Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45045: [C++][Parquet] Add a benchmark for size_statistics_level #45085

Merged
merged 10 commits into from
Jan 8, 2025

Conversation

wgtmac
Copy link
Member

@wgtmac wgtmac commented Dec 20, 2024

Rationale for this change

Add a benchmark to know the performance of writing different size stats levels.

What changes are included in this PR?

Add a size_stats_benchmark for parquet.

Are these changes tested?

No

Are there any user-facing changes?

No

@wgtmac

This comment was marked as outdated.

@pitrou
Copy link
Member

pitrou commented Dec 20, 2024

The slowdown on lists is a bit surprising, is it because of levels histograms?

Does the non-list case have nulls?

@wgtmac
Copy link
Member Author

wgtmac commented Dec 20, 2024

The slowdown on lists is a bit surprising, is it because of levels histograms?

Did you mean the slowdown from which one?

  1. T -> List[T] for same level.
  2. Level::None -> Level::ColumnChunk for List[T].

For 1, I think it is due to explosion of element sizes.
For 2, I think it is due to levels histograms because the string and integer types seem to have similar regression.

The data size is large enough that most iteration numbers are 1, which may affect judgement.

Does the non-list case have nulls?

Yes, the null probability is hard-coded to 50%.

@mapleFU
Copy link
Member

mapleFU commented Dec 20, 2024

BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type, Compression::ZSTD>                         602237730 ns    602045500 ns            2 bytes_per_second=134.957Mi/s items_per_second=17.4169M/s output_size=46.1176M page_index_size=1.011k
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type, Compression::ZSTD>                  542915083 ns    542902000 ns            1 bytes_per_second=149.659Mi/s items_per_second=19.3143M/s output_size=46.1177M page_index_size=1.011k
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type, Compression::ZSTD>           520517541 ns    520418000 ns            1 bytes_per_second=156.124Mi/s items_per_second=20.1487M/s output_size=46.1181M page_index_size=1.417k

There must be some unstable test since the None encoding is slower...🤔

@wgtmac
Copy link
Member Author

wgtmac commented Dec 20, 2024

Yes, the iteration is 1 in most cases which is not stable.

@wgtmac
Copy link
Member Author

wgtmac commented Dec 20, 2024

I have changed kBenchmarkSize from 10_000_000 to 1_000_000 and below is the result:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                                                 Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type, Compression::UNCOMPRESSED>                 199734637 ns    199733786 ns           14 bytes_per_second=40.6791Mi/s items_per_second=5.24987M/s output_size=4.6232M page_index_size=99
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type, Compression::UNCOMPRESSED>          204370481 ns    204321538 ns           13 bytes_per_second=39.7658Mi/s items_per_second=5.13199M/s output_size=4.62321M page_index_size=99
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type, Compression::UNCOMPRESSED>   209760009 ns    209680923 ns           13 bytes_per_second=38.7494Mi/s items_per_second=5.00082M/s output_size=4.62325M page_index_size=139
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType, Compression::UNCOMPRESSED>                231955434 ns    231875167 ns           12 bytes_per_second=39.3514Mi/s items_per_second=4.52216M/s output_size=7.50526M page_index_size=162
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType, Compression::UNCOMPRESSED>         237242318 ns    236886091 ns           11 bytes_per_second=38.519Mi/s items_per_second=4.4265M/s output_size=7.50528M page_index_size=162
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType, Compression::UNCOMPRESSED>  223389735 ns    223350000 ns           11 bytes_per_second=40.8534Mi/s items_per_second=4.69477M/s output_size=7.50537M page_index_size=229
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type, Compression::UNCOMPRESSED>                      365880153 ns    365687333 ns            3 bytes_per_second=122.363Mi/s items_per_second=2.86741M/s output_size=43.9176M page_index_size=894
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type, Compression::UNCOMPRESSED>               443278611 ns    442877333 ns            3 bytes_per_second=101.036Mi/s items_per_second=2.36764M/s output_size=43.9176M page_index_size=894
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type, Compression::UNCOMPRESSED>        363161333 ns    363023000 ns            2 bytes_per_second=123.261Mi/s items_per_second=2.88846M/s output_size=43.9182M page_index_size=1.54k
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType, Compression::UNCOMPRESSED>                     519121292 ns    519106000 ns            2 bytes_per_second=143.037Mi/s items_per_second=2.01997M/s output_size=74.8029M page_index_size=1.544k
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType, Compression::UNCOMPRESSED>              584476896 ns    584031000 ns            2 bytes_per_second=127.136Mi/s items_per_second=1.79541M/s output_size=74.803M page_index_size=1.544k
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType, Compression::UNCOMPRESSED>       591756459 ns    591446500 ns            2 bytes_per_second=125.542Mi/s items_per_second=1.7729M/s output_size=74.8042M page_index_size=2.588k
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type, Compression::ZSTD>                         182079277 ns    181948818 ns           11 bytes_per_second=44.6554Mi/s items_per_second=5.76303M/s output_size=4.61017M page_index_size=99
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type, Compression::ZSTD>                  204989156 ns    204887417 ns           12 bytes_per_second=39.6559Mi/s items_per_second=5.11782M/s output_size=4.61019M page_index_size=99
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type, Compression::ZSTD>           211684645 ns    211571000 ns           12 bytes_per_second=38.4032Mi/s items_per_second=4.95614M/s output_size=4.61023M page_index_size=139
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType, Compression::ZSTD>                        246236464 ns    246142625 ns            8 bytes_per_second=37.0704Mi/s items_per_second=4.26003M/s output_size=3.51222M page_index_size=162
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType, Compression::ZSTD>                 253982437 ns    253829125 ns            8 bytes_per_second=35.9478Mi/s items_per_second=4.13103M/s output_size=3.51224M page_index_size=162
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType, Compression::ZSTD>          236436298 ns    236416571 ns            7 bytes_per_second=38.5955Mi/s items_per_second=4.43529M/s output_size=3.51233M page_index_size=229
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type, Compression::ZSTD>                              409220236 ns    409182000 ns            3 bytes_per_second=109.357Mi/s items_per_second=2.56262M/s output_size=43.3514M page_index_size=894
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type, Compression::ZSTD>                       385995105 ns    385996500 ns            2 bytes_per_second=115.925Mi/s items_per_second=2.71654M/s output_size=43.3514M page_index_size=894
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type, Compression::ZSTD>                423469083 ns    422697500 ns            2 bytes_per_second=105.86Mi/s items_per_second=2.48068M/s output_size=43.352M page_index_size=1.54k
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType, Compression::ZSTD>                             691726084 ns    691472000 ns            1 bytes_per_second=107.381Mi/s items_per_second=1.51644M/s output_size=32.9562M page_index_size=1.544k
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType, Compression::ZSTD>                      759628875 ns    759026000 ns            1 bytes_per_second=97.8244Mi/s items_per_second=1.38148M/s output_size=32.9563M page_index_size=1.544k
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType, Compression::ZSTD>               829352332 ns    801603000 ns            1 bytes_per_second=92.6285Mi/s items_per_second=1.3081M/s output_size=32.9575M page_index_size=2.588k

@pitrou
Copy link
Member

pitrou commented Dec 20, 2024

Ok, some observations:

  1. can you arrange for List benchmarks to be shorter? there are still not enough iterations in these cases IMHO; kBenchmarkSize should probably denote the number of leaf values, not the number of top-level rows
  2. are the ZSTD benchmarks useful? they are just adding the compression overhead, we don't expect any significant insight about page index write speed there

@wgtmac
Copy link
Member Author

wgtmac commented Dec 20, 2024

are the ZSTD benchmarks useful? they are just adding the compression overhead, we don't expect any significant insight about page index write speed there

I was to observe the page index size overhead compared to the file size. Now it seems to be pretty trivial and thus compression can be removed.

@wgtmac
Copy link
Member Author

wgtmac commented Dec 21, 2024

Number of leaf values: 1024 * 1024

------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                 201108976 ns    201109929 ns           14 bytes_per_second=40.4008Mi/s items_per_second=5.21394M/s output_size=4.6232M page_index_size=99
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>          211061192 ns    210837077 ns           13 bytes_per_second=38.5369Mi/s items_per_second=4.97339M/s output_size=4.62321M page_index_size=99
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>   205393718 ns    205320000 ns           13 bytes_per_second=39.5724Mi/s items_per_second=5.10703M/s output_size=4.62325M page_index_size=139
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                214885667 ns    214848091 ns           11 bytes_per_second=42.47Mi/s items_per_second=4.88055M/s output_size=7.50526M page_index_size=162
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>         226451371 ns    224535909 ns           11 bytes_per_second=40.6376Mi/s items_per_second=4.66997M/s output_size=7.50528M page_index_size=162
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>  227898682 ns    227739545 ns           11 bytes_per_second=40.066Mi/s items_per_second=4.60428M/s output_size=7.50537M page_index_size=229
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                      206181004 ns    206177800 ns           10 bytes_per_second=41.4084Mi/s items_per_second=5.08579M/s output_size=4.89438M page_index_size=99
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>               238991846 ns    238962700 ns           10 bytes_per_second=35.7273Mi/s items_per_second=4.38803M/s output_size=4.89441M page_index_size=99
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>        217312880 ns    217313444 ns            9 bytes_per_second=39.2866Mi/s items_per_second=4.82518M/s output_size=4.89448M page_index_size=172
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                     229697459 ns    229696222 ns            9 bytes_per_second=41.5205Mi/s items_per_second=4.56506M/s output_size=7.77632M page_index_size=162
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>              236574281 ns    236410750 ns            8 bytes_per_second=40.3413Mi/s items_per_second=4.4354M/s output_size=7.77635M page_index_size=162
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>       233096755 ns    233096375 ns            8 bytes_per_second=40.9149Mi/s items_per_second=4.49847M/s output_size=7.77649M page_index_size=280

It is counter-intuitive that PageAndColumnChunk is faster than ColumnChunk for Int64 and List[Int64]

@pitrou
Copy link
Member

pitrou commented Dec 21, 2024

Thanks. By the way, can you disable dictionary encoding? Does it use PLAIN encoding?

@wgtmac
Copy link
Member Author

wgtmac commented Dec 23, 2024

After disabling dictionary, the result looks more reasonable.

------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                  98489675 ns     98475500 ns           10 bytes_per_second=82.5078Mi/s items_per_second=10.6481M/s output_size=4.32785M page_index_size=99
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>          113836396 ns    113687200 ns           10 bytes_per_second=71.468Mi/s items_per_second=9.22334M/s output_size=4.32787M page_index_size=99
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>   116892034 ns    116822900 ns           10 bytes_per_second=69.5497Mi/s items_per_second=8.97577M/s output_size=4.32791M page_index_size=139
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                210575609 ns    210545846 ns           13 bytes_per_second=43.3378Mi/s items_per_second=4.98027M/s output_size=7.47408M page_index_size=162
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>         211015493 ns    211015417 ns           12 bytes_per_second=43.2414Mi/s items_per_second=4.96919M/s output_size=7.4741M page_index_size=162
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>  211377809 ns    211377250 ns           12 bytes_per_second=43.1674Mi/s items_per_second=4.96069M/s output_size=7.47419M page_index_size=229
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                      148771942 ns    148768100 ns           10 bytes_per_second=57.388Mi/s items_per_second=7.04839M/s output_size=4.59879M page_index_size=99
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>               214763756 ns    214764231 ns           13 bytes_per_second=39.7529Mi/s items_per_second=4.88245M/s output_size=4.59881M page_index_size=99
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>        217214279 ns    217205077 ns           13 bytes_per_second=39.3062Mi/s items_per_second=4.82759M/s output_size=4.59889M page_index_size=172
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                     220088534 ns    220089000 ns           10 bytes_per_second=43.333Mi/s items_per_second=4.76433M/s output_size=7.74504M page_index_size=162
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>              228813218 ns    228813111 ns            9 bytes_per_second=41.6808Mi/s items_per_second=4.58267M/s output_size=7.74507M page_index_size=162
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>       231646792 ns    231642333 ns            9 bytes_per_second=41.1717Mi/s items_per_second=4.5267M/s output_size=7.74521M page_index_size=279

@wgtmac
Copy link
Member Author

wgtmac commented Jan 8, 2025

Do you want me to change anything? @pitrou

From the result, I don't think we should enable size stats by default. cc @emkornfield

@emkornfield
Copy link
Contributor

Thanks for the benchmarks. am I reading this correct there is a about 30% regression for plain encoded integers but otherwise it does not add substantial overhead? If something like lz4 raw is used for compression do we still see such a large regression for integers? (I'd assume most people are doing at least some form of light-weight compression)

@wgtmac
Copy link
Member Author

wgtmac commented Jan 8, 2025

@emkornfield I did an experiment with zstd before: #45085 (comment)


// This should result in multiple pages for most primitive types
constexpr int64_t kBenchmarkSize = 1024 * 1024;
constexpr double kNullProbability = 0.5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a worst case for definition levels encoding. Perhaps 0.9 or 0.95 would exhibit the size statistics overhead even more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kNullProbability = 0.95

------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                  61819253 ns     61819750 ns           20 bytes_per_second=131.43Mi/s items_per_second=16.9618M/s output_size=546.091k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>           55140742 ns     54834400 ns           10 bytes_per_second=148.173Mi/s items_per_second=19.1226M/s output_size=546.107k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>    53924121 ns     53887400 ns           10 bytes_per_second=150.777Mi/s items_per_second=19.4586M/s output_size=546.121k page_index_size=47
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                 51773791 ns     51774500 ns           10 bytes_per_second=89.4236Mi/s items_per_second=20.2527M/s output_size=864.083k page_index_size=30
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>          65489058 ns     65488500 ns           10 bytes_per_second=70.6973Mi/s items_per_second=16.0116M/s output_size=864.103k page_index_size=30
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>   65241288 ns     65241300 ns           10 bytes_per_second=70.9652Mi/s items_per_second=16.0723M/s output_size=864.122k page_index_size=44
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                       72174783 ns     72174900 ns           10 bytes_per_second=118.289Mi/s items_per_second=14.5283M/s output_size=625.915k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>               102759675 ns    102760300 ns           10 bytes_per_second=83.0817Mi/s items_per_second=10.2041M/s output_size=625.937k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>        105034546 ns    105034000 ns           10 bytes_per_second=81.2832Mi/s items_per_second=9.98321M/s output_size=625.957k page_index_size=54
BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                      92049333 ns     92049200 ns           10 bytes_per_second=54.779Mi/s items_per_second=11.3915M/s output_size=944.123k page_index_size=31
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>              122477704 ns    122478100 ns           10 bytes_per_second=41.1695Mi/s items_per_second=8.56133M/s output_size=944.149k page_index_size=31
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>       121217775 ns    121217000 ns           10 bytes_per_second=41.5978Mi/s items_per_second=8.6504M/s output_size=944.174k page_index_size=51

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 8, 2025
@pitrou
Copy link
Member

pitrou commented Jan 8, 2025

Sorry for the delay @wgtmac . I posted a small suggestion but this looks good to me.

Later, it would be good to investigate the cause of the overhead. It is surprising that there is a larger overhead for Int64 than for String. The numbers you posted in #45085 (comment) let me compute the following overheads (in ns/item):

Primitive List
Int64 17.5 65
String 0.8 11

@pitrou
Copy link
Member

pitrou commented Jan 8, 2025

Oh, wait, the benchmark is broken. Will push a fix to get more reasonable numbers.

@pitrou
Copy link
Member

pitrou commented Jan 8, 2025

Ok, I now get these numbers locally:

------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                   8213595 ns      8209893 ns           84 bytes_per_second=989.66Mi/s items_per_second=127.721M/s output_size=537.472k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>            9757681 ns      9753633 ns           72 bytes_per_second=833.023Mi/s items_per_second=107.506M/s output_size=537.488k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>     9768503 ns      9764686 ns           72 bytes_per_second=832.08Mi/s items_per_second=107.385M/s output_size=537.502k page_index_size=47

BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                 10229574 ns     10226100 ns           68 bytes_per_second=451.728Mi/s items_per_second=102.539M/s output_size=848.305k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>          11847455 ns     11843439 ns           55 bytes_per_second=390.04Mi/s items_per_second=88.5364M/s output_size=848.325k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>   11808794 ns     11804771 ns           59 bytes_per_second=391.318Mi/s items_per_second=88.8265M/s output_size=848.344k page_index_size=48

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                       13141463 ns     13130383 ns           53 bytes_per_second=650.21Mi/s items_per_second=79.8588M/s output_size=617.464k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>                16485608 ns     16472720 ns           43 bytes_per_second=518.281Mi/s items_per_second=63.6553M/s output_size=617.486k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>         16315535 ns     16302010 ns           43 bytes_per_second=523.709Mi/s items_per_second=64.3219M/s output_size=617.506k page_index_size=54

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                      15384749 ns     15370516 ns           46 bytes_per_second=327.375Mi/s items_per_second=68.22M/s output_size=927.326k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>               19044082 ns     19027424 ns           36 bytes_per_second=264.456Mi/s items_per_second=55.1087M/s output_size=927.352k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>        18860937 ns     18843797 ns           37 bytes_per_second=267.033Mi/s items_per_second=55.6457M/s output_size=927.377k page_index_size=55

@pitrou
Copy link
Member

pitrou commented Jan 8, 2025

And now the size statistics overhead in ns/item is more consistent (and smaller!):

Primitive List
Int64 1.48 3.04
String 1.50 3.22

@wgtmac
Copy link
Member Author

wgtmac commented Jan 8, 2025

What about the null probability? 0.5 or 0.95?

@pitrou
Copy link
Member

pitrou commented Jan 8, 2025

My latest push makes it 0.95. Can you check the changes look ok to you?

@wgtmac
Copy link
Member Author

wgtmac commented Jan 8, 2025

Yes, it looks good. Thanks!

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge. Thanks @wgtmac !

@pitrou pitrou merged commit 0804ba6 into apache:main Jan 8, 2025
34 of 35 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jan 8, 2025
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Jan 8, 2025
pitrou added a commit that referenced this pull request Jan 9, 2025
…stics (#45202)

### Rationale for this change

We found out in #45085 that there is a non-trivial overhead when writing size statistics is enabled.

### What changes are included in this PR?

Dramatically reduce overhead by speeding up def/rep levels histogram updates.

Performance results on the author's machine:
```
------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                   8103053 ns      8098569 ns           86 bytes_per_second=1003.26Mi/s items_per_second=129.477M/s output_size=537.472k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>            8153499 ns      8148492 ns           86 bytes_per_second=997.117Mi/s items_per_second=128.683M/s output_size=537.488k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>     8212560 ns      8207754 ns           83 bytes_per_second=989.918Mi/s items_per_second=127.754M/s output_size=537.502k page_index_size=47

BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                 10405020 ns     10400775 ns           67 bytes_per_second=444.142Mi/s items_per_second=100.817M/s output_size=848.305k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>          10464784 ns     10460778 ns           66 bytes_per_second=441.594Mi/s items_per_second=100.239M/s output_size=848.325k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>   10469832 ns     10465739 ns           67 bytes_per_second=441.385Mi/s items_per_second=100.191M/s output_size=848.344k page_index_size=48

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                       13004962 ns     12992678 ns           52 bytes_per_second=657.101Mi/s items_per_second=80.7052M/s output_size=617.464k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>                13718352 ns     13705599 ns           50 bytes_per_second=622.921Mi/s items_per_second=76.5071M/s output_size=617.486k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>         13845553 ns     13832138 ns           52 bytes_per_second=617.222Mi/s items_per_second=75.8072M/s output_size=617.506k page_index_size=54

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                      15715263 ns     15702707 ns           44 bytes_per_second=320.449Mi/s items_per_second=66.7768M/s output_size=927.326k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>               16507328 ns     16493800 ns           43 bytes_per_second=305.079Mi/s items_per_second=63.5739M/s output_size=927.352k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>        16575359 ns     16561311 ns           42 bytes_per_second=303.836Mi/s items_per_second=63.3148M/s output_size=927.377k page_index_size=55
```

Performance results without this PR:
```
------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                      Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------
BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                   8042576 ns      8037678 ns           87 bytes_per_second=1010.86Mi/s items_per_second=130.458M/s output_size=537.472k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>            9576627 ns      9571279 ns           73 bytes_per_second=848.894Mi/s items_per_second=109.554M/s output_size=537.488k page_index_size=33
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>     9570204 ns      9563595 ns           73 bytes_per_second=849.576Mi/s items_per_second=109.642M/s output_size=537.502k page_index_size=47

BM_WritePrimitiveColumn<SizeStatisticsLevel::None, ::arrow::StringType>                 10165397 ns     10160868 ns           69 bytes_per_second=454.628Mi/s items_per_second=103.197M/s output_size=848.305k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>          11662568 ns     11657396 ns           60 bytes_per_second=396.265Mi/s items_per_second=89.9494M/s output_size=848.325k page_index_size=34
BM_WritePrimitiveColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>   11657135 ns     11653063 ns           60 bytes_per_second=396.412Mi/s items_per_second=89.9829M/s output_size=848.344k page_index_size=48

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::Int64Type>                       13182006 ns     13168704 ns           51 bytes_per_second=648.318Mi/s items_per_second=79.6264M/s output_size=617.464k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::Int64Type>                16438205 ns     16421762 ns           43 bytes_per_second=519.89Mi/s items_per_second=63.8528M/s output_size=617.486k page_index_size=34
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::Int64Type>         16424615 ns     16409032 ns           42 bytes_per_second=520.293Mi/s items_per_second=63.9024M/s output_size=617.506k page_index_size=54

BM_WriteListColumn<SizeStatisticsLevel::None, ::arrow::StringType>                      15387808 ns     15373086 ns           46 bytes_per_second=327.32Mi/s items_per_second=68.2086M/s output_size=927.326k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::ColumnChunk, ::arrow::StringType>               18319628 ns     18302938 ns           37 bytes_per_second=274.924Mi/s items_per_second=57.29M/s output_size=927.352k page_index_size=35
BM_WriteListColumn<SizeStatisticsLevel::PageAndColumnChunk, ::arrow::StringType>        18346665 ns     18329336 ns           37 bytes_per_second=274.528Mi/s items_per_second=57.2075M/s output_size=927.377k page_index_size=55
```

### Are these changes tested?

Tested by existing tests, validated by existing benchmarks.

### Are there any user-facing changes?

No.

* GitHub Issue: #45201

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 0804ba6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants