forked from spektom/snappy-visual-cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathChangeLog
executable file
·1886 lines (1430 loc) · 99.8 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
------------------------------------------------------------------------
r80 | [email protected] | 2013-08-13 14:55:00 +0200 (Tue, 13 Aug 2013) | 6 lines
Add autoconf tests for size_t and ssize_t. Sort-of resolves public issue 79;
it would solve the problem if MSVC typically used autoconf. However, it gives
a natural place (config.h) to put the typedef even for MSVC.
R=jsbell
------------------------------------------------------------------------
r79 | [email protected] | 2013-07-29 13:06:44 +0200 (Mon, 29 Jul 2013) | 14 lines
When we compare the number of bytes produced with the offset for a
backreference, make the signedness of the bytes produced clear,
by sticking it into a size_t. This avoids a signed/unsigned compare
warning from MSVC (public issue 71), and also is slightly clearer.
Since the line is now so long the explanatory comment about the -1u
trick has to go somewhere else anyway, I used the opportunity to
explain it in slightly more detail.
This is a purely stylistic change; the emitted assembler from GCC
is identical.
R=jeff
------------------------------------------------------------------------
r78 | [email protected] | 2013-06-30 21:24:03 +0200 (Sun, 30 Jun 2013) | 111 lines
In the fast path for decompressing literals, instead of checking
whether there's 16 bytes free and then checking right afterwards
(when having subtracted the literal size) that there are now
5 bytes free, just check once for 21 bytes. This skips a compare
and a branch; although it is easily predictable, it is still
a few cycles on a fast path that we would like to get rid of.
Benchmarking this yields very confusing results. On open-source
GCC 4.8.1 on Haswell, we get exactly the expected results; the
benchmarks where we hit the fast path for literals (in particular
the two HTML benchmarks and the protobuf benchmark) give very nice
speedups, and the others are not really affected.
However, benchmarks with Google's GCC branch on other hardware
is much less clear. It seems that we have a weak loss in some cases
(and the win for the “typical” win cases are not nearly as clear),
but that it depends on microarchitecture and plain luck in how we run
the benchmark. Looking at the generated assembler, it seems that
the removal of the if causes other large-scale changes in how the
function is laid out, which makes it likely that this is just bad luck.
Thus, we should keep this change, even though its exact current impact is
unclear; it's a sensible change per se, and dropping it on the basis of
microoptimization for a given compiler (or even branch of a compiler)
would seem like a bad strategy in the long run.
Microbenchmark results (all in 64-bit, opt mode):
Nehalem, Google GCC:
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------------------
BM_UFlat/0 76747 75591 1.3GB/s html +1.5%
BM_UFlat/1 765756 757040 886.3MB/s urls +1.2%
BM_UFlat/2 10867 10893 10.9GB/s jpg -0.2%
BM_UFlat/3 124 131 1.4GB/s jpg_200 -5.3%
BM_UFlat/4 31663 31596 2.8GB/s pdf +0.2%
BM_UFlat/5 314162 308176 1.2GB/s html4 +1.9%
BM_UFlat/6 29668 29746 790.6MB/s cp -0.3%
BM_UFlat/7 12958 13386 796.4MB/s c -3.2%
BM_UFlat/8 3596 3682 966.0MB/s lsp -2.3%
BM_UFlat/9 1019193 1033493 953.3MB/s xls -1.4%
BM_UFlat/10 239 247 775.3MB/s xls_200 -3.2%
BM_UFlat/11 236411 240271 606.9MB/s txt1 -1.6%
BM_UFlat/12 206639 209768 571.2MB/s txt2 -1.5%
BM_UFlat/13 627803 635722 641.4MB/s txt3 -1.2%
BM_UFlat/14 845932 857816 538.2MB/s txt4 -1.4%
BM_UFlat/15 402107 391670 1.2GB/s bin +2.7%
BM_UFlat/16 283 279 683.6MB/s bin_200 +1.4%
BM_UFlat/17 46070 46815 781.5MB/s sum -1.6%
BM_UFlat/18 5053 5163 782.0MB/s man -2.1%
BM_UFlat/19 79721 76581 1.4GB/s pb +4.1%
BM_UFlat/20 251158 252330 697.5MB/s gaviota -0.5%
Sum of all benchmarks 4966150 4980396 -0.3%
Sandy Bridge, Google GCC:
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------------------
BM_UFlat/0 42850 42182 2.3GB/s html +1.6%
BM_UFlat/1 525660 515816 1.3GB/s urls +1.9%
BM_UFlat/2 7173 7283 16.3GB/s jpg -1.5%
BM_UFlat/3 92 91 2.1GB/s jpg_200 +1.1%
BM_UFlat/4 15147 14872 5.9GB/s pdf +1.8%
BM_UFlat/5 199936 192116 2.0GB/s html4 +4.1%
BM_UFlat/6 12796 12443 1.8GB/s cp +2.8%
BM_UFlat/7 6588 6400 1.6GB/s c +2.9%
BM_UFlat/8 2010 1951 1.8GB/s lsp +3.0%
BM_UFlat/9 761124 763049 1.3GB/s xls -0.3%
BM_UFlat/10 186 189 1016.1MB/s xls_200 -1.6%
BM_UFlat/11 159354 158460 918.6MB/s txt1 +0.6%
BM_UFlat/12 139732 139950 856.1MB/s txt2 -0.2%
BM_UFlat/13 429917 425027 961.7MB/s txt3 +1.2%
BM_UFlat/14 585255 587324 785.8MB/s txt4 -0.4%
BM_UFlat/15 276186 266173 1.8GB/s bin +3.8%
BM_UFlat/16 205 207 925.5MB/s bin_200 -1.0%
BM_UFlat/17 24925 24935 1.4GB/s sum -0.0%
BM_UFlat/18 2632 2576 1.5GB/s man +2.2%
BM_UFlat/19 40546 39108 2.8GB/s pb +3.7%
BM_UFlat/20 175803 168209 1048.9MB/s gaviota +4.5%
Sum of all benchmarks 3408117 3368361 +1.2%
Haswell, upstream GCC 4.8.1:
Benchmark Base (ns) New (ns) Improvement
------------------------------------------------------------------------------
BM_UFlat/0 46308 40641 2.3GB/s html +13.9%
BM_UFlat/1 513385 514706 1.3GB/s urls -0.3%
BM_UFlat/2 6197 6151 19.2GB/s jpg +0.7%
BM_UFlat/3 61 61 3.0GB/s jpg_200 +0.0%
BM_UFlat/4 13551 13429 6.5GB/s pdf +0.9%
BM_UFlat/5 198317 190243 2.0GB/s html4 +4.2%
BM_UFlat/6 14768 12560 1.8GB/s cp +17.6%
BM_UFlat/7 6453 6447 1.6GB/s c +0.1%
BM_UFlat/8 1991 1980 1.8GB/s lsp +0.6%
BM_UFlat/9 766947 770424 1.2GB/s xls -0.5%
BM_UFlat/10 170 169 1.1GB/s xls_200 +0.6%
BM_UFlat/11 164350 163554 888.7MB/s txt1 +0.5%
BM_UFlat/12 145444 143830 832.1MB/s txt2 +1.1%
BM_UFlat/13 437849 438413 929.2MB/s txt3 -0.1%
BM_UFlat/14 603587 605309 759.8MB/s txt4 -0.3%
BM_UFlat/15 249799 248067 1.9GB/s bin +0.7%
BM_UFlat/16 191 188 1011.4MB/s bin_200 +1.6%
BM_UFlat/17 26064 24778 1.4GB/s sum +5.2%
BM_UFlat/18 2620 2601 1.5GB/s man +0.7%
BM_UFlat/19 44551 37373 3.0GB/s pb +19.2%
BM_UFlat/20 165408 164584 1.0GB/s gaviota +0.5%
Sum of all benchmarks 3408011 3385508 +0.7%
------------------------------------------------------------------------
r77 | [email protected] | 2013-06-14 23:42:26 +0200 (Fri, 14 Jun 2013) | 92 lines
Make the two IncrementalCopy* functions take in an ssize_t instead of a len,
in order to avoid having to do 32-to-64-bit signed conversions on a hot path
during decompression. (Also fixes some MSVC warnings, mentioned in public
issue 75, but more of those remain.) They cannot be size_t because we expect
them to go negative and test for that.
This saves a few movzwl instructions, yielding ~2% speedup in decompression.
Sandy Bridge:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 48009 41283 2.3GB/s html +16.3%
BM_UFlat/1 531274 513419 1.3GB/s urls +3.5%
BM_UFlat/2 7378 7062 16.8GB/s jpg +4.5%
BM_UFlat/3 92 92 2.0GB/s jpg_200 +0.0%
BM_UFlat/4 15057 14974 5.9GB/s pdf +0.6%
BM_UFlat/5 204323 193140 2.0GB/s html4 +5.8%
BM_UFlat/6 13282 12611 1.8GB/s cp +5.3%
BM_UFlat/7 6511 6504 1.6GB/s c +0.1%
BM_UFlat/8 2014 2030 1.7GB/s lsp -0.8%
BM_UFlat/9 775909 768336 1.3GB/s xls +1.0%
BM_UFlat/10 182 184 1043.2MB/s xls_200 -1.1%
BM_UFlat/11 167352 161630 901.2MB/s txt1 +3.5%
BM_UFlat/12 147393 142246 842.8MB/s txt2 +3.6%
BM_UFlat/13 449960 432853 944.4MB/s txt3 +4.0%
BM_UFlat/14 620497 594845 775.9MB/s txt4 +4.3%
BM_UFlat/15 265610 267356 1.8GB/s bin -0.7%
BM_UFlat/16 206 205 932.7MB/s bin_200 +0.5%
BM_UFlat/17 25561 24730 1.4GB/s sum +3.4%
BM_UFlat/18 2620 2644 1.5GB/s man -0.9%
BM_UFlat/19 45766 38589 2.9GB/s pb +18.6%
BM_UFlat/20 171107 169832 1039.5MB/s gaviota +0.8%
Sum of all benchmarks 3500103 3394565 +3.1%
Westmere:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 72624 71526 1.3GB/s html +1.5%
BM_UFlat/1 735821 722917 930.8MB/s urls +1.8%
BM_UFlat/2 10450 10172 11.7GB/s jpg +2.7%
BM_UFlat/3 117 117 1.6GB/s jpg_200 +0.0%
BM_UFlat/4 29817 29648 3.0GB/s pdf +0.6%
BM_UFlat/5 297126 293073 1.3GB/s html4 +1.4%
BM_UFlat/6 28252 27994 842.0MB/s cp +0.9%
BM_UFlat/7 12672 12391 862.1MB/s c +2.3%
BM_UFlat/8 3507 3425 1040.9MB/s lsp +2.4%
BM_UFlat/9 1004268 969395 1018.0MB/s xls +3.6%
BM_UFlat/10 233 227 844.8MB/s xls_200 +2.6%
BM_UFlat/11 230054 224981 647.8MB/s txt1 +2.3%
BM_UFlat/12 201229 196447 610.5MB/s txt2 +2.4%
BM_UFlat/13 609547 596761 685.3MB/s txt3 +2.1%
BM_UFlat/14 824362 804821 573.8MB/s txt4 +2.4%
BM_UFlat/15 371095 374899 1.3GB/s bin -1.0%
BM_UFlat/16 267 267 717.8MB/s bin_200 +0.0%
BM_UFlat/17 44623 43828 835.9MB/s sum +1.8%
BM_UFlat/18 5077 4815 841.0MB/s man +5.4%
BM_UFlat/19 74964 73210 1.5GB/s pb +2.4%
BM_UFlat/20 237987 236745 746.0MB/s gaviota +0.5%
Sum of all benchmarks 4794092 4697659 +2.1%
Istanbul:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 98614 96376 1020.4MB/s html +2.3%
BM_UFlat/1 963740 953241 707.2MB/s urls +1.1%
BM_UFlat/2 25042 24769 4.8GB/s jpg +1.1%
BM_UFlat/3 180 180 1065.6MB/s jpg_200 +0.0%
BM_UFlat/4 45942 45403 1.9GB/s pdf +1.2%
BM_UFlat/5 400135 390226 1008.2MB/s html4 +2.5%
BM_UFlat/6 37768 37392 631.9MB/s cp +1.0%
BM_UFlat/7 18585 18200 588.2MB/s c +2.1%
BM_UFlat/8 5751 5690 627.7MB/s lsp +1.1%
BM_UFlat/9 1543154 1542209 641.4MB/s xls +0.1%
BM_UFlat/10 381 388 494.6MB/s xls_200 -1.8%
BM_UFlat/11 339715 331973 440.1MB/s txt1 +2.3%
BM_UFlat/12 294807 289418 415.4MB/s txt2 +1.9%
BM_UFlat/13 906160 884094 463.3MB/s txt3 +2.5%
BM_UFlat/14 1224221 1198435 386.1MB/s txt4 +2.2%
BM_UFlat/15 516277 502923 979.5MB/s bin +2.7%
BM_UFlat/16 405 402 477.2MB/s bin_200 +0.7%
BM_UFlat/17 61640 60621 605.6MB/s sum +1.7%
BM_UFlat/18 7326 7383 549.5MB/s man -0.8%
BM_UFlat/19 94720 92653 1.2GB/s pb +2.2%
BM_UFlat/20 360435 346687 510.6MB/s gaviota +4.0%
Sum of all benchmarks 6944998 6828663 +1.7%
------------------------------------------------------------------------
r76 | [email protected] | 2013-06-13 18:19:52 +0200 (Thu, 13 Jun 2013) | 9 lines
Add support for uncompressing to iovecs (scatter I/O).
Windows does not have struct iovec defined anywhere,
so we define our own version that's equal to what UNIX
typically has.
The bulk of this patch was contributed by Mohit Aron.
R=jeff
------------------------------------------------------------------------
r75 | [email protected] | 2013-06-12 21:51:15 +0200 (Wed, 12 Jun 2013) | 4 lines
Some code reorganization needed for an internal change.
R=fikes
------------------------------------------------------------------------
r74 | [email protected] | 2013-04-09 17:33:30 +0200 (Tue, 09 Apr 2013) | 4 lines
Supports truncated test data in zippy benchmark.
R=sesse
------------------------------------------------------------------------
r73 | [email protected] | 2013-02-05 15:36:15 +0100 (Tue, 05 Feb 2013) | 4 lines
Release Snappy 1.1.0.
R=sanjay
------------------------------------------------------------------------
r72 | [email protected] | 2013-02-05 15:30:05 +0100 (Tue, 05 Feb 2013) | 9 lines
Make ./snappy_unittest pass without "srcdir" being defined.
Previously, snappy_unittests would read from an absolute path /testdata/..;
convert it to use a relative path instead.
Patch from Marc-Antonie Ruel.
R=maruel
------------------------------------------------------------------------
r71 | [email protected] | 2013-01-18 13:16:36 +0100 (Fri, 18 Jan 2013) | 287 lines
Increase the Zippy block size from 32 kB to 64 kB, winning ~3% density
while being effectively performance neutral.
The longer story about density is that we win 3-6% density on the benchmarks
where this has any effect at all; many of the benchmarks (cp, c, lsp, man)
are smaller than 32 kB and thus will have no effect. Binary data also seems
to win little or nothing; of course, the already-compressed data wins nothing.
The protobuf benchmark wins as much as ~18% depending on architecture,
but I wouldn't be too sure that this is representative of protobuf data in
general.
As of performance, we lose a tiny amount since we get more tags (e.g., a long
literal might be broken up into literal-copy-literal), but we win it back with
less clearing of the hash table, and more opportunities to skip incompressible
data (e.g. in the jpg benchmark). Decompression seems to get ever so slightly
slower, again due to more tags. The total net change is about as close to zero
as we can get, so the end effect seems to be simply more density and no
real performance change.
The comment about not changing kBlockSize, scary as it is, is not really
relevant, since we're never going to have a block-level decompressor without
explicitly marked blocks. Replace it with something more appropriate.
This affects the framing format, but it's okay to change it since it basically
has no users yet.
Density (note that cp, c, lsp and man are all smaller than 32 kB):
Benchmark Description Base (%) New (%) Improvement
--------------------------------------------------------------
ZFlat/0 html 22.57 22.31 +5.6%
ZFlat/1 urls 50.89 47.77 +6.5%
ZFlat/2 jpg 99.88 99.87 +0.0%
ZFlat/3 pdf 82.13 82.07 +0.1%
ZFlat/4 html4 23.55 22.51 +4.6%
ZFlat/5 cp 48.12 48.12 +0.0%
ZFlat/6 c 42.40 42.40 +0.0%
ZFlat/7 lsp 48.37 48.37 +0.0%
ZFlat/8 xls 41.34 41.23 +0.3%
ZFlat/9 txt1 59.81 57.87 +3.4%
ZFlat/10 txt2 64.07 61.93 +3.5%
ZFlat/11 txt3 57.11 54.92 +4.0%
ZFlat/12 txt4 68.35 66.22 +3.2%
ZFlat/13 bin 18.21 18.11 +0.6%
ZFlat/14 sum 51.88 48.96 +6.0%
ZFlat/15 man 59.36 59.36 +0.0%
ZFlat/16 pb 23.15 19.64 +17.9%
ZFlat/17 gaviota 38.27 37.72 +1.5%
Geometric mean 45.51 44.15 +3.1%
Microbenchmarks (64-bit, opt):
Westmere 2.8 GHz:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 75342 75027 1.3GB/s html +0.4%
BM_UFlat/1 723767 744269 899.6MB/s urls -2.8%
BM_UFlat/2 10072 10072 11.7GB/s jpg +0.0%
BM_UFlat/3 30747 30388 2.9GB/s pdf +1.2%
BM_UFlat/4 307353 306063 1.2GB/s html4 +0.4%
BM_UFlat/5 28593 28743 816.3MB/s cp -0.5%
BM_UFlat/6 12958 12998 818.1MB/s c -0.3%
BM_UFlat/7 3700 3792 935.8MB/s lsp -2.4%
BM_UFlat/8 999685 999905 982.1MB/s xls -0.0%
BM_UFlat/9 232954 230079 630.4MB/s txt1 +1.2%
BM_UFlat/10 200785 201468 592.6MB/s txt2 -0.3%
BM_UFlat/11 617267 610968 666.1MB/s txt3 +1.0%
BM_UFlat/12 821595 822475 558.7MB/s txt4 -0.1%
BM_UFlat/13 377097 377632 1.3GB/s bin -0.1%
BM_UFlat/14 45476 45260 805.8MB/s sum +0.5%
BM_UFlat/15 4985 5003 805.7MB/s man -0.4%
BM_UFlat/16 80813 77494 1.4GB/s pb +4.3%
BM_UFlat/17 251792 241553 727.7MB/s gaviota +4.2%
BM_UValidate/0 40343 40354 2.4GB/s html -0.0%
BM_UValidate/1 426890 451574 1.4GB/s urls -5.5%
BM_UValidate/2 187 179 661.9GB/s jpg +4.5%
BM_UValidate/3 13783 13827 6.4GB/s pdf -0.3%
BM_UValidate/4 162393 163335 2.3GB/s html4 -0.6%
BM_UDataBuffer/0 93756 93302 1046.7MB/s html +0.5%
BM_UDataBuffer/1 886714 916292 730.7MB/s urls -3.2%
BM_UDataBuffer/2 15861 16401 7.2GB/s jpg -3.3%
BM_UDataBuffer/3 38934 39224 2.2GB/s pdf -0.7%
BM_UDataBuffer/4 381008 379428 1029.5MB/s html4 +0.4%
BM_UCord/0 92528 91098 1072.0MB/s html +1.6%
BM_UCord/1 858421 885287 756.3MB/s urls -3.0%
BM_UCord/2 13140 13464 8.8GB/s jpg -2.4%
BM_UCord/3 39012 37773 2.3GB/s pdf +3.3%
BM_UCord/4 376869 371267 1052.1MB/s html4 +1.5%
BM_UCordString/0 75810 75303 1.3GB/s html +0.7%
BM_UCordString/1 735290 753841 888.2MB/s urls -2.5%
BM_UCordString/2 11945 13113 9.0GB/s jpg -8.9%
BM_UCordString/3 33901 32562 2.7GB/s pdf +4.1%
BM_UCordString/4 310985 309390 1.2GB/s html4 +0.5%
BM_UCordValidate/0 40952 40450 2.4GB/s html +1.2%
BM_UCordValidate/1 433842 456531 1.4GB/s urls -5.0%
BM_UCordValidate/2 1179 1173 100.8GB/s jpg +0.5%
BM_UCordValidate/3 14481 14392 6.1GB/s pdf +0.6%
BM_UCordValidate/4 164364 164151 2.3GB/s html4 +0.1%
BM_ZFlat/0 160610 156601 623.6MB/s html (22.31 %) +2.6%
BM_ZFlat/1 1995238 1993582 335.9MB/s urls (47.77 %) +0.1%
BM_ZFlat/2 30133 24983 4.7GB/s jpg (99.87 %) +20.6%
BM_ZFlat/3 74453 73128 1.2GB/s pdf (82.07 %) +1.8%
BM_ZFlat/4 647674 633729 616.4MB/s html4 (22.51 %) +2.2%
BM_ZFlat/5 76259 76090 308.4MB/s cp (48.12 %) +0.2%
BM_ZFlat/6 31106 31084 342.1MB/s c (42.40 %) +0.1%
BM_ZFlat/7 10507 10443 339.8MB/s lsp (48.37 %) +0.6%
BM_ZFlat/8 1811047 1793325 547.6MB/s xls (41.23 %) +1.0%
BM_ZFlat/9 597903 581793 249.3MB/s txt1 (57.87 %) +2.8%
BM_ZFlat/10 525320 514522 232.0MB/s txt2 (61.93 %) +2.1%
BM_ZFlat/11 1596591 1551636 262.3MB/s txt3 (54.92 %) +2.9%
BM_ZFlat/12 2134523 2094033 219.5MB/s txt4 (66.22 %) +1.9%
BM_ZFlat/13 593024 587869 832.6MB/s bin (18.11 %) +0.9%
BM_ZFlat/14 114746 110666 329.5MB/s sum (48.96 %) +3.7%
BM_ZFlat/15 14376 14485 278.3MB/s man (59.36 %) -0.8%
BM_ZFlat/16 167908 150070 753.6MB/s pb (19.64 %) +11.9%
BM_ZFlat/17 460228 442253 397.5MB/s gaviota (37.72 %) +4.1%
BM_ZCord/0 164896 160241 609.4MB/s html +2.9%
BM_ZCord/1 2070239 2043492 327.7MB/s urls +1.3%
BM_ZCord/2 54402 47002 2.5GB/s jpg +15.7%
BM_ZCord/3 85871 83832 1073.1MB/s pdf +2.4%
BM_ZCord/4 664078 648825 602.0MB/s html4 +2.4%
BM_ZDataBuffer/0 174874 172549 566.0MB/s html +1.3%
BM_ZDataBuffer/1 2134410 2139173 313.0MB/s urls -0.2%
BM_ZDataBuffer/2 71911 69551 1.7GB/s jpg +3.4%
BM_ZDataBuffer/3 98236 99727 902.1MB/s pdf -1.5%
BM_ZDataBuffer/4 710776 699104 558.8MB/s html4 +1.7%
Sum of all benchmarks 27358908 27200688 +0.6%
Sandy Bridge 2.6 GHz:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 49356 49018 1.9GB/s html +0.7%
BM_UFlat/1 516764 531955 1.2GB/s urls -2.9%
BM_UFlat/2 6982 7304 16.2GB/s jpg -4.4%
BM_UFlat/3 15285 15598 5.6GB/s pdf -2.0%
BM_UFlat/4 206557 206669 1.8GB/s html4 -0.1%
BM_UFlat/5 13681 13567 1.7GB/s cp +0.8%
BM_UFlat/6 6571 6592 1.6GB/s c -0.3%
BM_UFlat/7 2008 1994 1.7GB/s lsp +0.7%
BM_UFlat/8 775700 773286 1.2GB/s xls +0.3%
BM_UFlat/9 165578 164480 881.8MB/s txt1 +0.7%
BM_UFlat/10 143707 144139 828.2MB/s txt2 -0.3%
BM_UFlat/11 443026 436281 932.8MB/s txt3 +1.5%
BM_UFlat/12 603129 595856 771.2MB/s txt4 +1.2%
BM_UFlat/13 271682 270450 1.8GB/s bin +0.5%
BM_UFlat/14 26200 25666 1.4GB/s sum +2.1%
BM_UFlat/15 2620 2608 1.5GB/s man +0.5%
BM_UFlat/16 48908 47756 2.3GB/s pb +2.4%
BM_UFlat/17 174638 170346 1031.9MB/s gaviota +2.5%
BM_UValidate/0 31922 31898 3.0GB/s html +0.1%
BM_UValidate/1 341265 363554 1.8GB/s urls -6.1%
BM_UValidate/2 160 151 782.8GB/s jpg +6.0%
BM_UValidate/3 10402 10380 8.5GB/s pdf +0.2%
BM_UValidate/4 129490 130587 2.9GB/s html4 -0.8%
BM_UDataBuffer/0 59383 58736 1.6GB/s html +1.1%
BM_UDataBuffer/1 619222 637786 1049.8MB/s urls -2.9%
BM_UDataBuffer/2 10775 11941 9.9GB/s jpg -9.8%
BM_UDataBuffer/3 18002 17930 4.9GB/s pdf +0.4%
BM_UDataBuffer/4 259182 259306 1.5GB/s html4 -0.0%
BM_UCord/0 59379 57814 1.6GB/s html +2.7%
BM_UCord/1 598456 615162 1088.4MB/s urls -2.7%
BM_UCord/2 8519 8628 13.7GB/s jpg -1.3%
BM_UCord/3 18123 17537 5.0GB/s pdf +3.3%
BM_UCord/4 252375 252331 1.5GB/s html4 +0.0%
BM_UCordString/0 49494 49790 1.9GB/s html -0.6%
BM_UCordString/1 524659 541803 1.2GB/s urls -3.2%
BM_UCordString/2 8206 8354 14.2GB/s jpg -1.8%
BM_UCordString/3 17235 16537 5.3GB/s pdf +4.2%
BM_UCordString/4 210188 211072 1.8GB/s html4 -0.4%
BM_UCordValidate/0 31956 31587 3.0GB/s html +1.2%
BM_UCordValidate/1 340828 362141 1.8GB/s urls -5.9%
BM_UCordValidate/2 783 744 158.9GB/s jpg +5.2%
BM_UCordValidate/3 10543 10462 8.4GB/s pdf +0.8%
BM_UCordValidate/4 130150 129789 2.9GB/s html4 +0.3%
BM_ZFlat/0 113873 111200 878.2MB/s html (22.31 %) +2.4%
BM_ZFlat/1 1473023 1489858 449.4MB/s urls (47.77 %) -1.1%
BM_ZFlat/2 23569 19486 6.1GB/s jpg (99.87 %) +21.0%
BM_ZFlat/3 49178 48046 1.8GB/s pdf (82.07 %) +2.4%
BM_ZFlat/4 475063 469394 832.2MB/s html4 (22.51 %) +1.2%
BM_ZFlat/5 46910 46816 501.2MB/s cp (48.12 %) +0.2%
BM_ZFlat/6 16883 16916 628.6MB/s c (42.40 %) -0.2%
BM_ZFlat/7 5381 5447 651.5MB/s lsp (48.37 %) -1.2%
BM_ZFlat/8 1466870 1473861 666.3MB/s xls (41.23 %) -0.5%
BM_ZFlat/9 468006 464101 312.5MB/s txt1 (57.87 %) +0.8%
BM_ZFlat/10 408157 408957 291.9MB/s txt2 (61.93 %) -0.2%
BM_ZFlat/11 1253348 1232910 330.1MB/s txt3 (54.92 %) +1.7%
BM_ZFlat/12 1702373 1702977 269.8MB/s txt4 (66.22 %) -0.0%
BM_ZFlat/13 439792 438557 1116.0MB/s bin (18.11 %) +0.3%
BM_ZFlat/14 80766 78851 462.5MB/s sum (48.96 %) +2.4%
BM_ZFlat/15 7420 7542 534.5MB/s man (59.36 %) -1.6%
BM_ZFlat/16 112043 100126 1.1GB/s pb (19.64 %) +11.9%
BM_ZFlat/17 368877 357703 491.4MB/s gaviota (37.72 %) +3.1%
BM_ZCord/0 116402 113564 859.9MB/s html +2.5%
BM_ZCord/1 1507156 1519911 440.5MB/s urls -0.8%
BM_ZCord/2 39860 33686 3.5GB/s jpg +18.3%
BM_ZCord/3 56211 54694 1.6GB/s pdf +2.8%
BM_ZCord/4 485594 479212 815.1MB/s html4 +1.3%
BM_ZDataBuffer/0 123185 121572 803.3MB/s html +1.3%
BM_ZDataBuffer/1 1569111 1589380 421.3MB/s urls -1.3%
BM_ZDataBuffer/2 53143 49556 2.4GB/s jpg +7.2%
BM_ZDataBuffer/3 65725 66826 1.3GB/s pdf -1.6%
BM_ZDataBuffer/4 517871 514750 758.9MB/s html4 +0.6%
Sum of all benchmarks 20258879 20315484 -0.3%
AMD Instanbul 2.4 GHz:
Benchmark Base (ns) New (ns) Improvement
-------------------------------------------------------------------------------------------------
BM_UFlat/0 97120 96585 1011.1MB/s html +0.6%
BM_UFlat/1 917473 948016 706.3MB/s urls -3.2%
BM_UFlat/2 21496 23938 4.9GB/s jpg -10.2%
BM_UFlat/3 44751 45639 1.9GB/s pdf -1.9%
BM_UFlat/4 391950 391413 998.0MB/s html4 +0.1%
BM_UFlat/5 37366 37201 630.7MB/s cp +0.4%
BM_UFlat/6 18350 18318 580.5MB/s c +0.2%
BM_UFlat/7 5672 5661 626.9MB/s lsp +0.2%
BM_UFlat/8 1533390 1529441 642.1MB/s xls +0.3%
BM_UFlat/9 335477 336553 431.0MB/s txt1 -0.3%
BM_UFlat/10 285140 292080 408.7MB/s txt2 -2.4%
BM_UFlat/11 888507 894758 454.9MB/s txt3 -0.7%
BM_UFlat/12 1187643 1210928 379.5MB/s txt4 -1.9%
BM_UFlat/13 493717 507447 964.5MB/s bin -2.7%
BM_UFlat/14 61740 60870 599.1MB/s sum +1.4%
BM_UFlat/15 7211 7187 560.9MB/s man +0.3%
BM_UFlat/16 97435 93100 1.2GB/s pb +4.7%
BM_UFlat/17 362662 356395 493.2MB/s gaviota +1.8%
BM_UValidate/0 47475 47118 2.0GB/s html +0.8%
BM_UValidate/1 501304 529741 1.2GB/s urls -5.4%
BM_UValidate/2 276 243 486.2GB/s jpg +13.6%
BM_UValidate/3 16361 16261 5.4GB/s pdf +0.6%
BM_UValidate/4 190741 190353 2.0GB/s html4 +0.2%
BM_UDataBuffer/0 111080 109771 889.6MB/s html +1.2%
BM_UDataBuffer/1 1051035 1085999 616.5MB/s urls -3.2%
BM_UDataBuffer/2 25801 25463 4.6GB/s jpg +1.3%
BM_UDataBuffer/3 50493 49946 1.8GB/s pdf +1.1%
BM_UDataBuffer/4 447258 444138 879.5MB/s html4 +0.7%
BM_UCord/0 109350 107909 905.0MB/s html +1.3%
BM_UCord/1 1023396 1054964 634.7MB/s urls -3.0%
BM_UCord/2 25292 24371 4.9GB/s jpg +3.8%
BM_UCord/3 48955 49736 1.8GB/s pdf -1.6%
BM_UCord/4 440452 437331 893.2MB/s html4 +0.7%
BM_UCordString/0 98511 98031 996.2MB/s html +0.5%
BM_UCordString/1 933230 963495 694.9MB/s urls -3.1%
BM_UCordString/2 23311 24076 4.9GB/s jpg -3.2%
BM_UCordString/3 45568 46196 1.9GB/s pdf -1.4%
BM_UCordString/4 397791 396934 984.1MB/s html4 +0.2%
BM_UCordValidate/0 47537 46921 2.0GB/s html +1.3%
BM_UCordValidate/1 505071 532716 1.2GB/s urls -5.2%
BM_UCordValidate/2 1663 1621 72.9GB/s jpg +2.6%
BM_UCordValidate/3 16890 16926 5.2GB/s pdf -0.2%
BM_UCordValidate/4 192365 191984 2.0GB/s html4 +0.2%
BM_ZFlat/0 184708 179103 545.3MB/s html (22.31 %) +3.1%
BM_ZFlat/1 2293864 2302950 290.7MB/s urls (47.77 %) -0.4%
BM_ZFlat/2 52852 47618 2.5GB/s jpg (99.87 %) +11.0%
BM_ZFlat/3 100766 96179 935.3MB/s pdf (82.07 %) +4.8%
BM_ZFlat/4 741220 727977 536.6MB/s html4 (22.51 %) +1.8%
BM_ZFlat/5 85402 85418 274.7MB/s cp (48.12 %) -0.0%
BM_ZFlat/6 36558 36494 291.4MB/s c (42.40 %) +0.2%
BM_ZFlat/7 12706 12507 283.7MB/s lsp (48.37 %) +1.6%
BM_ZFlat/8 2336823 2335688 420.5MB/s xls (41.23 %) +0.0%
BM_ZFlat/9 701804 681153 212.9MB/s txt1 (57.87 %) +3.0%
BM_ZFlat/10 606700 597194 199.9MB/s txt2 (61.93 %) +1.6%
BM_ZFlat/11 1852283 1803238 225.7MB/s txt3 (54.92 %) +2.7%
BM_ZFlat/12 2475527 2443354 188.1MB/s txt4 (66.22 %) +1.3%
BM_ZFlat/13 694497 696654 702.6MB/s bin (18.11 %) -0.3%
BM_ZFlat/14 136929 129855 280.8MB/s sum (48.96 %) +5.4%
BM_ZFlat/15 17172 17124 235.4MB/s man (59.36 %) +0.3%
BM_ZFlat/16 190364 171763 658.4MB/s pb (19.64 %) +10.8%
BM_ZFlat/17 567285 555190 316.6MB/s gaviota (37.72 %) +2.2%
BM_ZCord/0 193490 187031 522.1MB/s html +3.5%
BM_ZCord/1 2427537 2415315 277.2MB/s urls +0.5%
BM_ZCord/2 85378 81412 1.5GB/s jpg +4.9%
BM_ZCord/3 121898 119419 753.3MB/s pdf +2.1%
BM_ZCord/4 779564 762961 512.0MB/s html4 +2.2%
BM_ZDataBuffer/0 213820 207272 471.1MB/s html +3.2%
BM_ZDataBuffer/1 2589010 2586495 258.9MB/s urls +0.1%
BM_ZDataBuffer/2 121871 118885 1018.4MB/s jpg +2.5%
BM_ZDataBuffer/3 145382 145986 616.2MB/s pdf -0.4%
BM_ZDataBuffer/4 868117 852754 458.1MB/s html4 +1.8%
Sum of all benchmarks 33771833 33744763 +0.1%
------------------------------------------------------------------------
r70 | [email protected] | 2013-01-06 20:21:26 +0100 (Sun, 06 Jan 2013) | 6 lines
Adjust the Snappy open-source distribution for the changes in Google's
internal file API.
R=sanjay
------------------------------------------------------------------------
r69 | [email protected] | 2013-01-04 12:54:20 +0100 (Fri, 04 Jan 2013) | 15 lines
Change a few ORs to additions where they don't matter. This helps the compiler
use the LEA instruction more efficiently, since e.g. a + (b << 2) can be encoded
as one instruction. Even more importantly, it can constant-fold the
COPY_* enums together with the shifted negative constants, which also saves
some instructions. (We don't need it for LITERAL, since it happens to be 0.)
I am unsure why the compiler couldn't do this itself, but the theory is that
it cannot prove that len-1 and len-4 cannot underflow/wrap, and thus can't
do the optimization safely.
The gains are small but measurable; 0.5-1.0% over the BM_Z* benchmarks
(measured on Westmere, Sandy Bridge and Istanbul).
R=sanjay
------------------------------------------------------------------------
r68 | [email protected] | 2012-10-08 13:37:16 +0200 (Mon, 08 Oct 2012) | 5 lines
Stop giving -Werror to automake, due to an incompatibility between current
versions of libtool and automake on non-GNU platforms (e.g. Mac OS X).
R=sanjay
------------------------------------------------------------------------
r67 | [email protected] | 2012-08-17 15:54:47 +0200 (Fri, 17 Aug 2012) | 5 lines
Fix public issue 66: Document GetUncompressedLength better, in particular that
it leaves the source in a state that's not appropriate for RawUncompress.
R=sanjay
------------------------------------------------------------------------
r66 | [email protected] | 2012-07-31 13:44:44 +0200 (Tue, 31 Jul 2012) | 5 lines
Fix public issue 64: Check for <sys/time.h> at configure time,
since MSVC seemingly does not have it.
R=sanjay
------------------------------------------------------------------------
r65 | [email protected] | 2012-07-04 11:34:48 +0200 (Wed, 04 Jul 2012) | 10 lines
Handle the case where gettimeofday() goes backwards or returns the same value
twice; it could cause division by zero in the unit test framework.
(We already had one fix for this in place, but it was incomplete.)
This could in theory happen on any system, since there are few guarantees
about gettimeofday(), but seems to only happen in practice on GNU/Hurd, where
gettimeofday() is cached and only updated ever so often.
R=sanjay
------------------------------------------------------------------------
r64 | [email protected] | 2012-07-04 11:28:33 +0200 (Wed, 04 Jul 2012) | 6 lines
Mark ARMv4 as not supporting unaligned accesses (not just ARMv5 and ARMv6);
apparently Debian still targets these by default, giving us segfaults on
armel.
R=sanjay
------------------------------------------------------------------------
r63 | [email protected] | 2012-05-22 11:46:05 +0200 (Tue, 22 May 2012) | 5 lines
Fix public bug #62: Remove an extraneous comma at the end of an enum list,
causing compile errors when embedded in Mozilla on OpenBSD.
R=sanjay
------------------------------------------------------------------------
r62 | [email protected] | 2012-05-22 11:32:50 +0200 (Tue, 22 May 2012) | 8 lines
Snappy library no longer depends on iostream.
Achieved by moving logging macro definitions to a test-only
header file, and by changing non-test code to use assert,
fprintf, and abort instead of LOG/CHECK macros.
R=sesse
------------------------------------------------------------------------
r61 | [email protected] | 2012-02-24 16:46:37 +0100 (Fri, 24 Feb 2012) | 4 lines
Release Snappy 1.0.5.
R=sanjay
------------------------------------------------------------------------
r60 | [email protected] | 2012-02-23 18:00:36 +0100 (Thu, 23 Feb 2012) | 57 lines
For 32-bit platforms, do not try to accelerate multiple neighboring
32-bit loads with a 64-bit load during compression (it's not a win).
The main target for this optimization is ARM, but 32-bit x86 gets
a small gain, too, although there is noise in the microbenchmarks.
It's a no-op for 64-bit x86. It does not affect decompression.
Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from
Ubuntu/Linaro), -O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9
-mthumb-interwork, minimum 1000 iterations:
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_ZFlat/0 1158277 1160000 1000 84.2MB/s html (23.57 %) [ +4.3%]
BM_ZFlat/1 14861782 14860000 1000 45.1MB/s urls (50.89 %) [ +1.1%]
BM_ZFlat/2 393595 390000 1000 310.5MB/s jpg (99.88 %) [ +0.0%]
BM_ZFlat/3 650583 650000 1000 138.4MB/s pdf (82.13 %) [ +3.1%]
BM_ZFlat/4 4661480 4660000 1000 83.8MB/s html4 (23.55 %) [ +4.3%]
BM_ZFlat/5 491973 490000 1000 47.9MB/s cp (48.12 %) [ +2.0%]
BM_ZFlat/6 193575 192678 1038 55.2MB/s c (42.40 %) [ +9.0%]
BM_ZFlat/7 62343 62754 3187 56.5MB/s lsp (48.37 %) [ +2.6%]
BM_ZFlat/8 17708468 17710000 1000 55.5MB/s xls (41.34 %) [ -0.3%]
BM_ZFlat/9 3755345 3760000 1000 38.6MB/s txt1 (59.81 %) [ +8.2%]
BM_ZFlat/10 3324217 3320000 1000 36.0MB/s txt2 (64.07 %) [ +4.2%]
BM_ZFlat/11 10139932 10140000 1000 40.1MB/s txt3 (57.11 %) [ +6.4%]
BM_ZFlat/12 13532109 13530000 1000 34.0MB/s txt4 (68.35 %) [ +5.0%]
BM_ZFlat/13 4690847 4690000 1000 104.4MB/s bin (18.21 %) [ +4.1%]
BM_ZFlat/14 830682 830000 1000 43.9MB/s sum (51.88 %) [ +1.2%]
BM_ZFlat/15 84784 85011 2235 47.4MB/s man (59.36 %) [ +1.1%]
BM_ZFlat/16 1293254 1290000 1000 87.7MB/s pb (23.15 %) [ +2.3%]
BM_ZFlat/17 2775155 2780000 1000 63.2MB/s gaviota (38.27 %) [+12.2%]
Core i7 in 32-bit mode (only one run and 100 iterations, though, so noisy):
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_ZFlat/0 227582 223464 3043 437.0MB/s html (23.57 %) [ +7.4%]
BM_ZFlat/1 2982430 2918455 233 229.4MB/s urls (50.89 %) [ +2.9%]
BM_ZFlat/2 46967 46658 15217 2.5GB/s jpg (99.88 %) [ +0.0%]
BM_ZFlat/3 115298 114864 5833 783.2MB/s pdf (82.13 %) [ +1.5%]
BM_ZFlat/4 913440 899743 778 434.2MB/s html4 (23.55 %) [ +0.3%]
BM_ZFlat/5 110302 108571 7000 216.1MB/s cp (48.12 %) [ +0.0%]
BM_ZFlat/6 44409 43372 15909 245.2MB/s c (42.40 %) [ +0.8%]
BM_ZFlat/7 15713 15643 46667 226.9MB/s lsp (48.37 %) [ +2.7%]
BM_ZFlat/8 2625539 2602230 269 377.4MB/s xls (41.34 %) [ +1.4%]
BM_ZFlat/9 808884 811429 875 178.8MB/s txt1 (59.81 %) [ -3.9%]
BM_ZFlat/10 709532 700000 1000 170.5MB/s txt2 (64.07 %) [ +0.0%]
BM_ZFlat/11 2177682 2162162 333 188.2MB/s txt3 (57.11 %) [ -1.4%]
BM_ZFlat/12 2849640 2840000 250 161.8MB/s txt4 (68.35 %) [ -1.4%]
BM_ZFlat/13 849760 835476 778 585.8MB/s bin (18.21 %) [ +1.2%]
BM_ZFlat/14 165940 164571 4375 221.6MB/s sum (51.88 %) [ +1.4%]
BM_ZFlat/15 20939 20571 35000 196.0MB/s man (59.36 %) [ +2.1%]
BM_ZFlat/16 239209 236544 2917 478.1MB/s pb (23.15 %) [ +4.2%]
BM_ZFlat/17 616206 610000 1000 288.2MB/s gaviota (38.27 %) [ -1.6%]
R=sanjay
------------------------------------------------------------------------
r59 | [email protected] | 2012-02-21 18:02:17 +0100 (Tue, 21 Feb 2012) | 107 lines
Enable the use of unaligned loads and stores for ARM-based architectures
where they are available (ARMv7 and higher). This gives a significant
speed boost on ARM, both for compression and decompression.
It should not affect x86 at all.
There are more changes possible to speed up ARM, but it might not be
that easy to do without hurting x86 or making the code uglier.
Also, we de not try to use NEON yet.
Microbenchmark results on a Cortex-A9 1GHz, using g++ 4.6.2 (from Ubuntu/Linaro),
-O2 -DNDEBUG -Wa,-march=armv7a -mtune=cortex-a9 -mthumb-interwork:
Benchmark Time(ns) CPU(ns) Iterations
---------------------------------------------------
BM_UFlat/0 524806 529100 378 184.6MB/s html [+33.6%]
BM_UFlat/1 5139790 5200000 100 128.8MB/s urls [+28.8%]
BM_UFlat/2 86540 84166 1901 1.4GB/s jpg [ +0.6%]
BM_UFlat/3 215351 210176 904 428.0MB/s pdf [+29.8%]
BM_UFlat/4 2144490 2100000 100 186.0MB/s html4 [+33.3%]
BM_UFlat/5 194482 190000 1000 123.5MB/s cp [+36.2%]
BM_UFlat/6 91843 90175 2107 117.9MB/s c [+38.6%]
BM_UFlat/7 28535 28426 6684 124.8MB/s lsp [+34.7%]
BM_UFlat/8 9206600 9200000 100 106.7MB/s xls [+42.4%]
BM_UFlat/9 1865273 1886792 106 76.9MB/s txt1 [+32.5%]
BM_UFlat/10 1576809 1587301 126 75.2MB/s txt2 [+32.3%]
BM_UFlat/11 4968450 4900000 100 83.1MB/s txt3 [+32.7%]
BM_UFlat/12 6673970 6700000 100 68.6MB/s txt4 [+32.8%]
BM_UFlat/13 2391470 2400000 100 203.9MB/s bin [+29.2%]
BM_UFlat/14 334601 344827 522 105.8MB/s sum [+30.6%]
BM_UFlat/15 37404 38080 5252 105.9MB/s man [+33.8%]
BM_UFlat/16 535470 540540 370 209.2MB/s pb [+31.2%]
BM_UFlat/17 1875245 1886792 106 93.2MB/s gaviota [+37.8%]
BM_UValidate/0 178425 179533 1114 543.9MB/s html [ +2.7%]
BM_UValidate/1 2100450 2000000 100 334.8MB/s urls [ +5.0%]
BM_UValidate/2 1039 1044 172413 113.3GB/s jpg [ +3.4%]
BM_UValidate/3 59423 59470 3363 1.5GB/s pdf [ +7.8%]
BM_UValidate/4 760716 766283 261 509.8MB/s html4 [ +6.5%]
BM_ZFlat/0 1204632 1204819 166 81.1MB/s html (23.57 %) [+32.8%]
BM_ZFlat/1 15656190 15600000 100 42.9MB/s urls (50.89 %) [+27.6%]
BM_ZFlat/2 403336 410677 487 294.8MB/s jpg (99.88 %) [+16.5%]
BM_ZFlat/3 664073 671140 298 134.0MB/s pdf (82.13 %) [+28.4%]
BM_ZFlat/4 4961940 4900000 100 79.7MB/s html4 (23.55 %) [+30.6%]
BM_ZFlat/5 500664 501253 399 46.8MB/s cp (48.12 %) [+33.4%]
BM_ZFlat/6 217276 215982 926 49.2MB/s c (42.40 %) [+25.0%]
BM_ZFlat/7 64122 65487 3054 54.2MB/s lsp (48.37 %) [+36.1%]
BM_ZFlat/8 18045730 18000000 100 54.6MB/s xls (41.34 %) [+34.4%]
BM_ZFlat/9 4051530 4000000 100 36.3MB/s txt1 (59.81 %) [+25.0%]
BM_ZFlat/10 3451800 3500000 100 34.1MB/s txt2 (64.07 %) [+25.7%]
BM_ZFlat/11 11052340 11100000 100 36.7MB/s txt3 (57.11 %) [+24.3%]
BM_ZFlat/12 14538690 14600000 100 31.5MB/s txt4 (68.35 %) [+24.7%]
BM_ZFlat/13 5041850 5000000 100 97.9MB/s bin (18.21 %) [+32.0%]
BM_ZFlat/14 908840 909090 220 40.1MB/s sum (51.88 %) [+22.2%]
BM_ZFlat/15 86921 86206 1972 46.8MB/s man (59.36 %) [+42.2%]
BM_ZFlat/16 1312315 1315789 152 86.0MB/s pb (23.15 %) [+34.5%]
BM_ZFlat/17 3173120 3200000 100 54.9MB/s gaviota (38.27%) [+28.1%]
The move from 64-bit to 32-bit operations for the copies also affected 32-bit x86;
positive on the decompression side, and slightly negative on the compression side
(unless that is noise; I only ran once):
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------
BM_UFlat/0 86279 86140 7778 1.1GB/s html [ +7.5%]
BM_UFlat/1 839265 822622 778 813.9MB/s urls [ +9.4%]
BM_UFlat/2 9180 9143 87500 12.9GB/s jpg [ +1.2%]
BM_UFlat/3 35080 35000 20000 2.5GB/s pdf [+10.1%]
BM_UFlat/4 350318 345000 2000 1.1GB/s html4 [ +7.0%]
BM_UFlat/5 33808 33472 21212 701.0MB/s cp [ +9.0%]
BM_UFlat/6 15201 15214 46667 698.9MB/s c [+14.9%]
BM_UFlat/7 4652 4651 159091 762.9MB/s lsp [ +7.5%]
BM_UFlat/8 1285551 1282528 538 765.7MB/s xls [+10.7%]
BM_UFlat/9 282510 281690 2414 514.9MB/s txt1 [+13.6%]
BM_UFlat/10 243494 239286 2800 498.9MB/s txt2 [+14.4%]
BM_UFlat/11 743625 740000 1000 550.0MB/s txt3 [+14.3%]
BM_UFlat/12 999441 989717 778 464.3MB/s txt4 [+16.1%]
BM_UFlat/13 412402 410076 1707 1.2GB/s bin [ +7.3%]
BM_UFlat/14 54876 54000 10000 675.3MB/s sum [+13.0%]
BM_UFlat/15 6146 6100 100000 660.8MB/s man [+14.8%]
BM_UFlat/16 90496 90286 8750 1.2GB/s pb [ +4.0%]
BM_UFlat/17 292650 292000 2500 602.0MB/s gaviota [+18.1%]
BM_UValidate/0 49620 49699 14286 1.9GB/s html [ +0.0%]
BM_UValidate/1 501371 500000 1000 1.3GB/s urls [ +0.0%]
BM_UValidate/2 232 227 3043478 521.5GB/s jpg [ +1.3%]
BM_UValidate/3 17250 17143 43750 5.1GB/s pdf [ -1.3%]
BM_UValidate/4 198643 200000 3500 1.9GB/s html4 [ -0.9%]
BM_ZFlat/0 227128 229415 3182 425.7MB/s html (23.57 %) [ -1.4%]
BM_ZFlat/1 2970089 2960000 250 226.2MB/s urls (50.89 %) [ -1.9%]
BM_ZFlat/2 45683 44999 15556 2.6GB/s jpg (99.88 %) [ +2.2%]
BM_ZFlat/3 114661 113136 6364 795.1MB/s pdf (82.13 %) [ -1.5%]
BM_ZFlat/4 919702 914286 875 427.2MB/s html4 (23.55%) [ -1.3%]
BM_ZFlat/5 108189 108422 6364 216.4MB/s cp (48.12 %) [ -1.2%]
BM_ZFlat/6 44525 44000 15909 241.7MB/s c (42.40 %) [ -2.9%]
BM_ZFlat/7 15973 15857 46667 223.8MB/s lsp (48.37 %) [ +0.0%]
BM_ZFlat/8 2677888 2639405 269 372.1MB/s xls (41.34 %) [ -1.4%]
BM_ZFlat/9 800715 780000 1000 186.0MB/s txt1 (59.81 %) [ -0.4%]
BM_ZFlat/10 700089 700000 1000 170.5MB/s txt2 (64.07 %) [ -2.9%]
BM_ZFlat/11 2159356 2138365 318 190.3MB/s txt3 (57.11 %) [ -0.3%]
BM_ZFlat/12 2796143 2779923 259 165.3MB/s txt4 (68.35 %) [ -1.4%]
BM_ZFlat/13 856458 835476 778 585.8MB/s bin (18.21 %) [ -0.1%]
BM_ZFlat/14 166908 166857 4375 218.6MB/s sum (51.88 %) [ -1.4%]
BM_ZFlat/15 21181 20857 35000 193.3MB/s man (59.36 %) [ -0.8%]
BM_ZFlat/16 244009 239973 2917 471.3MB/s pb (23.15 %) [ -1.4%]
BM_ZFlat/17 596362 590000 1000 297.9MB/s gaviota (38.27%) [ +0.0%]
R=sanjay
------------------------------------------------------------------------
r58 | [email protected] | 2012-02-11 23:11:22 +0100 (Sat, 11 Feb 2012) | 9 lines
Lower the size allocated in the "corrupted input" unit test from 256 MB
to 2 MB. This fixes issues with running the unit test on platforms with
little RAM (e.g. some ARM boards).
Also, reactivate the 2 MB test for 64-bit platforms; there's no good
reason why it shouldn't be.
R=sanjay
------------------------------------------------------------------------
r57 | [email protected] | 2012-01-08 18:55:48 +0100 (Sun, 08 Jan 2012) | 2 lines
Minor refactoring to accomodate changes in Google's internal code tree.
------------------------------------------------------------------------
r56 | [email protected] | 2012-01-04 14:10:46 +0100 (Wed, 04 Jan 2012) | 19 lines
Fix public issue r57: Fix most warnings with -Wall, mostly signed/unsigned
warnings. There are still some in the unit test, but the main .cc file should
be clean. We haven't enabled -Wall for the default build, since the unit test
is still not clean.
This also fixes a real bug in the open-source implementation of
ReadFileToStringOrDie(); it would not detect errors correctly.
I had to go through some pains to avoid performance loss as the types
were changed; I think there might still be some with 32-bit if and only if LFS
is enabled (ie., size_t is 64-bit), but for regular 32-bit and 64-bit I can't
see any losses, and I've diffed the generated GCC assembler between the old and
new code without seeing any significant choices. If anything, it's ever so
slightly faster.
This may or may not enable compression of very large blocks (>2^32 bytes)
when size_t is 64-bit, but I haven't checked, and it is still not a supported
case.
------------------------------------------------------------------------
r55 | [email protected] | 2012-01-04 11:46:39 +0100 (Wed, 04 Jan 2012) | 6 lines
Add a framing format description. We do not have any implementation of this at
the current point, but there seems to be enough of a general interest in the
topic (cf. public bug #34).
R=csilvers,sanjay
------------------------------------------------------------------------
r54 | [email protected] | 2011-12-05 22:27:26 +0100 (Mon, 05 Dec 2011) | 81 lines
Speed up decompression by moving the refill check to the end of the loop.
This seems to work because in most of the branches, the compiler can evaluate
“ip_limit_ - ip” in a more efficient way than reloading ip_limit_ from memory
(either by already having the entire expression in a register, or reconstructing
it from “avail”, or something else). Memory loads, even from L1, are seemingly
costly in the big picture at the current decompression speeds.
Microbenchmarks (64-bit, opt mode):
Westmere (Intel Core i7):
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------
BM_UFlat/0 74492 74491 187894 1.3GB/s html [ +5.9%]
BM_UFlat/1 712268 712263 19644 940.0MB/s urls [ +3.8%]
BM_UFlat/2 10591 10590 1000000 11.2GB/s jpg [ -6.8%]
BM_UFlat/3 29643 29643 469915 3.0GB/s pdf [ +7.9%]
BM_UFlat/4 304669 304667 45930 1.3GB/s html4 [ +4.8%]
BM_UFlat/5 28508 28507 490077 823.1MB/s cp [ +4.0%]
BM_UFlat/6 12415 12415 1000000 856.5MB/s c [ +8.6%]
BM_UFlat/7 3415 3415 4084723 1039.0MB/s lsp [+18.0%]
BM_UFlat/8 979569 979563 14261 1002.5MB/s xls [ +5.8%]
BM_UFlat/9 230150 230148 60934 630.2MB/s txt1 [ +5.2%]
BM_UFlat/10 197167 197166 71135 605.5MB/s txt2 [ +4.7%]
BM_UFlat/11 607394 607390 23041 670.1MB/s txt3 [ +5.6%]
BM_UFlat/12 808502 808496 17316 568.4MB/s txt4 [ +5.0%]
BM_UFlat/13 372791 372788 37564 1.3GB/s bin [ +3.3%]
BM_UFlat/14 44541 44541 313969 818.8MB/s sum [ +5.7%]
BM_UFlat/15 4833 4833 2898697 834.1MB/s man [ +4.8%]
BM_UFlat/16 79855 79855 175356 1.4GB/s pb [ +4.8%]
BM_UFlat/17 245845 245843 56838 715.0MB/s gaviota [ +5.8%]
Clovertown (Intel Core 2):
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------
BM_UFlat/0 107911 107890 100000 905.1MB/s html [ +2.2%]
BM_UFlat/1 1011237 1011041 10000 662.3MB/s urls [ +2.5%]
BM_UFlat/2 26775 26770 523089 4.4GB/s jpg [ +0.0%]
BM_UFlat/3 48103 48095 290618 1.8GB/s pdf [ +3.4%]
BM_UFlat/4 437724 437644 31937 892.6MB/s html4 [ +2.1%]
BM_UFlat/5 39607 39600 358284 592.5MB/s cp [ +2.4%]
BM_UFlat/6 18227 18224 768191 583.5MB/s c [ +2.7%]
BM_UFlat/7 5171 5170 2709437 686.4MB/s lsp [ +3.9%]
BM_UFlat/8 1560291 1559989 8970 629.5MB/s xls [ +3.6%]
BM_UFlat/9 335401 335343 41731 432.5MB/s txt1 [ +3.0%]
BM_UFlat/10 287014 286963 48758 416.0MB/s txt2 [ +2.8%]
BM_UFlat/11 888522 888356 15752 458.1MB/s txt3 [ +2.9%]
BM_UFlat/12 1186600 1186378 10000 387.3MB/s txt4 [ +3.1%]
BM_UFlat/13 572295 572188 24468 855.4MB/s bin [ +2.1%]
BM_UFlat/14 64060 64049 218401 569.4MB/s sum [ +4.1%]
BM_UFlat/15 7264 7263 1916168 555.0MB/s man [ +1.4%]
BM_UFlat/16 108853 108836 100000 1039.1MB/s pb [ +1.7%]
BM_UFlat/17 364289 364223 38419 482.6MB/s gaviota [ +4.9%]
Barcelona (AMD Opteron):
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------
BM_UFlat/0 103900 103871 100000 940.2MB/s html [ +8.3%]
BM_UFlat/1 1000435 1000107 10000 669.5MB/s urls [ +6.6%]
BM_UFlat/2 24659 24652 567362 4.8GB/s jpg [ +0.1%]
BM_UFlat/3 48206 48193 291121 1.8GB/s pdf [ +5.0%]
BM_UFlat/4 421980 421850 33174 926.0MB/s html4 [ +7.3%]
BM_UFlat/5 40368 40357 346994 581.4MB/s cp [ +8.7%]
BM_UFlat/6 19836 19830 708695 536.2MB/s c [ +8.0%]
BM_UFlat/7 6100 6098 2292774 581.9MB/s lsp [ +9.0%]
BM_UFlat/8 1693093 1692514 8261 580.2MB/s xls [ +8.0%]
BM_UFlat/9 365991 365886 38225 396.4MB/s txt1 [ +7.1%]
BM_UFlat/10 311330 311238 44950 383.6MB/s txt2 [ +7.6%]
BM_UFlat/11 975037 974737 14376 417.5MB/s txt3 [ +6.9%]
BM_UFlat/12 1303558 1303175 10000 352.6MB/s txt4 [ +7.3%]
BM_UFlat/13 517448 517290 27144 946.2MB/s bin [ +5.5%]
BM_UFlat/14 66537 66518 210352 548.3MB/s sum [ +7.5%]
BM_UFlat/15 7976 7974 1760383 505.6MB/s man [ +5.6%]
BM_UFlat/16 103121 103092 100000 1097.0MB/s pb [ +8.7%]
BM_UFlat/17 391431 391314 35733 449.2MB/s gaviota [ +6.5%]
R=sanjay
------------------------------------------------------------------------
r53 | [email protected] | 2011-11-23 12:14:17 +0100 (Wed, 23 Nov 2011) | 88 lines
Speed up decompression by making the fast path for literals faster.
We do the fast-path step as soon as possible; in fact, as soon as we know the
literal length. Since we usually hit the fast path, we can then skip the checks
for long literals and available input space (beyond what the fast path check
already does).
Note that this changes the decompression Writer API; however, it does not
change the ABI, since writers are always templatized and as such never
cross compilation units. The new API is slightly more general, in that it
doesn't hard-code the value 16. Note that we also take care to check
for len <= 16 first, since the other two checks almost always succeed
(so we don't want to waste time checking for them until we have to).
The improvements are most marked on Nehalem, but are generally positive
on other platforms as well. All microbenchmarks are 64-bit, opt.
Clovertown (Core 2):
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------
BM_UFlat/0 110226 110224 100000 886.0MB/s html [ +1.5%]
BM_UFlat/1 1036523 1036508 10000 646.0MB/s urls [ -0.8%]
BM_UFlat/2 26775 26775 522570 4.4GB/s jpg [ +0.0%]
BM_UFlat/3 49738 49737 280974 1.8GB/s pdf [ +0.3%]
BM_UFlat/4 446790 446792 31334 874.3MB/s html4 [ +0.8%]