Skip to content

Latest commit

 

History

History
447 lines (355 loc) · 13.3 KB

stats.org

File metadata and controls

447 lines (355 loc) · 13.3 KB

Performance metrics

Download the test BAM file from https://www.internationalgenome.org/

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00100/alignment/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam

View

Processing an 841Mb BAM file.

Penguin2 56x Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz, 256Gb

Sambamba appears to be slightly faster between 0.8.1 and 0.8.2 thanks to ldc2 and LLVM updates.

time ./sambamba-0.8.2 sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam >
 /dev/null

sambamba 0.8.2 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.27.1 / DMD v2.097.2 / LLVM11.0.0 / bootstrap LDC - the LLVM D compiler (1.27.1)

real    0m18.636s
user    3m47.704s
sys     0m18.764s

sambamba 0.8.1 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)

real    0m19.045s
user    3m50.088s
sys     0m16.908s

AMD Ryzen 7 3700X 8-Core Processor

time ./bin/sambamba-0.8.1-pre1 view HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam > /dev/null

sambamba 0.8.1-pre1
 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)
real    0m1.398s
user    0m16.589s
sys     0m0.240s

time ./bin/sambamba-0.8.1-pre1 sort HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam

sambamba 0.8.1-pre1
 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)
real    0m9.151s
user    2m5.779s
sys     0m3.101s

time ./bin/sambamba-0.8.1-pre1 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.sorted.bam dedup.bam
 by Artem Tarasov and Pjotr Prins (C) 2012-2021
    LDC 1.26.0 / DMD v2.096.1 / LLVM9.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)
real    0m11.319s
user    1m47.719s
sys     0m4.070s

4x Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz (hyperthreaded)

sambamba 0.6.8-pre3

monza:~/tmp$ time ./sambamba view /gnu/data/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.orig > /dev/null

sambamba 0.6.8-pre3 by Artem Tarasov and Pjotr Prins (C) 2012-2018
    LDC 1.11.0 / DMD v2.081.2 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.6)

real    0m6.930s
user    0m26.940s
sys     0m0.516s

sambamba 0.6.8-pre2 by Artem Tarasov and Pjotr Prins (C) 2012-2018
    LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

real    0m6.854s
user    0m26.456s
sys     0m0.584s

        linux-vdso.so.1 (0x00007ffd227fc000)
        librt.so.1 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/librt.so.1 (0x00007f5d31082000)
        libpthread.so.0 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libpthread.so.0 (0x00007f5d30e64000)
        libm.so.6 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libm.so.6 (0x00007f5d30b52000)
        libdl.so.2 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libdl.so.2 (0x00007f5d3094e000)
        libgcc_s.so.1 => /gnu/store/h3z6nshhdlc8zgh4mi13x1br03xipi9r-gcc-7.2.0-lib/lib/libgcc_s.so.1 (0x00007f5d30737000)
        libc.so.6 => /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/libc.so.6 (0x00007f5d30398000)
        /gnu/store/n6nvxlk2j8ysffjh3jphn1k5silnakh6-glibc-2.25/lib/ld-linux-x86-64.so.2 (0x00007f5d3128a000)

8x Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz, 16Gb, SSD (hyperthreaded)

time ./build/sambamba view /gnu/data/HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam.orig > /dev/null

sambamba 0.6.8

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
    LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

real    0m2.869s
user    0m21.972s
sys     0m0.356s

sambamba 0.6.7

This version was built with:
    LDC 1.1.1
    using DMD v2.071.2
    using LLVM 3.9.1
    bootstrapped with LDC - the LLVM D compiler (1.1.1)

real    0m3.150s
user    0m24.668s
sys     0m0.320s

This version was built with:
    LDC 1.7.0
    using DMD v2.077.1
    using LLVM 5.0.1
    bootstrapped with LDC - the LLVM D compiler (1.7.0)

real    0m2.869s
user    0m22.344s
sys     0m0.344s

Sort

56x Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz, 256Gb

time ./sambamba_v0.6.6 sort -m 20615843020 -N -o /dev/null ENCFF696RLQ.bam -p

sambamba 0.6.8

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
    LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

real    7m50.558s
user    89m10.808s
sys     2m57.188s

and with 120GB RAM

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
    LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

real    3m49.953s
user    81m16.956s
sys     1m58.332s

sambamba 0.6.8-pre1

Wed Feb  7 03:43:14 CST 2018
sambamba 0.6.8-pre1

This version was built with:
    LDC 1.7.0
    using DMD v2.077.1
    using LLVM 5.0.1
    bootstrapped with LDC - the LLVM D compiler (1.7.0)

real    8m0.528s
user    88m44.084s
sys     2m45.888s

When sambamba is given enough RAM to hold everything in memory sambamba is twice as fast (apparently half the time goes to intermediate IO)

time ./sambamba sort -N -o /dev/null ENCFF696RLQ.bam -p -m 120G
real    3m46.856s
user    81m44.524s
sys     1m56.388s

with 64GB it is

real    5m36.062s
user    88m43.176s
sys     3m0.536s

and with 32GB it is

real    7m22.125s
user    89m6.188s
sys     2m51.228s

sambamba 0.6.7

This version was built with:
    LDC 1.7.0
    using DMD v2.077.1
    using LLVM 5.0.1
    bootstrapped with LDC - the LLVM D compiler (1.7.0)

real    18m15.809s
user    158m30.148s
sys     3m15.932s

Ouch! A regression in the shipped release 0.6.7.

This version was built with:
    LDC 1.1.1
    using DMD v2.071.2
    using LLVM 3.9.1
    bootstrapped with LDC - the LLVM D compiler (1.1.1)
    ldc2 -wi -I. -IBioD -IundeaD/src -g -O3 -release -enable-inlining -boundscheck=off

real    18m40.223s
user    159m34.292s
sys     3m19.300s

So, the same build is 2x slower than the previous version.

This version was built with:
    LDC 1.1.1
    using DMD v2.071.2
    using LLVM 3.9.1
    bootstrapped with LDC - the LLVM D compiler (1.1.1)
    Using ldmd2 @sambamba-ldmd-release.rsp
    "-g" "-O2" "-c" "-m64" "-release" "-IBioD/" "-IundeaD/src/" "-ofbuild/sambamba.o" "-odbuild" "-I."
    gcc -Wl,--gc-sections -o build/sambamba build/sambamba.o -Lhtslib -Llz4/lib -Wl,-Bstatic -lhts -llz4 -Wl,-Bdynamic /home/wrk/opt/ldc2-1.1.1-linux-x86_64/lib/libphobos2-ldc.a /home/wrk/opt/ldc2-1.1.1-linux-x86_64/lib/libdruntime-ldc.a -lrt -lpthread -lm

real    9m9.465s
user    97m56.204s
sys     2m50.512s

Updated the makefile to build with -singleobj. Now LLVM kicks in!

This version was built with:
    LDC 1.7.0
    using DMD v2.077.1
    using LLVM 5.0.1
    bootstrapped with LDC - the LLVM D compiler (1.7.0)

real    8m1.978s
user    89m13.936s
sys     2m47.392s

Next I tried adding profile guided optimization. But that turned out to be slower

This version was built with:
    LDC 1.7.0
    using DMD v2.077.1
    using LLVM 5.0.1
    bootstrapped with LDC - the LLVM D compiler (1.7.0)

real    11m16.267s
user    116m15.556s
sys     2m56.244s

So, the release is reverted an after a version bump:

sambamba 0.6.6

This version was built with:
    LDC 0.17.1
    using DMD v2.068.2
    using LLVM 3.8.0
    bootstrapped with version not available

real    10m0.932s
user    151m39.172s
sys     3m7.596s

This version was built with:
    LDC 1.1.1
    using DMD v2.071.2
    using LLVM 3.9.1
    bootstrapped with LDC - the LLVM D compiler (1.1.1)

real    9m22.501s
user    98m24.748s
sys     2m51.996s

Note, updating compiler shows a speed gain for 0.6.6.

Markdup

8x Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz, 16Gb, SSD (hyperthreaded)

sambamba 0.6.8

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018
    LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

finding positions of the duplicate reads in the file...
  sorted 11286293 end pairs
     and 156042 single ends (among them 0 unmatched pairs)
  collecting indices of duplicate reads...   done in 1325 ms
  found 6603388 duplicates
collected list of positions in 0 min 16 sec
marking duplicates...
collected list of positions in 1 min 2 sec
        Command being timed: "./bin/sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam"
        User time (seconds): 406.49
        System time (seconds): 3.86
        Percent of CPU this job got: 649%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:03.13
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1709720
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1140382
        Voluntary context switches: 393213
        Involuntary context switches: 8993
        Swaps: 0
        File system inputs: 0
        File system outputs: 2663824
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Uses slightly more memory but is faster than

sambamba 0.6.7-pre1

/usr/bin/time --verbose sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam
finding positions of the duplicate reads in the file...
  sorted 11286293 end pairs
     and 156042 single ends (among them 0 unmatched pairs)
  collecting indices of duplicate reads...   done in 1521 ms
  found 6603388 duplicates
collected list of positions in 0 min 16 sec
marking duplicates...
total time elapsed: 1 min 4 sec
        Command being timed: "sambamba markdup /gnu/data/in_raw.sorted.bam /gnu/data/in_raw.sorted.bam t2.bam"
        User time (seconds): 423.78
        System time (seconds): 4.47
        Percent of CPU this job got: 666%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:04.24
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1542764
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1839470
        Voluntary context switches: 368082
        Involuntary context switches: 8537
        Swaps: 0
        File system inputs: 0
        File system outputs: 2643840
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

8x Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz (Thinkpad T480)

/usr/bin/time --verbose ./bin/sambamba-0.7.1 "--DRT-gcopt=profile:1" markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam test.bam

sambamba 0.7.1

Commit 5f52f04aae3de1dce2d13b9e748002b4e513ded0

 by Artem Tarasov and Pjotr Prins (C) 2012-2019
    LDC 1.17.0 / DMD v2.087.1 / LLVM8.0.1 / bootstrap LDC - the LLVM D compiler (1.17.0)

finding positions of the duplicate reads in the file...
  sorted 3969781 end pairs
     and 73839 single ends (among them 22397 unmatched pairs)
  collecting indices of duplicate reads...   done in 372 ms
  found 239673 duplicates
collected list of positions in 0 min 6 sec
marking duplicates...
collected list of positions in 0 min 22 sec
        Number of collections:  107
        Total GC prep time:  10 milliseconds
        Total mark time:  548 milliseconds
        Total sweep time:  26 milliseconds
        Max Pause Time:  10 milliseconds
        Grand total GC time:  585 milliseconds
GC summary: 1158 MB,  107 GC  585 ms, Pauses  558 ms <   10 ms
        Command being timed: "./bin/sambamba-0.7.1 --DRT-gcopt=profile:1 markdup HG00100.chrom20.ILLUMINA.bwa.GBR.low_coverage.20130415.bam test2.bam"
        User time (seconds): 136.00
        System time (seconds): 2.39
        Percent of CPU this job got: 583%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:23.70
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1282940
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 396600
        Voluntary context switches: 199806
        Involuntary context switches: 5017
        Swaps: 0
        File system inputs: 16
        File system outputs: 1967376
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0