Skip to content

Latest commit

 

History

History
622 lines (458 loc) · 39.9 KB

magni.md

File metadata and controls

622 lines (458 loc) · 39.9 KB

Example results on bare metal

I ran these on the following hardware:

  • Intel Xeon E5-2650 v4 @ 2.20 GHz
  • 512GB DDR4 memory (not that we would need it)
  • NVidia Tesla P100 (16GB memory)

Software stack:

  • CentOS 7

  • GNU compiler tookit 8.3.0

  • Python 3.7.3

  • CUDA 10.1

  • Most packages pulled from conda-forge (exceptions see below)

  • Backend versions:

    bohrium==0.11.0.post19  # built from source
    cupy==7.6.0
    jax==0.1.72  # built from source
    jaxlib==0.1.51  # built from source
    llvmlite==0.33.0
    numba==0.50.1
    numpy==1.19.0
    pytorch==1.4.0
    tensorflow==2.2.0
    theano==1.0.4

Contents

Equation of state

An equation consisting of >100 terms with no data dependencies and only elementary math. This benchmark should represent a best-case scenario for vector instructions and GPU performance.

CPU

$ taskset -c 23 python run.py benchmarks/equation_of_state/

benchmarks.equation_of_state
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numba         10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.013     2.977
       4,096  theano        10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.013     2.964
       4,096  tensorflow    10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.010     2.937
       4,096  jax           10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.012     2.584
       4,096  numpy          1,000     0.002     0.001     0.002     0.002     0.002     0.002     0.013     1.000
       4,096  pytorch        1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.011     0.758
       4,096  bohrium          100     0.055     0.001     0.054     0.054     0.055     0.055     0.058     0.031

      16,384  tensorflow    10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.013     3.784
      16,384  jax           10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.014     3.414
      16,384  theano         1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.011     3.103
      16,384  numba         10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.016     2.867
      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.016     1.000
      16,384  pytorch        1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.015     0.999
      16,384  bohrium          100     0.057     0.001     0.056     0.056     0.057     0.057     0.060     0.126

      65,536  tensorflow     1,000     0.006     0.000     0.006     0.006     0.006     0.006     0.016     5.515
      65,536  jax            1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.008     5.042
      65,536  theano         1,000     0.009     0.000     0.008     0.009     0.009     0.009     0.021     4.077
      65,536  numba          1,000     0.010     0.001     0.010     0.010     0.010     0.010     0.019     3.657
      65,536  pytorch          100     0.034     0.005     0.027     0.028     0.036     0.040     0.042     1.037
      65,536  numpy            100     0.035     0.006     0.029     0.029     0.037     0.041     0.045     1.000
      65,536  bohrium          100     0.065     0.002     0.063     0.064     0.064     0.064     0.072     0.548

     262,144  tensorflow     1,000     0.021     0.001     0.020     0.021     0.021     0.021     0.030     8.316
     262,144  jax            1,000     0.025     0.001     0.023     0.024     0.025     0.025     0.028     7.036
     262,144  theano           100     0.031     0.001     0.030     0.031     0.031     0.031     0.034     5.647
     262,144  numba            100     0.035     0.001     0.035     0.035     0.035     0.035     0.038     5.020
     262,144  bohrium          100     0.091     0.002     0.090     0.091     0.091     0.091     0.100     1.924
     262,144  pytorch          100     0.173     0.009     0.147     0.167     0.171     0.180     0.198     1.015
     262,144  numpy            100     0.176     0.006     0.164     0.169     0.179     0.181     0.187     1.000

   1,048,576  tensorflow       100     0.099     0.003     0.092     0.097     0.099     0.101     0.108     7.383
   1,048,576  jax              100     0.100     0.003     0.094     0.098     0.099     0.101     0.108     7.288
   1,048,576  theano           100     0.129     0.003     0.125     0.127     0.127     0.129     0.139     5.673
   1,048,576  numba            100     0.145     0.003     0.144     0.144     0.144     0.145     0.160     5.021
   1,048,576  bohrium          100     0.207     0.004     0.200     0.205     0.206     0.206     0.220     3.534
   1,048,576  numpy             10     0.730     0.005     0.724     0.726     0.730     0.735     0.737     1.000
   1,048,576  pytorch           10     0.839     0.008     0.830     0.833     0.837     0.843     0.858     0.870

   4,194,304  tensorflow        10     0.389     0.009     0.379     0.386     0.387     0.388     0.413     9.164
   4,194,304  jax               10     0.407     0.012     0.388     0.407     0.408     0.409     0.432     8.751
   4,194,304  theano            10     0.518     0.015     0.510     0.510     0.510     0.511     0.547     6.878
   4,194,304  numba             10     0.581     0.020     0.570     0.570     0.571     0.578     0.627     6.125
   4,194,304  bohrium           10     0.650     0.002     0.648     0.649     0.650     0.651     0.654     5.476
   4,194,304  numpy             10     3.560     0.019     3.548     3.550     3.554     3.560     3.614     1.000
   4,194,304  pytorch           10     4.790     0.024     4.770     4.774     4.778     4.793     4.841     0.743

(time in wall seconds, less is better)

$ taskset -c 23 python run.py benchmarks/equation_of_state/ -s 16777216

benchmarks.equation_of_state
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
  16,777,216  tensorflow        10     1.496     0.029     1.480     1.483     1.483     1.486     1.577     9.149
  16,777,216  jax               10     1.811     0.039     1.769     1.793     1.797     1.820     1.904     7.559
  16,777,216  theano            10     2.104     0.040     2.086     2.088     2.090     2.094     2.224     6.506
  16,777,216  numba             10     2.117     0.023     2.105     2.106     2.109     2.110     2.183     6.466
  16,777,216  bohrium           10     2.449     0.037     2.423     2.425     2.426     2.481     2.511     5.588
  16,777,216  numpy             10    13.686     0.025    13.660    13.666    13.673    13.704    13.732     1.000
  16,777,216  pytorch           10    18.330     0.035    18.270    18.310    18.331    18.339    18.409     0.747

(time in wall seconds, less is better)

GPU

$ for backend in bohrium cupy jax pytorch tensorflow theano; do CUDA_VISIBLE_DEVICES="0" python run.py benchmarks/equation_of_state/ --gpu -b $backend -b numpy; done

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numpy         10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.015     1.000
       4,096  bohrium          100     0.056     0.001     0.055     0.055     0.055     0.055     0.061     0.029

      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.012     1.000
      16,384  bohrium          100     0.056     0.001     0.055     0.055     0.055     0.055     0.059     0.126

      65,536  numpy            100     0.030     0.002     0.029     0.029     0.029     0.029     0.040     1.000
      65,536  bohrium          100     0.056     0.001     0.055     0.055     0.055     0.055     0.061     0.531

     262,144  bohrium          100     0.056     0.001     0.055     0.055     0.055     0.056     0.059     2.423
     262,144  numpy            100     0.135     0.004     0.120     0.133     0.133     0.134     0.161     1.000

   1,048,576  bohrium          100     0.056     0.001     0.055     0.056     0.056     0.056     0.061    13.903
   1,048,576  numpy             10     0.779     0.009     0.771     0.774     0.774     0.785     0.794     1.000

   4,194,304  bohrium          100     0.057     0.001     0.056     0.057     0.057     0.057     0.062    62.559
   4,194,304  numpy             10     3.566     0.022     3.552     3.555     3.559     3.562     3.631     1.000

(time in wall seconds, less is better)

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numpy         10,000     0.002     0.001     0.002     0.002     0.002     0.002     0.015     1.000
       4,096  cupy           1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.020     0.199

      16,384  numpy          1,000     0.007     0.001     0.007     0.007     0.007     0.007     0.021     1.000
      16,384  cupy           1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.022     0.870

      65,536  cupy           1,000     0.009     0.001     0.008     0.008     0.008     0.008     0.022     4.882
      65,536  numpy            100     0.042     0.003     0.030     0.040     0.042     0.042     0.049     1.000

     262,144  cupy           1,000     0.009     0.001     0.008     0.008     0.008     0.008     0.020    21.548
     262,144  numpy            100     0.185     0.002     0.181     0.184     0.184     0.185     0.192     1.000

   1,048,576  cupy           1,000     0.016     0.001     0.016     0.016     0.016     0.016     0.030    46.246
   1,048,576  numpy             10     0.747     0.001     0.745     0.746     0.747     0.748     0.748     1.000

   4,194,304  cupy             100     0.060     0.000     0.060     0.060     0.060     0.060     0.061    58.375
   4,194,304  numpy             10     3.527     0.002     3.523     3.525     3.527     3.528     3.530     1.000

(time in wall seconds, less is better)

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  jax           10,000     0.000     0.000     0.000     0.000     0.000     0.000     0.016     6.402
       4,096  numpy         10,000     0.002     0.001     0.002     0.002     0.002     0.002     0.017     1.000

      16,384  jax           10,000     0.000     0.001     0.000     0.000     0.000     0.000     0.016    35.237
      16,384  numpy          1,000     0.010     0.002     0.007     0.009     0.009     0.012     0.022     1.000

      65,536  jax           10,000     0.000     0.001     0.000     0.000     0.000     0.000     0.017   160.817
      65,536  numpy            100     0.048     0.003     0.029     0.047     0.048     0.050     0.051     1.000

     262,144  jax           10,000     0.000     0.001     0.000     0.000     0.000     0.000     0.016   515.477
     262,144  numpy            100     0.188     0.002     0.171     0.187     0.188     0.189     0.192     1.000

   1,048,576  jax           10,000     0.002     0.001     0.001     0.001     0.002     0.002     0.016   515.759
   1,048,576  numpy             10     0.789     0.005     0.783     0.786     0.787     0.789     0.802     1.000

   4,194,304  jax           10,000     0.001     0.001     0.001     0.001     0.001     0.001     0.017  2566.638
   4,194,304  numpy             10     3.490     0.016     3.476     3.480     3.485     3.490     3.527     1.000

(time in wall seconds, less is better)

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  pytorch      100,000     0.000     0.000     0.000     0.000     0.000     0.000     0.016    27.642
       4,096  numpy         10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.017     1.000

      16,384  pytorch      100,000     0.000     0.000     0.000     0.000     0.000     0.000     0.017   113.228
      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.019     1.000

      65,536  pytorch      100,000     0.000     0.000     0.000     0.000     0.000     0.000     0.016   413.623
      65,536  numpy            100     0.036     0.007     0.029     0.029     0.033     0.043     0.045     1.000

     262,144  pytorch      100,000     0.000     0.000     0.000     0.000     0.000     0.000     0.015  1028.368
     262,144  numpy            100     0.179     0.007     0.168     0.171     0.183     0.184     0.193     1.000

   1,048,576  pytorch       10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.001  1252.851
   1,048,576  numpy             10     0.722     0.004     0.718     0.719     0.720     0.726     0.728     1.000

   4,194,304  pytorch       10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.014  1754.906
   4,194,304  numpy             10     3.470     0.002     3.467     3.468     3.470     3.472     3.474     1.000

(time in wall seconds, less is better)

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  tensorflow    10,000     0.001     0.001     0.000     0.000     0.000     0.001     0.018     3.225
       4,096  numpy          1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.014     1.000

      16,384  tensorflow    10,000     0.001     0.001     0.000     0.001     0.001     0.001     0.016    12.167
      16,384  numpy          1,000     0.008     0.001     0.006     0.007     0.007     0.008     0.012     1.000

      65,536  tensorflow    10,000     0.001     0.001     0.001     0.001     0.001     0.001     0.016    62.993
      65,536  numpy            100     0.046     0.004     0.042     0.044     0.045     0.047     0.078     1.000

     262,144  tensorflow    10,000     0.001     0.001     0.001     0.001     0.001     0.001     0.019   144.502
     262,144  numpy            100     0.191     0.007     0.180     0.187     0.189     0.193     0.224     1.000

   1,048,576  tensorflow    10,000     0.003     0.000     0.003     0.003     0.003     0.003     0.014   248.496
   1,048,576  numpy             10     0.807     0.010     0.796     0.801     0.804     0.815     0.823     1.000

   4,194,304  tensorflow    10,000     0.009     0.000     0.009     0.009     0.009     0.009     0.023   384.760
   4,194,304  numpy             10     3.616     0.079     3.543     3.564     3.590     3.624     3.809     1.000

(time in wall seconds, less is better)

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  theano        10,000     0.000     0.001     0.000     0.000     0.000     0.000     0.013     6.897
       4,096  numpy          1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.012     1.000

      16,384  theano        10,000     0.000     0.001     0.000     0.000     0.000     0.000     0.013    23.210
      16,384  numpy          1,000     0.007     0.001     0.007     0.007     0.007     0.007     0.021     1.000

      65,536  theano        10,000     0.001     0.001     0.001     0.001     0.001     0.001     0.013    65.748
      65,536  numpy            100     0.038     0.002     0.030     0.037     0.037     0.039     0.044     1.000

     262,144  theano        10,000     0.002     0.000     0.001     0.001     0.001     0.002     0.014   121.565
     262,144  numpy            100     0.183     0.005     0.175     0.180     0.182     0.187     0.195     1.000

   1,048,576  theano         1,000     0.006     0.001     0.005     0.005     0.005     0.008     0.020   130.103
   1,048,576  numpy             10     0.816     0.018     0.785     0.810     0.814     0.831     0.843     1.000

   4,194,304  theano         1,000     0.029     0.000     0.024     0.029     0.029     0.030     0.032   123.718
   4,194,304  numpy             10     3.633     0.051     3.541     3.593     3.642     3.669     3.705     1.000

(time in wall seconds, less is better)

Isoneutral mixing

A more balanced routine with many data dependencies (stencil operations), and tensor shapes of up to 5 dimensions. This is the most expensive part of Veros, so in a way this is the benchmark that interests me the most.

CPU

$ taskset -c 23 python run.py benchmarks/isoneutral_mixing/

benchmarks.isoneutral_mixing
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numba         10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.009     3.333
       4,096  jax           10,000     0.002     0.001     0.002     0.002     0.002     0.002     0.036     2.046
       4,096  theano         1,000     0.003     0.000     0.002     0.003     0.003     0.003     0.006     1.415
       4,096  numpy          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.011     1.000
       4,096  pytorch        1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.011     0.557
       4,096  bohrium          100     0.072     0.001     0.071     0.071     0.071     0.071     0.076     0.053

      16,384  numba          1,000     0.006     0.000     0.005     0.006     0.006     0.006     0.013     2.549
      16,384  jax            1,000     0.006     0.000     0.006     0.006     0.006     0.006     0.010     2.353
      16,384  theano         1,000     0.010     0.000     0.010     0.010     0.010     0.010     0.015     1.435
      16,384  numpy          1,000     0.014     0.000     0.014     0.014     0.014     0.014     0.020     1.000
      16,384  pytorch        1,000     0.015     0.000     0.015     0.015     0.015     0.015     0.020     0.930
      16,384  bohrium          100     0.078     0.001     0.077     0.077     0.077     0.078     0.082     0.184

      65,536  numba          1,000     0.025     0.001     0.025     0.025     0.025     0.025     0.033     2.238
      65,536  jax            1,000     0.026     0.001     0.025     0.026     0.026     0.026     0.037     2.173
      65,536  theano           100     0.039     0.001     0.039     0.039     0.039     0.039     0.042     1.427
      65,536  pytorch          100     0.044     0.001     0.043     0.043     0.044     0.044     0.046     1.278
      65,536  numpy            100     0.056     0.001     0.055     0.056     0.056     0.056     0.062     1.000
      65,536  bohrium          100     0.106     0.003     0.103     0.104     0.104     0.108     0.115     0.528

     262,144  numba            100     0.098     0.004     0.095     0.095     0.096     0.098     0.114     2.359
     262,144  jax              100     0.117     0.002     0.115     0.116     0.116     0.116     0.126     1.972
     262,144  theano           100     0.170     0.010     0.151     0.161     0.170     0.177     0.191     1.358
     262,144  pytorch          100     0.175     0.006     0.166     0.170     0.175     0.180     0.190     1.314
     262,144  bohrium          100     0.210     0.005     0.204     0.206     0.208     0.214     0.221     1.099
     262,144  numpy            100     0.230     0.009     0.218     0.221     0.230     0.236     0.256     1.000

   1,048,576  numba             10     0.457     0.001     0.456     0.456     0.456     0.457     0.459     2.452
   1,048,576  jax               10     0.532     0.001     0.531     0.531     0.532     0.533     0.533     2.106
   1,048,576  bohrium           10     0.646     0.013     0.633     0.641     0.643     0.644     0.684     1.733
   1,048,576  theano            10     0.760     0.004     0.756     0.758     0.759     0.759     0.770     1.475
   1,048,576  pytorch           10     0.958     0.013     0.948     0.949     0.956     0.959     0.994     1.169
   1,048,576  numpy             10     1.120     0.004     1.115     1.118     1.119     1.122     1.127     1.000

   4,194,304  numba             10     1.886     0.024     1.851     1.867     1.885     1.904     1.931     2.529
   4,194,304  jax               10     2.255     0.021     2.242     2.245     2.248     2.254     2.318     2.115
   4,194,304  bohrium           10     2.374     0.024     2.349     2.358     2.363     2.382     2.433     2.009
   4,194,304  theano            10     3.073     0.028     3.054     3.058     3.065     3.071     3.155     1.552
   4,194,304  pytorch           10     4.756     0.188     4.556     4.563     4.752     4.910     5.007     1.003
   4,194,304  numpy             10     4.769     0.017     4.752     4.757     4.762     4.779     4.799     1.000

(time in wall seconds, less is better)

$ taskset -c 23 python run.py benchmarks/isoneutral_mixing/ -s 16777216

benchmarks.isoneutral_mixing
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
  16,777,216  numba             10     7.652     0.032     7.575     7.637     7.664     7.669     7.693     2.935
  16,777,216  jax               10     8.880     0.052     8.838     8.847     8.859     8.890     9.022     2.529
  16,777,216  bohrium           10     9.559     0.124     9.354     9.519     9.566     9.611     9.791     2.350
  16,777,216  theano            10    12.890     0.050    12.801    12.859    12.888    12.921    12.977     1.743
  16,777,216  numpy             10    22.462     0.049    22.373    22.424    22.468    22.496    22.548     1.000
  16,777,216  pytorch           10    24.891     0.039    24.839    24.866    24.884    24.915    24.973     0.902

(time in wall seconds, less is better)

GPU

$ for backend in bohrium cupy jax pytorch theano; do CUDA_VISIBLE_DEVICES="0" python run.py benchmarks/isoneutral_mixing/ --gpu -b $backend -b numpy; done

benchmarks.isoneutral_mixing
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numpy          1,000     0.004     0.001     0.004     0.004     0.004     0.004     0.012     1.000
       4,096  bohrium          100     0.076     0.002     0.074     0.075     0.075     0.075     0.082     0.052

      16,384  numpy          1,000     0.014     0.001     0.014     0.014     0.014     0.014     0.047     1.000
      16,384  bohrium          100     0.076     0.002     0.074     0.075     0.075     0.076     0.088     0.190

      65,536  numpy            100     0.057     0.002     0.055     0.056     0.056     0.056     0.063     1.000
      65,536  bohrium          100     0.077     0.002     0.075     0.076     0.077     0.078     0.086     0.733

     262,144  bohrium          100     0.080     0.003     0.076     0.077     0.080     0.081     0.089     3.098
     262,144  numpy            100     0.248     0.006     0.234     0.243     0.248     0.249     0.267     1.000

   1,048,576  bohrium          100     0.092     0.005     0.086     0.086     0.094     0.095     0.103    12.269
   1,048,576  numpy             10     1.127     0.013     1.118     1.120     1.122     1.128     1.165     1.000

   4,194,304  bohrium          100     0.155     0.005     0.145     0.152     0.156     0.157     0.166    31.080
   4,194,304  numpy             10     4.815     0.068     4.755     4.768     4.779     4.841     4.941     1.000

(time in wall seconds, less is better)

benchmarks.isoneutral_mixing
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numpy          1,000     0.004     0.001     0.004     0.004     0.004     0.004     0.010     1.000
       4,096  cupy           1,000     0.011     0.001     0.010     0.011     0.011     0.011     0.015     0.356

      16,384  cupy           1,000     0.011     0.001     0.010     0.011     0.011     0.011     0.017     1.320
      16,384  numpy          1,000     0.014     0.000     0.014     0.014     0.014     0.014     0.018     1.000

      65,536  cupy           1,000     0.011     0.001     0.011     0.011     0.011     0.011     0.017     5.163
      65,536  numpy            100     0.056     0.002     0.055     0.055     0.055     0.056     0.065     1.000

     262,144  cupy           1,000     0.011     0.001     0.011     0.011     0.011     0.011     0.015    21.722
     262,144  numpy            100     0.245     0.005     0.234     0.241     0.245     0.247     0.261     1.000

   1,048,576  cupy           1,000     0.022     0.000     0.022     0.022     0.022     0.022     0.026    50.066
   1,048,576  numpy             10     1.120     0.006     1.114     1.115     1.117     1.122     1.134     1.000

   4,194,304  cupy             100     0.085     0.001     0.085     0.085     0.085     0.086     0.088    55.967
   4,194,304  numpy             10     4.778     0.021     4.768     4.769     4.772     4.775     4.840     1.000

(time in wall seconds, less is better)

benchmarks.isoneutral_mixing
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  jax           10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.011     2.724
       4,096  numpy          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.008     1.000

      16,384  jax           10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.007     9.922
      16,384  numpy          1,000     0.015     0.001     0.014     0.014     0.014     0.015     0.027     1.000

      65,536  jax           10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.008    27.143
      65,536  numpy            100     0.058     0.003     0.055     0.056     0.058     0.059     0.071     1.000

     262,144  jax            1,000     0.006     0.000     0.006     0.006     0.006     0.006     0.011    43.892
     262,144  numpy            100     0.251     0.003     0.244     0.249     0.250     0.251     0.264     1.000

   1,048,576  jax            1,000     0.019     0.000     0.019     0.019     0.019     0.019     0.024    57.624
   1,048,576  numpy             10     1.117     0.005     1.111     1.113     1.114     1.122     1.128     1.000

   4,194,304  jax              100     0.072     0.000     0.071     0.071     0.072     0.072     0.073    66.022
   4,194,304  numpy             10     4.728     0.023     4.692     4.707     4.741     4.745     4.749     1.000

(time in wall seconds, less is better)

benchmarks.isoneutral_mixing
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numpy          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.010     1.000
       4,096  pytorch        1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.015     0.483

      16,384  pytorch        1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.013     1.793
      16,384  numpy          1,000     0.014     0.001     0.014     0.014     0.014     0.014     0.019     1.000

      65,536  pytorch        1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.014     6.817
      65,536  numpy            100     0.056     0.002     0.055     0.055     0.055     0.056     0.065     1.000

     262,144  pytorch        1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.012    29.159
     262,144  numpy            100     0.246     0.004     0.224     0.242     0.247     0.249     0.259     1.000

   1,048,576  pytorch        1,000     0.020     0.000     0.020     0.020     0.020     0.020     0.024    56.054
   1,048,576  numpy             10     1.115     0.003     1.113     1.113     1.114     1.116     1.122     1.000

   4,194,304  pytorch          100     0.074     0.001     0.074     0.074     0.074     0.074     0.079    64.054
   4,194,304  numpy             10     4.732     0.004     4.726     4.730     4.733     4.735     4.739     1.000

(time in wall seconds, less is better)

benchmarks.isoneutral_mixing
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  theano        10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.007     2.240
       4,096  numpy          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.008     1.000

      16,384  theano        10,000     0.003     0.000     0.002     0.002     0.002     0.003     0.007     5.480
      16,384  numpy          1,000     0.014     0.000     0.014     0.014     0.014     0.014     0.019     1.000

      65,536  theano         1,000     0.006     0.000     0.006     0.006     0.006     0.007     0.013     9.001
      65,536  numpy            100     0.056     0.001     0.055     0.055     0.055     0.056     0.061     1.000

     262,144  theano         1,000     0.018     0.002     0.017     0.017     0.017     0.019     0.032    12.259
     262,144  numpy            100     0.226     0.011     0.218     0.220     0.222     0.225     0.279     1.000

   1,048,576  theano           100     0.103     0.007     0.085     0.099     0.100     0.113     0.115    10.890
   1,048,576  numpy             10     1.127     0.021     1.110     1.113     1.120     1.124     1.174     1.000

   4,194,304  theano            10     0.386     0.018     0.380     0.380     0.380     0.381     0.439    12.280
   4,194,304  numpy             10     4.741     0.020     4.723     4.729     4.733     4.739     4.781     1.000

(time in wall seconds, less is better)

Turbulent kinetic energy

This routine consists of some stencil operations and some linear algebra (a tridiagonal matrix solver), which cannot be vectorized.

CPU

$ taskset -c 23 python run.py benchmarks/turbulent_kinetic_energy/

benchmarks.turbulent_kinetic_energy
===================================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  jax           10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.007     2.001
       4,096  numba         10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.005     1.852
       4,096  numpy          1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.007     1.000
       4,096  bohrium           10     0.048     0.001     0.046     0.047     0.047     0.049     0.050     0.048

      16,384  jax           10,000     0.003     0.000     0.002     0.003     0.003     0.003     0.013     2.728
      16,384  numba          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.009     1.773
      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.011     1.000
      16,384  bohrium          100     0.049     0.001     0.048     0.048     0.048     0.049     0.052     0.148

      65,536  jax            1,000     0.010     0.000     0.009     0.010     0.010     0.010     0.013     2.669
      65,536  numba          1,000     0.013     0.000     0.013     0.013     0.013     0.013     0.018     2.009
      65,536  numpy          1,000     0.026     0.001     0.026     0.026     0.026     0.026     0.031     1.000
      65,536  bohrium          100     0.056     0.001     0.054     0.055     0.055     0.057     0.059     0.465

     262,144  numba            100     0.046     0.002     0.042     0.043     0.047     0.047     0.050     2.585
     262,144  jax              100     0.051     0.002     0.044     0.051     0.051     0.052     0.054     2.319
     262,144  bohrium           10     0.085     0.003     0.079     0.086     0.086     0.087     0.089     1.385
     262,144  numpy            100     0.118     0.009     0.099     0.116     0.123     0.124     0.130     1.000

   1,048,576  numba            100     0.178     0.004     0.175     0.176     0.177     0.178     0.195     3.026
   1,048,576  bohrium          100     0.198     0.005     0.193     0.194     0.197     0.203     0.213     2.721
   1,048,576  jax              100     0.250     0.002     0.247     0.249     0.250     0.251     0.261     2.154
   1,048,576  numpy             10     0.539     0.006     0.535     0.537     0.537     0.538     0.557     1.000

   4,194,304  bohrium           10     0.643     0.006     0.633     0.639     0.645     0.646     0.652     3.194
   4,194,304  numba             10     0.683     0.007     0.675     0.677     0.681     0.690     0.693     3.005
   4,194,304  jax               10     1.155     0.009     1.145     1.148     1.151     1.162     1.172     1.778
   4,194,304  numpy             10     2.053     0.018     2.032     2.041     2.046     2.062     2.095     1.000

(time in wall seconds, less is better)

$ taskset -c 23 python run.py benchmarks/turbulent_kinetic_energy/ -s 16777216

benchmarks.turbulent_kinetic_energy
===================================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
  16,777,216  bohrium           10     2.386     0.011     2.370     2.381     2.384     2.386     2.410     4.315
  16,777,216  numba             10     2.629     0.032     2.598     2.607     2.615     2.641     2.710     3.917
  16,777,216  jax               10     4.397     0.016     4.379     4.388     4.391     4.399     4.436     2.342
  16,777,216  numpy             10    10.297     0.092    10.217    10.234    10.274    10.280    10.476     1.000

(time in wall seconds, less is better)

GPU

$ for backend in bohrium jax; do CUDA_VISIBLE_DEVICES="0" python run.py benchmarks/turbulent_kinetic_energy/ --gpu -b $backend -b numpy; done

benchmarks.turbulent_kinetic_energy
===================================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numpy         10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.012     1.000
       4,096  bohrium          100     0.048     0.002     0.047     0.048     0.048     0.048     0.061     0.043

      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.009     1.000
      16,384  bohrium          100     0.048     0.002     0.047     0.048     0.048     0.049     0.061     0.145

      65,536  numpy            100     0.026     0.001     0.025     0.026     0.026     0.026     0.032     1.000
      65,536  bohrium          100     0.049     0.002     0.048     0.049     0.049     0.050     0.061     0.529

     262,144  bohrium          100     0.053     0.006     0.049     0.049     0.052     0.055     0.082     1.998
     262,144  numpy            100     0.106     0.005     0.099     0.101     0.103     0.111     0.128     1.000

   1,048,576  bohrium           10     0.064     0.003     0.054     0.065     0.065     0.066     0.067     8.519
   1,048,576  numpy             10     0.548     0.014     0.532     0.536     0.546     0.555     0.578     1.000

   4,194,304  bohrium          100     0.091     0.010     0.082     0.083     0.091     0.094     0.137    23.098
   4,194,304  numpy             10     2.099     0.094     2.029     2.038     2.080     2.099     2.363     1.000

(time in wall seconds, less is better)

benchmarks.turbulent_kinetic_energy
===================================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  jax            1,000     0.002     0.000     0.001     0.002     0.002     0.002     0.008     1.303
       4,096  numpy         10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.010     1.000

      16,384  jax           10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.010     4.141
      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.008     0.013     1.000

      65,536  jax            1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.008    12.552
      65,536  numpy            100     0.029     0.003     0.025     0.026     0.029     0.032     0.035     1.000

     262,144  jax            1,000     0.004     0.001     0.004     0.004     0.004     0.004     0.009    31.513
     262,144  numpy            100     0.126     0.008     0.106     0.120     0.126     0.129     0.158     1.000

   1,048,576  jax            1,000     0.012     0.000     0.012     0.012     0.012     0.012     0.018    44.869
   1,048,576  numpy             10     0.550     0.008     0.543     0.544     0.547     0.554     0.570     1.000

   4,194,304  jax              100     0.047     0.000     0.047     0.047     0.047     0.047     0.049    44.246
   4,194,304  numpy             10     2.090     0.034     2.061     2.064     2.075     2.099     2.156     1.000

(time in wall seconds, less is better)