Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NVPerfWorks to measure FLOPs #3

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Use NVPerfWorks to measure FLOPs #3

wants to merge 1 commit into from

Conversation

vchuravy
Copy link
Collaborator

@vchuravy vchuravy commented Apr 5, 2023

Still need to verify that the data is accurate using ncu as a baseline,
and we need to decide if we want to average with something like OnlineStats.jl

This is measuring FLOP/s in "kernel time" not "host time".

Running locally with NZ=12 RESOLUTION=3

┌ Info: Duration
│   time = 1.44577531
│   gctime = 0.0
└   bytes = 13188120
┌ Info: Fraction active
└   fraction = 0.5
┌ Info: Kernel performance
│   time = 0.5812019466666665 s
│   D_FLOP = 1.00283101181e11 Instruction
│   F_FLOP = 7.03965312e9 Instruction
│   H_FLOP = 0.0 Instruction
│   D_FLOPs = 1.725443312021713e11 Instruction s^-1
│   F_FLOPs = 1.2112232521542831e10 Instruction s^-1
└   H_FLOPs = 0.0 Instruction s^-1
┌ Info: Arithmetic intensity (DRAM)
│   AI_D_DRAM = 10.103490546902368 Instruction Byte^-1
│   AI_F_DRAM = 0.7092428127349074 Instruction Byte^-1
└   AI_H_DRAM = 0.0 Instruction Byte^-1
┌ Info: Arithmetic intensity (L2)
│   AI_D_L2 = 5.165718443517095 Instruction Byte^-1
│   AI_F_L2 = 0.3626220722104721 Instruction Byte^-1
└   AI_H_L2 = 0.0 Instruction Byte^-1
┌ Info: Arithmetic intensity (L1)
│   AI_D_L1 = 2.4958553124543723 Instruction Byte^-1
│   AI_F_L1 = 0.17520355304605265 Instruction Byte^-1
└   AI_H_L1 = 0.0 Instruction Byte^-1

So we get 172GFLOP/s double and 12GFLOP/s single. To be noted this is FLOP/s averaged across all kernels.
The peak FLOP/s is likely higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant