Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask large graph warning #921

Open
keflavich opened this issue Oct 9, 2024 · 6 comments
Open

Dask large graph warning #921

keflavich opened this issue Oct 9, 2024 · 6 comments

Comments

@keflavich
Copy link
Contributor

Dask has started giving me these warnings:

/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/dask/base.py:1539: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
  warnings.warn(
/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/distributed/client.py:3362: UserWarning: Sending large graph of size 47.13 MiB.
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.
  warnings.warn(
/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/distributed/client.py:3362: UserWarning: Sending large graph of size 126.51 MiB.
This may cause some slowdown.

The associated graph is:

>>>     print("Dask graph:\n", cube._data.max().__dask_graph__(), flush=True)
Dask graph:
 HighLevelGraph with 8 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x150f9ce61c70>
 0. 6977221d-6a75-4ab8-aed6-75a692e17ad4
 1. chunk_max-c293526886e109dbc1ba2e3d6e94fd78
 2. chunk_max-partial-9b4ae50165aa3ddaa8891d2cab4066a4
 3. chunk_max-partial-b4bbe00a6376ca780c5337c38c199e47
 4. chunk_max-partial-098e6a0a6325d9963750709ed6a35a79
 5. chunk_max-partial-f2ff242a5c6da2f66470609a428891ea
 6. chunk_max-partial-73d557f29fcdc9eabfd781ef9c432f25
 7. max-aggregate-680d8b8300e162151ed6176d3c2726bb

Questions:

  • How can we avoid this warning?
  • Can/should we silence it?
@e-koch
Copy link
Contributor

e-koch commented Oct 9, 2024

Where are those -partial graphs getting generated?

@keflavich
Copy link
Contributor Author

@keflavich
Copy link
Contributor Author

Specifically, this line:
https://github.com/ACES-CMZ/reduction_ACES/blob/a46e180b703b223e5dd62a1d4ff50e5795fdfd6c/aces/analysis/giantcube_cuts.py#L23
Everything else is just standard spectral-cube data loading

@keflavich
Copy link
Contributor Author

I tried rechunking, and that didn't solve the problem:

Rechunked
DaskSpectralCube with shape=(350, 4080, 11120) and unit=Jy / beam and chunk size (350, 200, 200):
 n_x:  11120  type_x: GLON-TAN  unit_x: deg    range:     0.909178 deg:  359.364966 deg
 n_y:   4080  type_y: GLAT-TAN  unit_y: deg    range:    -0.311997 deg:    0.254476 deg
 n_s:    350  type_s: VRAD      unit_s: km / s  range:     -146.953 km / s:     147.798 km / s
Dask graph:
 HighLevelGraph with 9 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x145f1638c9e0>
 0. 5744e66f-a4b3-444f-832b-b2ae12657309
 1. rechunk-merge-db4a6e74f53bd62f6a6b8b1c8817a488
 2. chunk_max-85ceef86ff8d9df5784eceac4ed6a6cd
 3. chunk_max-partial-7f09f44688721f7f7643a821e6d100b5
 4. chunk_max-partial-633428dfbe2f9b5cb24dc2c75a9b1d4a
 5. chunk_max-partial-0b42dfbd8ef559b3b6f72917920388cf
 6. chunk_max-partial-e960640a0197802e76bfd8c078b0459c
 7. chunk_max-partial-c51f3ba01fe06d52273b4035c44d6ceb
 8. max-aggregate-db525d0d4d6361435d2cdbeb84ce14e6

mom0.  dt=0.12298011779785156
/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/dask/base.py:1539: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
  warnings.warn(
/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/distributed/client.py:3362: UserWarning: Sending large graph of size 173.64 MiB.
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.

This is a gigantic FITS cube, which means the data are in flat layers on disk image-by-image. Maybe that's forcing the large graph?

@astrofrog
Copy link
Member

I started seeing these kinds of warnings too with reproject - these graphs aren't necessarily complex though, it could just be that a data slice is being sent in binary as part of the graph. I'm not sure how to avoid this though.

@keflavich
Copy link
Contributor Author

One good sign is that using bigger chunks does reduce the number of partial operations:

Dask client number of workers: 1
DaskSpectralCube with shape=(300, 4080, 11120) and unit=Jy / beam and chunk size (255, 255, 255):
 n_x:  11120  type_x: GLON-TAN  unit_x: deg    range:     0.909178 deg:  359.364966 deg
 n_y:   4080  type_y: GLAT-TAN  unit_y: deg    range:    -0.311997 deg:    0.254476 deg
 n_s:    300  type_s: VRAD      unit_s: km / s  range:     -221.189 km / s:     222.674 km / s
Dask graph:
 HighLevelGraph with 8 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x149add0a6c00>
 0. ea43d47d-ab76-4786-9ed3-049e52076e81
 1. chunk_max-5de7d249bf54deff61ac7995a5277344
 2. chunk_max-partial-b240a90a4f7cdee879affd0289a964c8
 3. chunk_max-partial-66d97cf593469547b0533bf50babfbf1
 4. chunk_max-partial-2382c6b9397c9760b8992bbddfc3a257
 5. chunk_max-partial-f6a5157013c6b9151be571910896ec8a
 6. chunk_max-partial-01c3ef2d806b4549a6850ec4da7adfa0
 7. max-aggregate-77ff3583db01b576400d5b794cd328cd

Rechunked
DaskSpectralCube with shape=(300, 4080, 11120) and unit=Jy / beam and chunk size (300, 1000, 1000):
 n_x:  11120  type_x: GLON-TAN  unit_x: deg    range:     0.909178 deg:  359.364966 deg
 n_y:   4080  type_y: GLAT-TAN  unit_y: deg    range:    -0.311997 deg:    0.254476 deg
 n_s:    300  type_s: VRAD      unit_s: km / s  range:     -221.189 km / s:     222.674 km / s
Dask graph:
 HighLevelGraph with 7 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x149add119490>
 0. ea43d47d-ab76-4786-9ed3-049e52076e81
 1. rechunk-merge-575b61836efc57b192f77658004f907f
 2. chunk_max-0c11ae1ececeeb643ba57d27dd18ecab
 3. chunk_max-partial-cdfe2ffa0dc31776e247150b41d4ba24
 4. chunk_max-partial-6feefd99e51d090685b8eb38f157059f
 5. chunk_max-partial-d3db70dfbb53e172394396ea02546c64
 6. max-aggregate-cf9eda41d10c8faf3794c7f7d5766264

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants