Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking: Standardise cf standards and missingval handling #695

Merged
merged 43 commits into from
Jan 12, 2025
Merged

Conversation

rafaqz
Copy link
Owner

@rafaqz rafaqz commented Jul 12, 2024

This PR standardises cf and missingval behaviour accross all sources.

New keywords are:

  • cf=true: apply offset and scale if the are not zero and one.
  • maskingval=missing this is the value the missingval will be converted to if there is one.

I don't like either of these names! if you can think of anything better please suggest something.

@felixcremer @meggart this is the new diskarrays object that does 2-way cf and missingval/maskingval conversions

https://github.com/rafaqz/Rasters.jl/compare/cf?expand=1#diff-1478a88a8bdcffcf7f556cc1865dbe18d69b5c03f75b4481e1f5889e8a6b6b73

@rafaqz rafaqz changed the title Cf Standardise cf standards and missingval handling Jul 12, 2024
@rafaqz rafaqz changed the title Standardise cf standards and missingval handling Breaking: Standardise cf standards and missingval handling Jul 15, 2024
@Rapsodia86
Copy link

Rapsodia86 commented Jul 28, 2024

How about inverting it? Instead of "cf" call it "raw". If raw=false, offset and scale are applied, if raw=true, offset and scale are NOT applied?

And raw=true would be the default?

Maybe "navalue" instead of "maskingval"?

@rafaqz
Copy link
Owner Author

rafaqz commented Jul 28, 2024

Currently I've got scaled=true as the default instead of cf. raw might suggest the missing value is also not transformed?

Main concern with navalue is it sounds like R, and isn't really used in julia

@Rapsodia86
Copy link

Yes, you are right that with "raw" one may expect that data were not touched at all.

Indeed, "navalue" originates from my heavy R usage! And I agree NA is not very julia way.

Point taken:)

@rafaqz
Copy link
Owner Author

rafaqz commented Jul 28, 2024

Thanks for the input though, good to have all the options to choose from

@rafaqz
Copy link
Owner Author

rafaqz commented Aug 13, 2024

Coming back to this I like how short and obvious raw=true is!

But to mean no scaling and no masking - so it will override everything else.

src/create.jl Outdated Show resolved Hide resolved
src/create.jl Outdated Show resolved Hide resolved
src/create.jl Outdated Show resolved Hide resolved

"""
create([f!], [filename], template; kw...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When should I use this as opposed to Raster(data, dims; ...)? What are the advantages beyond efficiency in writing to file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the direct constructor going to call create?

Copy link
Owner Author

@rafaqz rafaqz Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well Raster(filename) reads an existing file... create(filename) creates one.

So this is best for building something new from scratch, Raster is best for opening something, rewrapping a DimArray, etc. Otherwise you need to define the Raster in-memory then write it as separate steps.

(create is already called in a lot of places under the hood where a new file might be created, like in nearly all methods with a filename keyword. It's been here for years, just not publicly)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I guess I'm not entirely clear on what to use when creating a purely in memory dataset. Does the pure Raster(data, dims; kwargs...) constructor have the same options? Should it? That's the most intuitive option IMO to create an in memory raster...

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it will be pretty much identical for in-memory rasters. But if you want chunks and e.g. compression options you have to do it separately in write. An in-memory raster doesn't have chunks or options like that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet that sounds good. Will add a suggestion for a line in the docstring to clarify.

rafaqz and others added 2 commits September 3, 2024 11:10
This fixes a strange issue I was having where Rasters.jl read the wrong values from a Kerchunk/Zarr dataset.
src/methods/shared_docstrings.jl Outdated Show resolved Hide resolved
src/methods/shared_docstrings.jl Outdated Show resolved Hide resolved
src/methods/shared_docstrings.jl Outdated Show resolved Hide resolved
src/methods/shared_docstrings.jl Outdated Show resolved Hide resolved
@rafaqz rafaqz changed the base branch from main to breaking January 12, 2025 15:20
@rafaqz rafaqz merged commit 6cbad97 into breaking Jan 12, 2025
4 checks passed
@rafaqz rafaqz deleted the cf branch January 12, 2025 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants