-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local processing: support UDF's #158
Comments
A first implementation of the run_udf process can be found here: https://github.com/VincentVerelst/openeo-processes-dask/tree/local-udf |
Hi @VincentVerelst, can you also provide an example code with a sample udf showing how to call this? |
Here a sample code to use it: import logging
logging.basicConfig(level = logging.INFO)
import openeo
from openeo.local import LocalConnection
local_conn = LocalConnection("./")
url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
spatial_extent = {"east": 11.40, "north": 46.52, "south": 46.46, "west": 11.25}
temporal_extent = ["2022-06-01", "2022-06-10"]
bands = ["red", "nir"]
properties = {"eo:cloud_cover": dict(lt=80)}
s2_datacube = local_conn.load_stac(
url=url,
spatial_extent=spatial_extent,
temporal_extent=temporal_extent,
bands=bands,
properties=properties,
)
b04 = s2_datacube.band("red")
b08 = s2_datacube.band("nir")
ndvi = (b08 - b04) / (b08 + b04)
ndvi_median = ndvi.reduce_dimension(dimension="time", reducer="median")
# Build a UDF object from an inline string with Python source code.
udf = openeo.UDF("""
from openeo.udf import XarrayDataCube
def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
array = cube.get_array()
print(array.shape)
array.values = 0.0001 * array.values
return cube
""")
# Or load the UDF code from a separate file.
# udf = openeo.UDF.from_file("udf-code.py")
# Apply the UDF to a cube.
rescaled_cube = ndvi_median.apply(process=udf)
print(rescaled_cube.execute()) |
@VincentVerelst I noticed that you're basically calling the implementation available in the openeo-python-client. A common misunderstanding occurs when someone tries to debug locally and then gets a different result on the cloud. This could be due to different chunk sizes used. Currently, if I check the shape of the array that the UDF code is manipulating, it loads everything without respecting the chunking. This needs to be adresses, so that chunks are used if present. |
Reviving this one as we need it for openEO platform project. The only problem perhaps is that different implementations are somewhat allowed to use different types of chunking. For local debugging, it could perhaps be nice to have a way to somehow enable different chunking modes, allowing to debug if the UDF works in all cases. I would however not address that in this initial implementation. |
@jdries I agree that for the first implementation it would be enough to have a fixed chunk size, what's the default at VITO? On our side @jzvolensky is the one working on it. |
Add UDF support, with the main goal of making it work for 'local' processing in the Python client.
The Python client already has the basics in place to run a UDF on a chunk of data:
https://github.com/Open-EO/openeo-python-client/blob/master/openeo/udf/run_code.py#L143
But it needs to be connected to the local processing implementation, which should be fairly similar to running one of the predefined functions.
To be seen which parent processes we want to support. apply_neighborhood is for sure very popular, but also apply_dimension is relevant.
The motivation for this is that UDF's are used in quite a few of the user workflows, so while users often continue to ask for local debugging, we can only point them to this feature when it supports UDF's.
The text was updated successfully, but these errors were encountered: