Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discuss best structure for modules (file names, classes, function names and arguments) #15

Closed
Tracked by #2
cchwala opened this issue Sep 25, 2024 · 4 comments
Closed
Tracked by #2

Comments

@cchwala
Copy link
Member

cchwala commented Sep 25, 2024

In this issue we can collect and discuss ideas regarding the structure of our code including:

  • high-level class structure
  • function names, names of module files and where to put which function
  • low-level function argument and return values
@cchwala
Copy link
Member Author

cchwala commented Sep 25, 2024

A first high-level functionality is being added now in #13 by @eoydvin. We might not have time to iterate it a lot. Because of that I add some thoughts here and not in the PR.

Regarding high-level adjustment functions

  • we have to allow gauge data as input
  • we have to evaluate how we can do the adjustment of different time stamps in parallel using dask, most probably via dask.delayed, since this might note work nicely if all data for all time stamps is stored in the adjustment class and the __call__(timestamp) then just operates on different time stamps.

Regarding low-level adjustment functions

The low-level functions could all look like this

def some_adjustment_function(da_radar_t, da_gauges_t, da_cmls_t, parameter_1, parameter_2, return_intermediate_fields=False):
    # note that da_radar_t would be 2D, i.e. only one time stamp

    # do adjustment
   ds_all_fields['intermediate_field_1'] = field_1

   # !!! we should agree on a good name for the final adjusted field which should always be the same var name !!!
   ds_all_fields['adjusted_rainfall'] = final_adjusted_field 

   if return_intermediate_fields:
      return ds_all_fields
   else:
      ds_all_fields['adjusted_rainfall']

@cchwala
Copy link
Member Author

cchwala commented Sep 25, 2024

Some thoughts and where we have to take into account the difference between line and point geometries and where we already do that (marked by x)

processing step Point Line does not matter
radar value at sensor x x
radar value at sensor with best of 9 x
calc diff or factor x
IDW of diff or factor x
Estimate Krig parameters x ?
Kriging of diff or factor x ?
KED x ?

@cchwala
Copy link
Member Author

cchwala commented Sep 27, 2024

I am still pondering over class structures for the "high-level interface for doing the merging. Here I just put my current thoughts (and might add some updated to this comment later with even more thoughts...)


The following is very similar to what @eoydvin is currently doing in #13 but does things a bit differently:

  • not adding the full Datasets in __init__
  • trying to do a "composition approach" in adjust where the actual adjustment calculation happens

Note that his is very much pseudo-code and not tested at all in a Python interpreter. This is just for me to remember the general idea.

# this could also be an abstract base class
class AdjusterBase:
    def __init__(grid, lines, points, some_more_params):
        '''Just add the geometries for init to precalcualte stuff'''
        
        # set stuff up
        self._grid_at_point = poligrain.GridAtPoint(..., some_params) 
        self._grid_at_line = poligrain.GridAtLine(..., params)

    def __call__(grid_t, lines_t, points_t):
        '''adjust one or more time steps'''

        # check that cached pre-calculated data is sufficent, and calc missing data e.g. when new CMLs are added
        # But, not sure if I like that, updating that much of the inner state of the object...?!
        self._update_cached_data(grid_t, lines_t, points_t)    

        # do adjustment, return also intermediate results at sensors
        ds_adjusted, ds_lines_intermed, ds_points_intermed = self._adjust(grid_t, lines_t, points_t)

        return ds_adjusted, ds_lines_intermed, ds_points_intermed

    def _update_cached_data(grid_t, lines_t, points_t):
        '''Check of pre-calculated stuff e.g. from GridAtLine needs update because of new sensors'''
        # do something

    def _adjust(grid_t, lines_t, points_t):
        '''needs to be implemented in subclass for each adjustment method'''

        return ds_adjusted, ds_lines_intermed, ds_points_intermed 


class AdjusterAdditivIDW(AdjusterBase):
    def __init__(grid, lines, points, some_param_like_best_of_9, some_idw_param):
        # we might need to also adjust the pre-calculation based on e.g. best_of_9, but we might also
        # be able to do e.g. best_of_9 separately later for CMLs by shifting the intersect_weights matrix
        # up, down, left and right
        super().__init__(grid, lines, points)

       # some custom init for the method

       # interpolators
       self._interpolator_point_grid() = SomeIDW()
       self._interpolator_line_grid() = SomeOtherIDW() # simple case would be to take the mid-point, could inherit from LineGridInterpolator()???
       # or maybe have on interpolator that takes both, an interpolator for lines and for points
       self._interpolator = Interpolator(PointGridIDW(), PointLineIDW())

    def _adjust(grid_t, lines_t, points_t):
        ds_points_intermed['radar_at_sensor']  = self._grid_at_point(...)
        ds_lines_intermed['radar_at_sensor']  = self._grid_at_line(...)

       ds_point_intermed['diff_at_sensor'] = ...

        # interpolate adjustment field
        ds_adjusted = self_interpolator(
            grid_t.rainfall_amount, 
            ds_lines_intermed.diff_at_sensor, 
            ds_points_intermed.diff_at_sensor,
        )

       return ds_adjusted, ds_lines_intermed, ds_points_intermed
  • UPDATE1:
    • Moved poligrain.GridAtPoint and GridAtLine to init in base class because we always need to do this calculations.
    • improved code in AdjusterAdditivIDW._adjust
  • UPDATE2:
    • added more explanation via comment and pseudo-code for AdjusterBase._update_cached_data

@cchwala
Copy link
Member Author

cchwala commented Nov 7, 2024

This issue could still be discussed further at a later stage, but after #21 we now have a first version of our high-level API that we want to keep.

I am closing this issue now, but we can reopen once there is the need (and the time) to iterate high-level functions, naming, etc. once more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

1 participant