-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor code (and some other changes) #43
Conversation
…d making the choice of CI API a parameter
…be nore versatile and be the only new object used
… APIs are added later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Llannelongue for your work and the detailed PR description. As this is a large set of changes I won't be able to look at everything in one go.
About the API interface:
I don't see the benefit of introducing a new plain class for the API interface, compared to the currently implemented APIInterface namedtuple. It provides the same access to the request url and response parsing functions through dot notation and a lot less code.
More problematic to me is that the API-specific logic of forming the query URL and parsing the response is coupled back to the API interface code. Somebody wishing to interface cats with a new web API would have to modify the implementation of CI_API_interface
itself, adding a new statememt if self.choice_CI_API == 'myapi.org'
and hardcoding API-specific code in there. I feel like this is a step back from the current implementation (introduced in #37) in which interfacing against a different API is only a matter of providing two functions (possibly grouping them into a new APIInterface object) without having to touch the code of cats itself at any point.
Lastly, do we really want to rename api_query.py
and api_interface.py
?
Thanks for these @tlestang
More on this below after the second comment, but regarding
This is an interesting point, and probably needs more thinking. However not sure about not having to touch the code, I see two different use cases here:
I opened issue #44 to have a space to discuss (2) further.
The motivation behind that is to clarify that the code in there is related to pulling carbon intensities, in case more APIs for other parts of the code are included later. But at the moment, this is only future-proofing so I don't mind removing the |
Following up on that, commit 8c927c7 reintroduces namedtuple for the api interfaces. |
Agreed 100%, and that's the motivation behind #37. But the separation between the API query code and the API specific bits wasn't done with a particular usage of cats in mind - its generally useful for developers adding new APIs because they don't have to touch the api query module. Turns out its also useful for users playing with cats as a library as they can actually bring they own functions. Sure, the right default API interface must be selected in the init script for now but that's about it. Could as well be specified at the command line. |
cats/optimise_starttime.py
Outdated
@@ -0,0 +1,91 @@ | |||
from datetime import datetime, timezone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add my general comment on this module here so we can have a thread.
I think this PR adds useful missing pieces but I'm a bit uncomfortable with the fast it completely removes the timeseries_conversion
and forecast
modules. Makes more sense to me to build upon them instead of discarding them alltogether: they already contain most of the functionality (re)implemented in this module. Particularly, I'm not sure why WindowedForecast
disappeared -- it was merged 4 days ago -- see #36 . I think it would be straightforward to make it work with a simple summation of intensity values.
That being a few welcome things in this module that are missing in the current implementation:
- Detecting cases where the jon duration exceeds the maximum forecast duration.
- Allowing to compute the windowed/summed/integrated intensity over the window by just summing intervals.
- Handle the fact that the first and last interval in the time window are fraction of the default interval length (30min).
Again -- I think we'd be better off integrating the above in the current code, rather than reimplementing a whole new module from scratch. Unless of course we think that there are strong reasons to do so, but not obvious to me at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding WindowedForecast
: should definitely be kept, as highlighted in the PR release note and in the code here, the goal is to keep the trapezoidal integration as an option, especially while we figure out which one is right for the UK (issue #42). But it's a rather delicate function to get the bounds right (especially first and last segments), so rather than me just shoehorn in into this updated structure on my own, it'd be good to discuss about the best way to include it. Very happy to see suggestions.
Regarding forecast
it's not removed, just renamed as I'm not sure forecast
still describes what this part of the code is doing. But it still serves the same purpose.
As of timeseries_conversion
(and most of the code in this PR), it's of course reusing existing code, sometimes grouped differently based on how usage has evolved. Which functions in particular from timeseries_conversion
have been removed completely but still needed? My understanding is that csv_loader
and cat_converter
aren't needed anymore (as per PR #41), check_duration
has just been moved to check_clean_arguments
as it's what it does and get_lowest_carbon_intensity
is now in optimise_starttime
for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the goal is to keep the trapezoidal integration as an option [..]. But it's a rather delicate function to get the bounds right (especially first and last segments), so rather than me just shoehorn in into this updated structure on my own, it'd be good to discuss about the best way to include it.
Don't you think it will more effort re-integrating something that is already there (and covered by a couple of tests) into new code? As opposed to start from what is there. Sorry -- I understand the appeal of writing that module from scratch, but I'm questioning whether it will make the next few steps actually harder. Obviously that's just my own humble opinion, let's see what other people have to say about this!
Regarding forecast it's not removed, just renamed as I'm not sure forecast still describes what this part of the code is doing. But it still serves the same purpose.
Did forecast
become optimise_starttime
? It looks like a new module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll submit a commit later today that reintroduces WindowedForecast
closer to its original form so that we can hopefully get the best of both worlds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that properly integrating the first and last time windows into the trapezoidal model was a bit harder than planned (didn't manage with the original implementation but this one is very close) but it's now done alongside a test scenario: commit
1aca7ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @tlestang if you think testing the integration method on the sinus function is still needed, it can be put back in as it's only commented out for now, for now there is a test on a dummy carbon intensity forecast
I'm not sure exactly where we are with this one - does it make sense to discuss at the meeting on Wednesday. One possibility would be to try and chop the changes into smaller sets but I think some wider discussion around the architecture could be useful. My two cents is that a useful approach to thinking about this is to consider other future uses of (maybe bits of) the code base and think about how they would work. So, for cats, I imagine a user who has some big python package providing a service that runs some regular processing, and it's the processing that they want to schedule (no need to integrate with a scheduler). Or, somebody wants to post-process job scheduler logs. Some (partial) answers to the questions above:
The way the python currently interfaces with |
I'm closing this for now as we discussed it won't be merged as-in. But let's keep the branch around for reference. |
This is following discussions on #26 and during the meeting that now that we are going toward a stable version, it would be worth refactoring the code to make it a bit more future-proof.
This is most likely not final at all, and it'd be great to hear everyone's thoughts to refine it. And also to check that it still works as expected (in particular if I have inadvertently deleted something important!). And we can open dedicated issues for anything arising for discussions here.
Here is a schema of the draft new structure:
The guiding principles in refactoring were to try to:
cats.run
As a reminder, the main steps of CATS are:
An overview of some changes beyond structure
I aimed at keeping the functionalities as they were as much as possible, just trying to enhance when I could. Here are some changed:
Regarding how to find the best start time:
Now all carbon intensity values with timestamps (from the API or computed internally) are stored as a
CarbonIntensityEstimate
(defined inCI_api_interface.py
but could be defined more centrally). It is a simple adaptation of the previously existingCarbonIntensityAverageEstimate
and has properties such asvalue
,start
,end
andtimedelta
and can be sorted by CI value.Added some progress update/error messages throughout (more needed!)
How this release addresses existing open issues
end_of_forecast - duration
). How to remove this constraints probably needs to be discussed further as it's not trivial.CI_api_interface.py
(e.g. here)Open questions:
stderr
and others onstdout
? As a temporary measure, all updates/messages are sent tostdout
and only errors/warnings tostderr
, but would be great to hear some thoughts on that.at
linux scheduler with the different output, any thoughts on the best way to pass information on start time?What is still needed
Not necessarily before merging, but here is a list non-exhaustive of what still needs to be done (the idea of this PR is to discuss this new structure while polishing the details).
at
linux command