You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have finished refactoring/rewriting the initial work I did on the Shell/DPDHL/Unilever/AEP ingestion scripts. I think this script does a credible job doing the lightweight transform envisioned by an EtLT process for those four cases, plus Altria. The source code is here:
I've intentionally inserted a syntax error so that the script runs its transformation process but does not attempt to load the resulting transformed data into Trino yet. After all, the script is just claiming to do the Et part of the EtLT, not the LT part (yet).
What I'd like to do in the preview process is to see how this script complements/is redundant with the rules-based table processor contributed by Allianz, what additional extraction/transformation logic it should implement, what logic should perhaps be removed, and whether we now have a clear idea of how to define a second processing step that digests a consistent input format into consistent (i.e., comparable), catalog-able, and browsable data, with proper KPIs.
One way to review this code would be to see how easy/difficult others find it to add processing support for some of the additional spreadsheets that Heather reported here: #1 (comment). That will test the documentation and comments for sure!
I have finished refactoring/rewriting the initial work I did on the Shell/DPDHL/Unilever/AEP ingestion scripts. I think this script does a credible job doing the lightweight transform envisioned by an EtLT process for those four cases, plus Altria. The source code is here:
https://github.com/os-climate/osc-ingest-shell/blob/master/notebooks/osc-ingest-v3.ipynb
I've intentionally inserted a syntax error so that the script runs its transformation process but does not attempt to load the resulting transformed data into Trino yet. After all, the script is just claiming to do the Et part of the EtLT, not the LT part (yet).
What I'd like to do in the preview process is to see how this script complements/is redundant with the rules-based table processor contributed by Allianz, what additional extraction/transformation logic it should implement, what logic should perhaps be removed, and whether we now have a clear idea of how to define a second processing step that digests a consistent input format into consistent (i.e., comparable), catalog-able, and browsable data, with proper KPIs.
One way to review this code would be to see how easy/difficult others find it to add processing support for some of the additional spreadsheets that Heather reported here: #1 (comment). That will test the documentation and comments for sure!
@idemir-ids @ChristianMeyndt @erikerlandson @caldeirav @HeatherAck @hbaltzell @JeremyGohBNP @toki8 @LeaADeleris @ludans @joriscram @MichaelClifford @oindrillac
The text was updated successfully, but these errors were encountered: