Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review and next steps #3

Open
MichaelTiemannOSC opened this issue Nov 17, 2021 · 0 comments
Open

Code review and next steps #3

MichaelTiemannOSC opened this issue Nov 17, 2021 · 0 comments
Labels
help wanted Extra attention is needed

Comments

@MichaelTiemannOSC
Copy link
Contributor

I have finished refactoring/rewriting the initial work I did on the Shell/DPDHL/Unilever/AEP ingestion scripts. I think this script does a credible job doing the lightweight transform envisioned by an EtLT process for those four cases, plus Altria. The source code is here:

https://github.com/os-climate/osc-ingest-shell/blob/master/notebooks/osc-ingest-v3.ipynb

I've intentionally inserted a syntax error so that the script runs its transformation process but does not attempt to load the resulting transformed data into Trino yet. After all, the script is just claiming to do the Et part of the EtLT, not the LT part (yet).

What I'd like to do in the preview process is to see how this script complements/is redundant with the rules-based table processor contributed by Allianz, what additional extraction/transformation logic it should implement, what logic should perhaps be removed, and whether we now have a clear idea of how to define a second processing step that digests a consistent input format into consistent (i.e., comparable), catalog-able, and browsable data, with proper KPIs.

One way to review this code would be to see how easy/difficult others find it to add processing support for some of the additional spreadsheets that Heather reported here: #1 (comment). That will test the documentation and comments for sure!

@idemir-ids @ChristianMeyndt @erikerlandson @caldeirav @HeatherAck @hbaltzell @JeremyGohBNP @toki8 @LeaADeleris @ludans @joriscram @MichaelClifford @oindrillac

@MichaelTiemannOSC MichaelTiemannOSC added the help wanted Extra attention is needed label Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant