feat(upload): improve SINAN upload; add helpers, validations & views to upload large files #732

luabida · 2024-12-04T22:01:24Z

No description provided.

…to upload big files

…file uploads

luabida · 2025-01-06T14:36:28Z

AlertaDengue/upload/tasks.py

-        logger.info("Converting Parquet file into chunks")
+    # df_chunk = df_chunk.dropna(
+    #     subset=SINANUpload.REQUIRED_COLS, how="any"
+    # ) # it usually drops the entire chunk


I don't know if its supposed to happen this way, but this dropna usually drops the entire chunk, few rows don't have None in some of the required columns

Edit: found the issue, the dropna is correct since required_cols can't contain None values, I was incorrectly replacing a string by None

Is this DROPNA necessary for the conversion to Parquet? If not, we should retain all rows independently of the fact that they may contain NAs. This chunk_parquet_file should not make decisions about data quality. It should already receive sanitized data.

I thought that the parquet file would contain the raw data (the DBF converted to parquet, for instance) and the inserted rows of that file would be pointed in a separated table..

You are right. The sanitization should be done as a separate step.

…lds due to huge time consuming on insert

…ong)

…e entire array into memory

luabida · 2025-01-07T20:03:24Z

@fccoelho I'll merge this PR the way it is right now so the SINAN upload can be used on production, but its still missing the file overview (merely visual) and the task to convert the dbf to parquet (until we resolve how the data should be included on the parquet file). Can you review it again please?

fccoelho · 2025-01-07T20:13:30Z

@luabida I think that the Parquet should have the same exact content of the DBF(as you mentioned above), so that we don't need to keep the DBFs.
But we should have separate diagnostic step that can report problems in the DBF, before it is inserted in the production database. Simple things like checking it has all columns we need, counting the number of NAs per column, etc.

fccoelho

Looks Good.

luabida force-pushed the chunked-upload-sinan branch from 992d5b5 to 709841f Compare December 6, 2024 13:11

fccoelho approved these changes Dec 6, 2024

View reviewed changes

luabida force-pushed the chunked-upload-sinan branch from 709841f to 7c6f032 Compare December 6, 2024 17:52

luabida added 2 commits December 9, 2024 16:43

feat(upload): improve SINAN upload; add helpers, validations & views …

0be8819

…to upload big files

remove leftovers from upload app

97859f9

luabida force-pushed the chunked-upload-sinan branch from 7c6f032 to 97859f9 Compare December 9, 2024 19:45

luabida added 6 commits December 10, 2024 18:35

improve sinan/upload index

d3cb215

remove export_date, municipio on SINANUpload

b117a48

disassemblying fileupload.js -> chunked upload

4e0cfc6

finally fix upload_chunked && include DELETE method on chunked upload

ffd052a

include upload card on upload dashboard template

9485075

Enable replication on upload cards

7d9cc83

luabida force-pushed the chunked-upload-sinan branch from 512d6ab to 7d9cc83 Compare December 20, 2024 21:16

luabida added 4 commits December 23, 2024 10:45

Include on error message

8714c64

Include submit button

1871585

include signals back, exclude file on delete; improve form check for …

caadb63

…file uploads

replace SINANUpload.file by SINANChunkedUpload

fde6413

luabida force-pushed the chunked-upload-sinan branch from 19b3a2e to fde6413 Compare December 27, 2024 15:10

create SINANUploadLogStatus

8b37888

luabida force-pushed the chunked-upload-sinan branch from 0afd8a4 to 8b37888 Compare December 30, 2024 14:41

luabida added 5 commits December 30, 2024 17:38

Change chunked_upload path to /DBF_SINAN/; create tasks

c4ece71

Implement insert_to_db tasks on SINANUpload

3a61db1

Start implementing upload status box

10cc252

Implement status.html with error message

2883a13

implement re-rendering of each status item on the dashboard

0e78261

fccoelho added the enhancement label Jan 3, 2025

improvements and minor fixes

29c3296

luabida force-pushed the chunked-upload-sinan branch from eda79e3 to 29c3296 Compare January 4, 2025 03:53

luabida commented Jan 6, 2025

View reviewed changes

luabida force-pushed the chunked-upload-sinan branch from 74059a9 to a7119c6 Compare January 6, 2025 16:42

REMOVE dropa() FROM CHUNK*; replace SINANUploadHistory by integer fie…

aaebf2d

…lds due to huge time consuming on insert

luabida force-pushed the chunked-upload-sinan branch from a7119c6 to aaebf2d Compare January 6, 2025 21:55

luabida added 3 commits January 6, 2025 21:15

replace db transactions by log regex (celery is not saving models attrs)

461363c

pickle inserts and updates ids lists (saving it to models takes too l…

3910b0f

…ong)

return len of pickled ids instead of loading it to memory

be0e5ec

luabida force-pushed the chunked-upload-sinan branch from 2469f7e to 9a1ba83 Compare January 7, 2025 14:15

Include progress on log system

871f42b

luabida force-pushed the chunked-upload-sinan branch 2 times, most recently from f606721 to 6a394dd Compare January 7, 2025 14:58

transform inserts() & updates() into generators so they don't load th…

45648f9

…e entire array into memory

luabida force-pushed the chunked-upload-sinan branch from 6a394dd to 45648f9 Compare January 7, 2025 17:24

populate info-box on upload success

f337d7f

luabida marked this pull request as ready for review January 7, 2025 19:58

luabida requested a review from fccoelho January 7, 2025 20:03

include overview page

9596c41

luabida force-pushed the chunked-upload-sinan branch from 4e74acc to 9596c41 Compare January 7, 2025 20:15

fccoelho approved these changes Jan 7, 2025

View reviewed changes

luabida merged commit 623cf21 into AlertaDengue:main Jan 7, 2025
0 of 4 checks passed

luabida deleted the chunked-upload-sinan branch January 7, 2025 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(upload): improve SINAN upload; add helpers, validations & views to upload large files #732

feat(upload): improve SINAN upload; add helpers, validations & views to upload large files #732

luabida commented Dec 4, 2024

luabida Jan 6, 2025 •

edited

Loading

fccoelho Jan 6, 2025

luabida Jan 6, 2025

fccoelho Jan 7, 2025

luabida commented Jan 7, 2025

fccoelho commented Jan 7, 2025 •

edited

Loading

fccoelho left a comment

feat(upload): improve SINAN upload; add helpers, validations & views to upload large files #732

feat(upload): improve SINAN upload; add helpers, validations & views to upload large files #732

Conversation

luabida commented Dec 4, 2024

luabida Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

fccoelho Jan 6, 2025

Choose a reason for hiding this comment

luabida Jan 6, 2025

Choose a reason for hiding this comment

fccoelho Jan 7, 2025

Choose a reason for hiding this comment

luabida commented Jan 7, 2025

fccoelho commented Jan 7, 2025 • edited Loading

fccoelho left a comment

Choose a reason for hiding this comment

luabida Jan 6, 2025 •

edited

Loading

fccoelho commented Jan 7, 2025 •

edited

Loading