Filter rootnames and filenames before calling the filename_parser in archive_database_update #1657

bhilbert4 · 2024-10-30T18:17:24Z

The motivation for this PR is to reduce the size of the log files from archive_database_update. After #1651 the filename_parser creates a log message for files that it cannot parse. But the relevant lines are also inside a try/except. So each time a non-parsable file is used, 4-5 lines of log messages are created. In WFSS observations, there can be thousands of filenames that can't be parsed. Plus the filename_parser is actually called twice on all the files as part of archive_database_update. This was resulting in log files that were ~350MB each, once per hour.

This PR filters out those filenames before calling the filename_parser, so we should end up not having all of those logging lines added.

pep8speaks · 2024-10-30T18:17:31Z

Hello @bhilbert4, Thank you for updating !

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated at 2025-01-15 17:12:29 UTC

mfixstsci

@bhilbert4 whipping out the re dark arts here. I think this is best way to handle this problem, thanks for submitting this.

mfixstsci · 2025-01-15T15:20:23Z

Oh! @bhilbert4 one last thing. I am trying to be better about this in my reviews... Would you mind submitting a test or two for this new function.

It doesn't have to be too complex

def test_filter_rootnames():
    good_rootname = ['jw_good_rootname']
    bad_rootname=['jw_bad_rootname']
    
    good_result = filter_rootnames(good_rootname)
    bad_result = filter_rootnames(bad_rootname)
    
    assert good_rootname == good_rootname  # since it should return something
    asset bad_rootname is None  # since this should return None

If you want to parameterize it or do something fancy I support that too.

bhilbert4 · 2025-01-15T15:57:57Z

New test added! It's pretty simple, but shows that the correct level 2 files are being filtered out. Let me know if you want anything more complex.

mfixstsci

Nice there is a failure now, going to check on that real quick

bhilbert4 · 2025-01-15T17:17:20Z

@mfixstsci the failures have been resolved with some shuffling of the import statements in archive_database_update

Filter rootnames/filenames before calling the filename_parser

f288a55

bhilbert4 self-assigned this Oct 30, 2024

bhilbert4 added 2 commits October 30, 2024 14:19

Remove commented out regex

b074376

Remove unused constant

dbad337

bhilbert4 requested a review from mfixstsci October 30, 2024 18:42

mfixstsci approved these changes Jan 15, 2025

View reviewed changes

Merge branch 'develop' into filter-rootnames-in-archive-db-update

0a6e258

Add test for new function

e125c98

mfixstsci reviewed Jan 15, 2025

View reviewed changes

bhilbert4 added 3 commits January 15, 2025 12:02

Only define FILESYSTEM if not on github actions

79f835a

Fix import order and location

365506d

make imports easier to read

bcc92d1

mfixstsci merged commit e9669fd into spacetelescope:develop Jan 15, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter rootnames and filenames before calling the filename_parser in archive_database_update #1657

Filter rootnames and filenames before calling the filename_parser in archive_database_update #1657

bhilbert4 commented Oct 30, 2024

pep8speaks commented Oct 30, 2024 •

edited

Loading

mfixstsci left a comment

mfixstsci commented Jan 15, 2025

bhilbert4 commented Jan 15, 2025

mfixstsci left a comment

bhilbert4 commented Jan 15, 2025

Filter rootnames and filenames before calling the filename_parser in archive_database_update #1657

Filter rootnames and filenames before calling the filename_parser in archive_database_update #1657

Conversation

bhilbert4 commented Oct 30, 2024

pep8speaks commented Oct 30, 2024 • edited Loading

Comment last updated at 2025-01-15 17:12:29 UTC

mfixstsci left a comment

Choose a reason for hiding this comment

mfixstsci commented Jan 15, 2025

bhilbert4 commented Jan 15, 2025

mfixstsci left a comment

Choose a reason for hiding this comment

bhilbert4 commented Jan 15, 2025

pep8speaks commented Oct 30, 2024 •

edited

Loading