Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: persistent errors for job processing [DHIS2-15276] #15575

Merged
merged 14 commits into from
Nov 8, 2023

Conversation

jbee
Copy link
Contributor

@jbee jbee commented Nov 2, 2023

Summary

Main changes to extend the job progress tracking API with a error collection feature.

Errors are attached to the overall progress JSON object.
A new column errorCodes was added to the table which contains the codes extracted from the progress object.
This is purely to allow easier search by error code(s) using existing filter API tech. The column is implicitly maintained when the progress JSON changes.

To accommodate this the persisted JSON changed from an array of processes which previously would be wrapped in a Progress object on in the memory model which had the sequence of Processes this root object is now explicit in the stored JSON so it can also contain the errors list. To be backwards compatibly with the JSON array stored previously the extraction pf the progress JSON will auto-wrap any array values in an appropiate object in the SQL so existing data will continue to work.

Other changes

  • the ErrorReport of the metadata import now does contain the plain arguments to the message so they can be forwarded to the job progress error list
  • the heartbeat job was renamed to housekeeping job
  • tracker Error class is now called TrackerImportError in OpenAPI to resolve the name clash with the new generic Error class added
  • Metadata import errors are forwarded to the job progress tracking
  • cancelling jobs requires either ALL or F_PERFORM_MAINTENANCE authority or must be performed by the user executing the job
  • access to job /progress and /errors is restricted to ALL or F_PERFORM_MAINTENANCE authority

Fixes

  • sending email messages in case a job failed is now async so any exception in the sending does not cause a rollback of the main TX for transitioning the job to stopped/failed state
  • the housekeeping job is re-spawned not only at startup but in each loop cycle that has no single job to execute as the housekeeping is expected to be in that list
  • the progress JSON object is now stored as a jsonb object not as string of the escaped JSON (that worked by accident for store and decode)

Automatic Testing

The affected code is covered by many existing test indirectly. No explicit new tests were added yet.

Manual Testing

  • open import/export app
  • run a metadata import with a file that has errors, for example export metadata, edit the file and change a UID of an object
  • check the /api/jobConfigurations/{uid}/errors and /api/jobConfigurations/{uid}/progress endpoints of the job created for the import and check the error added to the file is recorded (find the UID e.g. by the newest METADATA_IMPORT job in the list api/jobConfigurations/gist?order=created:desc)
  • make sure the list entries of /errors have a readable message property

@jbee jbee self-assigned this Nov 2, 2023
Copy link

codecov bot commented Nov 6, 2023

Codecov Report

Merging #15575 (181275a) into master (178839d) will decrease coverage by 0.04%.
Report is 4 commits behind head on master.
The diff coverage is 32.23%.

@@             Coverage Diff              @@
##             master   #15575      +/-   ##
============================================
- Coverage     66.23%   66.19%   -0.04%     
+ Complexity    31250    31240      -10     
============================================
  Files          3485     3485              
  Lines        129791   129851      +60     
  Branches      15145    15165      +20     
============================================
- Hits          85965    85960       -5     
- Misses        36741    36802      +61     
- Partials       7085     7089       +4     
Flag Coverage Δ
integration 49.75% <29.75%> (-0.03%) ⬇️
integration-h2 32.40% <11.57%> (-0.03%) ⬇️
unit 30.31% <10.74%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...ava/org/hisp/dhis/scheduling/JobConfiguration.java 76.11% <ø> (ø)
...rc/main/java/org/hisp/dhis/scheduling/JobType.java 94.87% <100.00%> (ø)
.../analytics/event/data/DefaultQueryItemLocator.java 85.56% <100.00%> (ø)
.../main/java/org/hisp/dhis/config/StartupConfig.java 100.00% <100.00%> (ø)
...a/org/hisp/dhis/message/DefaultMessageService.java 65.16% <100.00%> (+0.19%) ⬆️
...in/java/org/hisp/dhis/scheduling/HeartbeatJob.java 3.70% <100.00%> (ø)
...ain/java/org/hisp/dhis/startup/SchedulerStart.java 0.00% <ø> (ø)
.../dhis/metadata/DefaultMetadataWorkflowService.java 86.89% <100.00%> (ø)
...va/org/hisp/dhis/tracker/imports/report/Error.java 100.00% <ø> (ø)
...main/java/org/hisp/dhis/feedback/ErrorMessage.java 92.30% <85.71%> (-7.70%) ⬇️
... and 12 more

... and 8 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 376fc9b...181275a. Read the comment docs.

Copy link

sonarqubecloud bot commented Nov 7, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@jbee jbee marked this pull request as ready for review November 7, 2023 13:42
}

default void addError(ErrorCode code, String uid, String type, Integer index, List<String> args) {
// default implementation is a NOOP, we don't remember or handle the error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this reasoning for this empty method.
Do only certain jobs care about errors & they would then override this method? (which would result in the persisting of the error(s)?
And if jobs using this default method have errors, they are silently swallowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not related to certain jobs but only one implementation of the JobProgress interfaces wants to record them and here we have no access to the implementation. To avoid having to implement the method in all other implementation with an empty method it is instead implemented like that as default.

Copy link
Contributor

@david-mackessy david-mackessy Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the default to be an empty method suggests it shouldn't be part of the interface. It feels like we're adding this confusing behaviour to satisfy only 1 implementer.
Could a javadoc comment be added to the method at least to explain why we don't want to handle errors here (handling errors seems like a very common use case) as the default and when would overriding be necessary. This would at least help someone reading the code to hopefully understand the reasoning for it & when it should be overridden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that in the next PR I was planning on doing anyhow

Copy link
Contributor

@david-mackessy david-mackessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch with the jsonb

@jbee jbee merged commit 9ecd184 into dhis2:master Nov 8, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants