GitHub

This is the app to solve the techical challenge task from RavenPack.

Running instructions:

Outcome: Application returns the stories count and errors found during file processing. The following assumptions are made:

The records normally are not coming archived. So, extraction from rar was out of scope. The simplest way to process archive - extract it in container before app starts (in case the realtime unarchiving was required in the task).
The file provided has fixed length, however the system is logically designed to process the records coming indefinitely from some stream. So I don't use an object to contain all the stories - in one unlucky day that will lead to out of memory. That's why I use only counter for the records but not the list of stories to count.
The regex pattern RP_ENTITY_ID was designed as "[A-Z|0-9]{6}". Maybe not too strict - but I couldn't find better for RP_ENTITY_ID":"KOXQBB" and RP_ENTITY_ID":"660345" as an examples.
All the story is coming in one time. It's unacceptable that story starts interrupting another story.
Checked on Windows 10. I don't have any Linux to check if instructions provided are 100% working on Linux.
The following possible errors are validated (however file provided does not contain any of them so you don't see any errors in the log):
1. The 1-st record index should be 1.
2. Document count for all the records relates to the same story should be the same.
3. The records for the same object should come sequentially. So, the 1-st record index should be 1, next 2 and so on.
4. Records count should be exactly the same as the DOCUMENT_RECORD_COUNT.

What could be enhanced here:

Add common channel to log the errors;
Add connectors (interfaces) to make switch to another sources (for example, streams or queues) more smooth.

The reason why it wasn't done - that will create unnecessary complication of code not related to the test task scope.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md
dockerfile		dockerfile
requirements.txt		requirements.txt

Provide feedback