Skip to content

Latest commit

 

History

History
80 lines (47 loc) · 2.58 KB

README.md

File metadata and controls

80 lines (47 loc) · 2.58 KB

codingfun-data-engineering-tht

This is a benchmark test to ensure that data engineers, Python developers can show a good understanding of the fundamentals of reading, coding and delivering to a timeframe.

Rules: A link to a public Git repository with your final solution must be provided within 48 hours of receipt of the test.

Guidelines To help understand how you approach the problem, we will assess your use of source control and how you build to the final solution, checking what is committed along each step (hint: frequent push) The code must be written in Python 3. You may use any frameworks or libraries to complete this task, excluding data analysis libraries like Pandas. Unit tests must be provided

Let's imagine that you are working for a e-commerce company that delivers gifts from warehouse/stores to individuals.

We have a table of raw/unprocessed data coming from our app that stores the information concerning a delivery. In this unprocessed table, a delivery is made of 4 steps:

  1. Order placed
  2. Driver accepted order
  3. Food is picked up at restaurant
  4. Food is delivered to customer

Dataset:

WITH order_data as (

SELECT 1 as deliveryStep, 1 as Customer, CAST("2020-01-01T12:00:00" AS DATETIME) as timestamp

union all

SELECT 2, 1, CAST("2022-01-01T12:05:00" AS DATETIME)

union all

SELECT 3, 1, CAST("2022-01-01T12:15:00" AS DATETIME)

union all

SELECT 1, 1, CAST("2022-01-01T18:00:00" AS DATETIME)

union all

SELECT 1, 2, CAST("2022-01-01T18:01:00" AS DATETIME)

union all

SELECT 2, 1, CAST("2022-01-01T18:10:00" AS DATETIME)

union all

SELECT 3, 1, CAST("2022-01-01T18:15:00" AS DATETIME)

union all

SELECT 4, 1, CAST("2022-01-01T18:20:00" AS DATETIME)

)

You have a stream processing app that ingests each row of the previous table as it is being generated by our mobile app. Whenever a new row is ready to be ingested, the stream processing app will call the following python function :

def ingest(record):

  • where record is a dictionary as follows: {'Delivery Step':1, 'Customer': 1, 'Timestamp':2020-01-01T12:00:00}

Question:

Implement the ingest function to output the average duration of a successful delivery each time a record is delivered to the function.

  1. You are allowed to use stateful variables to store information between calls of the ingest function.
  2. How much memory will the function need for this dataset?

Assessment:

Your code will be reviewed and assessed according to the following:

  1. Adherence to the requirements
  2. Code quality – readability, structure of the code, performance
  3. Unit test coverage and relevance of the tests