From a8d3022d25d68689bde0d732cb0bb3bb449239a4 Mon Sep 17 00:00:00 2001 From: Marj E Date: Sun, 13 Sep 2020 11:00:58 -0700 Subject: [PATCH 01/14] added a new text file for step 0 --- step_0.txt | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 step_0.txt diff --git a/step_0.txt b/step_0.txt new file mode 100644 index 0000000..e69de29 From 1a698e2cd562a2117b9dd3e7f8862233ec23bcfb Mon Sep 17 00:00:00 2001 From: Marj E Date: Sun, 13 Sep 2020 11:02:19 -0700 Subject: [PATCH 02/14] answered questions in step 0 text file --- step_0.txt | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/step_0.txt b/step_0.txt index e69de29..6835b56 100644 --- a/step_0.txt +++ b/step_0.txt @@ -0,0 +1,8 @@ +1. What _things_ (objects, nouns) are represented or described in this file? We can think of at least six different things. +A. Driver, Date, Cost, Rider, Rating, One Ride, + +2. From the things you listed in the previous question, all of those things have relationships to each other. (an ID belongs to a person, for instance. As an abstract, unrelated example a VIN belongs to a vehicle, and a vehicle has a VIN.) Consider the relationships between the pieces of data. +A. All of the information combines to form the data for one ride. You can find the amount of times or average times the driver or rider has a ride. You can find the average rating a rider gives or the average rating a driver gets. You can see this not just for one driver or rider, but for all as a grouped average. You can count how many rides on a given day, month, or year. You may be able to determine the possible cause for a high or low cost - distance, traffic / time of day, (split cost for ride share if an option)... but that would need more data like a time stamp. You could see if there is a link between cost and ratings. You can find rider trends in travel cost - are there more low cost shorter trips or higher cost long trips (although this may also need more data like start and stop locations). + +3. Lastly, in this assignment, we will rearrange all of the data into one data structure (with a lot of nested layers), that can be held in one variable. List some ideas: considering all of the relationships listed in the last question, what piece of data can contain the others at the top-most level? (Compared to the json example before, think about what the top-most layer of the hash and what that represented.) There is more than one correct answer, so just list out the options at this moment. +A. An array can hold the other data at the top-most level, with each hash representing a ride. Since each row represents a ride, I think it best to maintain that structure as much as possible. If I think about the things that I want to get from the data, or what this data may be used to analyse, I still think using the table/column headings as keys in each ride hash is wise. If more columns are added in the future to collect more data, it will be easier to add the column to the ride hashes and the data of that column by following the same pattern. \ No newline at end of file From d6ef39c19bfb1191c0146ccecfb57102ce4c2ca0 Mon Sep 17 00:00:00 2001 From: Marj E Date: Sun, 13 Sep 2020 11:03:33 -0700 Subject: [PATCH 03/14] added the rubymine auto-generated file name to file --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 63123fb..b7cd941 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ .DS_store +.idea \ No newline at end of file From 3fc805b550eb5d3b14876bbe5c704be0a0473c84 Mon Sep 17 00:00:00 2001 From: Marj E Date: Sun, 13 Sep 2020 11:04:09 -0700 Subject: [PATCH 04/14] answered first question in worksheet.rb file --- worksheet.rb | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/worksheet.rb b/worksheet.rb index 95b085d..a149ebd 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -3,9 +3,18 @@ # In this section of the file, as a series of comments, # create a list of the layers you identify. + # Layer 1 - Array of ride hashes + # Layer 2 - DRIVER_ID, DATE, COST, RIDER_ID, RATING + # Layer 3 - Day, Month, Year + # Which layers are nested in each other? + # Layer 3 is nested in Date of Layer 2, and Layer 2 is nested in Layer 1 + # Which layers of data "have" within it a different layer? + # Date of Layer 2 + # Which layers are "next" to each other? + # All of the column headings ######################################################## # Step 2: Assign a data structure to each layer From adf4e97994fa063ac654e0f57c9400a91a8bcfd4 Mon Sep 17 00:00:00 2001 From: Marj E Date: Sun, 13 Sep 2020 11:07:17 -0700 Subject: [PATCH 05/14] added answers to question 2 --- worksheet.rb | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/worksheet.rb b/worksheet.rb index a149ebd..ff28378 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -21,6 +21,10 @@ # Copy your list from above, and in this section # determine what data structure each layer should have + # Layer 1 - Array of ride hashes + # Layer 2 - Hash with keys for DRIVER_ID, DATE, COST, RIDER_ID, RATING + # Layer 3 - Array of ints for Day, Month, Year + ######################################################## # Step 3: Make the data structure! From d162522f7f2498e4bf767c75296bf2811ea62700 Mon Sep 17 00:00:00 2001 From: Marj E Date: Sun, 13 Sep 2020 13:49:07 -0700 Subject: [PATCH 06/14] added answer to step 3 --- worksheet.rb | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/worksheet.rb b/worksheet.rb index ff28378..9b9c007 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -36,6 +36,91 @@ # into this data structure, such as "DR0004" # and "3rd Feb 2016" and "RD0022" +# my data structure blueprint +# [ +# { +# driver_id: "", +# date: [0, 0, 0], +# cost: 0, +# rider_id: "", +# rating: 0 +# } +# ] + +rides_data = [ + ['DRIVER_ID','DATE','COST','RIDER_ID','RATING'], + ['DR0004','3rd Feb 2016','5','RD0022','5'], + ['DR0001','3rd Feb 2016','10','RD0003','3'], + ['DR0002','3rd Feb 2016','25','RD0073','5'], + ['DR0001','3rd Feb 2016','30','RD0015','4'], + ['DR0003','4th Feb 2016','5','RD0066','5'], + ['DR0004','4th Feb 2016','10','RD0022','4'], + ['DR0002','4th Feb 2016','15','RD0013','1'], + ['DR0003','5th Feb 2016','50','RD0003','2'], + ['DR0002','5th Feb 2016','35','RD0066','3'], + ['DR0004','5th Feb 2016','20','RD0073','5'], + ['DR0001','5th Feb 2016','45','RD0003','2'] +] + +# return an array with integer representation of dates +def parse_date(date_string) + day_month_year = [] + date_string = date_string.split(" ") + day = date_string[0].split("").select { |chr| chr =~ /\d/ }.join.to_i + month = date_string[1].downcase + year = date_string[2].to_i + + # re-assign month from string to int + case month + when 'jan' + month = 1 + when 'feb' + month = 2 + when 'mar' + month = 3 + when 'apr' + month = 4 + when 'may' + month = 5 + when 'jun' + month = 6 + when 'jul' + month = 7 + when 'aug' + month = 8 + when 'sep' + month = 9 + when 'oct' + month = 10 + when 'nov' + month = 11 + when 'dec' + month = 12 + end + + return day_month_year << day << month << year +end + +def create_structure(data) + array = [] + headings = data[0].map { |heading| heading.downcase } + + (data.length - 1).times do |index| + #skip heading + index += 1 + + #choose row of data + row = data[index] + + #create ride hash + ride = Hash[headings[0], row[0], headings[1], parse_date(row[1]), headings[2], row[2].to_i, headings[3], row[3], headings[4], row[4].to_i] + array << ride + end + return array +end + +ride_share_data = create_structure(rides_data) +pp ride_share_data ######################################################## # Step 4: Total Driver's Earnings and Number of Rides From 7e36c0370028fcb5428136895e7dae38fc492331 Mon Sep 17 00:00:00 2001 From: Marj E Date: Sun, 13 Sep 2020 13:53:38 -0700 Subject: [PATCH 07/14] changed variable name array to top_arry in step 3 method --- worksheet.rb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/worksheet.rb b/worksheet.rb index 9b9c007..3e37b39 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -102,7 +102,7 @@ def parse_date(date_string) end def create_structure(data) - array = [] + top_array = [] headings = data[0].map { |heading| heading.downcase } (data.length - 1).times do |index| @@ -114,9 +114,9 @@ def create_structure(data) #create ride hash ride = Hash[headings[0], row[0], headings[1], parse_date(row[1]), headings[2], row[2].to_i, headings[3], row[3], headings[4], row[4].to_i] - array << ride + top_array << ride end - return array + return top_array end ride_share_data = create_structure(rides_data) From 68828a0970b8e713afaacda032d1d3d7aea81d02 Mon Sep 17 00:00:00 2001 From: Marj E Date: Sun, 13 Sep 2020 14:21:07 -0700 Subject: [PATCH 08/14] modified data structure blue print, added clarifying comment in step 3 method, started step 4 --- worksheet.rb | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/worksheet.rb b/worksheet.rb index 3e37b39..7a11672 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -36,14 +36,14 @@ # into this data structure, such as "DR0004" # and "3rd Feb 2016" and "RD0022" -# my data structure blueprint +# data structure blueprint # [ # { -# driver_id: "", -# date: [0, 0, 0], -# cost: 0, -# rider_id: "", -# rating: 0 +# driver_id: str, +# date: [int, int, int], +# cost: int, +# rider_id: str, +# rating: int # } # ] @@ -103,6 +103,7 @@ def parse_date(date_string) def create_structure(data) top_array = [] + # create default hash keys based on column headings headings = data[0].map { |heading| heading.downcase } (data.length - 1).times do |index| @@ -126,7 +127,10 @@ def create_structure(data) # Use an iteration blocks to print the following answers: # - the number of rides each driver has given +unique_drivers = ride_share_data.uniq { |ride| ride['driver_id'] } + # - the total amount of money each driver has made +# # - the average rating for each driver # - Which driver made the most money? # - Which driver has the highest average rating? \ No newline at end of file From be854a1ba0572951444a951ee62493ba236e0d50 Mon Sep 17 00:00:00 2001 From: Marj E Date: Sun, 13 Sep 2020 17:26:18 -0700 Subject: [PATCH 09/14] changed step 3 method name, changed comment discription --- worksheet.rb | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/worksheet.rb b/worksheet.rb index 7a11672..090e0ea 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -101,10 +101,10 @@ def parse_date(date_string) return day_month_year << day << month << year end -def create_structure(data) +def structure_ride_share(data) top_array = [] # create default hash keys based on column headings - headings = data[0].map { |heading| heading.downcase } + headings = data[0].map { |heading| heading.downcase.to_sym } (data.length - 1).times do |index| #skip heading @@ -113,24 +113,25 @@ def create_structure(data) #choose row of data row = data[index] - #create ride hash + #populate ride hashes ride = Hash[headings[0], row[0], headings[1], parse_date(row[1]), headings[2], row[2].to_i, headings[3], row[3], headings[4], row[4].to_i] top_array << ride end return top_array end -ride_share_data = create_structure(rides_data) +ride_share_data = structure_ride_share(rides_data) pp ride_share_data ######################################################## # Step 4: Total Driver's Earnings and Number of Rides # Use an iteration blocks to print the following answers: # - the number of rides each driver has given -unique_drivers = ride_share_data.uniq { |ride| ride['driver_id'] } - +unique_drivers = ride_share_data.map { |ride_hash| ride_hash[:driver_id] }.uniq +p unique_drivers # - the total amount of money each driver has made -# +# Your code here to find that menu item's price +# menu.each{ |hash| item_price = hash[:price] if hash.value?(unique_driver) } # - the average rating for each driver # - Which driver made the most money? # - Which driver has the highest average rating? \ No newline at end of file From 3a3638daa7277b5929aecdd84747d2198ebe26fc Mon Sep 17 00:00:00 2001 From: Marj E Date: Mon, 14 Sep 2020 00:26:16 -0700 Subject: [PATCH 10/14] changed uniq variable to method, added calc avg rating method and driver summary variable --- worksheet.rb | 52 ++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 48 insertions(+), 4 deletions(-) diff --git a/worksheet.rb b/worksheet.rb index 090e0ea..9393acb 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -121,17 +121,61 @@ def structure_ride_share(data) end ride_share_data = structure_ride_share(rides_data) +#test pp ride_share_data ######################################################## # Step 4: Total Driver's Earnings and Number of Rides # Use an iteration blocks to print the following answers: +def find_unique_values(value_type, data) + unique_values = data.map { |ride_hash| ride_hash[value_type] }.uniq + return unique_values +end +#test +pp find_unique_values(:driver_id, ride_share_data) + # - the number of rides each driver has given -unique_drivers = ride_share_data.map { |ride_hash| ride_hash[:driver_id] }.uniq -p unique_drivers +def count_total_rides(id, data) + count = data.count { |ride_hash| ride_hash.has_value? id } + return count +end +#test +pp count_total_rides('DR0004', ride_share_data) + # - the total amount of money each driver has made -# Your code here to find that menu item's price -# menu.each{ |hash| item_price = hash[:price] if hash.value?(unique_driver) } +def total_ride_cost(id, data) + total_cost = 0 + data.each{ |ride_hash| total_cost += ride_hash[:cost] if ride_hash.value?(id) } + return total_cost +end +#test +pp total_ride_cost('DR0004', ride_share_data) + # - the average rating for each driver +def calculate_average_rating(id, data) + total_rides = count_total_rides(id, data) + total_ratings = 0.to_f + + data.each{ |ride_hash| total_ratings += ride_hash[:rating] if ride_hash.value?(id) } + + average = total_ratings / total_rides + return average.round(1) +end +#test +pp calculate_average_rating('DR0004', ride_share_data) + +driver_summaries = find_unique_values(:driver_id, ride_share_data).map do |driver| + { + driver_id: driver, + total_rides: count_total_rides(driver, ride_share_data), + total_cost: total_ride_cost(driver, ride_share_data), + average_rating: calculate_average_rating(driver, ride_share_data), + } +end +#test +pp driver_summaries + # - Which driver made the most money? + + # - Which driver has the highest average rating? \ No newline at end of file From 0a9640ab744f4acf2c19bd9cc96bf3fddc3c5fac Mon Sep 17 00:00:00 2001 From: Marj E Date: Mon, 14 Sep 2020 00:39:05 -0700 Subject: [PATCH 11/14] added info for date parser method refactoring in comments --- worksheet.rb | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/worksheet.rb b/worksheet.rb index 9393acb..934e1ca 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -101,6 +101,16 @@ def parse_date(date_string) return day_month_year << day << month << year end +#Should refactor date parser method to use ruby Date.parser - see following: +# d = Date.parse('3rd Feb 2001') +# #=> # +# d.year #=> 2001 +# d.mon #=> 2 +# d.mday #=> 3 +# d.wday #=> 6 +# d += 1 #=> # +# d.strftime('%a %d %b %Y') #=> "Sun 04 Feb 2001" + def structure_ride_share(data) top_array = [] # create default hash keys based on column headings From 0e4abbbeef22906a9e9d54e6148e4b68d68c736a Mon Sep 17 00:00:00 2001 From: Marj E Date: Mon, 14 Sep 2020 00:57:54 -0700 Subject: [PATCH 12/14] finished answeres to step 4 --- worksheet.rb | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/worksheet.rb b/worksheet.rb index 934e1ca..d31aae0 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -186,6 +186,7 @@ def calculate_average_rating(id, data) pp driver_summaries # - Which driver made the most money? +p driver_summaries.max { |a_hash, b_hash| a_hash[:total_cost] <=> b_hash[:total_cost] }[:driver_id] - -# - Which driver has the highest average rating? \ No newline at end of file +# - Which driver has the highest average rating? +p driver_summaries.max { |a_hash, b_hash| a_hash[:rating] <=> b_hash[:rating] }[:driver_id] \ No newline at end of file From ef28a94a88be04c49665c91334e7ce79a90ccd64 Mon Sep 17 00:00:00 2001 From: Marj E Date: Mon, 14 Sep 2020 01:05:56 -0700 Subject: [PATCH 13/14] refactored date parser method to use ruby Date --- worksheet.rb | 47 +++-------------------------------------------- 1 file changed, 3 insertions(+), 44 deletions(-) diff --git a/worksheet.rb b/worksheet.rb index d31aae0..5a0a97c 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -1,3 +1,4 @@ +require 'date' ######################################################## # Step 1: Establish the layers @@ -65,52 +66,10 @@ # return an array with integer representation of dates def parse_date(date_string) day_month_year = [] - date_string = date_string.split(" ") - day = date_string[0].split("").select { |chr| chr =~ /\d/ }.join.to_i - month = date_string[1].downcase - year = date_string[2].to_i - - # re-assign month from string to int - case month - when 'jan' - month = 1 - when 'feb' - month = 2 - when 'mar' - month = 3 - when 'apr' - month = 4 - when 'may' - month = 5 - when 'jun' - month = 6 - when 'jul' - month = 7 - when 'aug' - month = 8 - when 'sep' - month = 9 - when 'oct' - month = 10 - when 'nov' - month = 11 - when 'dec' - month = 12 - end - - return day_month_year << day << month << year + d = Date.parse(date_string) + return day_month_year << d.mday << d.mon << d.year end -#Should refactor date parser method to use ruby Date.parser - see following: -# d = Date.parse('3rd Feb 2001') -# #=> # -# d.year #=> 2001 -# d.mon #=> 2 -# d.mday #=> 3 -# d.wday #=> 6 -# d += 1 #=> # -# d.strftime('%a %d %b %Y') #=> "Sun 04 Feb 2001" - def structure_ride_share(data) top_array = [] # create default hash keys based on column headings From 4cdb5d050b42163457e16aa18da1503355e9ba03 Mon Sep 17 00:00:00 2001 From: Marj E Date: Mon, 14 Sep 2020 01:21:34 -0700 Subject: [PATCH 14/14] added line breaks for consele print out --- worksheet.rb | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/worksheet.rb b/worksheet.rb index 5a0a97c..fe1747d 100644 --- a/worksheet.rb +++ b/worksheet.rb @@ -64,12 +64,17 @@ ] # return an array with integer representation of dates +def line_break + puts '--------------------------------------------------------' +end + def parse_date(date_string) day_month_year = [] d = Date.parse(date_string) return day_month_year << d.mday << d.mon << d.year end +# create the data structure from the blueprint def structure_ride_share(data) top_array = [] # create default hash keys based on column headings @@ -82,7 +87,7 @@ def structure_ride_share(data) #choose row of data row = data[index] - #populate ride hashes + #populate ride hashes - really long! ride = Hash[headings[0], row[0], headings[1], parse_date(row[1]), headings[2], row[2].to_i, headings[3], row[3], headings[4], row[4].to_i] top_array << ride end @@ -90,8 +95,11 @@ def structure_ride_share(data) end ride_share_data = structure_ride_share(rides_data) -#test +line_break +puts 'Ride Share Data:' +line_break pp ride_share_data + ######################################################## # Step 4: Total Driver's Earnings and Number of Rides @@ -100,16 +108,12 @@ def find_unique_values(value_type, data) unique_values = data.map { |ride_hash| ride_hash[value_type] }.uniq return unique_values end -#test -pp find_unique_values(:driver_id, ride_share_data) # - the number of rides each driver has given def count_total_rides(id, data) count = data.count { |ride_hash| ride_hash.has_value? id } return count end -#test -pp count_total_rides('DR0004', ride_share_data) # - the total amount of money each driver has made def total_ride_cost(id, data) @@ -117,8 +121,6 @@ def total_ride_cost(id, data) data.each{ |ride_hash| total_cost += ride_hash[:cost] if ride_hash.value?(id) } return total_cost end -#test -pp total_ride_cost('DR0004', ride_share_data) # - the average rating for each driver def calculate_average_rating(id, data) @@ -130,9 +132,10 @@ def calculate_average_rating(id, data) average = total_ratings / total_rides return average.round(1) end -#test -pp calculate_average_rating('DR0004', ride_share_data) +line_break +puts 'Driver Summary:' +line_break driver_summaries = find_unique_values(:driver_id, ride_share_data).map do |driver| { driver_id: driver, @@ -145,7 +148,13 @@ def calculate_average_rating(id, data) pp driver_summaries # - Which driver made the most money? +line_break +puts 'Driver that made the most money:' +line_break p driver_summaries.max { |a_hash, b_hash| a_hash[:total_cost] <=> b_hash[:total_cost] }[:driver_id] # - Which driver has the highest average rating? +line_break +puts 'Driver that has the highest average rating:' +line_break p driver_summaries.max { |a_hash, b_hash| a_hash[:rating] <=> b_hash[:rating] }[:driver_id] \ No newline at end of file