-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f2c0d3a
commit 15d89d2
Showing
3 changed files
with
91 additions
and
0 deletions.
There are no files selected for viewing
15 changes: 15 additions & 0 deletions
15
_freeze/docs/geocode/bulk-geocoding/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"hash": "f14351616d67d8381a1cf4cc5f0b03a4", | ||
"result": { | ||
"engine": "knitr", | ||
"markdown": "---\ntitle: \"Bulk geocoding\"\n---\n\n\n\n\nBulk geocoding capabilities are provided via the `geocode_addresses()` function in `{arcgisgeocode}`. Rather geocoding a single address and returning match candidates, the bulk geocoding capabilities take many addresses and geocode them all at once returning a single location per address. \n\nUsing the bulk geocoding capabilities can result in incurring a cost. See more about [geocoding pricing](https://developers.arcgis.com/documentation/mapping-apis-and-services/geocoding/services/geocoding-service/#pricing).\n\n\nIn this example, you will geocode restaurant addresses in Boston, MA collected by the [Boston Area Research Initiative (BARI)](https://cssh.northeastern.edu/bari/). The data is originally from their [data portal](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DMWCBT).\n\n# Step 1. Authenticate\n\nIn order to utilize the bulk geocoding capabilities of the ArcGIS World Geocoder, you must first authenticate using `{arcgisutils}`. In this example, we are using user-based authentication via `auth_user()`. You may choose a different authentication function if it works better for you. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(arcgisutils)\nlibrary(arcgisgeocode)\n\nset_arc_token(auth_user())\n```\n:::\n\n\n# Step 2. Prepare the data \n\nSimilar to using `find_address_candidates()` the geocoding results return an ID that can be used to join back onto the original dataset. First, you will read in the dataset from a filepath using `readr::read_csv()` and then create a unique identifier with `dplyr::mutate()` and `dplyr::row_number()`. \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Boston Yelp addresses\n# Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DMWCBT\nfp <- \"https://analysis-1.maps.arcgis.com/sharing/rest/content/items/0423768816b343b69d9a425b82351912/data\"\n\nlibrary(dplyr)\nrestaurants <- readr::read_csv(fp) |>\n mutate(id = row_number())\n\nrestaurants\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2,664 × 28\n restaurant_name restaurant_ID restaurant_address restaurant_tag rating price\n <chr> <dbl> <chr> <chr> <dbl> <chr>\n 1 100% Delicias 2 635 Hyde Park Ave… Latin America… 2 $$ \n 2 100% Delicias E… 3 660A Centre St,Ja… Dominican,Emp… 4 <NA> \n 3 107 4 107 Salem St,Bost… Restaurants, NA <NA> \n 4 140 Supper Club 6 138 St James Ave,… Diners, 5 <NA> \n 5 163 Vietnamese … 7 66 Harrison Ave,B… Vietnamese,Co… 3.5 $ \n 6 180 Cafe 8 23 Edinboro St,Bo… Cafes, 4 <NA> \n 7 180 Restaurant … 9 174 Lincoln St,Bo… Restaurants, NA <NA> \n 8 224 Boston Stre… 11 224 Boston St,Dor… American (New… 4 $$ \n 9 24 Hour Pizza D… 12 686 Morton St,Bos… Pizza, 1 $$$$ \n10 2Twenty2 13 222 Friend St,Bos… Asian Fusion,… 3 <NA> \n# ℹ 2,654 more rows\n# ℹ 22 more variables: review_number <dbl>, unique_reviewer <dbl>,\n# reviews_Jan_19 <dbl>, reviews_Feb_19 <dbl>, reviews_Mar_19 <dbl>,\n# reviews_Apr_19 <dbl>, reviews_May_19 <dbl>, reviews_Jun_19 <dbl>,\n# reviews_Jul_19 <dbl>, reviews_Aug_19 <dbl>, reviews_Jan_20 <dbl>,\n# reviews_Feb_20 <dbl>, reviews_Mar_20 <dbl>, reviews_Apr_20 <dbl>,\n# reviews_May_20 <dbl>, reviews_Jun_20 <dbl>, reviews_Jul_20 <dbl>, …\n```\n\n\n:::\n:::\n\n\n# Step 3. Geocode addresses\n\nThe restaurant addresses are contained in the `restaurant_address` column. Pass this column into the `single_line` argument of `geocode_addresses()` and store the results in `geocoded`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngeocoded <- geocode_addresses(\n single_line = restaurants[[\"restaurant_address\"]]\n)\n\n# preview the first 10 columns\nglimpse(geocoded[, 1:10])\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 2,664\nColumns: 11\n$ result_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…\n$ loc_name <chr> \"World\", \"World\", \"World\", \"World\", \"World\", \"World\", \"Wor…\n$ status <chr> \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"M\"…\n$ score <dbl> 100.00, 100.00, 100.00, 100.00, 100.00, 100.00, 100.00, 10…\n$ match_addr <chr> \"635 Hyde Park Avenue, Roslindale, Massachusetts, 02131\", …\n$ long_label <chr> \"635 Hyde Park Avenue, Roslindale, MA, 02131, USA\", \"660A …\n$ short_label <chr> \"635 Hyde Park Avenue\", \"660A Centre Street\", \"107\", \"138 …\n$ addr_type <chr> \"PointAddress\", \"PointAddress\", \"POI\", \"PointAddress\", \"Po…\n$ type_field <chr> NA, NA, \"Bank\", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ place_name <chr> NA, NA, \"107\", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…\n$ geometry <POINT [°]> POINT (-71.11936 42.27857), POINT (-71.11386 42.3128…\n```\n\n\n:::\n:::\n\n\n:::{.callout-tip}\nYou can use `dplyr::reframe()` to geocode these addresses in a dplyr-friendly way. \n:::\n\n# Step 4. Join the results\n\nIn the previous step you geocoded the addresses and returned a data frame containing the location information. More likely than not, it would be helpful to have the locations joined onto the original dataset. You can do this by using `dplyr::left_join()` and joining on the `id` column you created and the `result_id` from the geocoding results. \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\njoined_addresses <- left_join(\n restaurants,\n geocoded,\n by = c(\"id\" = \"result_id\")\n)\n\ndplyr::glimpse(joined_addresses)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 2,664\nColumns: 87\n$ restaurant_name <chr> \"100% Delicias\", \"100% Delicias Express\", \"107…\n$ restaurant_ID <dbl> 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 16, 17, 18, 2…\n$ restaurant_address <chr> \"635 Hyde Park Ave,Roslindale, MA 02131,\", \"66…\n$ restaurant_tag <chr> \"Latin American,Dominican,\", \"Dominican,Empana…\n$ rating <dbl> 2.0, 4.0, NA, 5.0, 3.5, 4.0, NA, 4.0, 1.0, 3.0…\n$ price <chr> \"$$\", NA, NA, NA, \"$\", NA, NA, \"$$\", \"$$$$\", N…\n$ review_number <dbl> 37, 26, 0, 1, 335, 8, 0, 248, 31, 63, 10, 232,…\n$ unique_reviewer <dbl> 34, 25, 0, 1, 335, 8, 0, 248, 31, 63, 10, 232,…\n$ reviews_Jan_19 <dbl> 0, 1, 0, 0, 0, 0, 0, 1, 0, 8, 0, 1, 7, 0, 1, 0…\n$ reviews_Feb_19 <dbl> 1, 2, 0, 0, 0, 0, 0, 4, 0, 3, 0, 0, 2, 0, 0, 0…\n$ reviews_Mar_19 <dbl> 1, 3, 0, 0, 0, 1, 0, 5, 1, 2, 0, 0, 3, 0, 2, 0…\n$ reviews_Apr_19 <dbl> 0, 3, 0, 0, 1, 0, 0, 3, 0, 4, 0, 3, 5, 0, 0, 0…\n$ reviews_May_19 <dbl> 2, 1, 0, 0, 1, 0, 0, 1, 0, 2, 0, 0, 6, 0, 0, 0…\n$ reviews_Jun_19 <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 4, 0, 1, 3, 0, 0, 0…\n$ reviews_Jul_19 <dbl> 0, 1, 0, 0, 3, 1, 0, 4, 1, 0, 4, 0, 3, 0, 2, 0…\n$ reviews_Aug_19 <dbl> 0, 7, 0, 0, 0, 0, 0, 3, 0, 7, 3, 0, 0, 0, 0, 0…\n$ reviews_Jan_20 <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 5, 1, 0, 0…\n$ reviews_Feb_20 <dbl> 0, 1, 0, 0, 1, 0, 0, 2, 0, 2, 1, 3, 8, 6, 0, 0…\n$ reviews_Mar_20 <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 6, 0, 0…\n$ reviews_Apr_20 <dbl> 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0…\n$ reviews_May_20 <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0…\n$ reviews_Jun_20 <dbl> 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 6, 0, 0…\n$ reviews_Jul_20 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 3, 0…\n$ reviews_Aug_20 <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 4, 1, 0…\n$ restaurant_neighborhood <chr> \"Roslindale\", \"Jamaica Plain\", \"Boston\", \"Bost…\n$ GIS_ID <dbl> 1806741000, 1901410000, 302366000, 401087000, …\n$ CT_ID_10 <dbl> 25025140400, 25025120400, 25025030400, 2502501…\n$ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…\n$ loc_name <chr> \"World\", \"World\", \"World\", \"World\", \"World\", \"…\n$ status <chr> \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"M\", \"…\n$ score <dbl> 100.00, 100.00, 100.00, 100.00, 100.00, 100.00…\n$ match_addr <chr> \"635 Hyde Park Avenue, Roslindale, Massachuset…\n$ long_label <chr> \"635 Hyde Park Avenue, Roslindale, MA, 02131, …\n$ short_label <chr> \"635 Hyde Park Avenue\", \"660A Centre Street\", …\n$ addr_type <chr> \"PointAddress\", \"PointAddress\", \"POI\", \"PointA…\n$ type_field <chr> NA, NA, \"Bank\", NA, NA, NA, NA, NA, NA, NA, NA…\n$ place_name <chr> NA, NA, \"107\", NA, NA, NA, NA, NA, NA, NA, NA,…\n$ place_addr <chr> \"635 Hyde Park Avenue, Roslindale, Massachuset…\n$ phone <chr> NA, NA, \"(617) 227-6236\", NA, NA, NA, NA, NA, …\n$ url <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ rank <dbl> 20, 20, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20…\n$ add_bldg <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ add_num <chr> \"635\", \"660A\", \"107\", \"138\", \"66\", \"23\", \"174\"…\n$ add_num_from <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ add_num_to <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ add_range <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ side <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ st_pre_dir <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ st_pre_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ st_name <chr> \"Hyde Park\", \"Centre\", \"Salem\", \"Saint James\",…\n$ st_type <chr> \"Avenue\", \"Street\", \"St\", \"Avenue\", \"Avenue\", …\n$ st_dir <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ bldg_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ bldg_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ level_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ level_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ unit_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ unit_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ sub_addr <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ st_addr <chr> \"635 Hyde Park Avenue\", \"660A Centre Street\", …\n$ block <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ sector <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ nbrhd <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ district <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ city <chr> \"Roslindale\", \"Jamaica Plain\", \"Boston\", \"Bost…\n$ metro_area <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ subregion <chr> \"Suffolk County\", \"Suffolk County\", \"Suffolk C…\n$ region <chr> \"Massachusetts\", \"Massachusetts\", \"Massachuset…\n$ region_abbr <chr> \"MA\", \"MA\", \"MA\", \"MA\", \"MA\", \"MA\", \"MA\", \"MA\"…\n$ territory <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ zone <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ postal <chr> \"02131\", \"02130\", \"02113\", \"02116\", \"02111\", \"…\n$ postal_ext <chr> \"4723\", NA, NA, \"5071\", \"1907\", \"2131\", \"2404\"…\n$ country <chr> \"USA\", \"USA\", \"USA\", \"USA\", \"USA\", \"USA\", \"USA…\n$ cntry_name <chr> \"United States\", \"United States\", \"United Stat…\n$ lang_code <chr> \"ENG\", \"ENG\", \"ENG\", \"ENG\", \"ENG\", \"ENG\", \"ENG…\n$ distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…\n$ x <dbl> -71.11936, -71.11386, -71.05537, -71.07624, -7…\n$ y <dbl> 42.27857, 42.31285, 42.36419, 42.34923, 42.351…\n$ display_x <dbl> -71.11936, -71.11386, -71.05537, -71.07624, -7…\n$ display_y <dbl> 42.27857, 42.31285, 42.36419, 42.34923, 42.351…\n$ xmin <dbl> -71.12036, -71.11486, -71.05637, -71.07724, -7…\n$ xmax <dbl> -71.11836, -71.11286, -71.05437, -71.07524, -7…\n$ ymin <dbl> 42.27757, 42.31185, 42.36319, 42.34823, 42.350…\n$ ymax <dbl> 42.27957, 42.31385, 42.36519, 42.35023, 42.352…\n$ ex_info <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…\n$ geometry <POINT [°]> POINT (-71.11936 42.27857), POINT (-71.1…\n```\n\n\n:::\n:::", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
title: "Bulk geocoding" | ||
--- | ||
|
||
```{r include=FALSE} | ||
knitr::opts_chunk$set(message = FALSE) | ||
``` | ||
|
||
Bulk geocoding capabilities are provided via the `geocode_addresses()` function in `{arcgisgeocode}`. Rather geocoding a single address and returning match candidates, the bulk geocoding capabilities take many addresses and geocode them all at once returning a single location per address. | ||
|
||
Using the bulk geocoding capabilities can result in incurring a cost. See more about [geocoding pricing](https://developers.arcgis.com/documentation/mapping-apis-and-services/geocoding/services/geocoding-service/#pricing). | ||
|
||
|
||
In this example, you will geocode restaurant addresses in Boston, MA collected by the [Boston Area Research Initiative (BARI)](https://cssh.northeastern.edu/bari/). The data is originally from their [data portal](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DMWCBT). | ||
|
||
# Step 1. Authenticate | ||
|
||
In order to utilize the bulk geocoding capabilities of the ArcGIS World Geocoder, you must first authenticate using `{arcgisutils}`. In this example, we are using user-based authentication via `auth_user()`. You may choose a different authentication function if it works better for you. | ||
|
||
|
||
```{r message=FALSE} | ||
library(arcgisutils) | ||
library(arcgisgeocode) | ||
set_arc_token(auth_user()) | ||
``` | ||
|
||
# Step 2. Prepare the data | ||
|
||
Similar to using `find_address_candidates()` the geocoding results return an ID that can be used to join back onto the original dataset. First, you will read in the dataset from a filepath using `readr::read_csv()` and then create a unique identifier with `dplyr::mutate()` and `dplyr::row_number()`. | ||
|
||
```{r message= FALSE} | ||
# Boston Yelp addresses | ||
# Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DMWCBT | ||
fp <- "https://analysis-1.maps.arcgis.com/sharing/rest/content/items/0423768816b343b69d9a425b82351912/data" | ||
library(dplyr) | ||
restaurants <- readr::read_csv(fp) |> | ||
mutate(id = row_number()) | ||
restaurants | ||
``` | ||
|
||
# Step 3. Geocode addresses | ||
|
||
The restaurant addresses are contained in the `restaurant_address` column. Pass this column into the `single_line` argument of `geocode_addresses()` and store the results in `geocoded`. | ||
|
||
```{r message=FALSE} | ||
geocoded <- geocode_addresses( | ||
single_line = restaurants[["restaurant_address"]] | ||
) | ||
# preview the first 10 columns | ||
glimpse(geocoded[, 1:10]) | ||
``` | ||
|
||
:::{.callout-tip} | ||
You can use `dplyr::reframe()` to geocode these addresses in a dplyr-friendly way. | ||
::: | ||
|
||
# Step 4. Join the results | ||
|
||
In the previous step you geocoded the addresses and returned a data frame containing the location information. More likely than not, it would be helpful to have the locations joined onto the original dataset. You can do this by using `dplyr::left_join()` and joining on the `id` column you created and the `result_id` from the geocoding results. | ||
|
||
|
||
```{r} | ||
joined_addresses <- left_join( | ||
restaurants, | ||
geocoded, | ||
by = c("id" = "result_id") | ||
) | ||
dplyr::glimpse(joined_addresses) | ||
``` |