Skip to content

Commit

Permalink
forward geocoding doc
Browse files Browse the repository at this point in the history
  • Loading branch information
JosiahParry committed Jun 5, 2024
1 parent b49f2af commit f2c0d3a
Show file tree
Hide file tree
Showing 8 changed files with 189 additions and 0 deletions.
15 changes: 15 additions & 0 deletions _freeze/docs/geocode/forward-geocoding/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"hash": "d8ef2099ddcfaf861ce33943369845df",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: Forward Geocoding\n--- \n\n\nForward geocoding is the process of taking an address or place information and identifying its location on the globe. \n\nTo geocode addresses, the `{arcgisgeocode}` package provides the function `find_address_candidates()`. This function geocodes a single address at a time and returns up to 50 address candidates (ranked by a score). \n\nThere are two ways in which you can provide address information: \n\n1. Provide the entire address as a string via the `single_line` argument\n2. Provide parts of the address using the arguments `address`, `city`, `region`, `postal` etc. \n\n# Single line address geocoding \n\nIt can be tough to parse out addresses into their components. Using the `single_line` argument is a very flexible way of geocoding addresses. Doing utilizes the ArcGIS World Geocoder's address parsing capabilities. \n\nFor example, we can geocode the same location using 3 decreasingly specific addresses.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(arcgisgeocode)\n\naddresses <- c(\n \"380 New York Street Redlands, California, 92373, USA\",\n \"Esri Redlands\",\n \"ESRI CA\"\n)\n\nlocs <- find_address_candidates(\n addresses,\n max_locations = 1L\n)\n\nlocs$geometry\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nGeometry set for 3 features \nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -117.1948 ymin: 34.05726 xmax: -117.1948 ymax: 34.05726\nGeodetic CRS: WGS 84\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nPOINT (-117.1948 34.05726)\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nPOINT (-117.1957 34.05609)\nPOINT (-117.1957 34.05609)\n```\n\n\n:::\n:::\n\n\nIn each case, it finds the correct address! \n\n# Geocoding from a dataframe \n\nMost commonly, you will need to geocode addresses from a column in a data.frame. It is important to note that the `find_address_candidates()` function does not work well in a `dplyr::mutate()` function call. Particularly because it is possible to return more than 1 address at a time. \n\nLet's read in a csv of bike stores in Tacoma, WA. To use `find_address_candidates()` with a data.frame, it is recommended to create a unique identifier of the row positions. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(dplyr)\n\nfp <- \"https://www.arcgis.com/sharing/rest/content/items/9a9b91179ac44db1b689b42017471ae6/data\"\n\nbike_stores <- readr::read_csv(fp) |>\n mutate(id = row_number())\n\nbike_stores\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 10 × 3\n store_name original_address id\n <chr> <chr> <int>\n 1 Cascadia Wheel Co. 3320 N Proctor St, Tacoma, WA 984… 1\n 2 Puget Sound Bike and Ski Shop between 3206 N. 15th and 1414, N … 2\n 3 Takoma Bike & Ski 3010 6th Ave, Tacoma, WA 98406 3\n 4 Trek Bicycle Tacoma University Place 3550 Market Pl W Suite 102, Unive… 4\n 5 Opalescent Cyclery 814 6th Ave, Tacoma, WA 98405 5\n 6 Sound Bikes 108 W Main, Puyallup, WA 98371 6\n 7 Trek Bicycle Tacoma North End 3009 McCarver St, Tacoma, WA 98403 7\n 8 Second Cycle 1205 M.L.K. Jr Way, Tacoma, WA 98… 8\n 9 Penny bike co. 6419 24th St NE, Tacoma, WA 98422 9\n10 Spider's Bike, Ski & Tennis Lab 3608 Grandview St, Gig Harbor, WA… 10\n```\n\n\n:::\n:::\n\n\n\nTo geocode addresses from a data.frame, you can use `dplyr::reframe()`. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nbike_stores |>\n reframe(\n find_address_candidates(original_address)\n )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 13 × 62\n input_id result_id loc_name status score match_addr long_label short_label\n <int> <int> <chr> <chr> <dbl> <chr> <chr> <chr> \n 1 1 NA World M 100 3320 N Proct… 3320 N Pr… 3320 N Pro…\n 2 2 NA World M 97.6 N 15th St & … N 15th St… N 15th St …\n 3 2 NA World M 97.3 1414 N Alder… 1414 N Al… 1414 N Ald…\n 4 2 NA World M 94.7 S 15th St & … S 15th St… S 15th St …\n 5 2 NA World M 84.4 3206 N 15th … 3206 N 15… 3206 N 15t…\n 6 3 NA World M 100 3010 6th Ave… 3010 6th … 3010 6th A…\n 7 4 NA World M 100 3550 Market … 3550 Mark… 3550 Marke…\n 8 5 NA World M 100 814 6th Ave,… 814 6th A… 814 6th Ave\n 9 6 NA World M 100 108 W Main, … 108 W Mai… 108 W Main \n10 7 NA World M 100 3009 McCarve… 3009 McCa… 3009 McCar…\n11 8 NA World M 100 1205 Martin … 1205 Mart… 1205 Marti…\n12 9 NA World M 97.9 6419 24th St… 6419 24th… 6419 24th …\n13 10 NA World M 100 3608 Grandvi… 3608 Gran… 3608 Grand…\n# ℹ 54 more variables: addr_type <chr>, type_field <chr>, place_name <chr>,\n# place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>,\n# add_num <chr>, add_num_from <chr>, add_num_to <chr>, add_range <chr>,\n# side <chr>, st_pre_dir <chr>, st_pre_type <chr>, st_name <chr>,\n# st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>,\n# level_type <chr>, level_name <chr>, unit_type <chr>, unit_name <chr>,\n# sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>, …\n```\n\n\n:::\n:::\n\n\nNotice how there are multiple results for each `input_id`. This is because the `max_locations` argument was not specified. To ensure only the best match is returned set `max_locations = 1`\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngeocoded <- bike_stores |>\n reframe(\n find_address_candidates(original_address, max_locations = 1)\n ) |>\n # reframe drops the sf class, must be added\n sf::st_as_sf()\n\ngeocoded\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 10 features and 61 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -122.5871 ymin: 47.19164 xmax: -122.294 ymax: 47.32301\nGeodetic CRS: WGS 84\n# A tibble: 10 × 62\n input_id result_id loc_name status score match_addr long_label short_label\n <int> <int> <chr> <chr> <dbl> <chr> <chr> <chr> \n 1 1 NA World M 100 3320 N Proct… 3320 N Pr… 3320 N Pro…\n 2 2 NA World M 97.6 N 15th St & … N 15th St… N 15th St …\n 3 3 NA World M 100 3010 6th Ave… 3010 6th … 3010 6th A…\n 4 4 NA World M 100 3550 Market … 3550 Mark… 3550 Marke…\n 5 5 NA World M 100 814 6th Ave,… 814 6th A… 814 6th Ave\n 6 6 NA World M 100 108 W Main, … 108 W Mai… 108 W Main \n 7 7 NA World M 100 3009 McCarve… 3009 McCa… 3009 McCar…\n 8 8 NA World M 100 1205 Martin … 1205 Mart… 1205 Marti…\n 9 9 NA World M 97.9 6419 24th St… 6419 24th… 6419 24th …\n10 10 NA World M 100 3608 Grandvi… 3608 Gran… 3608 Grand…\n# ℹ 54 more variables: addr_type <chr>, type_field <chr>, place_name <chr>,\n# place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>,\n# add_num <chr>, add_num_from <chr>, add_num_to <chr>, add_range <chr>,\n# side <chr>, st_pre_dir <chr>, st_pre_type <chr>, st_name <chr>,\n# st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>,\n# level_type <chr>, level_name <chr>, unit_type <chr>, unit_name <chr>,\n# sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>, …\n```\n\n\n:::\n:::\n\n\nWith this result, you can now join the address fields back onto the `bike_stores` data.frame using a `left_join()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nleft_join(\n bike_stores,\n geocoded,\n by = c(\"id\" = \"input_id\")\n) |>\n # left_join keeps the class of the first table\n # must add sf class back on\n sf::st_as_sf()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 10 features and 63 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -122.5871 ymin: 47.19164 xmax: -122.294 ymax: 47.32301\nGeodetic CRS: WGS 84\n# A tibble: 10 × 64\n store_name original_address id result_id loc_name status score match_addr\n <chr> <chr> <int> <int> <chr> <chr> <dbl> <chr> \n 1 Cascadia W… 3320 N Proctor … 1 NA World M 100 3320 N Pr…\n 2 Puget Soun… between 3206 N.… 2 NA World M 97.6 N 15th St…\n 3 Takoma Bik… 3010 6th Ave, T… 3 NA World M 100 3010 6th …\n 4 Trek Bicyc… 3550 Market Pl … 4 NA World M 100 3550 Mark…\n 5 Opalescent… 814 6th Ave, Ta… 5 NA World M 100 814 6th A…\n 6 Sound Bikes 108 W Main, Puy… 6 NA World M 100 108 W Mai…\n 7 Trek Bicyc… 3009 McCarver S… 7 NA World M 100 3009 McCa…\n 8 Second Cyc… 1205 M.L.K. Jr … 8 NA World M 100 1205 Mart…\n 9 Penny bike… 6419 24th St NE… 9 NA World M 97.9 6419 24th…\n10 Spider's B… 3608 Grandview … 10 NA World M 100 3608 Gran…\n# ℹ 56 more variables: long_label <chr>, short_label <chr>, addr_type <chr>,\n# type_field <chr>, place_name <chr>, place_addr <chr>, phone <chr>,\n# url <chr>, rank <dbl>, add_bldg <chr>, add_num <chr>, add_num_from <chr>,\n# add_num_to <chr>, add_range <chr>, side <chr>, st_pre_dir <chr>,\n# st_pre_type <chr>, st_name <chr>, st_type <chr>, st_dir <chr>,\n# bldg_type <chr>, bldg_name <chr>, level_type <chr>, level_name <chr>,\n# unit_type <chr>, unit_name <chr>, sub_addr <chr>, st_addr <chr>, …\n```\n\n\n:::\n:::",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
15 changes: 15 additions & 0 deletions _freeze/docs/geocode/overview/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"hash": "08f4c4ce2f08106e03a5a393817ad0bb",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: Overview\n---\n\n\nAddresses represent a physical place. They're meant to be interpreted by people and help guide navigation of the built environment. Addresses represent a geographical place but lack geographic data.\n\nThe package `{arcgisgeocode}` enables you to search for an address (geocode), reverse geocode, find candidate matches, get suggestions, and batch geocode. Geocoding is the process of converting text to an address and a location.\n\n- **Address geocoding**, also known as forward geocoding, is the process of converting text for an address to a complete address with a location.\n- **Place geocoding** is the process of searching for addresses for businesses, administrative locations, and geographic features.\n- **Reverse geocoding** is the process of converting a point to an address or place.\n- **Batch geocoding**, also known as bulk geocoding, is the process of converting a list of addresses or place names to a set of complete addresses with locations.\n\n# Licensing considerations\n\nMany features of the ArcGIS World Geocoder are provided for free such as forward geocoding, reverse geocoding, and place search. However, **storing results is not free**. Additionally, the bulk geocoding functionality requires a developer account or available credits. \n\nIn order to store results, each function has an argument `for_storage` which should be set to `TRUE` if you intend to store the results. \n\nTo learn more about free and paid geocoding operations refer to the [storage parameter documentation](https://developers.arcgis.com/documentation/mapping-apis-and-services/geocoding/services/geocoding-service/#storage-parameter).\n\n| Function | Description | Free |\n| -------- | ----------- | ---- |\n| `find_address_candidates()` | Finds up to 50 location candidates based on a provided address. _This function is vectorized_ to work with many addresses at a time. | ✅ |\n| `reverse_geocode()` | Returns an address based on the provided coordinate. _This function is vectorized_ to work with many locations at a time. | ✅ |\n| `suggest_places()` | Returns possible POI information based on a location and a search phrase. This function is not vectorized. | ✅ |\n| `geocoded_addresses()` | Bulk geocodes addresses returning a single location per address. Use this for highly performant and scalable address geocoding. | ❌ |\n\n\n# Get started\n\nTo start geocoding with the R-ArcGIS Bridge, install the R package from CRAN. \n\n\n\n\n```r\n# install from CRAN\ninstall.packages(\"arcgisgeocode\")\n```\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load the library\nlibrary(arcgisgeocode)\n```\n:::\n\n\n## Geocode an address\n\nPerform single address geocoding using the `find_address_candidates()` function. Limit the number of results using the `max_locations` argument. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nloc <- find_address_candidates(\n \"501 Edgewood Ave SE, Atlanta, GA 30312\", max_locations = 1\n)\n\nloc[, 1:8]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 1 feature and 8 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -84.37108 ymin: 33.75396 xmax: -84.37108 ymax: 33.75396\nGeodetic CRS: WGS 84\n input_id result_id loc_name status score\n1 1 NA World M 100\n match_addr\n1 501 Edgewood Ave SE, Atlanta, Georgia, 30312\n long_label short_label\n1 501 Edgewood Ave SE, Atlanta, GA, 30312, USA 501 Edgewood Ave SE\n geometry\n1 POINT (-84.37108 33.75396)\n```\n\n\n:::\n:::\n\n\n## Reverse geocode \n\nFrom a location, find its corresponding address using `reverse_geocode()`. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nreverse_geocode(c(-84.371, 33.753))\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nRegistered S3 method overwritten by 'jsonify':\n method from \n print.json jsonlite\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 1 feature and 22 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -84.37103 ymin: 33.75322 xmax: -84.37103 ymax: 33.75322\nGeodetic CRS: WGS 84\n match_addr\n1 39 Daniel St SE, Atlanta, Georgia, 30312\n long_label short_label addr_type\n1 39 Daniel St SE, Atlanta, GA, 30312, USA 39 Daniel St SE PointAddress\n type_field place_name add_num address block sector neighborhood\n1 39 39 Daniel St SE \n district city metro_area subregion region region_abbr territory\n1 Atlanta Fulton County Georgia GA \n postal postal_ext country_name country_code geometry\n1 30312 1907 United States USA POINT (-84.37103 33.75322)\n```\n\n\n:::\n:::\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
3 changes: 3 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ website:
- text: "Truncate and append features"
href: docs/editing/overwrite-features.qmd
- section: Geocoding
contents:
- docs/geocode/overview.qmd
- docs/geocode/forward-geocoding.qmd
- section: Places


Expand Down
Empty file added docs/geocode/bulk-geocoding.qmd
Empty file.
Empty file.
91 changes: 91 additions & 0 deletions docs/geocode/forward-geocoding.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
title: Forward Geocoding
---

Forward geocoding is the process of taking an address or place information and identifying its location on the globe.

To geocode addresses, the `{arcgisgeocode}` package provides the function `find_address_candidates()`. This function geocodes a single address at a time and returns up to 50 address candidates (ranked by a score).

There are two ways in which you can provide address information:

1. Provide the entire address as a string via the `single_line` argument
2. Provide parts of the address using the arguments `address`, `city`, `region`, `postal` etc.

# Single line address geocoding

It can be tough to parse out addresses into their components. Using the `single_line` argument is a very flexible way of geocoding addresses. Doing utilizes the ArcGIS World Geocoder's address parsing capabilities.

For example, we can geocode the same location using 3 decreasingly specific addresses.

```{r}
library(arcgisgeocode)
addresses <- c(
"380 New York Street Redlands, California, 92373, USA",
"Esri Redlands",
"ESRI CA"
)
locs <- find_address_candidates(
addresses,
max_locations = 1L
)
locs$geometry
```

In each case, it finds the correct address!

# Geocoding from a dataframe

Most commonly, you will need to geocode addresses from a column in a data.frame. It is important to note that the `find_address_candidates()` function does not work well in a `dplyr::mutate()` function call. Particularly because it is possible to return more than 1 address at a time.

Let's read in a csv of bike stores in Tacoma, WA. To use `find_address_candidates()` with a data.frame, it is recommended to create a unique identifier of the row positions.

```{r message = FALSE}
library(dplyr)
fp <- "https://www.arcgis.com/sharing/rest/content/items/9a9b91179ac44db1b689b42017471ae6/data"
bike_stores <- readr::read_csv(fp) |>
mutate(id = row_number())
bike_stores
```


To geocode addresses from a data.frame, you can use `dplyr::reframe()`.

```{r}
bike_stores |>
reframe(
find_address_candidates(original_address)
)
```

Notice how there are multiple results for each `input_id`. This is because the `max_locations` argument was not specified. To ensure only the best match is returned set `max_locations = 1`


```{r}
geocoded <- bike_stores |>
reframe(
find_address_candidates(original_address, max_locations = 1)
) |>
# reframe drops the sf class, must be added
sf::st_as_sf()
geocoded
```

With this result, you can now join the address fields back onto the `bike_stores` data.frame using a `left_join()`.

```{r}
left_join(
bike_stores,
geocoded,
by = c("id" = "input_id")
) |>
# left_join keeps the class of the first table
# must add sf class back on
sf::st_as_sf()
```
Loading

0 comments on commit f2c0d3a

Please sign in to comment.