-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
247 lines (180 loc) · 8.28 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
---
output: github_document
---
<!-- badges: start -->
[![R-CMD-check](https://github.com/R-ArcGIS/arcpbf/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/R-ArcGIS/arcpbf/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
options(max.print = 100)
```
# arcpbf
`{arcpbf}` is an R package that processes [Esri FeatureCollection Protocol Buffers](https://github.com/Esri/arcgis-pbf/tree/main/proto/FeatureCollection).
It is written in Rust and powered by the [extendr](https://github.com/extendr/extendr) library.
arcpbf has functions for reading protocol buffer (pbf) results from an ArcGIS
REST API result. pbf results are returned when `f=pbf` in a [query](https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service-layer-.htm).
The package is extremely lightweight and fast.
Limitation: this package does not support Z and M dimensions at this point.
## TL;DR
- `open_pbf()` will read a FeatureCollection `pbf` file into a raw vector
- `read_pbf()` will read a FeatureCollection `pbf` file _and_ process it with
- `resp_body_pbf()` and `resps_data_pbf()` process `httr2_response` objects
with FeatureCollection pbf bodies
- `process_pbf()` will process a raw vector or a list of raw vectors
- `post_process_pbf()` will apply post processing steps to the results of
`process_pbf()`
- set `use_sf = TRUE` to return an `sf` object if possible. Applied by
default in `read_pbf()`, `resp_body_pbf()` and `resps_data_pbf()`.
> ***Developer Note***: Rust must be installed to compile the package. Run the one line
installation instructions at https://rustup.rs/. To verify your Rust installation
is compatible, run `rextendr::rust_sitrep()`. That's it.
### PBF support
Note that _only_ the FeatureCollection pbf specification is supported by arcpbf.
If you want to process OSM pbf files use [`osmextract::oe_read()`](https://docs.ropensci.org/osmextract/reference/oe_read.html).
Or, if you want to create and read arbitrary protocol buffers directly in R,
use [`RprotoBuf`](https://cran.r-project.org/web/packages/RProtoBuf).
## Basic usage
In most cases, we will be processing a protocol buffer directly from an http
request created with [`{httr2}`](https://httr2.r-lib.org/).
```{r}
library(arcpbf)
# specify url to sent our request to
url <- "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/ACS_Population_by_Race_and_Hispanic_Origin_Boundaries/FeatureServer/2/query?where=1=1&outFields=objectid&resultRecordCount=10&f=pbf&token="
req <- httr2::request(url)
resp <- httr2::req_perform(req)
resp
```
We can process request responses with `resp_body_pbf()`. Post-processing steps
are applied by default. The arguments `post_process` and `use_sf` are `TRUE` by
default.
```{r}
resp_body_pbf(resp)
```
### Multiple response objects
When running multiple requests in parallel using
`httr2::req_perform_parallel()` the responses are returned as a list of
responses. `resps_data_pbf()` processes the responses in a vectorized
manner.
```{r}
# create a list of requests
reqs <- replicate(5, req, simplify = FALSE)
# perform them in parallel
resps <- httr2::req_perform_parallel(reqs)
# process the responses
resps_data_pbf(resps)
```
### Reading from a file
In some cases you may have a file on disk that you want to process a pbf from.
Use `read_pbf()` to do so. Again, post-processing steps are applied by default.
```{r}
fp <- system.file("small-points.pbf", package = "arcpbf")
read_pbf(fp)
```
## FeatureCollection Result Types
There are three types of PBF FeatureCollection responses that may be
returned as a result of a [Feature Service Query request](https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service-layer-.htm).
- **Feature Results**:
- the default query response type. Contains individual features with their
attributes and geometries if available.
- **Count Result**:
- returned when `returnCountOnly=true` in an API request. Returned as a scalar
integer vector.
- **Object ID Result**:
- returned when `returnIdsOnly=true`. A `data.frame` containing object IDs
where the column name is set to the object ID field name of the feature
service.
### Feature Results
Feature results can either omit geometry entirely, for example in the case of a
Table or when the query parameter `returnGeometry=false`, or include it. When
geometry is omitted entirely, the response is processed as a simple
`data.frame`. However, if the response does contain geometry, the response is a
bit more complex.
Unprocessed feature results with geometries return a named list with 3 elements:
- `attributes`:
- a `data.frame` of the fields and their values
- `sr`:
- a named list with elements `wkt`, `wkid`, `latest_wkid`, `vcs_wkid`,
and `latest_vcs_wkid`. These determine the coordinate reference system of the
response as well as the vertical coordinate reference system.
- `geometry`:
- an `sfc` object _**without a computed bounding box or coordinate reference
system set**_ or a CRS set.
```{r}
# read an example pbf without post-processing
fc_fp <- system.file("small-points.pbf", package = "arcpbf")
res <- read_pbf(fc_fp, post_process = FALSE)
res
```
When post-processing is applied to a geometry Feature Result, the CRS is set
and the bounding box is computed. This requires the `sf` package to be available.
```{r}
post_process_pbf(res)
```
## Lower level functions
The function `open_pbf()` will read a pbf file into a raw vector which can be
passed to `process_pbf()`. In general you will not need this function, but it
is handy for the sake of example.
```{r}
pbf_raw <- open_pbf(fc_fp)
head(pbf_raw, 20)
```
This raw vector can be turned into an R object using `process_pbf()`. The output
_will not_ be post processed.
```{r}
res <- process_pbf(pbf_raw)
res
```
Post-processing can be applied to the result of `process_pbf()` using
`post_process_pbf()`.
```{r}
post_process_pbf(res)
```
`post_process_pbf()` can also be applied to a list of processed pbf responses.
```{r}
multi_res <- list(res, res, res)
post_process_pbf(multi_res)
```
## Benchmarking
Below is a bench mark that compares processing pbfs to the current approach of processing
raw json in arcgislayers and arcgisutils. The below recreates the example from the README of arcgislayers.
```{r}
jsn <- function() {
json_reqs <- c(
"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=json&resultOffset=0",
"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=json&resultOffset=2001"
)
reqs <- lapply(json_reqs, httr2::request)
resps <- httr2::req_perform_parallel(reqs) |>
lapply(function(x) arcgisutils::parse_esri_json(httr2::resp_body_string(x)))
do.call(rbind.data.frame, resps) |>
sf::st_as_sf()
}
# protobuff processing
pbf <- function() {
pbf_reqs <- c(
"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=pbf&resultOffset=0",
"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=pbf&resultOffset=2001"
)
reqs <- lapply(pbf_reqs, httr2::request)
httr2::req_perform_parallel(reqs) |>
resps_data_pbf()
}
bench::mark(
jsn(),
pbf(),
check = FALSE,
relative = TRUE,
iterations = 5
)
```
## Internals
Internally, there is a rust crate [`esripbf`](./src/rust/esripbf) which is a
a Rust library built with [`prost`](https://github.com/tokio-rs/prost) to handle the [FeatureCollection Protocol Buffer Specification](https://github.com/Esri/arcgis-pbf/tree/main/proto/FeatureCollection).
## Future Notes
Alternatively, it may make sense to write to a geoarrow array and convert to sfc
using {wk}. These are just thoughts.