mod_spatial.qmd

---
title: "Working with Spatial Data"
code-annotations: hover
---

```{r design-team-note}
#| include: false

# !!! NOTE TO SSECR DESIGN TEAM !!!

# This module uses raster and vector data that *are not widely available*
# You _must_ run run the 'prep-data_spatial.R' script in the "scripts/" directory before you will be able to render this module
# Reach out to Nick Lyon if questions exist
```

## Overview

Synthesis projects often have need of spatial datasets. At its simplest, it can be helpful to have a map of the original project locations including in the synthesis dataset. In more complex instances you want to extract spatial data within a certain area of sampling locations. Regardless of 'why' you're using spatial data, it may come up during your primary or synthesis work and thus deserves consideration in this course's materials. There are _many_ modes of working with spatial data, and not all of these tools require coding literacy but for consistency with the rest of the modules **this module will focus on _scripted_ approaches to interacting with spatial data**.

## Learning Objectives

After completing this topic you will be able to: 

- <u>Define</u> the difference between the two major types of spatial data
- <u>Manipulate</u> spatial data with R
- <u>Create</u> maps using spatial data
- <u>Integrate</u> spatial data with tabular data

## Preparation

This is a "bonus" module and thus was created for asynchronous learners. There is no suggested preparatory work.

## Networking Session

::: panel-tabset
### 2024 Guests

This was a bonus module for the 2024-25 cohort and so did not include a networking session.
:::

## Needed Packages

If you'd like to follow along with the code chunks included throughout this module, you'll need to install the following packages:

```{r install-pkgs}
#| eval: false

# Note that these lines only need to be run once per computer
## So you can skip this step if you've installed these before
install.packages("tidyverse")
install.packages("sf")
install.packages("terra")
install.packages("maps")
install.packages("exactextractr")
```

## Types of Spatial Data

There are two main types of spatial data: vector and raster. Both types (and the packages they require) are described in the tabs below.

:::{.panel-tabset}
### Vector Data

Vector data are stored as polygons. Essentially vector data are a set of points and--sometimes--the lines between them that define the edges of a shape. They may store additional data that is retained in a semi-tabular format that relates to the polygon(s) but isn't directly stored in them.

Common vector data types include shape files or GeoJSONs.

```{r read-vector}
#| message: false
#| output: false

# Load needed library
library(sf)

# Read in shapefile
nc_poly <- sf::st_read(dsn = file.path("data", "nc_borders.shp")) #<1>
```
1. Note that even though we're only specifying the ".shp" file in this function you _must_ also have the associated files in that same folder. In this case that includes a ".dbf", ".prj", and ".shx", though in other contexts you may have others.

Once you have read in the shapefile, you can check its structure as you would any other data object. Note that the object has both the 'data.frame' class and the 'sf' ("simple features") class. In this case, the shapefile relates to counties in North Carolina and some associated demographic data in those counties.

```{r str-vector}
# Check structure
str(nc_poly)
```

If desired, we could make a simple R base plot-style map. In this case we'll do it based on just the county areas so that the polygons are filled with a color corresponding to how large the county is.

```{r plot-vector}
#| fig-align: center

# Make a graph
plot(nc_poly["AREA"], axes = T)
```

### Raster Data

Raster data are stored as values in pixels. The resolution (i.e., size of the pixels) may differ among rasters but in all cases the data are stored at a per-pixel level.

Common raster data types include GeoTIFFs (.tif) and NetCDF (.nc) files.

```{r read-raster}
#| message: false

# Load needed library
library(terra)

# Read in raster
nc_pixel <- terra::rast(x = file.path("data", "nc_elevation.tif"))
```

Once you've read in the raster file you can check it's structure as you would any other object but the resulting output is much less informative than for other object classes.

```{r str-raster}
# Check structure
str(nc_pixel)
```

Regardless, now that we have the raster loaded we can make a simple graph to check out what sort of data is stored in it. In this case, each pixel is 3 arcseconds on each side (~0.0002° latitude/longitude) and contains the elevation (in meters) of that pixel.

```{r plot-raster}
#| fig-align: center

# Make a graph
terra::plot(nc_pixel)
```

:::

## Coordinate Reference Systems

A fundamental problem in spatial data is how to project data collected on a nearly spherical planet onto a two-dimensional plane. This has been solved--or at least clarified--by the use of <u>C</u>oordinate <u>R</u>eference <u>S</u>ystems (a.k.a. "CRS"). All spatial data have a CRS that is explicitly identified in the data and/or the metadata because the data _are not interpretable_ without knowing which CRS is used.


The CRS defines the following information:

1. **Datum** -- model for shape of the Earth including the starting coordinate pair and angular units that together define any particular point on the planet
    - Note that there can be global datums that work for any region of the world and local datums that only work for a particular area
2. **Projection** -- math for the transformation to get from a round planet to a flat map
3. **Additional parameters** -- any other information necessary to support the projection
    - E.g., the coordinates at the center of the map

Some people use the analogy of peeling a citrus fruit and flattening the peel to describe the components of a CRS. The datum is the choice between a lemon or a grapefruit (i.e., the shape of the not-quite-spherical object) while the projection is the instructions for taking the complete peel and flattening it.

You can check and transform the CRS in any scripted language that allows the loading of spatial data though the specifics do differ between the types of spatial data we introduced earlier.

:::{.panel-tabset}
### Vector CRS

For vector data we can check the CRS with other functions from the `sf` library. It can be a little difficult to parse all of the information that returns but essentially it is most important that the CRS match that of any other spatial data with which we are working.

```{r crs-check-vector}
# Check CRS
sf::st_crs(x = nc_poly)
```

Once you know the CRS, you can transform the data to another CRS if desired. This is a relatively fast operation for vector data because we're transforming geometric data rather than potentially millions of pixels.

```{r crs-transform-vector}
# Transform CRS
nc_poly_nad83 <- sf::st_transform(x = nc_poly, crs = 3083) # <1>

# Re-check CRS
sf::st_crs(nc_poly_nad83)
```
1. In order to transform to a new CRS you'll need to identify the four-digit EPSG code for the desired CRS.

### Raster CRS

For raster data we can check the CRS with other functions from the `terra` library. It can be a little difficult to parse all of the information that returns but essentially it is most important that the CRS match that of any other spatial data with which we are working.

```{r crs-check-raster}
# Check CRS
terra::crs(nc_pixel)
```

As with vector data, if desired you can transform the data to another CRS. However, unlike vector data, transforming the CRS of raster data is _very_ computationally intense. If at all possible you should avoid re-projecting rasters. If you must re-project, consider doing so in an environment with greater computing power than a typical laptop. Also, you should export a new raster in your preferred CRS after transforming so that you reduce the likelihood that you need to re-project again later in the lifecylce of your project.

In the interests of making this website reasonably quick to re-build, the following code chunk is not actually evaluated but is the correct syntax for this operation.

```{r crs-transform-raster}
#| eval: false

# Transform CRS
nc_pixel_nad83 <- terra::project(x = nc_pixel, y = "epsg:3083")

# Re-check CRS
terra::crs(nc_pixel_nad83)
```

:::

## Making Maps

Now that we've covered the main types of spatial data as well as how to check the CRS (and transform if needed) we're ready to make maps! For consistency with other modules on data visualization, we'll use `ggplot2` to make our maps but note that base R also supports map making and there are many useful tutorials elsewhere on making maps in that framework.

The `maps` package includes some useful national and US state borders so we'll begin by preparing an object that combines both sets of borders.

```{r map-borders}
# Load library
library(maps)

# Make 'borders' object
borders <- sf::st_as_sf(maps::map(database = "world", plot = F, fill = T)) %>%
  dplyr::bind_rows(sf::st_as_sf(maps::map(database = "state", plot = F, fill = T)))
```

Note that the simplest way of making a map that includes a raster is to coerce that raster into a dataframe. To do this we will translate each pixel's geographic coordinates into X and Y values.

```{r map-raster-df}
nc_pixel_df <- as.data.frame(nc_pixel, xy = T) %>% 
    # Rename the 'actual' data layer more clearly
    dplyr::rename(elevation_m = SRTMGL3_NC.003_SRTMGL3_DEM_doy2000042_aid0001)
```

With the borders object and our modified raster in hand, we can now make a map that includes useful context for state/nation borders. Synthesis projects often cover a larger geographic extent than primary projects so this is particularly useful in ways it might not be for primary research.

```{r map-raster-ACTUAL}
#| include: false
#| eval: false

# Load library
library(ggplot2)

# Make map
raster_map <- ggplot(borders) +
  geom_sf(fill = "gray95") +
  coord_sf(xlim = c(-70, -90), ylim = c(30, 40), expand = F) +
  geom_tile(data = nc_pixel_df, aes(x = x, y = y, fill = elevation_m)) +
  labs(x = "Longitude", y = "Latitude")

# Export it
ggsave(filename = file.path("images", "spatial_raster-map-demo.png"),
       plot = raster_map, width = 8, height = 4, units = "in")
```

```{r map-raster}
#| eval: false

# Load library
library(ggplot2)

# Make map
ggplot(borders) +
  geom_sf(fill = "gray95") + # <1>
  coord_sf(xlim = c(-70, -90), ylim = c(30, 40), expand = F) + # <2>
  geom_tile(data = nc_pixel_df, aes(x = x, y = y, fill = elevation_m)) +
  labs(x = "Longitude", y = "Latitude")
```
1. This line is filling our nation polygons with a pale gray (helps to differentiate from ocean)
2. Here we set the map extent so that we're only getting our region of interest

<p align="center">
<img src="images/spatial_raster-map-demo.png" alt="ggplot2-style map of the southeastern US where North Carolina is colored based on the per-pixel elevation" width="80%">
</p>

From here we can make additional `ggplot2`-style modifications as/if needed. This variant of map-making supports adding tabular data objects as well (though they would require separate geometries). Many of the LTER Network Office-funded groups that make maps include points for specific study locations along with a raster layer for an environmental / land cover characteristic that is particularly relevant to their research question and/or hypotheses.

## Extracting Spatial Data

By far the most common spatial operation that LNO-funded synthesis working groups want to perform is extraction of some spatial covariate data within their region of interest. "Extraction" here includes (1) the actual gathering of values from the desired area, (2) summarization of those values, and (3) attaching those summarized values to an existing tabular dataset for further analysis/visualization. As with any coding task there are many ways of accomplishing this end but we'll focus on one method in the following code chunks: extraction in R via the `exactextractr` package.

This package expects that you'll want to extract raster data within a the borders described in some type of vector data. If you want the values in all the pixels of a GeoTIFF that fall inside the boundary defined by a shapefile, tools in this package will be helpful.

We'll begin by making a simpler version of our North Carolina vector data. This ensures that the extraction is as fast as possible for demonstrative purposes while still being replicable for you.

```{r extract-prep}
# Simplify the vector data
nc_poly_simp <- nc_poly %>% 
  dplyr::filter(NAME %in% c("Wake", "Swain")) %>% 
  dplyr::select(NAME, AREA)

# Check structure to demonstrate simplicity
dplyr::glimpse(nc_poly_simp) # <1>
```
1. Note that even though we used `select` to remove all but one column, the geometry information is retained!

Now let's use this simplified object and extract elevation for our counties of interest (normally we'd likely do this for all counties but the process is the same).

```{r extract-actual}
#| message: false

# Load needed libraries
library(exactextractr)
library(purrr)

# Perform extraction
extracted_df <- exactextractr::exact_extract(x = nc_pixel, y = nc_poly_simp, # <1>
                                             include_cols = c("NAME", "AREA"), # <2>
                                             progress = F) %>% # <3>
  # Collapse to a dataframe
  purrr::list_rbind(x = .) # <4>

# Check structure
dplyr::glimpse(extracted_df)
```
1. Note that functions like this one <u>assume that both spatial data objects use the same CRS</u>. We checked that earlier so we're good but remember to include that check _every_ time you do something like this!
2. All column names specified here from the vector data (see the `y` argument) will be retained in the output. Otherwise only the extracted value and coverage fraction are included.
3. This argument controls whether a progress bar is included. _Extremely_ useful when you have many polygons / the extraction takes a long time!
4. The default output of this function is a list with one dataframe of extracted values per polygon in your vector data so we'll unlist to a dataframe for ease of future operations

In the above output we can see that it has extracted the elevation of _every pixel_ within each of our counties of interest and provided us with the percentage of that pixel that is covered by the polygon (i.e., by the shapefile). We can now summarize this however we'd like and--eventually--join it back onto the county data via the column(s) we specified should be retained from the original vector data.

## Additional Resources

### Papers & Documents

- 

### Workshops & Courses

- The Carpentries. [Introduction to Geospatial Raster and Vector Data with R](https://datacarpentry.org/r-raster-vector-geospatial/). **2024**.
- The Carpentries. [Introduction to R for Geospatial Data](https://datacarpentry.org/r-intro-geospatial/index.html). **2024**.
- King, R. [Spatial Data Visualization](https://github.com/king0708/spatial-data-viz). **2024**.
- Flower, J. [Introduction to Rasters with `terra`](https://jflowernet.github.io/intro-terra-ecodatascience/). **2024**.
- Clark, S.J., _et al._ [Spatial and Image Data Using GeoPandas](https://learning.nceas.ucsb.edu/2023-03-arctic/sections/geopandas.html). **2023**.

### Websites

- NASA. [AppEEARS Portal](https://appeears.earthdatacloud.nasa.gov/)