Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison BIEN4/GBIF #34

Open
basille opened this issue Jul 4, 2022 · 0 comments
Open

Comparison BIEN4/GBIF #34

basille opened this issue Jul 4, 2022 · 0 comments

Comments

@basille
Copy link

basille commented Jul 4, 2022

Hey @bmaitner!

Following our conversation a couple of weeks ago, I just take time now to provide a comparison (with example) between BIEN4 and GBIF data, of course using the two relevant R packages. I'll take the sycamore maple (Acer pseudoplatanus) for the illustration, although it's probably irrelevant. Here we go:

BIEN4 occurrence data

Note: This comes from my own records from a few days ago, as BIEN servers seem unresponsive as of today (The BIEN servers are currently undergoing updates and may be slower than usual at present.).

Information about BIEN:

library("BIEN")
BIEN_metadata_database_version()
  db_version db_release_date
1      4.2.5      2021-12-07

Get the data:

acps_bien <- BIEN_occurrence_species("Acer pseudoplatanus", 
    native.status = TRUE, 
    political.boundaries = TRUE)
dim(acps_bien)
[1] 1699   22

Only data after 1990:

acps_bien$date_collected <- lubridate::ymd(acps_bien$date_collected)
acps_bien <- subset(acps_bien, date_collected > lubridate::ymd("1990-01-01"))
dim(acps_bien)
[1] 728  22

Convert to sf class for mapping:

acps_bien <- st_as_sf(acps_bien, coords = c("longitude", "latitude"), remove = FALSE,
    crs = 4326, agr = "constant")
ggplot(data = world) +
    geom_sf(color = gray(.5), fill= "antiquewhite") +
    geom_sf(data = acps_bien, size = .1, alpha = .2, col = "brown3") +
    coord_sf(xlim = c(2.5e6, 7e6), ylim = c(1.3e6, 5.3e6), crs = st_crs(3035)) +
    labs(
        x = "Longitude",
        y = "Latitude",
        title = acps_nom_scient,
        subtitle = "Données BIEN"
    ) +
    theme(
        panel.grid.major = element_line(color = gray(.7),
        linetype = "dashed", size = 0.5),
        panel.background = element_rect(fill = "aliceblue"),
        plot.title = element_text(face = "italic")
    )

acps-bien-carte-1

GBIF occurrence data and comparison

Prepare the query and download the data:

library("rgbif")
acps_gbif_dl <- occ_download(
    pred("taxonKey", name_backbone(name = "Acer pseudoplatanus", rank = "species")$speciesKey), # Main key
    pred("hasGeospatialIssue", FALSE), # Remove default geospatial issues
    pred("hasCoordinate", TRUE),       # Keep only records with coordinates
    pred("occurrenceStatus","PRESENT"), # Remove absent records
    pred_not(pred_in("basisOfRecord",c("FOSSIL_SPECIMEN","LIVING_SPECIMEN"))), # Remove fossils and living specimens (zoo/botanical garden)
    pred_and( # Between 1990–2020 (both included)
        pred_gte("year", "1990"),
        pred_lte("year", "2020")),
    format = "SIMPLE_CSV"
)
occ_download_wait(acps_gbif_dl)
acps_gbif <- occ_download_get(acps_gbif_dl, path = "Data/gbif-acps/", overwrite = TRUE) |>
    occ_download_import()

Remove non-commercial data and check the resulting data:

acps_gbif <- subset(acps_gbif, license != "CC_BY_NC_4_0")
dim(acps_gbif)
[1] 387557  50

Convert to sf class for mapping:

acps_gbif <- st_as_sf(acps_gbif, coords = c("decimalLongitude", "decimalLatitude"),
    remove = FALSE, crs = 4326, agr = "constant")
ggplot(data = world) +
    geom_sf(color = gray(.5), fill= "antiquewhite") +
    geom_sf(data = acps_gbif, size = .1, alpha = .05, col = "brown3") +
    coord_sf(xlim = c(2.5e6, 7e6), ylim = c(1.3e6, 5.3e6), crs = st_crs(3035)) +
    labs(
        x = "Longitude",
        y = "Latitude",
        title = acps_nom_scient,
        subtitle = "Données GBIF"
    ) +
    theme(
        panel.grid.major = element_line(color = gray(.7),
        linetype = "dashed", size = 0.5),
        panel.background = element_rect(fill = "aliceblue"),
        plot.title = element_text(face = "italic")
    )

acps-cartes-1

Summary

There is a striking difference between the two datasets, even after removing a bunch of data with non-commercial restrictions (728 vs. 387557 records).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant