Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download functions - downloadLandings #135

Open
MikkoVihtakari opened this issue Mar 30, 2021 · 1 comment
Open

Download functions - downloadLandings #135

MikkoVihtakari opened this issue Mar 30, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@MikkoVihtakari
Copy link

MikkoVihtakari commented Mar 30, 2021

I don't know what you guys think, but I'd need functions that download data from NMD in my work. The most natural place for me would be to place these functions in the RstoxData package. Under is my attempt to download landings data.

FDIR species codes:

tmp.file <- tempfile()
download.file("https://github.com/MikkoVihtakari/RstoxUtils/raw/master/data/FDIRcodes.rda", tmp.file)
load(tmp.file)

Function to download the landings (requires FDIRcodes data frame):

#' @title Download landings data for a species from the IMR database
#' @description The function downloads landings ("sluttseddel") data from IMR database. Requires access to the intranet. 
#' @param species any species identification name in \code{FDIRcodes$speciesCodes} as character. Only one species at the time allowed.
#' @param years an integer vector of years to download. If \code{NULL} (default), all years are downloaded. Please note that this option can take very long time and lead to huge datasets.
#' @author Mikko Vihtakari
#' @importFrom RstoxData readXmlFile
#' @examples \dontrun{
#' downloadLandings("brugde") # Basking shark, all years
#' downloadLandings("kveite", years = 2000:2001) # halibut, 2000-01
#' }
#' @export

# species  <- "blåkveite"; years <- c(1900:2020)
downloadLandings <- function(species, years = NULL) {
  
  ## Set up variables
  
  splist <- as.data.frame(FDIRcodes$speciesCodes)
  dest <- tempfile(fileext = ".xml")
  APIpath <- "http://tomcat7.imr.no:8080/apis/nmdapi/landing/v2?version=2.0&type=search"
  
  ## Find the species code
  
  species <- paste0("^", species, "$")
  tmp <- sapply(colnames(splist), function(x) grep(species, splist[,x], ignore.case = TRUE))
  
  if(all(sapply(tmp, function(k) length(k) == 0))) stop(paste(species, "not found from FDIRcodes$speciesCodes"))
  
  if(sum(sapply(tmp, function(k) length(k) == 1)) > 1) {
    stop(
      paste(species, "was matched to", 
            paste(names(sapply(tmp, function(k) length(k) == 1)[sapply(tmp, function(k) length(k) == 1)]), collapse = ", "),
            ". Cannot extract information from multiple columns."
      )
    )
  }
  
  # spCol <- names(tmp[sapply(tmp, function(k) length(k) == 1)])
  spRow <- unlist(unname(tmp[sapply(tmp, function(k) length(k) == 1)]))
  spCode <- splist[spRow, "idNS"]
  
  spCode <- ifelse(nchar(spCode) == 3, paste0(0, spCode), spCode)
  
  ## Set up the download path
  
  if(is.null(years)) {
    DownloadPath <- paste0(APIpath, "&Art_kode=", spCode)
  } else {
    DownloadPath <- paste0(APIpath, "&Art_kode=", spCode, "&Fangstar=", paste(years, collapse = ","))
  }
  
  ## Download the data from the database 
  
  status <- suppressMessages(suppressWarnings(try(utils::download.file(DownloadPath, dest), silent = TRUE)))
  
  if(class(status) == "try-error") {
    
    ## Stop processing if not found
    
    stop(paste("Species code", spCode, "with years", paste(years, collapse = ","), "not found from the database."))
    
  } else {
    
    ## Read the data
    
    RstoxData::readXmlFile(dest)
    
  }
}

And a function to prepare FDIR codes (requires their messy Excel sheet):


#' @title Retrieve the Norwegian Directorate of Fisheries codes from a code list
#' @description This function retrieves codes used in the electronic logbook data from an Excel sheet published by the Directorate of Fisheries. This list is already supplied in the package and the function is only required to update the codes.
#' @param path Character string specifying the path to the Excel file downloaded from the \href{https://www.fiskeridir.no/Yrkesfiske/Rapportering-ved-landing/Kodeliste}{Directorate of Fisheries webpage}.
#' @param speciesSheet Character string specifying the name of the tab containing species codes.
#' @param speciesStartRow Integer specifying the \code{skip} argument for \code{\link[readxl]{read_xlsx}} in the species code tab.
#' @param speciesHeaderRow Integer specifying row number of header in the species code tab.
#' @param gearSheet Character string specifying the name of the tab containing species codes.
#' @param gearStartRow Integer specifying the \code{skip} argument for \code{\link[readxl]{read_xlsx}} in the gear code tab.
#' @details The function has been written for \href{https://www.fiskeridir.no/Yrkesfiske/Rapportering-ved-landing/Kodeliste}{the code list Excel sheet} published on 2020-10-30. You may have to adjust the function depending on changes in newer versions of the file.
#' @import readxl
#' @author Mikko Vihtakari
#' @export

# path = "~/Desktop/Kodeliste-landing-171219.xlsx"; speciesSheet = "B-Fiskeslag"; speciesStartRow = 19; speciesHeaderRow = 17; gearSheet = "A7-Redskap"; gearStartRow = 8
readFdirCodes <- function(path,
                          speciesSheet = "B-Fiskeslag",
                          speciesStartRow = 19,
                          speciesHeaderRow = 17,
                          gearSheet = "A7-Redskap",
                          gearStartRow = 8
) {
  
  ## Species codes
  
  dt <- suppressMessages(readxl::read_xlsx(path = path, sheet = speciesSheet, skip = speciesStartRow, col_names = FALSE))
  header <- suppressMessages(readxl::read_xlsx(path = path, sheet = speciesSheet, col_names = FALSE, range = paste0("A", speciesHeaderRow, ":", LETTERS[ncol(dt)], speciesHeaderRow)))
  
  colnames(dt) <- as.character(header[1,])
  
  dt <- dt[c("Tall", "FAO", "Norsk navn", "Engelsk navn", "Latinsk navn")]
  dt <- dt[rowSums(is.na(dt)) != ncol(dt), ]
  colnames(dt) <- c("idNS", "idFAO", "norwegian", "english", "latin")
  
  dt$idNS <- suppressWarnings(as.numeric(dt$idNS))
  dt <- dt[!is.na(dt$idNS),]
  dt <- dt[!duplicated(dt$idNS),]
  dt$norwegian <- trimws(gsub("\\*", "", dt$norwegian))
  dt$english <- trimws(gsub("\\*", "", dt$english))
  
  speciesCodes <- dt
  
  ## Gear codes
  
  dt <- suppressMessages(readxl::read_xlsx(path = path, sheet = gearSheet, skip = gearStartRow, col_names = FALSE))
  dt <- dt[,1:2]
  colnames(dt) <- c("idGear", "gearName")
  dt <- dt[!is.na(dt$idGear),]
  dt$gearCategory <- cut(dt$idGear, seq(10, 100, 10), right = FALSE, labels = c("Noter", "Garn", "Kroker", "Ruser", "Traal", "Noter", "Skytevaapen", "Annet", "Annet"))
  
  gearCodes <- dt
  
  ## Return
  
  list(speciesCodes = speciesCodes, gearCodes = gearCodes)
  
}

Source: https://github.com/MikkoVihtakari/RstoxUtils

@MikkoVihtakari MikkoVihtakari added the enhancement New feature or request label Mar 30, 2021
@edvinf
Copy link
Contributor

edvinf commented Sep 20, 2021

As NMD provide an API for for landings as well, it would be natural to include download services in the package @arnejohannesholmin presents in comments to #136. If it is going to be a pure download package, the code for parsing the Fdir codelists is probably best included in RstoxData along with the code for parsing the landings.

@edvinf edvinf self-assigned this Sep 20, 2021
arnejohannesholmin added a commit that referenced this issue Apr 20, 2023
Changed ImputeSuperIndividuals() to not add rows from EDSUs with no a…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants