WC_config_file • ArgosQC

Wildlife Computers QC config file structure

The Wildlife Computers JSON config file has the same 4-block structure as the SMRU_config_file. The meta block is similarly optional, however, some of the parameters within the blocks differ. Below is the config file for a NSWOO (IRAP) QC workflow on WC SPOT6 tags deployed on loggerhead turtles in NSW, Australia.

The slightly different config parameter structure accounts for differences in data structures between the 2 manufacturers’ tags. The WC config parameters are as follows:

setup config block specifies the program overseeing data assembly & paths to required data, metadata & output directories:
- program the national (or other) program of which the data is a part. For example, imos, atn, otn.
- data.dir the name of the data directory. Must reside within the wd.
- meta.file the metadata filename. Must reside within the wd. Can be NULL, in which case, the meta config block (see below) must be present & tag-specific metadata are acquired from the WC Data Portal.
- maps.dir the directory path to write diagnostic maps of QC’d tracks.
- diag.dir the directory path to write diagnostic time-series plots of QC’d lon & lat.
- output.dir the directory path to write QC output CSV files. Must reside within the wd.
- return.R a logical indicating whether the function should return a list of QC-generated, internal R objects. This results in a single large object returned to the R work space containing the following elements:
  - dropIDs the WC uuid’s dropped from the QC process
  - wc the WC tag data files downloaded from the WC Data Portal
  - meta the deployment metadata formatted for use in the QC workflow (ie. not the final output format)
  - locations_sf the projected location data to be passed as input to the SSM
  - fit1 the initial SSM output fit object
  - fit2 the final SSM output fit object including re-routed locations if specified.
  - wc_ssm the SSM-annotated WC tag data files. This output object can be useful for troubleshooting undesirable results during delayed-mode (supervised) QC workflows.

harvest config block specifies data harvesting parameters:
- download a logical indicating whether tag data are to be downloaded from the WC Data Portal API or read from the local data.dir.
- owner.id the Wildlife Computers collaborator ID, which is required if the user does not own or otherwise does not have direct access to the tag data. Note, that data-sharing collaborations must be set up in the Wildlife Computers Data Portal prior to accessing via the API.
- wc.akey the Wildlife Computers Access Key that all Portal users must have to access the API.
- wc.skey the Wildlife Computers Secret Key.
- tag.list a .CSV file with a single variable named uuid, providing the uuid’s to be downloaded. This ensures only the subset of desired tag datasets are downloaded. If not provided (NULL) then ALL the tag datasets attributed to the owner.id or to user providing wc-keys will be downloaded.
- dropIDs a .CSV file with a single variable named uuid, providing the uuid’s to be ignored by the QC process. Can be NULL.

model config block specifies model- and data-specific parameters:
- model the aniMotum SSM model to be used for the location QC - typically either rw or crw.
- vmax for SSM fitting; max travel rate (m/s) to identify implausible locations
- time.step the prediction interval (in decimal hours) to be used by the SSM
- proj the proj4string to be used for the location data & for the SSM-estimated locations. Can be NULL, which will result in one of 5 projections being used, depending on whether the centroid of the observed latitudes lies in N or S polar regions, temperate or equatorial regions, or if tracks straddle (or lie close to) -180,180 longitude.
- reroute a logical; whether QC’d tracks should be re-routed off of land (default is FALSE). Note, in some circumstances this can substantially increase processing time. Default land polygon data are sourced from the ropensci/rnaturalearthhires R package.
- dist the distance in km from outside the convex hull of observed locations from which to select land polygon data for re-routing. Ignored if reroute = FALSE.
- barrier an optional filepath to an alternate polygon data file (shapefile) for the land (or other) barrier. For example, higher resolution local coastline data can be supplied, provided the data extend at least dist km beyond the extent of the track data.
- buffer the distance in km to buffer rerouted locations from the coastline. Ignored if reroute = FALSE.
- centroids whether centroids are to be included in the visibility graph mesh used by the rerouting algorithm. See ?pathroutr::prt_visgraph for details. Ignored if reroute = FALSE.
- cut logical; should predicted locations be dropped if they lie within in a large data gap (default is FALSE).
- min.gap the minimum data gap duration (h) to be used for cutting predicted locations (default is 72 h)
- QCmode one of either nrt for Near Real-Time QC or dm for Delayed Mode QC.
- pred.int the prediction interval (in hours) to be used. Typically, this is the same as the time.step.

meta config block specifies species and deployment location information. This config block is only necessary when no metadata file is provided in the setup block.
- common_name the species common name (e.g., “loggerhead turtle”)
- species the species scientific name (e.g., “Caretta caretta”)
- release_site the location where tags were deployed (e.g., “NSW”)
- state_country the country/territory name (e.g., “Australia”)

With a completed config file, the standard call to initiate the QC workflow within R is:

wc_qc(wd = "test", config = "irap_config.json")

where wd is the file path for the working directory within which all QC data/metadata inputs are downloaded (or read) and outputs are written.

Additional details on config parameters

The Wildlife Computers API credentials: collab.id, wc.akey, and wc.skey may be used to download data directly from the Wildlife Computers Portal, in this case, data are written to tag-specific directories within the specified data.dir directory. Alternatively, wc_qc() may be used with local copies of Wildlife Computers tag data, provided they are stored in tag-specific directories within the data.dir directory.

The proj argument specifies the projection (as a proj4string) to which the tag-measured locations are converted as input to the QC state-space model (SSM), ie. the working projection in km for the SSM. Any valid proj4string may be used, provided the units are in km. If proj is left as NULL then the QC algorithm will project the data differently depending on the centroid latitude of the tracks. The default projections are:

Central Latitude or Longitude	Projection (with `+units=km`)
-55 to -25 or 25 to 55 Lat	Equidistant Conic with standard parallels at the tracks’ 25th & 75 percentile Latitudes
< -55 or > 55 Lat	Stereographic with origin at the tracks’ centroid
-25 to 25 Lat	Mercator with origin at the tracks’ centroid
-25 to 25 Lat & Long straddles -180,180	Longitudes are shifted to 0, 360 and a Mercator with origin at tracks’ centroid

The model argument specifies the aniMotum SSM to be used; typically either rw or crw. The latter is usually less biased when data gaps are absent, the former is best when data gaps are present. A general recommendation is to use model:rw as the SSM for unsupervised (e.g., NRT) QC workflows. The SSM fitting algorithm has a few fundamental parameters that need to be specified; vmax is the animals’ maximum plausible travel rate in ms $^{-1}$ . For example, vmax:3 is usually appropriate for seals and vmax:2 for turtles. The SSM prediction interval in hours is specified with time.step. Decimal hours can be used for time.steps shorter than 1 hr. This time interval determines the temporal resolution of the predicted track. The predicted track locations provide the basis for interpolation to the time of each tag-measured ocean observation or behavioural event. Typically, 6 hours is appropriate for most Argos data collected from seals and turtles but a finer time interval may be required for faster moving species and/or more frequently measured ocean observations, and a coarser interval for more sporadically observed locations. Further details on SSM fitting to Argos and GPS data are provided in the associated R package aniMotum vignettes and in Jonsen et al. 2023.

When animals pass close to land some SSM-predicted locations may implausibly lie on land. Often, this is due to the spatial and temporal resolution of the Argos tracking data. In these cases, SSM-predicted locations can be adjusted minimally off of land by setting reroute:true. The pathroutr R package is used for efficient rerouting. In this case, additional arguments should be specified:

dist - the distance in km beyond track locations from which coastline polygon data should be sampled (smaller provides less information for path re-routing, greater increase computation time)

barrier - an optional parameter that can provide an alternate spatial polygon dataset , as a shapefile, for the land (or other barriers to movement). Typically, this alternate dataset would be a localised, high-resolution coastline dataset.

buffer - the distance in km to buffer rerouted locations from the coastline

centroid- whether to include the visibility graph centroids for greater resolution

SSM-predicted tracks can be cut (cut:true) in regions where large location data gaps exist. These location data gaps can occur when the tags are unable to transmit for extended periods or when animal surfacing occurs during periods of Argos satellite unavailability (more common closer to the equator than at higher latitudes). In this case, min.gap is used to specify the minimum data gap duration (h) from which to cut SSM-predicted locations. This will limit interpolation artefacts due to implausible SSM-predicted locations in excessively long data gap periods.

The QCmode sets whether the QC is being conducted in delayed-mode dm or near real-time nrt. Delayed-mode is reserved for when tag deployments have ended and usually involve greater user intervention; such as making decisions on removing aberrant portions of a deployment (e.g., as tag batteries begin failing). The nrt mode is meant to be fully automated and only used while a deployment is active. In both cases, the output .CSV and plot file names will include the QCmode as a suffix.