Add a haulout indicator to tracking data from an SMRU haulout file

reads an SMRU-format haulout data file (or accepts a pre-read data frame) and stamps a binary haulout indicator ho onto each row of the tracking data based on whether that observation's timestamp falls within any haulout interval s_date, e_date for the matching individual. The resulting ho column is recognised automatically by fit_ssm() when supplied via the haulout argument, and is used by the mp, crw, and rw process models to constrain location estimates during haulout periods.

smru_haulout(x, haulout, ref = "ref", tz = "UTC", id_fun = NULL)

Arguments

x: a data.frame, tibble or sf-tibble of tracking observations with at least columns id and date (POSIXct or coercible to one). Typically the raw input to fit_ssm, before format_data is called.
haulout: either a character string giving the path to the SMRU haulout CSV file, or a pre-read data frame. Must contain columns matching ref (or the column named by the ref argument), s_date, and e_date. Only the s_date and e_date columns are used to define haulout intervals; all other columns (including lat, lon) are ignored.
ref: name of the column in haulout that identifies individuals and corresponds to the id column in x. Default "ref", which is the standard SMRU column name.
tz: timezone for parsing s_date and e_date. Default "UTC".
id_fun: optional function applied to the ref column of the haulout file to transform individual IDs before matching against x$id. Useful when the two files use different naming conventions, e.g., id_fun = function(x) gsub("-", "_", x) to replace hyphens with underscores. Default NULL (no transformation).

Value

x with an additional integer column ho: 1 if the observation falls within a haulout period, 0 otherwise. If x already has an ho column it is overwritten with a warning.

Details

the SMRU haulout file contains one row per haulout event with columns s_date (haulout start) and e_date (haulout end) in ISO 8601 UTC format (YYYY-MM-DDTHH:MM:SSZ). Interval membership is inclusive at both boundaries: an observation at exactly s_date or e_date is coded ho = 1.

This function is intended to be called either directly by the user before fit_ssm, or internally via the haulout argument of fit_ssm. In either case, the ho column must be present in x before fit_ssm is called, as ho does not survive format_data or prefilter.

A warning is issued for any individual in x that has no matching records in the haulout file. All observations for that individual will have ho = 0, which is the safe fallback (no haulout constraint applied).

Examples

if (FALSE) { # \dontrun{
## called directly - add ho before fitting
d <- read.csv("my_tracks.csv")
d <- smru_haulout(d, "haulout_ct189.csv")
fit <- fit_ssm(d, model = "mp", time.step = 24,
               control = ssm_control(ho_scale = 0.01, verbose = 0))

## called via fit_ssm haulout argument - equivalent to above
fit <- fit_ssm(d, model = "mp", time.step = 24,
               haulout = "haulout_ct189.csv",
               control = ssm_control(ho_scale = 0.01, verbose = 0))

## pre-read and filter the haulout file before passing to fit_ssm
ho_raw <- read.csv("haulout_ct189.csv")
ho_raw <- subset(ho_raw, cid == "ct189")
fit <- fit_ssm(d, model = "mp", time.step = 24,
               haulout = ho_raw,
               control = ssm_control(ho_scale = 0.01, verbose = 0))

## IDs differ between files: haulout uses "ct189-576-25",
## tracking data uses "ct189_576_25"
d <- smru_haulout(d, "haulout_ct189.csv",
                  id_fun = function(x) gsub("-", "_", x))
} # }