| Title: | Lagged Cross-Correlation Analysis of Environmental Time Series |
|---|---|
| Description: | Tools for analysing lagged relationships between environmental variables and ecological or epidemiological time series. The package implements a workflow to aggregate meteorological data over multiple lagged intervals, fit regression models for each lag window, and visualise effect strength and direction using cross-correlation maps (CCM). |
| Authors: | Nicolas Moiroux [aut, cre] (ORCID: <https://orcid.org/0000-0001-6755-6167>), Colombine Bartholomée [aut] (ORCID: <https://orcid.org/0000-0001-7291-5195>), Paul Taconet [aut] (ORCID: <https://orcid.org/0000-0001-7429-7204>) |
| Maintainer: | Nicolas Moiroux <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-02-05 22:28:59 UTC |
| Source: | https://github.com/Nmoiroux/ecoXCorr |
This function computes aggregated values of one or several meteorological
time series over all possible lagged time intervals defined relative to one
or more reference dates. For each reference date d, all intervals
are generated,
where i is the interval length (in days) and k, l range from 1 to m with
k >= l. Each interval is then used to aggregate the specified
meteorological variables using one or more summary functions.
aggregate_lagged_intervals( data, date_col, value_cols, d, i = 1, m, funs = list(mean = mean, min = min, max = max, sum = sum), na.rm = TRUE )aggregate_lagged_intervals( data, date_col, value_cols, d, i = 1, m, funs = list(mean = mean, min = min, max = max, sum = sum), na.rm = TRUE )
data |
A data.frame containing the meteorological time series. |
date_col |
Character string giving the name of the date column in
|
value_cols |
Character vector giving the names of numeric variables to be aggregated (e.g. rainfall, temperature). |
d |
Vector of reference *dates*. Can be of class |
i |
Integer giving the length of the base time *interval*, expressed in days
(e.g. |
m |
Integer giving the *maximum* lag (number of intervals) to consider.
All combinations of lag windows with |
funs |
Named list of aggregation functions to apply to each variable.
Each function must accept a numeric vector as first argument. The names
of the list are used to construct output column names
(e.g. |
na.rm |
Logical indicating whether missing values should be removed
before aggregation. Passed to the aggregation functions (default: |
The function supports multiple reference dates, multiple variables, and multiple aggregation functions, and returns all combinations as additional columns in the output data frame.
Reference dates for which at least one interval contains no observations are reported in the console as having missing data. Reference dates for which at least one interval partially lies outside the temporal bounds of the input time series are reported as having truncated intervals.
Intervals are defined as left-closed and right-open
([start, end)). An interval is considered truncated if it extends
beyond the temporal bounds of the input time series. An interval is considered
missing if no observations fall within it.
Console messages are printed to inform the user of reference dates for which missing data or truncated intervals occurred.
A data.frame with one row per reference date and lag window, containing:
date: reference date
start, end: start (inclusive) and end (exclusive) of
the aggregation interval
lag_start, lag_end: lag indices defining the interval
One column per combination of variable and aggregation function
(e.g. rain_mean, temperature_sum)
This dataset contains entomological sampling records of female *Aedes albopictus* collected in Montpellier (France) during 2023. The data originate from GBIF and correspond to a zero-truncated subset (i.e. only positive captures) of adult female mosquitoes collected using fixed traps across different areas of the city.
albopictusMPL2023albopictusMPL2023
A data.frame with the following variables:
Scientific name of the species (*Aedes albopictus*).
Number of female individuals captured during the sampling event.
Original event date as provided by GBIF.
Identifier of the trap used for sampling.
Identifier of the area where the trap is located.
Sampling date (class Date).
The dataset was derived from a GBIF annotated archive (DOI: 10.15468/4qafbu) and processed to extract sampling dates, trap identifiers, and spatial grouping variables.
Only records corresponding to adult females of *Aedes albopictus* were retained. Observations with zero counts were removed, resulting in a zero-truncated dataset suitable for abundance modelling.
The dataset can be used to illustrate lagged associations between
environmental variables and mosquito abundance, for example in
conjunction with the functions aggregate_lagged_intervals(),
fit_models_by_lag(), and plotCCM().
GBIF occurrence data: doi:10.15468/4qafbu
This function fits a regression model separately for each lag window
defined by the lag_start and lag_end columns of the input
data frame. For each lag window, the model is fitted using
observations corresponding to different reference dates (date),
and summary statistics (p-value, betas, sign of effect, , AIC reduction, sample size) are returned
for the specified predictor.
fit_models_by_lag( data, response, predictors, random = "", family = "gaussian()", min_n = 10, track = F, ... )fit_models_by_lag( data, response, predictors, random = "", family = "gaussian()", min_n = 10, track = F, ... )
data |
A data frame containing, at minimum, the columns
|
response |
Character string giving the name of the response variable. |
predictors |
Character vector of predictor names. Currently, only a single predictor is supported; providing more than one predictor will result in an error. |
random |
Optional character string specifying random-effects terms
to be added to the model formula (without a leading |
family |
Character string. The name of a family function
to be used in GLM or GLMM models. Default to "gaussian" (Linear model).
see |
min_n |
Minimum number of observations required to fit a model. (Currently not enforced; retained for future extensions.) |
track |
If TRUE, lag window is printed in the console before model fitting. |
... |
Additional arguments passed to the underlying modelling
function ( |
Both fixed-effect and mixed-effect models are supported.
The modelling function used depends on the random arguments:
For mixed-effects models, marginal (Nakagawa) is returned. For fixed-effects
models, classical is used.
For each unique combination of lag_start and lag_end, the
function:
Subsets the data to the corresponding lag window,
Removes rows with missing values in the response or predictor,
Fits the specified model,
Extracts beta parameter of the linear predictor,
Extracts the p-value of the predictor effect,
Computes AIC for the specified and null models,
Computes the model (marginal for mixed models),
Records the sign of the estimated effect and the sample size.
The returned table is suitable for lag-window screening, heatmap visualisation, or sensitivity analyses in epidemiological or ecological studies.
A data frame with one row per lag window, containing:
Start lag index of the aggregation window.
End lag index of the aggregation window.
Name of the predictor variable.
P-value associated with the predictor effect.
Coefficient of determination (marginal for mixed models).
Estimated beta parameter of the linear predictor.
Sign of the estimated predictor effect (-1 or +1).
AIC reduction compared to the null model.
Number of observations used to fit the model.
This dataset contains daily meteorological variables for Montpellier (Fréjorgues Airport, WMO station 07643) during the year 2023. The data were derived from 3-hourly SYNOP observations provided by Météo-France and aggregated to daily summaries.
meteoMPL2023meteoMPL2023
A data.frame with one row per day and the following variables:
Date of observation (class Date).
Daily mean wind speed (m·s⁻¹).
Daily minimum wind speed (m·s⁻¹).
Daily maximum wind speed (m·s⁻¹).
Daily mean air temperature (°C).
Daily minimum air temperature (°C).
Daily maximum air temperature (°C).
Daily mean dew point temperature (°C).
Daily minimum dew point temperature (°C).
Daily maximum dew point temperature (°C).
Daily mean relative humidity (%).
Daily minimum relative humidity (%).
Daily maximum relative humidity (%).
Daily mean atmospheric pressure (Pa).
Daily minimum atmospheric pressure (Pa).
Daily maximum atmospheric pressure (Pa).
Daily mean precipitation (mm).
Daily minimum precipitation (mm).
Daily maximum precipitation (mm).
Daily total precipitation (mm).
The dataset includes temperature, humidity, wind speed, atmospheric pressure and precipitation, and is intended for use in studies of lagged associations between environmental conditions and ecological or epidemiological time series.
Original 3-hourly SYNOP observations were filtered to retain data from Montpellier–Fréjorgues Airport (WIGOS station 0-20000-0-07643). Air temperature and dew point temperature were converted from Kelvin to degrees Celsius. Negative precipitation values were set to zero prior to aggregation.
Daily statistics (mean, minimum and maximum) were computed for all variables. For precipitation, daily totals are also provided.
This dataset is designed to be used in combination with
aggregate_lagged_intervals() to generate lagged environmental
predictors for ecological or epidemiological modelling.
Météo-France SYNOP data, distributed via data.gouv.fr: https://meteo.data.gouv.fr/datasets/686f8595b351c06a3a790867
This function visualises the strength and direction of associations between
a response variable and a lagged predictor across multiple lag windows, using
the output of fit_models_by_lag. The resulting plot is a
two-dimensional "cross-correlation map", where each tile represents a
lag window defined by lag_start and lag_end.
plotCCM( data, model_outcome = c("r2sign", "r2", "d_aic", "betas"), threshold_p = 1 )plotCCM( data, model_outcome = c("r2sign", "r2", "d_aic", "betas"), threshold_p = 1 )
data |
A data.frame produced by |
model_outcome |
Character string specifying the model's outcomes to plot. Either:
|
threshold_p |
Numeric value giving the p-value threshold above which
associations are masked (set to |
The colour of each tile corresponds to model_outcome. Positive associations are shown in red, negative
associations in blue, and non-significant or filtered values (using threshold_p) are shown in grey.
The lag window yielding the maximum absolute value of model_outcome is highlighted
with a coloured border.
The x-axis corresponds to lag_start (displayed in reverse order), and
the y-axis corresponds to lag_end. Tiles are coloured using a diverging
colour scale centred on zero. Lag windows with p_value >= threshold_p
are not displayed and appear in grey.
This function does not perform any modelling itself; it is intended solely
for visualising results obtained from fit_models_by_lag.
A ggplot2 object representing the cross-correlation map.