| Title: | Sparse Multiple Index Models for Nonparametric Forecasting |
|---|---|
| Description: | Implements a general algorithm for estimating Sparse Multiple Index (SMI) models for nonparametric forecasting and prediction. Estimation of SMI models requires the Gurobi mixed integer programming (MIP) solver via the gurobi R package. To use this functionality, the Gurobi Optimizer must be installed, and a valid license obtained and activated from <https://www.gurobi.com>. The gurobi R package must then be installed and configured following the instructions at <https://support.gurobi.com/hc/en-us/articles/14462206790033-How-do-I-install-Gurobi-for-R>. The package also includes functions for fitting nonparametric additive models with backward elimination, group-wise additive index models, and projection pursuit regression models as benchmark comparison methods. In addition, it provides tools for generating prediction intervals to quantify uncertainty in point forecasts produced by the SMI model and benchmark models, using the classical block bootstrap and a new method called conformal bootstrap, which integrates block bootstrap with split conformal prediction. |
| Authors: | Nuwani Palihawadana [aut, cre, cph] (ORCID: <https://orcid.org/0009-0008-6395-7797>), Xiaoqian Wang [ctb] (ORCID: <https://orcid.org/0000-0003-4827-496X>) |
| Maintainer: | Nuwani Palihawadana <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.3 |
| Built: | 2026-05-09 06:39:02 UTC |
| Source: | https://github.com/nuwani-palihawadana/smimodel |
Implements a general algorithm for estimating Sparse Multiple Index (SMI) models for nonparametric forecasting and prediction. Estimation of SMI models requires the Gurobi mixed integer programming (MIP) solver via the gurobi R package. To use this functionality, the Gurobi Optimizer must be installed, and a valid license obtained and activated from https://www.gurobi.com. The gurobi R package must then be installed and configured following the instructions at https://support.gurobi.com/hc/en-us/articles/14462206790033-How-do-I-install-Gurobi-for-R. The package also includes functions for fitting nonparametric additive models with backward elimination, group-wise additive index models, and projection pursuit regression models as benchmark comparison methods. In addition, it provides tools for generating prediction intervals to quantify uncertainty in point forecasts produced by the SMI model and benchmark models, using the classical block bootstrap and a new method called conformal bootstrap, which integrates block bootstrap with split conformal prediction.
Maintainer: Nuwani Palihawadana [email protected] (ORCID) [copyright holder]
Other contributors:
Xiaoqian Wang [email protected] (ORCID) [contributor]
Useful links:
Report bugs at https://github.com/nuwani-palihawadana/smimodel/issues
Constructs vectors of coefficients for each index including a coefficient for all the predictors that are entering indices. i.e. if a coefficient is not provided for a particular predictor in a particular index, the function will replace the missing coefficient with a zero.
allpred_index(num_pred, num_ind, ind_pos, alpha)allpred_index(num_pred, num_ind, ind_pos, alpha)
num_pred |
Number of predictors. |
num_ind |
Number of indices. |
ind_pos |
A list of length = |
alpha |
A vector of index coefficients. |
A list containing the following components:
alpha_init_new |
A
|
index |
An |
index_positions |
A list of length = |
backward
Generates residuals and fitted values of a fitted backward object.
## S3 method for class 'backward' augment(x, ...)## S3 method for class 'backward' augment(x, ...)
x |
A |
... |
Other arguments not currently used. |
A tibble.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1205 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:8] # Model fitting backwardModel <- model_backward(data = sim_train, val.data = sim_val, yvar = "y", s.vars = s.vars) # Obtain residuals and fitted values augment(backwardModel)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1205 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:8] # Model fitting backwardModel <- model_backward(data = sim_train, val.data = sim_val, yvar = "y", s.vars = s.vars) # Obtain residuals and fitted values augment(backwardModel)
gaimFit
Generates residuals and fitted values of a fitted gaimFit object.
## S3 method for class 'gaimFit' augment(x, ...)## S3 method for class 'gaimFit' augment(x, ...)
x |
A |
... |
Other arguments not currently used. |
A tibble.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictors taken as index variables index.vars <- colnames(sim_data)[3:7] # Assign group indices for each predictor index.ind = c(rep(1, 3), rep(2, 2)) # Predictors taken as non-linear variables not entering indices s.vars = "x_lag_005" # Model fitting gaimModel <- model_gaim(data = sim_data, yvar = "y", index.vars = index.vars, index.ind = index.ind, s.vars = s.vars) # Obtain residuals and fitted values augment(gaimModel)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictors taken as index variables index.vars <- colnames(sim_data)[3:7] # Assign group indices for each predictor index.ind = c(rep(1, 3), rep(2, 2)) # Predictors taken as non-linear variables not entering indices s.vars = "x_lag_005" # Model fitting gaimModel <- model_gaim(data = sim_data, yvar = "y", index.vars = index.vars, index.ind = index.ind, s.vars = s.vars) # Obtain residuals and fitted values augment(gaimModel)
gamFit
Generates residuals and fitted values of a fitted gamFit object.
## S3 method for class 'gamFit' augment(x, ...)## S3 method for class 'gamFit' augment(x, ...)
x |
A |
... |
Other arguments not currently used. |
A tibble.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:6] # Predictors taken as linear variables linear.vars <- colnames(sim_data)[7:8] # Model fitting gamModel <- model_gam(data = sim_data, yvar = "y", s.vars = s.vars, linear.vars = linear.vars) # Obtain residuals and fitted values augment(gamModel)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:6] # Predictors taken as linear variables linear.vars <- colnames(sim_data)[7:8] # Model fitting gamModel <- model_gam(data = sim_data, yvar = "y", s.vars = s.vars, linear.vars = linear.vars) # Obtain residuals and fitted values augment(gamModel)
lmFit
Generates residuals and fitted values of a fitted lmFit object.
## S3 method for class 'lmFit' augment(x, ...)## S3 method for class 'lmFit' augment(x, ...)
x |
A |
... |
Other arguments not currently used. |
A tibble.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictor variables linear.vars <- colnames(sim_data)[3:8] # Model fitting lmModel <- model_lm(data = sim_data, yvar = "y", linear.vars = linear.vars) # Obtain residuals and fitted values augment(lmModel)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictor variables linear.vars <- colnames(sim_data)[3:8] # Model fitting lmModel <- model_lm(data = sim_data, yvar = "y", linear.vars = linear.vars) # Obtain residuals and fitted values augment(lmModel)
pprFit
Generates residuals and fitted values of a fitted pprFit object.
## S3 method for class 'pprFit' augment(x, ...)## S3 method for class 'pprFit' augment(x, ...)
x |
A |
... |
Other arguments not currently used. |
A tibble.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_data, yvar = "y", index.vars = index.vars) # Obtain residuals and fitted values augment(pprModel)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_data, yvar = "y", index.vars = index.vars) # Obtain residuals and fitted values augment(pprModel)
smimodel
Generates residuals and fitted values of a fitted smimodel object.
## S3 method for class 'smimodel' augment(x, ...)## S3 method for class 'smimodel' augment(x, ...)
x |
A |
... |
Other arguments not currently used. |
A tibble.
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_data, yvar = "y", index.vars = index.vars, initialise = "ppr") # Obtain residuals and fitted values augment(smimodel_ppr) }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_data, yvar = "y", index.vars = index.vars, initialise = "ppr") # Obtain residuals and fitted values augment(smimodel_ppr) }
smimodelFit
Generates residuals and fitted values of a fitted smimodelFit object.
## S3 method for class 'smimodelFit' augment(x, ...)## S3 method for class 'smimodelFit' augment(x, ...)
x |
A |
... |
Other arguments not currently used. |
A tibble.
smimodel
Plots the graphs of fitted spline(s). If a set of multiple models are fitted,
plots graphs of fitted spline(s) of a specified model (in argument
model) out of the set of multiple models fitted.
## S3 method for class 'smimodel' autoplot(object, model = 1, ...)## S3 method for class 'smimodel' autoplot(object, model = 1, ...)
object |
A |
model |
An |
... |
Other arguments not currently used. |
Plot(s) of fitted spline(s).
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_data, yvar = "y", index.vars = index.vars, initialise = "ppr") autoplot(smimodel_ppr) }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_data, yvar = "y", index.vars = index.vars, initialise = "ppr") autoplot(smimodel_ppr) }
This is a wrapper for the function conformalForecast::coverage.
Calculates the mean coverage and the ifinn matrix for prediction intervals on
validation set. If window is not NULL, a matrix of the rolling
means of interval forecast coverage is also returned.
avgCoverage(object, level = 95, window = NULL, na.rm = FALSE)avgCoverage(object, level = 95, window = NULL, na.rm = FALSE)
object |
An object of class |
level |
Target confidence level for prediction intervals. |
window |
If not |
na.rm |
A |
A list of class coverage with the following components:
mean |
Mean coverage across the validation set. |
ifinn |
A indicator matrix as a multivariate time series, where the
|
rollmean |
If |
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1055 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1050, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_train, yvar = "y", index.vars = index.vars) # Conformal bootstrap prediction intervals (2-steps-ahead interval forecasts) set.seed(12345) pprModel_cb <- cb_cvforecast(object = pprModel, data = sim_data, yvar = "y", predictor.vars = index.vars, h = 2, ncal = 30, num.futures = 100, window = 1000) # Mean coverage of generated 95% conformal bootstrap prediction intervals cov_data <- avgCoverage(object = pprModel_cb) cov_data$meanlibrary(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1055 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1050, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_train, yvar = "y", index.vars = index.vars) # Conformal bootstrap prediction intervals (2-steps-ahead interval forecasts) set.seed(12345) pprModel_cb <- cb_cvforecast(object = pprModel, data = sim_data, yvar = "y", predictor.vars = index.vars, h = 2, ncal = 30, num.futures = 100, window = 1000) # Mean coverage of generated 95% conformal bootstrap prediction intervals cov_data <- avgCoverage(object = pprModel_cb) cov_data$mean
This is a wrapper for the function conformalForecast::width.
Calculates the mean width of prediction intervals on the validation set. If
window is not NULL, a matrix of the rolling means of interval
width is also returned. If includemedian is TRUE, the
information of the median interval width will be returned.
avgWidth( object, level = 95, includemedian = FALSE, window = NULL, na.rm = FALSE )avgWidth( object, level = 95, includemedian = FALSE, window = NULL, na.rm = FALSE )
object |
An object of class |
level |
Target confidence level for prediction intervals. |
includemedian |
If |
window |
If not |
na.rm |
A logical indicating whether |
A list of class width with the following components:
width |
Forecast interval width as a multivariate time series, where the
|
mean |
Mean interval width across the validation set. |
rollmean |
If |
median |
Median interval width across the validation set. |
rollmedian |
If |
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1055 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1050, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_train, yvar = "y", index.vars = index.vars) # Conformal bootstrap prediction intervals (2-steps-ahead interval forecasts) set.seed(12345) pprModel_cb <- cb_cvforecast(object = pprModel, data = sim_data, yvar = "y", predictor.vars = index.vars, h = 2, ncal = 30, num.futures = 100, window = 1000) # Mean width of generated 95% conformal bootstrap prediction intervals width_data <- avgWidth(object = pprModel_cb) width_data$meanlibrary(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1055 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1050, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_train, yvar = "y", index.vars = index.vars) # Conformal bootstrap prediction intervals (2-steps-ahead interval forecasts) set.seed(12345) pprModel_cb <- cb_cvforecast(object = pprModel, data = sim_data, yvar = "y", predictor.vars = index.vars, h = 2, ncal = 30, num.futures = 100, window = 1000) # Mean width of generated 95% conformal bootstrap prediction intervals width_data <- avgWidth(object = pprModel_cb) width_data$mean
Compute prediction intervals by applying the single season block bootstrap method to subsets of time series data using a rolling forecast origin.
bb_cvforecast( object, data, yvar, neighbour = 0, predictor.vars, h = 1, season.period = 1, m = 1, num.futures = 1000, level = c(80, 95), forward = TRUE, initial = 1, window = NULL, roll.length = 1, exclude.trunc = NULL, recursive = FALSE, recursive_colNames = NULL, na.rm = TRUE, verbose = list(solver = FALSE, progress = FALSE), ... )bb_cvforecast( object, data, yvar, neighbour = 0, predictor.vars, h = 1, season.period = 1, m = 1, num.futures = 1000, level = c(80, 95), forward = TRUE, initial = 1, window = NULL, roll.length = 1, exclude.trunc = NULL, recursive = FALSE, recursive_colNames = NULL, na.rm = TRUE, verbose = list(solver = FALSE, progress = FALSE), ... )
object |
Fitted model object of class |
data |
Data set. Must be a data set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
predictor.vars |
A character vector of names of the predictor variables. |
h |
Forecast horizon. |
season.period |
Length of the seasonal period. |
m |
Multiplier. (Block size = |
num.futures |
Number of possible future sample paths to be generated. |
level |
Confidence level for prediction intervals. |
forward |
If |
initial |
Initial period of the time series where no cross-validation forecasting is performed. |
window |
Length of the rolling window. If |
roll.length |
Number of observations by which each rolling/expanding window should be rolled forward. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. (Since the nonlinear functions are estimated using splines, extrapolation is not desirable. Hence, if any predictor variable is treated non-linearly in the estimated model, will be truncated to be in the in-sample range before obtaining predictions. If any variables are listed here will be excluded from such truncation.) |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colNames |
If |
na.rm |
logical; if |
verbose |
A named list controlling verbosity options. Defaults to
|
... |
Other arguments not currently used. |
An object of class bb_cvforecast, which is a list that
contains following elements:
x |
The original time series. |
method |
A character string "bb_cvforecast". |
fit_times |
The number of times the model is fitted in cross-validation. |
mean |
Point forecasts as a multivariate time series, where the
|
res |
The matrix of in-sample residuals produced in cross-validation.
The number of rows corresponds to |
model_fit |
Models fitted in cross-validation. |
level |
The confidence values associated with the prediction intervals. |
lower |
A list containing
lower bounds for prediction intervals for each level. Each element within
the list will be a multivariate time series with the same dimensional
characteristics as |
upper |
A list containing upper bounds
for prediction intervals for each level. Each element within the list will
be a multivariate time series with the same dimensional characteristics as
|
possible_futures |
A list of matrices containing future sample paths generated at each cross-validation step. |
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1105 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + (0.35*x_lag_002 + 0.7*x_lag_005)^2 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1100, ] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") # Block bootstrap prediction intervals (3-steps-ahead interval forecasts) set.seed(12345) smimodel_ppr_bb <- bb_cvforecast(object = smimodel_ppr, data = sim_data, yvar = "y", predictor.vars = index.vars, h = 3, num.futures = 50, window = 1000) }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1105 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + (0.35*x_lag_002 + 0.7*x_lag_005)^2 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1100, ] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") # Block bootstrap prediction intervals (3-steps-ahead interval forecasts) set.seed(12345) smimodel_ppr_bb <- bb_cvforecast(object = smimodel_ppr, data = sim_data, yvar = "y", predictor.vars = index.vars, h = 3, num.futures = 50, window = 1000) }
Generates possible future sample paths by applying the single season block bootstrap method.
blockBootstrap( object, newdata, resids, preds, season.period = 1, m = 1, num.futures = 1000, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )blockBootstrap( object, newdata, resids, preds, season.period = 1, m = 1, num.futures = 1000, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )
object |
Fitted model object. |
newdata |
Test data set. Must be a data set of class |
resids |
In-sample residuals from the fitted model. |
preds |
Predictions for the test set (i.e. data for the forecast horizon). |
season.period |
Length of the seasonal period. |
m |
Multiplier. (Block size = |
num.futures |
Number of possible future sample paths to be generated. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default - FALSE). |
recursive_colRange |
If |
A matrix of simulated future sample paths.
Compute prediction intervals by applying the conformal bootstrap method to subsets of time series data using a rolling forecast origin.
cb_cvforecast( object, data, yvar, neighbour = 0, predictor.vars, h = 1, ncal = 100, num.futures = 1000, level = c(80, 95), forward = TRUE, initial = 1, window = NULL, roll.length = 1, exclude.trunc = NULL, recursive = FALSE, recursive_colNames = NULL, na.rm = TRUE, nacheck_frac_numerator = 2, nacheck_frac_denominator = 3, verbose = list(solver = FALSE, progress = FALSE), ... )cb_cvforecast( object, data, yvar, neighbour = 0, predictor.vars, h = 1, ncal = 100, num.futures = 1000, level = c(80, 95), forward = TRUE, initial = 1, window = NULL, roll.length = 1, exclude.trunc = NULL, recursive = FALSE, recursive_colNames = NULL, na.rm = TRUE, nacheck_frac_numerator = 2, nacheck_frac_denominator = 3, verbose = list(solver = FALSE, progress = FALSE), ... )
object |
Fitted model object of class |
data |
Data set. Must be a data set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
predictor.vars |
A character vector of names of the predictor variables. |
h |
Forecast horizon. |
ncal |
Length of a calibration window. |
num.futures |
Number of possible future sample paths to be generated in bootstrap. |
level |
Confidence level for prediction intervals. |
forward |
If |
initial |
Initial period of the time series where no cross-validation forecasting is performed. |
window |
Length of the rolling window. If |
roll.length |
Number of observations by which each rolling/expanding window should be rolled forward. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. (Since the nonlinear functions are estimated using splines, extrapolation is not desirable. Hence, if any predictor variable is treated non-linearly in the estimated model, will be truncated to be in the in-sample range before obtaining predictions. If any variables are listed here will be excluded from such truncation.) |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colNames |
If |
na.rm |
logical; if |
nacheck_frac_numerator |
Numerator of the fraction of non-missing values that is required in a test set. |
nacheck_frac_denominator |
Denominator of the fraction of non-missing values that is required in a test set. |
verbose |
A named list controlling verbosity options. Defaults to
|
... |
Other arguments not currently used. |
An object of class cb_cvforecast, which is a list that
contains following elements:
x |
The original time series. |
method |
A character string "cb_cvforecast". |
fit_times |
The number of times the model is fitted in cross-validation. |
mean |
Point forecasts as a multivariate time
series, where the |
error |
Forecast errors given by |
res |
The matrix of in-sample residuals produced in cross-validation. |
level |
The confidence levels associated with the prediction intervals. |
cal_times |
The number of calibration windows considered in cross-validation. |
num_cal |
The number of non-missing multi-step forecast errors in each calibration window. |
skip_cal |
An indicator vector indicating whether a calibration window is skipped without constructing prediction intervals due to missing model or missing data in the test set. |
lower |
A list containing lower bounds for prediction
intervals for each level. Each element within the list will be a
multivariate time series with the same dimensional characteristics as
|
upper |
A list containing upper bounds for prediction
intervals for each level. Each element within the list will be a
multivariate time series with the same dimensional characteristics as
|
possible_futures |
A list of matrices containing future sample paths generated at each calibration step. |
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1105 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + (0.35*x_lag_002 + 0.7*x_lag_005)^2 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1100, ] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") # Conformal bootstrap prediction intervals (3-steps-ahead interval forecasts) set.seed(12345) smimodel_ppr_cb <- cb_cvforecast(object = smimodel_ppr, data = sim_data, yvar = "y", predictor.vars = index.vars, h = 3, ncal = 30, num.futures = 100, window = 1000) }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1105 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + (0.35*x_lag_002 + 0.7*x_lag_005)^2 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1100, ] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") # Conformal bootstrap prediction intervals (3-steps-ahead interval forecasts) set.seed(12345) smimodel_ppr_cb <- cb_cvforecast(object = smimodel_ppr, data = sim_data, yvar = "y", predictor.vars = index.vars, h = 3, ncal = 30, num.futures = 100, window = 1000) }
Eliminates a specified variable and fits a nonparametric additive model with
remaining variables, and returns validation set MSE. This is an internal
function of the package, and designed to be called from
model_backward.
eliminate( ind, train, val, yvar, family = gaussian(), s.vars = NULL, s.basedim = NULL, linear.vars = NULL, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )eliminate( ind, train, val, yvar, family = gaussian(), s.vars = NULL, s.basedim = NULL, linear.vars = NULL, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )
ind |
An |
train |
The data set on which the model(s) will be trained. Must be a
data set of class |
val |
Validation data set. (The data set on which the model selection
will be performed.) Must be a data set of class |
yvar |
Name of the response variable as a character string. |
family |
A description of the error distribution and link function to be
used in the model (see |
s.vars |
A |
s.basedim |
Dimension of the bases used to represent the smooth terms
corresponding to |
linear.vars |
A |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
A numeric.
Returns forecasts and other information for nonparametric additive models with backward elimination.
## S3 method for class 'backward' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'backward' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1215 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Test set sim_test <- sim_data[1201:1210, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:8] # Model fitting backwardModel <- model_backward(data = sim_train, val.data = sim_val, yvar = "y", s.vars = s.vars) forecast(backwardModel, newdata = sim_test)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1215 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Test set sim_test <- sim_data[1201:1210, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:8] # Model fitting backwardModel <- model_backward(data = sim_train, val.data = sim_val, yvar = "y", s.vars = s.vars) forecast(backwardModel, newdata = sim_test)
Returns forecasts and other information for GAIMs.
## S3 method for class 'gaimFit' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'gaimFit' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictors taken as index variables index.vars <- colnames(sim_data)[3:7] # Assign group indices for each predictor index.ind = c(rep(1, 3), rep(2, 2)) # Predictors taken as non-linear variables not entering indices s.vars = "x_lag_005" # Model fitting gaimModel <- model_gaim(data = sim_train, yvar = "y", index.vars = index.vars, index.ind = index.ind, s.vars = s.vars) forecast(gaimModel, newdata = sim_test)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictors taken as index variables index.vars <- colnames(sim_data)[3:7] # Assign group indices for each predictor index.ind = c(rep(1, 3), rep(2, 2)) # Predictors taken as non-linear variables not entering indices s.vars = "x_lag_005" # Model fitting gaimModel <- model_gaim(data = sim_train, yvar = "y", index.vars = index.vars, index.ind = index.ind, s.vars = s.vars) forecast(gaimModel, newdata = sim_test)
Returns forecasts and other information for GAMs.
## S3 method for class 'gamFit' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'gamFit' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:6] # Predictors taken as linear variables linear.vars <- colnames(sim_data)[7:8] # Model fitting gamModel <- model_gam(data = sim_train, yvar = "y", s.vars = s.vars, linear.vars = linear.vars) forecast(gamModel, newdata = sim_test)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:6] # Predictors taken as linear variables linear.vars <- colnames(sim_data)[7:8] # Model fitting gamModel <- model_gam(data = sim_train, yvar = "y", s.vars = s.vars, linear.vars = linear.vars) forecast(gamModel, newdata = sim_test)
Returns forecasts and other information for PPR models.
## S3 method for class 'pprFit' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'pprFit' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_train, yvar = "y", index.vars = index.vars) forecast(pprModel, newdata = sim_test)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_train, yvar = "y", index.vars = index.vars) forecast(pprModel, newdata = sim_test)
Returns forecasts and other information for SMI models.
## S3 method for class 'smimodel' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'smimodel' forecast( object, h = 1, level = c(80, 95), newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
An object of class |
h |
Forecast horizon. |
level |
Confidence level for prediction intervals. |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
An object of class forecast. Here, it is a list containing the
following elements:
method |
The name of the forecasting method as a character string. |
model |
The fitted model. |
mean |
Point forecasts as a time series. |
residuals |
Residuals from the fitted model. |
fitted |
Fitted values (one-step forecasts). |
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") forecast(smimodel_ppr, newdata = sim_test) }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") forecast(smimodel_ppr, newdata = sim_test) }
Performs a greedy search over a given grid of penalty parameter combinations (lambda0, lambda2), and fits SMI model(s) with best (lowest validation set MSE) penalty parameter combination(s). If the optimal combination lies on the edge of the grid, the penalty parameters are adjusted by ±10%, and a second round of grid search is performed. If a grouping variable is used, penalty parameters are tuned separately for each individual model.
greedy_smimodel( data, val.data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, nlambda = 100, lambda.min.ratio = 1e-04, refit = TRUE, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE), parallel = FALSE, workers = NULL, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )greedy_smimodel( data, val.data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, nlambda = 100, lambda.min.ratio = 1e-04, refit = TRUE, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE), parallel = FALSE, workers = NULL, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )
data |
Training data set on which models will be trained. Must be a data
set of class |
val.data |
Validation data set. (The data set on which the penalty
parameter selection will be performed.) Must be a data set of class
|
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
nlambda |
The number of values for lambda0 (penalty parameter for L0 penalty) - default is 100. |
lambda.min.ratio |
Smallest value for lambda0, as a fraction of lambda0.max (data derived). |
refit |
Whether to refit the model combining training and validation
sets after parameter tuning. If |
M |
Big-M value used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
parallel |
The option to use parallel processing in fitting SMI models for different penalty parameter combinations. |
workers |
If |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
An object of class smimodel. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit contains a list with six elements:
initial |
A list of information of the model initialisation. (For
descriptions of the list elements see |
best |
A list of information of the final optimised model. (For
descriptions of the list elements see |
best_lambdas |
Selected penalty parameter combination. |
lambda0_seq |
Sequence of values for lambda0 used to construct the initial grid. |
lambda2_seq |
Sequence of values for lambda2 used to construct the initial grid. |
searched |
A |
The number of
rows of the tibble equals to the number of levels in the grouping
variable.
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1205 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smi_greedy <- greedy_smimodel(data = sim_train, val.data = sim_val, yvar = "y", index.vars = index.vars, initialise = "ppr", lambda.min.ratio = 0.1) # Best (optimised) fitted model smi_greedy$fit[[1]]$best # Selected penalty parameter combination smi_greedy$fit[[1]]$best_lambdas }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1205 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smi_greedy <- greedy_smimodel(data = sim_train, val.data = sim_val, yvar = "y", index.vars = index.vars, initialise = "ppr", lambda.min.ratio = 0.1) # Best (optimised) fitted model smi_greedy$fit[[1]]$best # Selected penalty parameter combination smi_greedy$fit[[1]]$best_lambdas }
Function to perform a greedy search over a given grid of penalty parameter
combinations (lambda0, lambda2), and fits a single SMI model with the best
(lowest validation set MSE) penalty parameter combination. If the optimal
combination lies on the edge of the grid, the penalty parameters are adjusted
by ±10%, and a second round of grid search is performed.This is a helper
function designed to be called from greedy_smimodel.
greedy.fit( data, val.data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, nlambda = 100, lambda.min.ratio = 1e-04, refit = TRUE, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE), parallel = FALSE, workers = NULL, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )greedy.fit( data, val.data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, nlambda = 100, lambda.min.ratio = 1e-04, refit = TRUE, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE), parallel = FALSE, workers = NULL, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )
data |
Training data set on which models will be trained. Must be a data
set of class |
val.data |
Validation data set. (The data set on which the penalty
parameter selection will be performed.) Must be a data set of class
|
yvar |
Name of the response variable as a character string. |
neighbour |
|
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
nlambda |
The number of values for lambda0 (penalty parameter for L0 penalty) - default is 100. |
lambda.min.ratio |
Smallest value for lambda0, as a fraction of lambda0.max (data derived). |
refit |
Whether to refit the model combining training and validation
sets after parameter tuning. If |
M |
Big-M value used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
parallel |
The option to use parallel processing in fitting SMI models for different penalty parameter combinations. |
workers |
If |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
A list that contains six elements:
initial |
A list of
information of the model initialisation. (For descriptions of the list
elements see |
best |
A list of
information of the final optimised model. (For descriptions of the list
elements see |
best_lambdas |
Selected penalty parameter combination. |
lambda0_seq |
Sequence of values for lambda0 used to construct the initial grid. |
lambda2_seq |
Sequence of values for lambda2 used to construct the initial grid. |
searched |
A |
Initialises index coefficient vector through linear regression or penalised linear regression.
init_alpha( Y, X, index.ind, init.type = "penalisedReg", lambda0 = 1, lambda2 = 1, M = 10 )init_alpha( Y, X, index.ind, init.type = "penalisedReg", lambda0 = 1, lambda2 = 1, M = 10 )
Y |
Column matrix of response. |
X |
Matrix of predictors entering indices. |
index.ind |
An |
init.type |
Type of initialisation for index coefficients.
( |
lambda0 |
If |
lambda2 |
If |
M |
If |
A list containing the following components:
alpha_init |
Normalised vector of index coefficients. |
alpha_nonNormalised |
Non-normalised (i.e. prior to normalising) vector of index coefficients. |
Iteratively updates index coefficients and non-linear functions using mixed
integer programming. (A helper function used within
update_smimodelFit; users are not expected to directly call
this function.)
inner_update( x, data, yvar, family = gaussian(), index.vars, s.vars, linear.vars, num_ind, dgz, alpha_old, lambda0 = 1, lambda2 = 1, M = 10, max.iter = 50, tol = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE) )inner_update( x, data, yvar, family = gaussian(), index.vars, s.vars, linear.vars, num_ind, dgz, alpha_old, lambda0 = 1, lambda2 = 1, M = 10, max.iter = 50, tol = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE) )
x |
Fitted |
data |
Training data set on which models will be trained. Should be a
|
yvar |
Name of the response variable as a character string. |
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
s.vars |
A |
linear.vars |
A |
num_ind |
Number of indices. |
dgz |
The |
alpha_old |
Current vector of index coefficients. |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for loss. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
A list containing following elements:
best_alpha |
The vector of best index coefficient estimates. |
min_loss |
Minimum value of the objective function(loss). |
index.ind |
An |
ind_pos |
A list that indicates which predictors belong to which index,
corresponding to |
X_new |
A matrix of selected predictor variables, corresponding to
|
Generates specified number of lagged variables of the given variable in the form of a tibble.
lag_matrix(variable, n = 10)lag_matrix(variable, n = 10)
variable |
Variable to be lagged. |
n |
Number of lags. The default value is |
A tibble.
library(dplyr) library(tibble) library(tidyr) # Adding lagged variables to an existing tibble set.seed(123) sim_data <- tibble(x_lag_000 = runif(100)) |> mutate(x_lag = lag_matrix(x_lag_000, 3)) |> unpack(x_lag, names_sep = "_")library(dplyr) library(tibble) library(tidyr) # Adding lagged variables to an existing tibble set.seed(123) sim_data <- tibble(x_lag_000 = runif(100)) |> mutate(x_lag = lag_matrix(x_lag_000, 3)) |> unpack(x_lag, names_sep = "_")
Calculates the value of the objective function (loss function) of the mixed integer program used to estimate a SMI model.
loss(Y, Yhat, alpha, lambda0, lambda2)loss(Y, Yhat, alpha, lambda0, lambda2)
Y |
Column matrix of response. |
Yhat |
Predicted value of the response. |
alpha |
Vector of index coefficients. |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
A numeric.
Point estimate accuracy measures
MAE(residuals, na.rm = TRUE, ...) MSE(residuals, na.rm = TRUE, ...) point_measuresMAE(residuals, na.rm = TRUE, ...) MSE(residuals, na.rm = TRUE, ...) point_measures
residuals |
A vector of residuals from either the validation or test data. |
na.rm |
If |
... |
Additional arguments for each measure. |
An object of class list of length 2.
For the individual functions (MAE, MSE), returns a single numeric
scalar giving the requested accuracy measure.
For the exported object point_measures, returns a named list of functions
that can be supplied to higher-level accuracy routines.
set.seed(123) ytrain <- rnorm(100) ytest <- rnorm(30) yhat <- ytest + rnorm(30, sd = 0.3) resid <- ytest - yhat MAE(resid) MSE(resid)set.seed(123) ytrain <- rnorm(100) ytest <- rnorm(30) yhat <- ytest + rnorm(30, sd = 0.3) resid <- ytest - yhat MAE(resid) MSE(resid)
gam object to a smimodelFit objectConverts a given object of class gam to an object of class
smimodelFit.
make_smimodelFit( x, data, yvar, neighbour, index.vars, index.ind, index.data, index.names, alpha, s.vars = NULL, linear.vars = NULL, lambda0 = NULL, lambda2 = NULL, M = NULL, max.iter = NULL, tol = NULL, tolCoefs = NULL, TimeLimit = NULL, MIPGap = NULL, NonConvex = NULL )make_smimodelFit( x, data, yvar, neighbour, index.vars, index.ind, index.data, index.names, alpha, s.vars = NULL, linear.vars = NULL, lambda0 = NULL, lambda2 = NULL, M = NULL, max.iter = NULL, tol = NULL, tolCoefs = NULL, TimeLimit = NULL, MIPGap = NULL, NonConvex = NULL )
x |
A fitted |
data |
The original training data set. |
yvar |
Name of the response variable as a character string. |
neighbour |
|
index.vars |
A |
index.ind |
An |
index.data |
A |
index.names |
A |
alpha |
A vector of index coefficients. |
s.vars |
A |
linear.vars |
A |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
An object of class smimodelFit, which is a list that contains
following elements:
alpha |
A sparse matrix of index coefficients vectors. Each column of the matrix corresponds to the index coefficient vector of each index. |
derivatives |
A |
var_y |
Name of the response variable. |
vars_index |
A |
vars_s |
A |
vars_linear |
A |
neighbour |
Number of neighbours of each key considered in model fitting. |
gam |
Fitted |
lambda0 |
L0 penalty parameter used for model fitting. |
lambda2 |
L2 penalty parameter used for model fitting. |
M |
Big-M value used in MIP. |
max.iter |
Maximum number of MIP iterations for a single round of index coefficients update. |
tol |
Tolerance for the objective function value (loss) used in solving MIP. |
tolCoefs |
Tolerance for coefficients used in updating index coefficients. |
TimeLimit |
Limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap used. |
Nonconvex |
The strategy used for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
Fits a nonparametric additive model, with simultaneous variable selection through a backward elimination procedure as proposed by Fan and Hyndman (2012).
model_backward( data, val.data, yvar, neighbour = 0, family = gaussian(), s.vars = NULL, s.basedim = NULL, linear.vars = NULL, refit = TRUE, tol = 0.001, parallel = FALSE, workers = NULL, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, verbose = FALSE )model_backward( data, val.data, yvar, neighbour = 0, family = gaussian(), s.vars = NULL, s.basedim = NULL, linear.vars = NULL, refit = TRUE, tol = 0.001, parallel = FALSE, workers = NULL, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, verbose = FALSE )
data |
Training data set on which models will be trained. Must be a data
set of class |
val.data |
Validation data set. (The data set on which the model
selection will be performed.) Must be a data set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
family |
A description of the error distribution and link function to be
used in the model (see |
s.vars |
A |
s.basedim |
Dimension of the bases used to represent the smooth terms
corresponding to |
linear.vars |
A |
refit |
Whether to refit the model combining training and validation
sets after model selection. If |
tol |
Tolerance for the ratio of relative change in validation set MSE, used in model selection. |
parallel |
Whether to use parallel computing in model selection or not. |
workers |
If |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
This function fits a nonparametric additive model formulated through Backward Elimination, as proposed by Fan and Hyndman (2012). The process starts with all predictors included in an additive model, and predictors are progressively omitted until the best model is obtained based on the validation set. Once the best model is obtained, the final model is re-fitted for the data set combining training and validation sets. For more details see reference.
An object of class backward. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is an
object of class gam. For details refer mgcv::gamObject.
Fan, S. & Hyndman, R.J. (2012). Short-Term Load Forecasting Based on a Semi-Parametric Additive Model. IEEE Transactions on Power Systems, 27(1), 134-141.doi:10.1109/TPWRS.2011.2162082.
model_smimodel, model_gaim,
model_ppr, model_gam, model_lm
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1205 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:8] # Model fitting backwardModel <- model_backward(data = sim_train, val.data = sim_val, yvar = "y", s.vars = s.vars) # Fitted model backwardModel$fit[[1]]library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1205 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:8] # Model fitting backwardModel <- model_backward(data = sim_train, val.data = sim_val, yvar = "y", s.vars = s.vars) # Fitted model backwardModel$fit[[1]]
A wrapper for cgaim::cgaim() enabling multiple GAIM models based on a
grouping variable. Currently does not support Constrained GAIM (CGAIM)s.
model_gaim( data, yvar, neighbour = 0, index.vars, index.ind, s.vars = NULL, linear.vars = NULL, verbose = FALSE, ... )model_gaim( data, yvar, neighbour = 0, index.vars, index.ind, s.vars = NULL, linear.vars = NULL, verbose = FALSE, ... )
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
index.vars |
A |
index.ind |
An |
s.vars |
A |
linear.vars |
A |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
... |
Other arguments not currently used. (Note that the arguments in
|
Group-wise Additive Index Model (GAIM) can be written in the form
where is the univariate
response, , are pre-specified non-overlapping subsets of
, and are the
corresponding index coefficients, is an unknown (possibly
nonlinear) component function, and is the random
error, which is independent of .
An object of class gaimFit. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is an
object of class cgaim. For details refer cgaim::cgaim().
model_smimodel, model_backward,
model_ppr, model_gam, model_lm
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictors taken as index variables index.vars <- colnames(sim_data)[3:7] # Assign group indices for each predictor index.ind = c(rep(1, 3), rep(2, 2)) # Predictors taken as non-linear variables not entering indices s.vars = "x_lag_005" # Model fitting gaimModel <- model_gaim(data = sim_data, yvar = "y", index.vars = index.vars, index.ind = index.ind, s.vars = s.vars) # Fitted model gaimModel$fit[[1]]library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictors taken as index variables index.vars <- colnames(sim_data)[3:7] # Assign group indices for each predictor index.ind = c(rep(1, 3), rep(2, 2)) # Predictors taken as non-linear variables not entering indices s.vars = "x_lag_005" # Model fitting gaimModel <- model_gaim(data = sim_data, yvar = "y", index.vars = index.vars, index.ind = index.ind, s.vars = s.vars) # Fitted model gaimModel$fit[[1]]
A wrapper for mgcv::gam() enabling multiple GAMs based on a grouping
variable.
model_gam( data, yvar, family = gaussian(), neighbour = 0, s.vars, s.basedim = NULL, linear.vars = NULL, verbose = FALSE, ... )model_gam( data, yvar, family = gaussian(), neighbour = 0, s.vars, s.basedim = NULL, linear.vars = NULL, verbose = FALSE, ... )
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
family |
A description of the error distribution and link function to be
used in the model (see |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
s.vars |
A |
s.basedim |
Dimension of the bases used to represent the smooth terms
corresponding to |
linear.vars |
A |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
... |
Other arguments not currently used. |
An object of class gamFit. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is an
object of class gam. For details refer mgcv::gamObject.
model_smimodel, model_backward,
model_gaim, model_ppr, model_lm
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:6] # Predictors taken as linear variables linear.vars <- colnames(sim_data)[7:8] # Model fitting gamModel <- model_gam(data = sim_data, yvar = "y", s.vars = s.vars, linear.vars = linear.vars) # Fitted model gamModel$fit[[1]]library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:6] # Predictors taken as linear variables linear.vars <- colnames(sim_data)[7:8] # Model fitting gamModel <- model_gam(data = sim_data, yvar = "y", s.vars = s.vars, linear.vars = linear.vars) # Fitted model gamModel$fit[[1]]
A wrapper for lm enabling multiple linear models based on a
grouping variable.
model_lm(data, yvar, neighbour = 0, linear.vars, verbose = FALSE, ...)model_lm(data, yvar, neighbour = 0, linear.vars, verbose = FALSE, ...)
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
linear.vars |
A character vector of names of the predictor variables. |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
... |
Other arguments not currently used. |
An object of class lmFit. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is
an object of class lm. For details refer stats::lm.
model_smimodel, model_backward,
model_gaim, model_ppr, model_gam
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictor variables linear.vars <- colnames(sim_data)[3:8] # Model fitting lmModel <- model_lm(data = sim_data, yvar = "y", linear.vars = linear.vars) # Fitted model lmModel$fit[[1]]library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Predictor variables linear.vars <- colnames(sim_data)[3:8] # Model fitting lmModel <- model_lm(data = sim_data, yvar = "y", linear.vars = linear.vars) # Fitted model lmModel$fit[[1]]
A wrapper for stats::ppr() enabling multiple PPR models based on a
grouping variable.
model_ppr( data, yvar, neighbour = 0, index.vars, num_ind = 5, verbose = FALSE, ... )model_ppr( data, yvar, neighbour = 0, index.vars, num_ind = 5, verbose = FALSE, ... )
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
index.vars |
A |
num_ind |
An |
verbose |
Logical; controls whether progress messages (model indices) are printed during fitting. Defaults to FALSE. |
... |
Other arguments not currently used. (For more information on other
arguments that can be passed, refer |
A Projection Pursuit Regression (PPR) model (Friedman & Stuetzle (1981)) is given by
where is the response,
is the -dimensional predictor vector,
,
are -dimensional projection vectors (or
vectors of "index coefficients"), 's are unknown nonlinear
functions, and is the random error.
An object of class pprFit. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit is an
object of class c("ppr.form", "ppr"). For details refer
stats::ppr().
Friedman, J. H. & Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association, 76, 817–823. doi:10.2307/2287576.
model_smimodel, model_backward,
model_gaim, model_gam, model_lm
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_data, yvar = "y", index.vars = index.vars) # Fitted model pprModel$fit[[1]]library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_data, yvar = "y", index.vars = index.vars) # Fitted model pprModel$fit[[1]]
Fits nonparametric multiple index model(s), with simultaneous predictor selection (hence "sparse") and predictor grouping. Possible to fit multiple SMI models based on a grouping variable.
model_smimodel( data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, lambda0 = 1, lambda2 = 1, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE) )model_smimodel( data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, lambda0 = 1, lambda2 = 1, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE) )
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
If multiple models are fitted: Number of neighbours of each
key (i.e. grouping variable) to be considered in model fitting to handle
smoothing over the key. Should be an |
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
Sparse Multiple Index (SMI) model is a semi-parametric model that can be written as
where is the univariate
response, is the model intercept, , are subsets of predictors
entering indices, is a vector of index
coefficients corresponding to the index , and is a
smooth nonlinear function (estimated by a penalised cubic regression
spline). The model also allows for predictors that do not enter any
indices, including covariates that relate to the response
through nonlinear functions , , and linear
covariates .
In the model formulation related to this implementation, both the number of
indices and the predictor grouping among indices are assumed to be
unknown prior to model estimation. Suppose we observe ,
along with a set of potential predictors,
, with each vector
containing predictors. This function
implements algorithmic variable selection for index variables (i.e.
predictors entering indices) of the SMI model by allowing for zero index
coefficients for predictors. Non-overlapping predictors among indices are
assumed (i.e. no predictor enters more than one index). For algorithmic
details see reference.
An object of class smimodel. This is a tibble with two
columns:
key |
The level of the grouping variable (i.e. key of the training data set). |
fit |
Information of the fitted model
corresponding to the |
Each row of the column fit contains a list with two elements:
initial |
A list of information of the model initialisation. (For
descriptions of the list elements see |
best |
A list of information of the final optimised model. (For
descriptions of the list elements see |
Palihawadana, N.K., Hyndman, R.J. & Wang, X. (2024). Sparse Multiple Index Models for High-Dimensional Nonparametric Forecasting. (Department of Econometrics and Business Statistics Working Paper Series 16/24).
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_data, yvar = "y", index.vars = index.vars, initialise = "ppr") # Best (optimised) fitted model smimodel_ppr$fit[[1]]$best }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_data, yvar = "y", index.vars = index.vars, initialise = "ppr") # Best (optimised) fitted model smimodel_ppr$fit[[1]]$best }
smimodelFit
Constructs an object of class smimodelFit using the information passed
to arguments.
new_smimodelFit( data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("additive", "linear", "userInput"), index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL )new_smimodelFit( data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("additive", "linear", "userInput"), index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL )
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
|
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is "additive", where the initial model
will be a nonparametric additive model. The other options are "linear" -
linear regression model (i.e. a special case single-index model, where the
initial values of the index coefficients are obtained through a linear
regression), and "userInput" - user specifies the initial model structure
(i.e. the number of indices and the placement of index variables among
indices) and the initial index coefficients through |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
A list of initial model information. For descriptions of the list
elements see make_smimodelFit).
Scales a coefficient vector of a particular index to have unit norm.
normalise_alpha(alpha)normalise_alpha(alpha)
alpha |
A vector of index coefficients. |
A numeric vector.
Generates possible future sample paths (multi-step) using residuals of a fitted benchmark model through recursive forecasting.
possibleFutures_benchmark( object, newdata, bootstraps, exclude.trunc = NULL, recursive_colRange )possibleFutures_benchmark( object, newdata, bootstraps, exclude.trunc = NULL, recursive_colRange )
object |
A fitted model object of the class |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
bootstraps |
Generated matrix of bootstrapped residual series. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive_colRange |
The range of column numbers in |
A list containing the following components:
firstFuture |
A
|
future_cols |
A list of multi-steps-ahead simulated futures, where
each list element corresponds to each 1-step-ahead simulated future in
|
smimodel residualsGenerates possible future sample paths (multi-step) using residuals of a
fitted smimodel through recursive forecasting.
possibleFutures_smimodel( object, newdata, bootstraps, exclude.trunc = NULL, recursive_colRange )possibleFutures_smimodel( object, newdata, bootstraps, exclude.trunc = NULL, recursive_colRange )
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
bootstraps |
Generated matrix of bootstrapped residual series. |
exclude.trunc |
The names of the predictor variables that should not be truncated for stable predictions as a character string. |
recursive_colRange |
The range of column numbers in |
A list containing the following components:
firstFuture |
A
|
future_cols |
A list of multi-steps-ahead simulated futures, where
each list element corresponds to each 1-step-ahead simulated future in
|
mgcv::gam
Gives recursive forecasts on a test set.
predict_gam( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )predict_gam( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
A tibble with forecasts on test set.
backward
Gives forecasts on a test set.
## S3 method for class 'backward' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'backward' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
A tsibble with forecasts on test set.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1215 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Test set sim_test <- sim_data[1201:1210, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:8] # Model fitting backwardModel <- model_backward(data = sim_train, val.data = sim_val, yvar = "y", s.vars = s.vars) predict(object = backwardModel, newdata = sim_test)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1215 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Validation set sim_val <- sim_data[1001:1200, ] # Test set sim_test <- sim_data[1201:1210, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:8] # Model fitting backwardModel <- model_backward(data = sim_train, val.data = sim_val, yvar = "y", s.vars = s.vars) predict(object = backwardModel, newdata = sim_test)
gaimFit
Gives forecasts on a test set.
## S3 method for class 'gaimFit' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'gaimFit' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
A tsibble with forecasts on test set.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictors taken as index variables index.vars <- colnames(sim_data)[3:7] # Assign group indices for each predictor index.ind = c(rep(1, 3), rep(2, 2)) # Predictors taken as non-linear variables not entering indices s.vars = "x_lag_005" # Model fitting gaimModel <- model_gaim(data = sim_train, yvar = "y", index.vars = index.vars, index.ind = index.ind, s.vars = s.vars) predict(object = gaimModel, newdata = sim_test)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictors taken as index variables index.vars <- colnames(sim_data)[3:7] # Assign group indices for each predictor index.ind = c(rep(1, 3), rep(2, 2)) # Predictors taken as non-linear variables not entering indices s.vars = "x_lag_005" # Model fitting gaimModel <- model_gaim(data = sim_train, yvar = "y", index.vars = index.vars, index.ind = index.ind, s.vars = s.vars) predict(object = gaimModel, newdata = sim_test)
gamFit
Gives forecasts on a test set.
## S3 method for class 'gamFit' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'gamFit' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
A tsibble with forecasts on test set.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:6] # Predictors taken as linear variables linear.vars <- colnames(sim_data)[7:8] # Model fitting gamModel <- model_gam(data = sim_train, yvar = "y", s.vars = s.vars, linear.vars = linear.vars) predict(object = gamModel, newdata = sim_test)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictors taken as non-linear variables s.vars <- colnames(sim_data)[3:6] # Predictors taken as linear variables linear.vars <- colnames(sim_data)[7:8] # Model fitting gamModel <- model_gam(data = sim_train, yvar = "y", s.vars = s.vars, linear.vars = linear.vars) predict(object = gamModel, newdata = sim_test)
lmFit
Gives forecasts on a test set.
## S3 method for class 'lmFit' predict(object, newdata, recursive = FALSE, recursive_colRange = NULL, ...)## S3 method for class 'lmFit' predict(object, newdata, recursive = FALSE, recursive_colRange = NULL, ...)
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
A tsibble with forecasts on test set.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictor variables linear.vars <- colnames(sim_data)[3:8] # Model fitting lmModel <- model_lm(data = sim_train, yvar = "y", linear.vars = linear.vars) predict(object = lmModel, newdata = sim_test)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Predictor variables linear.vars <- colnames(sim_data)[3:8] # Model fitting lmModel <- model_lm(data = sim_train, yvar = "y", linear.vars = linear.vars) predict(object = lmModel, newdata = sim_test)
pprFit
Gives forecasts on a test set.
## S3 method for class 'pprFit' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'pprFit' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
A tsibble with forecasts on test set.
library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_train, yvar = "y", index.vars = index.vars) predict(object = pprModel, newdata = sim_test)library(dplyr) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting pprModel <- model_ppr(data = sim_train, yvar = "y", index.vars = index.vars) predict(object = pprModel, newdata = sim_test)
smimodel
Gives forecasts on a test set.
## S3 method for class 'smimodel' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'smimodel' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
A tsibble with forecasts on test set.
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") predict(object = smimodel_ppr, newdata = sim_test) }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") predict(object = smimodel_ppr, newdata = sim_test) }
smimodelFit
Gives forecasts on a test set.
## S3 method for class 'smimodelFit' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )## S3 method for class 'smimodelFit' predict( object, newdata, exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL, ... )
object |
A |
newdata |
The set of new data on for which the forecasts are required
(i.e. test set; should be a |
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
... |
Other arguments not currently used. |
A tibble with forecasts on test set.
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") predict(object = smimodel_ppr$fit[[1]]$best, newdata = sim_test) }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1015 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Training set sim_train <- sim_data[1:1000, ] # Test set sim_test <- sim_data[1001:1010, ] # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_train, yvar = "y", index.vars = index.vars, initialise = "ppr") predict(object = smimodel_ppr$fit[[1]]$best, newdata = sim_test) }
Prepare a test data for recursive forecasting by appropriately removing existing (actual) values from a specified range of columns (lagged response columns) of the data set. Handles seasonal data with gaps.
prep_newdata(newdata, recursive_colRange)prep_newdata(newdata, recursive_colRange)
newdata |
Data set to be ared. Should be a |
recursive_colRange |
The range of column numbers (lagged response
columns) in |
A tibble.
backward objectThe default print method for a backward object.
## S3 method for class 'backward' print(x, ...)## S3 method for class 'backward' print(x, ...)
x |
A model object of class |
... |
Other arguments not currently used. |
No return value; called for side effects. Prints a summary of the fitted model(s) to console.
gaimFit objectThe default print method for a gaimFit object.
## S3 method for class 'gaimFit' print(x, ...)## S3 method for class 'gaimFit' print(x, ...)
x |
A model object of class |
... |
Other arguments not currently used. |
No return value; called for side effects. Prints a summary of the fitted model(s) to console.
pprFit objectThe default print method for a pprFit object.
## S3 method for class 'pprFit' print(x, ...)## S3 method for class 'pprFit' print(x, ...)
x |
A model object of class |
... |
Other arguments not currently used. |
No return value; called for side effects. Prints a summary of the fitted model(s) to console.
smimodel objectThe default print method for a smimodel object.
## S3 method for class 'smimodel' print(x, ...)## S3 method for class 'smimodel' print(x, ...)
x |
An object of class |
... |
Other arguments not currently used. |
No return value; called for side effects. Prints a summary of the fitted model(s) to console.
smimodelFit objectThe default print method for a smimodelFit object.
## S3 method for class 'smimodelFit' print(x, ...)## S3 method for class 'smimodelFit' print(x, ...)
x |
An object of class |
... |
Other arguments not currently used. |
No return value; called for side effects. Prints a summary of the fitted model to console.
Samples a block of specified size from a given series starting form a random point in the series.
randomBlock(series, block.size)randomBlock(series, block.size)
series |
A series from which a block should be sampled. |
block.size |
Size of the block to be sampled. |
A numeric vector.
Appropriately removes existing (actual) values from the specified column range (lagged response columns) of a given data set (typically a test set for which recursive forecasting is required).
remove_lags(data, recursive_colRange)remove_lags(data, recursive_colRange)
data |
Data set (a |
recursive_colRange |
The range of column numbers in |
A tibble.
Generates multiple replications of single season block bootstrap series.
residBootstrap(x, season.period = 1, m = 1, num.bootstrap = 1000)residBootstrap(x, season.period = 1, m = 1, num.bootstrap = 1000)
x |
A series of residuals from which bootstrap series to be generated. |
season.period |
Length of the seasonal period. |
m |
Multiplier. (Block size = |
num.bootstrap |
Number of bootstrap series to be generated. |
A matrix of bootstrapped series.
smimodel
Generates residuals from a fitted smimodel object.
## S3 method for class 'smimodel' residuals(object, ...)## S3 method for class 'smimodel' residuals(object, ...)
object |
A |
... |
Other arguments not currently used. |
A numeric vector of residuals.
if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_data, yvar = "y", index.vars = index.vars, initialise = "ppr") residuals(smimodel_ppr) }if(requireNamespace("gurobi", quietly = TRUE)){ library(dplyr) library(ROI) library(tibble) library(tidyr) library(tsibble) # Simulate data n = 1005 set.seed(123) sim_data <- tibble(x_lag_000 = runif(n)) |> mutate( # Add x_lags x_lag = lag_matrix(x_lag_000, 5)) |> unpack(x_lag, names_sep = "_") |> mutate( # Response variable y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 + rnorm(n, sd = 0.1), # Add an index to the data set inddd = seq(1, n)) |> drop_na() |> select(inddd, y, starts_with("x_lag")) |> # Make the data set a `tsibble` as_tsibble(index = inddd) # Index variables index.vars <- colnames(sim_data)[3:8] # Model fitting smimodel_ppr <- model_smimodel(data = sim_data, yvar = "y", index.vars = index.vars, initialise = "ppr") residuals(smimodel_ppr) }
Scales the columns of the data corresponding to index.vars.
scaling(data, index.vars)scaling(data, index.vars)
data |
Training data set on which models will be trained. Should be a
|
index.vars |
A character vector of names of the predictor variables for which indices should be estimated. |
A list containing the following components:
scaled_data |
The
scaled data set of class |
scaled_info |
A named
|
Generates a single replication of single season block bootstrap series.
seasonBootstrap(x, season.period = 1, m = 1)seasonBootstrap(x, season.period = 1, m = 1)
x |
A series of residuals from which bootstrap series to be generated. |
season.period |
Length of the seasonal period. |
m |
Multiplier. (Block size = |
A numeric vector.
Fits a single nonparametric multiple index model to the data. This is a
helper function designed to be called from user-facing wrapper functions,
model_smimodel and greedy_smimodel.
smimodel.fit( data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, lambda0 = 1, lambda2 = 1, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE) )smimodel.fit( data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, lambda0 = 1, lambda2 = 1, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE) )
data |
Training data set on which models will be trained. Must be a data
set of class |
yvar |
Name of the response variable as a character string. |
neighbour |
|
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
A list with two elements:
initial |
A list of information of the
model initialisation. (For descriptions of the list elements see
|
best |
A list of information of the
final optimised model. (For descriptions of the list elements see
|
Splits a given number of predictors into a given number of indices.
split_index(num_pred, num_ind)split_index(num_pred, num_ind)
num_pred |
Number of predictors. |
num_ind |
Number of indices. |
A list containing the following components:
index |
An
|
index_positions |
A list of length = |
Truncates predictors to be in the in-sample range to avoid spline extrapolation.
truncate_vars(range.object, data, cols.trunc)truncate_vars(range.object, data, cols.trunc)
range.object |
A matrix containing range of each predictor variable. Should be a matrix with two rows for min and max, and the columns should correspond to variables. |
data |
Out-of-sample data set of which variables should be truncated. |
cols.trunc |
Column names of the variables to be truncated. |
Fits a nonparametric multiple index model to the data for a given combination
of the penalty parameters (lambda0, lambda2), and returns the validation set
mean squared error (MSE). (Used within greedy.fit; users are
not expected to use this function directly.)
tune_smimodel( data, val.data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, lambda.comb = c(1, 1), M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE), exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )tune_smimodel( data, val.data, yvar, neighbour = 0, family = gaussian(), index.vars, initialise = c("ppr", "additive", "linear", "multiple", "userInput"), num_ind = 5, num_models = 5, seed = 123, index.ind = NULL, index.coefs = NULL, s.vars = NULL, linear.vars = NULL, lambda.comb = c(1, 1), M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE), exclude.trunc = NULL, recursive = FALSE, recursive_colRange = NULL )
data |
Training data set on which models will be trained. Must be a data
set of class |
val.data |
Validation data set. (The data set on which the penalty
parameter selection will be performed.) Must be a data set of class
|
yvar |
Name of the response variable as a character string. |
neighbour |
|
family |
A description of the error distribution and link function to be
used in the model (see |
index.vars |
A |
initialise |
The model structure with which the estimation process
should be initialised. The default is |
num_ind |
If |
num_models |
If |
seed |
If |
index.ind |
If |
index.coefs |
If |
s.vars |
A |
linear.vars |
A |
lambda.comb |
A |
M |
Big-M value used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
exclude.trunc |
The names of the predictor variables that should not be
truncated for stable predictions as a character string. (Since the
nonlinear functions are estimated using splines, extrapolation is not
desirable. Hence, if any predictor variable in |
recursive |
Whether to obtain recursive forecasts or not (default -
|
recursive_colRange |
If |
A numeric.
smimodel
Transforms back the index coefficients to suit original-scale index variables
if the same were scaled when estimating the smimodel (happens in
initialise = "ppr" in model_smimodel or
greedy_smimodel). Users are not expected to directly use this
function; usually called within smimodel.fit.
unscaling(object, scaledInfo)unscaling(object, scaledInfo)
object |
A |
scaledInfo |
The list returned from a call of the function
|
A smimodel object.
Updates index coefficients by solving a mixed integer program.
update_alpha( Y, X, num_pred, num_ind, index.ind, dgz, alpha_old, lambda0 = 1, lambda2 = 1, M = 10, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = FALSE )update_alpha( Y, X, num_pred, num_ind, index.ind, dgz, alpha_old, lambda0 = 1, lambda2 = 1, M = 10, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = FALSE )
Y |
Column matrix of response. |
X |
Matrix of predictors (size adjusted to number of indices). |
num_pred |
Number of predictors. |
num_ind |
Number of indices. |
index.ind |
An integer vector that assigns group index for each predictor. |
dgz |
The |
alpha_old |
Vector of index coefficients from previous iteration. |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
The option to print detailed solver output. |
A vector of normalised index coefficients.
smimodelFit
Optimises and updates a given smimodelFit.
update_smimodelFit( object, data, lambda0 = 1, lambda2 = 1, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE), ... )update_smimodelFit( object, data, lambda0 = 1, lambda2 = 1, M = 10, max.iter = 50, tol = 0.001, tolCoefs = 0.001, TimeLimit = Inf, MIPGap = 1e-04, NonConvex = -1, verbose = list(solver = FALSE, progress = FALSE), ... )
object |
A |
data |
Training data set on which models will be trained. Must be a data
set of class |
lambda0 |
Penalty parameter for L0 penalty. |
lambda2 |
Penalty parameter for L2 penalty. |
M |
Big-M value to be used in MIP. |
max.iter |
Maximum number of MIP iterations performed to update index coefficients for a given model. |
tol |
Tolerance for the objective function value (loss) of MIP. |
tolCoefs |
Tolerance for coefficients. |
TimeLimit |
A limit for the total time (in seconds) expended in a single MIP iteration. |
MIPGap |
Relative MIP optimality gap. |
NonConvex |
The strategy for handling non-convex quadratic objectives or non-convex quadratic constraints in Gurobi solver. |
verbose |
A named list controlling verbosity options. Defaults to
|
... |
Other arguments not currently used. |
A list of optimised model information. For descriptions of the list
elements see make_smimodelFit).