Importance of variable v
is measured as drop in performance
by permuting the values of v
, see Fisher et al. 2018 (reference below).
light_importance(x, ...)
# Default S3 method
light_importance(x, ...)
# S3 method for class 'flashlight'
light_importance(
x,
data = x$data,
by = x$by,
type = c("permutation", "shap"),
v = NULL,
n_max = Inf,
seed = NULL,
m_repetitions = 1L,
metric = x$metrics[1L],
lower_is_better = TRUE,
use_linkinv = FALSE,
...
)
# S3 method for class 'multiflashlight'
light_importance(x, ...)
An object of class "flashlight" or "multiflashlight".
Further arguments passed to light_performance()
.
An optional data.frame
.
An optional vector of column names used to additionally group the results.
Type of importance: "permutation" (currently the only option).
Vector of variable names to assess importance for.
Defaults to all variables in data
except "by" and "y".
Maximum number of rows to consider.
An integer random seed used to select and shuffle rows.
Number of permutations. Defaults to 1. A value above 1 provides more stable estimates of variable importance and allows the calculation of standard errors measuring the uncertainty from permuting.
An optional named list of length one with a metric as element.
Defaults to the first metric in the flashlight. The metric needs to be a function
with at least four arguments: actual, predicted, case weights w and ...
.
Logical flag indicating if lower values in the metric
are better or not. If set to FALSE
, the increase in metric is multiplied by -1.
Should retransformation function be applied? Default is FALSE
.
An object of class "light_importance" with the following elements:
data
A tibble with results.
by
Same as input by
.
type
Same as input type
. For information only.
The minimum required elements in the (multi-)flashlight are "y", "predict_function", "model", "data" and "metrics".
light_importance(default)
: Default method not implemented yet.
light_importance(flashlight)
: Variable importance for a flashlight.
light_importance(multiflashlight)
: Variable importance for a multiflashlight.
Fisher A., Rudin C., Dominici F. (2018). All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance. Arxiv.
fit_part <- lm(Sepal.Length ~ Species + Petal.Length, data = iris)
fl_part <- flashlight(
model = fit_part, label = "part", data = iris, y = "Sepal.Length"
)
# No effect of some variables (incl. standard errors)
plot(light_importance(fl_part, m_repetitions = 4), fill = "chartreuse4")
# Second model includes all variables
fit_full <- lm(Sepal.Length ~ ., data = iris)
fl_full <- flashlight(
model = fit_full, label = "full", data = iris, y = "Sepal.Length"
)
fls <- multiflashlight(list(fl_part, fl_full))
plot(light_importance(fls), fill = "chartreuse4")
plot(light_importance(fls, by = "Species"))