missRanger 2.6.2
Maintenance
- Update code coverage version #86.
missRanger 2.6.1
CRAN release: 2024-12-07
Improvement
Solves an incompatibility with the {formula.tools} package. formula.tools:::as.character.formula() breaks base::as.character() for formulas, which prevented {missRanger} from working, see also https://github.com/decisionpatterns/formula.tools/issues/11. We have added a workaround in #81.
missRanger 2.6.0
CRAN release: 2024-08-17
Major bug fix
Fixes a major bug, by which responses would be used as covariates in the random forests. Thanks for reporting @flystar233, see #78. You can expect different and better imputations.
Major feature
Out-of-sample application is now possible! Thanks to @jeandigitale for pushing the idea in #58.
This means you can run imp <- missRanger(..., keep_forests = TRUE) and then apply its models to new data via predict(imp, newdata). The “missRanger” object can be saved/loaded as binary file, e.g, via saveRDS()/readRDS() for later use.
Note that out-of-sample imputation works best for rows in newdata with only one missing value (counting only missings in variables used as covariates in random forests). We call this the “easy case”. In the “hard case”, even multiple iterations (set by iter) can lead to unsatisfactory results.
The out-of-sample algorithm works as follows:
- Impute univariately all relevant columns by randomly drawing values from the original unimputed data. This step will only impact “hard case” rows.
- Replace univariate imputations by predictions of random forests. This is done sequentially over variables, where the variables are sorted to minimize the impact of univariate imputations. Optionally, this is followed by predictive mean matching (PMM).
- Repeat Step 2 for “hard case” rows multiple times.
Possibly breaking changes
- Columns of special type like date/time can’t be imputed anymore. You will need to convert them to numeric before imputation.
-
pmm()is more picky:xtrainandxtestmust both be either numeric, logical, or factor (with identical levels).
Other changes
- Now requires ranger >= 0.16.0.
- More compact vignettes.
- Better examples and README.
- Many relevant
ranger()arguments are now explicit arguments inmissRanger()to improve tab-completion experience:- num.trees = 500
- mtry = NULL
- min.node.size = NULL
- min.bucket = NULL
- max.depth = NULL
- replace = TRUE
- sample.fraction = if (replace) 1 else 0.632
- case.weights = NULL
- num.threads = NULL
- save.memory = FALSE
- For variables that can’t be used, more information is printed.
- If
keep_forests = TRUE, the argumentdata_onlyis set toFALSEby default. - “missRanger” object now stores
pmm.k. -
verboseargument is passed toranger()as well.
missRanger 2.5.0
CRAN release: 2024-07-12
Bug fixes
- Since Release 2.3.0, unintentionally, negative formula terms haven’t been dropped, see #62. This is fixed now.
missRanger 2.4.0
CRAN release: 2023-11-19
Future Output API
- New argument
data_only = TRUEto control if only the imputed data should be returned (default), or an object of class “missRanger”. This object contains the imputed data and infos like OOB prediction errors, fixing #28. The valueFALSEwill later becoming the default in {missRanger 3.0.0}. This will be announced via deprecation cycle.
Enhancements
- New argument
keep_forests = FALSE. Should the random forests of the best iteration (the one that generated the final imputed data) be added to the “missRanger” object? Note that this will use a lot of memory. Only relevant ifdata_only = FALSE. This solves #54.
missRanger 2.3.0
CRAN release: 2023-10-20
Major improvements
-
missRanger()now works with syntactically wrong variable names like “1bad:variable”. This solves an old issue, recently popping up in this new issue. -
missRanger()now works with any number of features, as long as the formula is left at its default, i.e.,. ~ .. This solves this issue.
missRanger 2.2.1
CRAN release: 2023-04-28
- Switch from
importFromto::code style - Documentation improved
missRanger 2.2.0
CRAN release: 2023-03-24
missRanger 2.1.5 (not on CRAN)
Maintenance release,
- switching to testthat 3,
- changing the package structure, and
- bringing vignettes into right order.
missRanger 2.1.1
CRAN release: 2021-03-20
Minor changes
- Allow the use of “mtry” as suggested by Thomas Lumley. Recommended values are NULL (default), 1 or a function of the number of covariables m, e.g.
mtry = function(m) max(1, m %/% 3). Keep in mind thatmissRanger()might use a growing set of covariables in the first iteration of the process, so passingmtry = 2might result in an error.
missRanger 2.1.0
CRAN release: 2019-06-30
This is a summary of all changes since version 1.x.x.
Major changes
missRangernow also imputes and uses logical variables, character variables and further variables of mode numeric like dates and times.-
Added formula interface to specify which variables to impute (those on the left hand side) and those used to do so (those on the right hand side). Here some (pseudo) examples:
. ~ .(default): Use all variables to impute all variables. Note that only those with missing values will be imputed. Variables without missings will only be used to impute others.. ~ . - ID: Use all variables exceptIDto impute all missing values.Species ~ Sepal.Width: UseSepal.Widthto imputeSpecies. Only works ifSepal.Widthdoes not contain missing values. (Add it to the right hand side if it does.)Species + Sepal.Length ~ Species + Petal.Length: UseSpeciesandPetal.Lengthto imputeSpeciesandSepal.Length. Only works ifPetal.Lengthdoes not contain missing values because it does not appear on the left hand side and is therefore not imputed itself.. ~ 1: Univariate imputation for all relevant columns (as nothing is selected on the right hand side).
The first argument of
generateNAis calledxinstead ofdatain consistency withimputeUnivariate.imputeUnivariatenow also works for data frames and matrices.In PMM mode,
missRangerrelies on OOB predictions. The smaller the value ofnum.trees, the higher the risk of missing OOB predictions, which caused an error in PMM. Now,pmmallows for missing values inxtrainorytrain. Thus, the algorithm will even work withnum.trees = 1. This will be useful to impute large data sets with PMM.
Minor changes
The function
imputeUnivariatehas received aseedargument.The function
imputeUnivariatehas received avargument, specifying columns to impute.The function
generateNAoffers now the possibility to use different proportions of missings for each column.If
verboseis not 0, thenmissRangerwill show which variables will be imputed in which order and which variables will be used for imputation.
