Impute missing values on newdata
based on an object of class "missRanger".
For multivariate imputation, use missRanger(..., keep_forests = TRUE)
.
For univariate imputation, no forests are required.
This can be enforced by predict(..., iter = 0)
or via missRanger(. ~ 1, ...)
.
Note that out-of-sample imputation works best for rows in newdata
with only one
missing value (counting only missings in variables used as covariates
in random forests). We call this the "easy case". In the "hard case",
even multiple iterations (set by iter
) can lead to unsatisfactory results.
Usage
# S3 method for class 'missRanger'
predict(
object,
newdata,
pmm.k = object$pmm.k,
iter = 4L,
num.threads = NULL,
seed = NULL,
verbose = 1L,
...
)
Arguments
- object
'missRanger' object.
- newdata
A
data.frame
with missing values to impute.- pmm.k
Number of candidate predictions of the original dataset for predictive mean matching (PMM). By default the same value as during fitting.
- iter
Number of iterations for "hard case" rows. 0 for univariate imputation.
- num.threads
Number of threads used by ranger's predict function. The default
NULL
uses all threads.- seed
Integer seed used for initial univariate imputation and PMM.
- verbose
Should info be printed? (1 = yes/default, 0 for no).
- ...
Passed to the predict function of ranger.
Details
The out-of-sample algorithm works as follows:
Impute univariately all relevant columns by randomly drawing values from the original unimputed data. This step will only impact "hard case" rows.
Replace univariate imputations by predictions of random forests. This is done sequentially over variables, where the variables are sorted to minimize the impact of univariate imputations. Optionally, this is followed by predictive mean matching (PMM).
Repeat Step 2 for "hard case" rows multiple times.
Examples
iris2 <- generateNA(iris, seed = 20, p = c(Sepal.Length = 0.2, Species = 0.1))
imp <- missRanger(iris2, pmm.k = 5, num.trees = 100, keep_forests = TRUE, seed = 2)
#> Missing value imputation by random forests
#>
#> Variables to impute: Species, Sepal.Length
#> Variables used to impute: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species
#>
#> iter 1
#>
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
#> iter 2
#>
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
#> iter 3
#>
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
#> iter 4
#>
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
predict(imp, head(iris2), seed = 3)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.4 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.1 3.6 1.4 0.2 setosa
#> 6 6.0 3.9 1.7 0.4 setosa