This function provides a list with in- and out-of-sample indices per fold used for time series k-fold cross-validation, see Details.

create_timefolds(y, k = 5L, use_names = TRUE, type = c("extending", "moving"))

Arguments

y

Any vector of the same length as the data intended to split.

k

Number of folds.

use_names

Should folds be named? Default is TRUE.

type

Should in-sample data be "extending" over the folds (default) or consist of one single fold ("moving")?

Value

A nested list with in-sample and out-of-sample indices per fold.

Details

The data is first partitioned into \(k+1\) sequential blocks \(B_1\) to \(B_{k+1}\). Each fold consists of two index vectors: one with in-sample row numbers, the other with out-of-sample row numbers. The first fold uses \(B_1\) as in-sample and \(B_2\) as out-of-sample data. The second one uses either \(B_2\) (if type = "moving") or \(\{B_1, B_2\}\) (if type = "extending") as in-sample, and \(B_3\) as out-of-sample data etc. Finally, the kth fold uses \(\{B_1, ..., B_k\}\) ("extending") or \(B_k\) ("moving") as in-sample data, and \(B_{k+1}\) as out-of-sample data. This makes sure that out-of-sample data always follows in-sample data.

Examples

y <- runif(100)
create_timefolds(y)
#> $Fold1
#> $Fold1$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
#> 
#> $Fold1$outsample
#>  [1] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#> 
#> 
#> $Fold2
#> $Fold2$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34
#> 
#> $Fold2$outsample
#>  [1] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
#> 
#> 
#> $Fold3
#> $Fold3$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51
#> 
#> $Fold3$outsample
#>  [1] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> 
#> 
#> $Fold4
#> $Fold4$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> 
#> $Fold4$outsample
#>  [1] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
#> 
#> 
#> $Fold5
#> $Fold5$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
#> [76] 76 77 78 79 80 81 82 83 84 85
#> 
#> $Fold5$outsample
#>  [1]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
#> 
#> 
create_timefolds(y, use_names = FALSE)
#> [[1]]
#> [[1]]$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
#> 
#> [[1]]$outsample
#>  [1] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#> 
#> 
#> [[2]]
#> [[2]]$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34
#> 
#> [[2]]$outsample
#>  [1] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
#> 
#> 
#> [[3]]
#> [[3]]$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51
#> 
#> [[3]]$outsample
#>  [1] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> 
#> 
#> [[4]]
#> [[4]]$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> 
#> [[4]]$outsample
#>  [1] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
#> 
#> 
#> [[5]]
#> [[5]]$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
#> [76] 76 77 78 79 80 81 82 83 84 85
#> 
#> [[5]]$outsample
#>  [1]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
#> 
#> 
create_timefolds(y, use_names = FALSE, type = "moving")
#> [[1]]
#> [[1]]$insample
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
#> 
#> [[1]]$outsample
#>  [1] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#> 
#> 
#> [[2]]
#> [[2]]$insample
#>  [1] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#> 
#> [[2]]$outsample
#>  [1] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
#> 
#> 
#> [[3]]
#> [[3]]$insample
#>  [1] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
#> 
#> [[3]]$outsample
#>  [1] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> 
#> 
#> [[4]]
#> [[4]]$insample
#>  [1] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> 
#> [[4]]$outsample
#>  [1] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
#> 
#> 
#> [[5]]
#> [[5]]$insample
#>  [1] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
#> 
#> [[5]]$outsample
#>  [1]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
#> 
#>