Dataset Shift via Resampling (Prediction) Uncertainty

Test for no adverse shift via prediction uncertainty for two-sample comparison. The scores are out-of-bag predictions from random forests with the package ranger. The prefix rue stands for resampling uncertainty, the relevant notion of outlier. This uncertainty is the standard error of the mean predictions. This assumes that both training and test sets are labeled.

rue_pt(
  x_train,
  x_test,
  R = 1000,
  sub_ratio = 1/2,
  num_trees = 500L,
  response_name = "label"
)

Arguments

x_train	Training sample.
x_test	Test sample.
R	The number of permutations. May be ignored.
sub_ratio	Subsampling ratio for sample splitting. May be ignored.
num_trees	The number of trees in random forests.
response_name	The column name of the categorical outcome to predict.

Value

A named list or object of class outlier.test containing:

statistic: observed WAUC statistic
seq_mct: sequential Monte Carlo test, if applicable
p_value: p-value
outlier_scores: outlier scores from training and test set

Details

The empirical null distribution uses R permutations to estimate the p-value. For speed, this is implemented as a sequential Monte Carlo test with the simctest package. See Gandy (2009) for details. The suffix pt refers to permutation test. It does not use the asymptotic (theoretical) null distribution for the weighted AUC (WAUC), the test statistic. This is the recommended approach for small samples.

Notes

For resampling uncertainty, we essentially implement the approach in Schulam & Saria (2019) with random forests. The standard errors of the mean predictions are the underlying scores. Any performant method for confidence-based out-of-distribution detection can replace random forests, the default in this implementation.

References

Kamulete, V. M. (2021). Test for non-negligible adverse shifts. arXiv preprint arXiv:2107.02990.

Schulam, P., & Saria, S. (2019, April). Can you trust this prediction? Auditing pointwise reliability after learning. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 1022-1031). PMLR.

Berger, C., Paschali, M., Glocker, B., & Kamnitsas, K. (2021). Confidence-based Out-of-Distribution Detection: A Comparative Study and Analysis. arXiv preprint arXiv:2107.02568.

Gandy, A. (2009). Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. Journal of the American Statistical Association, 104(488), 1504-1511.

Li, J., & Fine, J. P. (2010). Weighted area under the receiver operating characteristic curve and its application to gene selection. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59(4), 673-692.

Examples

# \donttest{
library(dsos)
set.seed(12345)
data(iris)
idx <- sample(nrow(iris), 2 / 3 * nrow(iris))
xy_train <- iris[idx, ]
xy_test <- iris[-idx, ]
iris_test <- rue_pt(xy_train, xy_test, response_name = "Species")
str(iris_test)
#> List of 4
#>  $ seq_mct       :Formal class 'sampalgontheflyres' [package "simctest"] with 10 slots
#>   .. ..@ porig  : num [1:55] 8.58e-06 3.24e-05 7.86e-05 1.60e-04 2.99e-04 ...
#>   .. ..@ U      : int [1:500] 47 47 47 47 47 48 48 48 48 48 ...
#>   .. ..@ L      : int [1:500] 7 7 7 8 8 8 8 8 8 8 ...
#>   .. ..@ ind    : num 499
#>   .. ..@ preverr: num [1:2] 0.000494 0.000499
#>   .. ..@ p.value: num NA
#>   .. ..@ steps  : int 1000
#>   .. ..@ pos    : num 48
#>   .. ..@ alg    :Formal class 'sampalgonthefly' [package "simctest"] with 1 slot
#>   .. .. .. ..@ internal:<environment: 0x000001d2c90a1880> 
#>   .. ..@ gen    :function ()  
#>   .. .. ..- attr(*, "srcref")= 'srcref' int [1:8] 6 14 9 3 14 3 6 9
#>   .. .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x000001d2d753abe0> 
#>  $ statistic     : num 0.345
#>  $ p_value       : num 0.048
#>  $ outlier_scores:List of 2
#>   ..$ train: num [1:100] 0.00567 0.00299 0.00538 0 0 ...
#>   ..$ test : num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
#>  - attr(*, "class")= chr "outlier.test"
# }