Title: | Robust Nonparametric Two-Sample Tests for Location/Scale |
---|---|
Description: | Implementations of several robust nonparametric two-sample tests for location or scale differences. The test statistics are based on robust location and scale estimators, e.g. the sample median or the Hodges-Lehmann estimators as described in Fried & Dehling (2011) <doi:10.1007/s10260-011-0164-1>. The p-values can be computed via the permutation principle, the randomization principle, or by using the asymptotic distributions of the test statistics under the null hypothesis, which ensures (approximate) distribution independence of the test decision. To test for a difference in scale, we apply the tests for location difference to transformed observations; see Fried (2012) <doi:10.1016/j.csda.2011.02.012>. Random noise on a small range can be added to the original observations in order to hold the significance level on data from discrete distributions. The location tests assume homoscedasticity and the scale tests require the location parameters to be zero. |
Authors: | Sermad Abbas [aut, cre] , Barbara Brune [aut] , Roland Fried [aut] |
Maintainer: | Sermad Abbas <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.0 |
Built: | 2024-11-23 04:18:58 UTC |
Source: | https://github.com/s-abbas/robnptests |
hl1_test
performs a two-sample location test based on
the difference of the one-sample Hodges-Lehmann estimators of both samples.
hl1_test( x, y, alternative = c("two.sided", "greater", "less"), delta = ifelse(scale.test, 1, 0), method = c("asymptotic", "permutation", "randomization"), scale = c("S1", "S2"), n.rep = 10000, na.rm = FALSE, scale.test = FALSE, wobble = FALSE, wobble.seed = NULL )
hl1_test( x, y, alternative = c("two.sided", "greater", "less"), delta = ifelse(scale.test, 1, 0), method = c("asymptotic", "permutation", "randomization"), scale = c("S1", "S2"), n.rep = 10000, na.rm = FALSE, scale.test = FALSE, wobble = FALSE, wobble.seed = NULL )
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less". |
delta |
a numeric value indicating the true difference in the location or
scale parameter, depending on whether the test should be performed
for a difference in location or in scale. The default is
|
method |
a character string specifying how the p-value is computed with
possible values |
scale |
a character string specifying the scale estimator used for standardization
of the test statistic; must be one of |
n.rep |
an integer value specifying the number of random splits used to
calculate the randomization distribution if |
na.rm |
a logical value indicating whether NA values in |
scale.test |
a logical value to specify if the samples should be compared
for a difference in scale. The default is |
wobble |
a logical value indicating whether the sample should be checked for
duplicated values that can cause the scale estimate to be zero.
If such values are present, uniform noise is added to the sample,
see |
wobble.seed |
an integer value used as a seed for the random number
generation in case of |
The test statistic for this test is based on the difference of the
one-sample Hodges-Lehmann estimators of x
and y
, see
hodges_lehmann
. Three versions
of the test are implemented: randomization, permutation, and asymptotic.
The test statistic for the permutation and randomization version of the test is standardized using a robust scale estimator, see (Fried and Dehling 2011).
With scale = "S1"
, the scale is estimated by
whereas scale = "S2"
uses
Here,
is the median-corrected sample.
The randomization distribution is based on randomly drawn splits with
replacement. The function permp
(Phipson and Smyth 2010)
is used to calculate the p-value. For the asymptotic test, a transformed version
of the difference of the HL1-estimators, which asymptotically follows a
normal distribution, is used. For more details on the asymptotic test, see
Fried and Dehling (2011).
For scale.test = TRUE
, the test compares the two samples for a difference
in scale. This is achieved by log-transforming the original squared observations,
i.e. x
is replaced by log(x^2)
and y
by log(y^2)
.
A potential scale difference then appears as a location difference between
the transformed samples, see Fried (2012).
Note that the samples need to have equal locations. The sample should not
contain zeros to prevent problems with the necessary log-transformation. If
it contains zeros, uniform noise is added to all variables in order to remove
zeros and a message is printed.
If the sample has been modified (either because of zeros if scale.test = TRUE
or wobble = TRUE
), the modified samples can be retrieved using
set.seed(wobble.seed); wobble(x, y)
.
Both samples need to contain at least 5 non-missing values.
A named list with class "htest
" containing the following components:
statistic |
the value of the test statistic. |
p.value |
the p-value for the test. |
estimate |
the one-sample Hodges-Lehmann estimates of |
null.value |
the specified hypothesized value of the mean difference/squared scale ratio. |
alternative |
a character string describing the alternative hypothesis. |
method |
a character string indicating how the p-value was computed. |
data.name |
a character string giving the names of the data. |
Phipson B, Smyth GK (2010). “Permutation p-values should never be zero: Calculating exact p-values when permutations are randomly drawn.” Statistical Applications in Genetics and Molecular Biology, 9(1), Article 39. doi:10.2202/1544-6115.1585.
Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.
Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Asymptotic HL1 test hl1_test(x, y, method = "asymptotic", scale = "S1") ## Not run: # HL12 test using randomization principle by drawing 1000 random permutations # with replacement hl1_test(x, y, method = "randomization", n.rep = 1000, scale = "S2") ## End(Not run)
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Asymptotic HL1 test hl1_test(x, y, method = "asymptotic", scale = "S1") ## Not run: # HL12 test using randomization principle by drawing 1000 random permutations # with replacement hl1_test(x, y, method = "randomization", n.rep = 1000, scale = "S2") ## End(Not run)
hl2_test
performs a two-sample location test based on the two-sample
Hodges-Lehmann estimator for shift.
hl2_test( x, y, alternative = c("two.sided", "greater", "less"), delta = ifelse(scale.test, 1, 0), method = c("asymptotic", "permutation", "randomization"), scale = c("S1", "S2"), n.rep = 10000, na.rm = FALSE, scale.test = FALSE, wobble = FALSE, wobble.seed = NULL )
hl2_test( x, y, alternative = c("two.sided", "greater", "less"), delta = ifelse(scale.test, 1, 0), method = c("asymptotic", "permutation", "randomization"), scale = c("S1", "S2"), n.rep = 10000, na.rm = FALSE, scale.test = FALSE, wobble = FALSE, wobble.seed = NULL )
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less". |
delta |
a numeric value indicating the true difference in the location or
scale parameter, depending on whether the test should be performed
for a difference in location or in scale. The default is
|
method |
a character string specifying how the p-value is computed with
possible values |
scale |
a character string specifying the scale estimator used for standardization
of the test statistic; must be one of |
n.rep |
an integer value specifying the number of random splits used to
calculate the randomization distribution if |
na.rm |
a logical value indicating whether NA values in |
scale.test |
a logical value to specify if the samples should be compared
for a difference in scale. The default is |
wobble |
a logical value indicating whether the sample should be checked for
duplicated values that can cause the scale estimate to be zero.
If such values are present, uniform noise is added to the sample,
see |
wobble.seed |
an integer value used as a seed for the random number
generation in case of |
The test statistic for this test is based on the two-sample Hodges-Lehmann
estimator of x
and y
, see
hodges_lehmann_2sample
. Three versions of the test
are implemented: randomization, permutation, and asymptotic.
The test statistic for the permutation and randomization version of the test is standardized using a robust scale estimator, see (Fried and Dehling 2011).
With scale = "S1"
, the scale is estimated by
whereas scale = "S2"
uses
Here,
is the median-corrected sample.
The randomization distribution is based on randomly drawn splits with
replacement. The function permp
(Phipson and Smyth 2010)
is used to calculate the p-value. For the asymptotic test, a transformed version
of the HL2-estimator, which asymptotically follows a normal distribution, is
used. For more details on the asymptotic test, see Fried and Dehling (2011).
For scale.test = TRUE
, the test compares the two samples for a difference
in scale. This is achieved by log-transforming the original squared observations,
i.e. x
is replaced by log(x^2)
and y
by log(y^2)
.
A potential scale difference then appears as a location difference between
the transformed samples, see Fried (2012).
Note that the samples need to have equal locations. The sample should not
contain zeros to prevent problems with the necessary log-transformation. If
it contains zeros, uniform noise is added to all variables in order to remove
zeros and a message is printed.
If the sample has been modified (either because of zeros if scale.test = TRUE
or wobble = TRUE
), the modified samples can be retrieved using
set.seed(wobble.seed); wobble(x, y)
.
Both samples need to contain at least 5 non-missing values.
A named list with class "htest
" containing the following components:
statistic |
the value of the test statistic. |
p.value |
the p-value for the test. |
estimate |
the estimated location difference between |
null.value |
the specified hypothesized value of the mean difference/squared scale ratio. |
alternative |
a character string describing the alternative hypothesis. |
method |
a character string indicating how the p-value was computed. |
data.name |
a character string giving the names of the data. |
Phipson B, Smyth GK (2010). “Permutation p-values should never be zero: Calculating exact p-values when permutations are randomly drawn.” Statistical Applications in Genetics and Molecular Biology, 9(1), Article 39. doi:10.2202/1544-6115.1585.
Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.
Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Asymptotic HL2 test hl2_test(x, y, method = "asymptotic", scale = "S1") ## Not run: # HL22 test using randomization principle by drawing 1000 random permutations # with replacement hl2_test(x, y, method = "randomization", n.rep = 1000, scale = "S2") ## End(Not run)
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Asymptotic HL2 test hl2_test(x, y, method = "asymptotic", scale = "S1") ## Not run: # HL22 test using randomization principle by drawing 1000 random permutations # with replacement hl2_test(x, y, method = "randomization", n.rep = 1000, scale = "S2") ## End(Not run)
hodges_lehmann
calculates the one-sample Hodges-Lehmann estimator
of a sample.
hodges_lehmann(x, na.rm = FALSE)
hodges_lehmann(x, na.rm = FALSE)
x |
a (non-empty) numeric vector of data values. |
na.rm |
a logical value indicating whether NA values in |
The one-sample Hodges-Lehmann estimator for a sample of size n
is defined as
The one-sample Hodges-Lehmann estimator.
Hodges JL, Lehmann EL (1963). “Estimates of location based on rank tests.” The Annals of Mathematical Statistics, 34(2), 598–611. doi:10.1214/aoms/1177704172.
# Generate random sample set.seed(108) x <- rnorm(10) # Compute one-sample Hodges-Lehmann estimator hodges_lehmann(x)
# Generate random sample set.seed(108) x <- rnorm(10) # Compute one-sample Hodges-Lehmann estimator hodges_lehmann(x)
hodges_lehmann_2sample
calculates the two-sample Hodges-Lehmann
estimator for the location difference of two samples x and y.
hodges_lehmann_2sample(x, y, na.rm = FALSE)
hodges_lehmann_2sample(x, y, na.rm = FALSE)
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
na.rm |
a logical value indicating whether NA values in |
The two-sample Hodges-Lehmann estimator for two samples x
and y
of sizes m
and n
is defined as
The two-sample Hodges-Lehmann estimator.
Hodges JL, Lehmann EL (1963). “Estimates of location based on rank tests.” The Annals of Mathematical Statistics, 34(2), 598–611. doi:10.1214/aoms/1177704172.
# Generate random samples set.seed(108) x <- rnorm(10); y <- rnorm(10) # Compute two-sample Hodges-Lehmann estimator hodges_lehmann_2sample(x, y)
# Generate random samples set.seed(108) x <- rnorm(10); y <- rnorm(10) # Compute two-sample Hodges-Lehmann estimator hodges_lehmann_2sample(x, y)
m_est
calculates an M-estimate of location and its variance
for different psi functions.
m_est( x, psi, k = robustbase::.Mpsi.tuning.default(psi), tol = 1e-06, max.it = 15, na.rm = FALSE )
m_est( x, psi, k = robustbase::.Mpsi.tuning.default(psi), tol = 1e-06, max.it = 15, na.rm = FALSE )
x |
a (non-empty) numeric vector of data values. |
psi |
kernel used for optimization.
Must be one of |
k |
tuning parameter(s) for the respective kernel function,
defaults to parameters implemented in |
tol |
tolerance for convergence. The default is 1e-06. |
max.it |
the maximum number of iterations. The default is 15. |
na.rm |
a logical value indicating whether NA values in |
To compute the M-estimate, the iterative algorithm described in Maronna et al. (2019) is used. The variance is estimated as in Huber (1981).
If max.it
contains decimal places, it is truncated to an integer
value.
A named list containing the components:
est |
estimated mean. |
var |
estimated variance. |
Maronna RA, Martin DR, Yohai VJ, Salibián-Barrera M (2019). Robust Statistics: Theory and Methods (with R), Wiley Series in Probability and Statistics, Second edition edition. Wiley. doi:10.1002/9781119214656.
Huber PJ (1981). Robust Statistics. Wiley, New York. doi:10.1002/0471725250.
# Generate random sample set.seed(108) x <- rnorm(10) # Computer Huber's M-estimate m_est(x, psi = "huber")
# Generate random sample set.seed(108) x <- rnorm(10) # Computer Huber's M-estimate m_est(x, psi = "huber")
m_test
performs a two-sample location test based on an M-estimator.
m_test( x, y, alternative = c("two.sided", "greater", "less"), delta = ifelse(scale.test, 1, 0), method = c("asymptotic", "permutation", "randomization"), psi = c("huber", "hampel", "bisquare"), k = robustbase::.Mpsi.tuning.default(psi), n.rep = 10000, na.rm = FALSE, scale.test = FALSE, wobble.seed = NULL, ... )
m_test( x, y, alternative = c("two.sided", "greater", "less"), delta = ifelse(scale.test, 1, 0), method = c("asymptotic", "permutation", "randomization"), psi = c("huber", "hampel", "bisquare"), k = robustbase::.Mpsi.tuning.default(psi), n.rep = 10000, na.rm = FALSE, scale.test = FALSE, wobble.seed = NULL, ... )
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less". |
delta |
a numeric value indicating the true difference in the location or
scale parameter, depending on whether the test should be performed
for a difference in location or in scale. The default is
|
method |
a character string specifying how the p-value is computed with
possible values |
psi |
kernel used for optimization.
Must be one of |
k |
tuning parameter(s) for the respective kernel function,
defaults to parameters implemented in |
n.rep |
an integer value specifying the number of random splits used to
calculate the randomization distribution if |
na.rm |
a logical value indicating whether NA values in |
scale.test |
a logical value to specify if the samples should be compared
for a difference in scale. The default is |
wobble.seed |
an integer value used as a seed for the random number
generation in case that |
... |
additional arguments |
The test statistic for this test is based on the difference of the M-estimates
of location of x
and y
, see m_est
.
Three different psi-functions can be used: huber
, hampel
, and
bisquare
. The corresponding tuning parameter(s) can be set by the
argument k
of the function.
The estimate for the location difference is scaled by a pooled estimate for
the standard deviation. This estimate is based on the
tau-estimate of scale and is computed with the default parameter settings
of the function scaleTau2
. These can be changed if
by setting c1
and c2
.
More details on the construction of the test statistic are given in the
vignettes vignette("robnptests")
and
vignette("m_tests")
.
Three versions of the test are implemented: randomization, permutation, and asymptotic.
The randomization distribution is based on randomly drawn splits with
replacement. The function permp
(Phipson and Smyth 2010)
is used to calculate the p-value. The psi-function for the the M-estimate
is computed with the implementations in the package
robustbase.
For the asymptotic test, the distribution of the test statistic is approximated
by a standard normal distribution.
However, this is only justified under the normality assumption. When the
observations do not come from a normal distribution, the tests might not keep
the desired significance level. Simulations indicate that the level is kept
under symmetric distributions if the variance exists. Under skewed
distributions, it tends to be anti-conservative, see the vignette
vignette("m_tests")
. The test statistic can be corrected by a
factor which has to be determined individually for a specific distribution in
such cases.
For scale.test = TRUE
, the test compares the two samples for a difference
in scale. This is achieved by log-transforming the original squared observations,
i.e. x
is replaced by log(x^2)
and y
by log(y^2)
.
A potential scale difference then appears as a location difference between
the transformed samples, see Fried (2012).
Note that the samples need to have equal locations. The sample should not
contain zeros to prevent problems with the necessary log-transformation. If
it contains zeros, uniform noise is added to all variables in order to remove
zeros and a message is printed.
If the sample has been modified because of zeros when scale.test = TRUE
,
the modified samples can be retrieved using
set.seed(wobble.seed); wobble(x, y)
Both samples need to contain at least 5 non-missing values.
A named list with class "htest
" containing the following components:
statistic |
the value of the test statistic. |
parameter |
the degrees of freedom for the test statistic. |
p.value |
the p-value for the test. |
estimate |
the M-estimates of |
null.value |
the specified hypothesized value of the mean difference/squared scale ratio. |
alternative |
a character string describing the alternative hypothesis. |
method |
a character string indicating how the p-value was computed. |
data.name |
a character string giving the names of the data. |
Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.
Maronna RA, Zamar RH (2002). “Robust estimates of location and dispersion of high-dimensional datasets.” Technometrics, 44(4), 307–317. doi:10.1198/004017002188618509.
Phipson B, Smyth GK (2010). “Permutation p-values should never be zero: Calculating exact p-values when permutations are randomly drawn.” Statistical Applications in Genetics and Molecular Biology, 9(1), Article 39. doi:10.2202/1544-6115.1585.
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Asymptotic test based on Huber M-estimator m_test(x, y, method = "asymptotic", psi = "huber") ## Not run: # Randomization test based on Hampel M-estimator with 1000 random permutations # drawn with replacement m_test(x, y, method = "randomization", n.rep = 1000, psi = "hampel") ## End(Not run)
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Asymptotic test based on Huber M-estimator m_test(x, y, method = "asymptotic", psi = "huber") ## Not run: # Randomization test based on Hampel M-estimator with 1000 random permutations # drawn with replacement m_test(x, y, method = "randomization", n.rep = 1000, psi = "hampel") ## End(Not run)
m_test_statistic
calculates the test statistics for
tests based on M-estimators.
m_test_statistic(x, y, psi, k = robustbase::.Mpsi.tuning.default(psi), ...)
m_test_statistic(x, y, psi, k = robustbase::.Mpsi.tuning.default(psi), ...)
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
psi |
kernel used for optimization.
Must be one of |
k |
tuning parameter(s) for the respective kernel function,
defaults to parameters implemented in |
... |
additional arguments |
For details on how the test statistic is constructed, we refer to the
vignette vignette("m_tests")
A named list containing the following components:
statistic |
standardized test statistic. |
estimates |
M-estimates of location for both |
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Compute Huber-M-statistic m_test_statistic(x, y, psi = "huber")
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Compute Huber-M-statistic m_test_statistic(x, y, psi = "huber")
med_test
performs a two-sample location test based on
the difference of the sample medians for both samples.
med_test( x, y, alternative = c("two.sided", "greater", "less"), delta = ifelse(scale.test, 1, 0), method = c("asymptotic", "permutation", "randomization"), scale = c("S3", "S4"), n.rep = 10000, na.rm = FALSE, scale.test = FALSE, wobble = FALSE, wobble.seed = NULL )
med_test( x, y, alternative = c("two.sided", "greater", "less"), delta = ifelse(scale.test, 1, 0), method = c("asymptotic", "permutation", "randomization"), scale = c("S3", "S4"), n.rep = 10000, na.rm = FALSE, scale.test = FALSE, wobble = FALSE, wobble.seed = NULL )
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less". |
delta |
a numeric value indicating the true difference in the location or
scale parameter, depending on whether the test should be performed
for a difference in location or in scale. The default is
|
method |
a character string specifying how the p-value is computed with
possible values |
scale |
a character string specifying the scale estimator used for standardization
of the test statistic, must be one of |
n.rep |
an integer value specifying the number of random splits used to
calculate the randomization distribution if |
na.rm |
a logical value indicating whether NA values in |
scale.test |
a logical value to specify if the samples should be compared
for a difference in scale. The default is |
wobble |
a logical value indicating whether the sample should be checked for
duplicated values that can cause the scale estimate to be zero.
If such values are present, uniform noise is added to the sample,
see |
wobble.seed |
an integer value used as a seed for the random number
generation in case of |
The test statistic for this test is based on the difference of the sample
medians of x
and y
. Three versions of the test are implemented:
randomization, permutation, and asymptotic.
The test statistic for the permutation and randomization version of the test is standardized using a robust scale estimator, see (Fried and Dehling 2011).
With scale = "S3"
, the scale is estimated by
whereas scale = "S4"
uses
When computing the randomization distribution based on randomly drawn splits with
replacement, the function permp
(Phipson and Smyth 2010)
is used to calculate the p-value. For the asymptotic test, a transformed version
of the difference of the sample medians, which asymptotically follows a normal
distribution, is used. For more details on the asymptotic test, see
Fried and Dehling (2011).
For scale.test = TRUE
, the test compares the two samples for a difference
in scale. This is achieved by log-transforming the original squared observations,
i.e. x
is replaced by log(x^2)
and y
by log(y^2)
.
A potential scale difference then appears as a location difference between
the transformed samples, see Fried (2012).
Note that the samples need to have equal locations. The sample should not
contain zeros to prevent problems with the necessary log-transformation. If
it contains zeros, uniform noise is added to all variables in order to remove
zeros and a message is printed.
If the sample has been modified (either because of zeros for scale.test = TRUE
,
or wobble = TRUE
), the modified samples can be retrieved using
set.seed(wobble.seed); wobble(x, y)
Both samples need to contain at least 5 non-missing values.
A named list with class "htest
" containing the following components:
statistic |
the value of the test statistic. |
p.value |
the p-value for the test. |
estimate |
the sample medians of |
null.value |
the specified hypothesized value of the mean difference/squared scale ratio. |
alternative |
a character string describing the alternative hypothesis. |
method |
a character string indicating how the p-value was computed. |
data.name |
a character string giving the names of the data. |
Phipson B, Smyth GK (2010). “Permutation p-values should never be zero: Calculating exact p-values when permutations are randomly drawn.” Statistical Applications in Genetics and Molecular Biology, 9(1), Article 39. doi:10.2202/1544-6115.1585.
Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.
Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Asymptotic MED test med_test(x, y, method = "asymptotic", scale = "S3") ## Not run: # MED2 test using randomization principle by drawing 1000 random permutations # with replacement med_test(x, y, method = "randomization", n.rep = 1000, scale = "S4") ## End(Not run)
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Asymptotic MED test med_test(x, y, method = "asymptotic", scale = "S3") ## Not run: # MED2 test using randomization principle by drawing 1000 random permutations # with replacement med_test(x, y, method = "randomization", n.rep = 1000, scale = "S4") ## End(Not run)
rob_perm_statistic
calculates test statistics for robust
permutation/randomization tests based on the sample median, the one-sample
Hodges-Lehmann estimator, or the two-sample Hodges-Lehmann estimator.
rob_perm_statistic( x, y, type = c("HL11", "HL12", "HL21", "HL22", "MED1", "MED2"), na.rm = FALSE )
rob_perm_statistic( x, y, type = c("HL11", "HL12", "HL21", "HL22", "MED1", "MED2"), na.rm = FALSE )
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
type |
a character string specifying the desired test statistic. It must
be one of |
na.rm |
a logical value indicating whether NA values in |
The test statistics returned by rob_perm_statistic
are of the
form
where the D_i, i = 1,...,3, are different
estimators of location and the S_j, j = 1,...,4, are estimates for
the mutual sample scale. See Fried and Dehling (2011)
or the vignette vignette("robnptests")
for details.
A named list containing the following components:
statistic |
the selected test statistic. |
estimates |
estimate of location for each sample if available. |
Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Compute HL21-statistic rob_perm_statistic(x, y, type = "HL21")
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Compute HL21-statistic rob_perm_statistic(x, y, type = "HL21")
rob_scale
calculates an estimator for the within-sample dispersion
based on two samples.
rob_scale( x, y, type = c("S1", "S2", "S3", "S4"), na.rm = FALSE, check.for.zero = FALSE )
rob_scale( x, y, type = c("S1", "S2", "S3", "S4"), na.rm = FALSE, check.for.zero = FALSE )
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
type |
character that specifies the estimator for the variance, can be
|
na.rm |
a logical value indicating whether NA values in |
check.for.zero |
logical value indicating a warning should be triggered
if the scale estimate is zero. The default is
|
For definitions of the scale estimators, see Fried and Dehling (2011).
If check.for.zero = TRUE
, an error is thrown when the scale estimate
is zero. This argument is only included because the function is used in
rob_perm_statistic
to compute values of robust test statistics
where the scale estimate is used for standardization. A scale estimate of zero
leads to a non-existing test statistic, so that the corresponding test cannot
be performed.
An estimate of the pooled variance of the two samples.
Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.
trim_mean
calculates a trimmed mean of a sample.
trim_mean(x, gamma = 0.2, na.rm = FALSE)
trim_mean(x, gamma = 0.2, na.rm = FALSE)
x |
a (non-empty) numeric vector of data values. |
gamma |
a numeric value in [0, 0.5] specifying the fraction of observations to be trimmed from each end of the sample before calculating the mean. The default value is 0.2. |
na.rm |
a logical value indicating whether NA values in |
This is a wrapper function for the function mean
.
The trimmed mean.
# Generate random sample set.seed(108) x <- rnorm(10) # Compute 20% trimmed mean trim_mean(x, gamma = 0.2)
# Generate random sample set.seed(108) x <- rnorm(10) # Compute 20% trimmed mean trim_mean(x, gamma = 0.2)
trimmed_t
calculates the test statistic for the two-sample trimmed t-test.
trimmed_t(x, y, gamma = 0.2, na.rm = FALSE)
trimmed_t(x, y, gamma = 0.2, na.rm = FALSE)
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
gamma |
a numeric value in [0, 0.5] specifying the fraction of observations to be trimmed from each end of the sample before calculating the mean. The default value is 0.2. |
na.rm |
a logical value indicating whether NA values in |
A named list containing the following components:
statistic |
the value of the test statistic. |
estimates |
the trimmed means for both samples. |
df |
the degrees of freedom for the test statistic. |
Yuen KK, Dixon WT (1973). “The approximate behaviour and performance of the two-sample trimmed t.” Biometrika, 60(2), 369–374. doi:10.2307/2334550.
Yuen KK (1974). “The two-sample trimmed t for unequal population variances.” Biometrika, 61(1), 165–170. doi:10.2307/2334299.
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Compute trimmed t-statistic trimmed_t(x, y, gamma = 0.2)
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Compute trimmed t-statistic trimmed_t(x, y, gamma = 0.2)
trimmed_test
performs the two-sample trimmed t-test.
trimmed_test( x, y, gamma = 0.2, alternative = c("two.sided", "less", "greater"), method = c("asymptotic", "permutation", "randomization"), delta = ifelse(scale.test, 1, 0), n.rep = 1000, na.rm = FALSE, scale.test = FALSE, wobble.seed = NULL )
trimmed_test( x, y, gamma = 0.2, alternative = c("two.sided", "less", "greater"), method = c("asymptotic", "permutation", "randomization"), delta = ifelse(scale.test, 1, 0), n.rep = 1000, na.rm = FALSE, scale.test = FALSE, wobble.seed = NULL )
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
gamma |
a numeric value in [0, 0.5] specifying the fraction of observations to be trimmed from each end of the sample before calculating the mean. The default value is 0.2. |
alternative |
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less". |
method |
a character string specifying how the p-value is computed with
possible values |
delta |
a numeric value indicating the true difference in the location or
scale parameter, depending on whether the test should be performed
for a difference in location or in scale. The default is
|
n.rep |
an integer value specifying the number of random splits used to
calculate the randomization distribution if |
na.rm |
a logical value indicating whether NA values in |
scale.test |
a logical value to specify if the samples should be compared
for a difference in scale. The default is |
wobble.seed |
an integer value used as a seed for the random number
generation in case of |
The function performs Yuen's t-test based on the trimmed mean and winsorized
variance (Yuen and Dixon 1973).
The amount of trimming/winsorization is set in gamma
and
defaults to 0.2, i.e. 20% of the values are removed/replaced.
In addition to the asymptotic distribution a permutation and a
randomization version of the test are implemented.
When computing a randomization distribution based on randomly drawn splits
with replacement, the function permp
(Phipson and Smyth 2010)
is used to calculate the p-value.
For scale.test = TRUE
, the test compares the two samples for a difference
in scale. This is achieved by log-transforming the original squared observations,
i.e. x
is replaced by log(x^2)
and y
by log(y^2)
.
A potential scale difference then appears as a location difference between
the transformed samples, see Fried (2012).
Note that the samples need to have equal locations. The sample should not
contain zeros to prevent problems with the necessary log-transformation. If
it contains zeros, uniform noise is added to all variables in order to remove
zeros and a message is printed.
If the sample has been modified because of zeros when scale.test = TRUE
,
the modified samples can be retrieved using
set.seed(wobble.seed); wobble(x, y)
Both samples need to contain at least 5 non-missing values.
A named list with class "htest
" containing the following components:
statistic |
the value of the test statistic. |
parameter |
the degrees of freedom for the test statistic. |
p.value |
the p-value for the test. |
estimate |
the trimmed means of |
null.value |
the specified hypothesized value of the mean difference/squared scale ratio. |
alternative |
a character string describing the alternative hypothesis. |
method |
a character string indicating how the p-value was computed. |
data.name |
a character string giving the names of the data. |
Yuen KK, Dixon WT (1973). “The approximate behaviour and performance of the two-sample trimmed t.” Biometrika, 60(2), 369–374. doi:10.2307/2334550.
Yuen KK (1974). “The two-sample trimmed t for unequal population variances.” Biometrika, 61(1), 165–170. doi:10.2307/2334299.
Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Trimmed t-test trimmed_test(x, y, gamma = 0.1)
# Generate random samples set.seed(108) x <- rnorm(20); y <- rnorm(20) # Trimmed t-test trimmed_test(x, y, gamma = 0.1)
win_mean
calculates the winsorized mean of a sample.
win_mean(x, gamma = 0.2, na.rm = FALSE)
win_mean(x, gamma = 0.2, na.rm = FALSE)
x |
a (non-empty) numeric vector of data values. |
gamma |
a numeric value in [0, 0.5] specifying the fraction of observations to be replaced at each end of the sample before calculating the mean. The default value is 0.2. |
na.rm |
a logical value indicating whether NA values in |
The winsorized mean.
# Generate random samples set.seed(108) x <- rnorm(10) # Compute 20% winsorized mean win_mean(x, gamma = 0.2)
# Generate random samples set.seed(108) x <- rnorm(10) # Compute 20% winsorized mean win_mean(x, gamma = 0.2)
win_var
calculates the winsorized variance of a sample.
win_var(x, gamma = 0, na.rm = FALSE)
win_var(x, gamma = 0, na.rm = FALSE)
x |
a (non-empty) numeric vector of data values. |
gamma |
a numeric value in [0, 0.5] specifying the fraction of observations to be replaced at each end of the sample before calculating the mean. The default value is 0.2. |
na.rm |
a logical value indicating whether NA values in |
A named list containing the following items:
var |
winsorized variance. |
h |
degrees of freedom used for tests based on trimmed means and the winsorized variance. |
# Generate random sample set.seed(108) x <- rnorm(10) # Compute 20% winsorized variance win_var(x, gamma = 0.2)
# Generate random sample set.seed(108) x <- rnorm(10) # Compute 20% winsorized variance win_var(x, gamma = 0.2)
wobble
adds noise from a continuous uniform distribution to the
observations to remove ties.
wobble(x, y, check = TRUE)
wobble(x, y, check = TRUE)
x |
a (non-empty) numeric vector of data values. |
y |
a (non-empty) numeric vector of data values. |
check |
a logical value indicating whether the samples should be checked
for bindings prior to adding uniform noise or not, defaults to
|
If check = TRUE
the function checks whether all values in the two numeric
input vectors are distinct. If so, it returns the original values, otherwise
the ties are removed by adding noise from a continuous uniform distribution
to all observations. If check = FALSE
, it simply determines the number
of digits and adds uniform noise.
More precisely, we determine the minimum number of digits d_min
in the sample
and then add random numbers from the U[-0.5 10^(-d_min
), 0.5 10^(-d_min
)]
distribution to each of the observations.
A named list of length two containing the modified input samples x
and
y
.
Fried R, Gather U (2007). “On rank tests for shift detection in time series.” Computational Statistics & Data Analysis, 52(1), 221–233. doi:10.1016/j.csda.2006.12.017.
x <- rnorm(20); y <- rnorm(20); x <- round(x) wobble(x, y)
x <- rnorm(20); y <- rnorm(20); x <- round(x) wobble(x, y)