Compute z-statistic for Two Sample Rank-Sum test (ZScoreForTwoSampleRankSumTest
)¶
-
class
cerebunit.statistics.stat_scores.zTwoSampleRankSumScore.
ZScoreForTwoSampleRankSumTest
(*args, **kwargs)¶ Compute z-statistic for Two Sample Rank-Sum Test (aka, Wilcoxon rank-sum or Mann-Whitney test). Note that this is not Wilcoxon Signed Rank test.
Definitions Interpretation \(\eta_0\) some specified value \(n_1\) sample size for sample 1 \(n_2\) sample size for sample 2 \(N\) total sample size, \(n_1 + n_2\) \(W\) sum of ranks for observations in sample 1 (post dataset ranking) \(\mu_W\) assuming \(H_0\) is true, \(\mu_W\) = \(\frac{ n_1(1+N) }{ 2 }\) \(\sigma_W\) assuming \(H_0\) is true, \(\sigma_W\) = \(\sqrt{ \frac{ n_1 n_2 (1+N) }{12} }\) z-statistic, z z = \(\frac{ W - \mu_W }{ \sigma_W }\) NOTE:
- \(H_0\) is true \(\Rightarrow\) for samples 1 and 2 their population distributions are the same
Use Case:
x = ZScoreForTwoSampleRankSumTest.compute( observation, prediction ) score = ZScoreForTwoSampleRankSumTest(x)
Note: As part of the SciUnit framework this custom
TScore
should have the following methods,compute()
(class method)sort_key()
(property)__str__()
Additionally,
get_observation_rank()
(class method)orderdata_ranks()
(static method)
-
classmethod
compute
(observation, prediction)¶ Argument Value type first argument dictionary; observation/experimental data second argument dictionary; simulated data Note:
- observation must have the key “raw_data” whose value is the list of numbers
- simulation, i.e, model prediction must also have the key “raw_data”
-
classmethod
get_observation_rank
(observation, prediction)¶ Returns ranks for the observation data.
- sample 1, observation[“raw_data”]
- sample 2, prediction[“raw_data”]
Example for describing what ‘ranking’ means:
\(sample1 = [65, 60, 62, 70]\)
\(sample2 = [60, 55, 65, 70]\)
Then,
\(ordered\_data = [55, 60, 60, 62, 65, 65, 70, 70]\)
\(raw\_ranks = [ 1, 2, 3, 4, 5, 6, 7, 8]\)
and
\(correct\_ranks= [ 1, 2.5, 2.5, 4, 5.5, 5.5, 7.5, 7.5]\)
Therefore, ranks for sample1 is
\(sample1\_ranks = [5.5, 2.5, 4, 7.5]\)
NOTE:
- corrected ranks have midranks for repeated values
- the returned sample1 rank is numpy array
-
static
orderdata_ranks
(observation, prediction)¶ Static function that orders the data and returns its appropriate rank.
- sample 1, observation[“raw_data”]
- sample 2, prediction[“raw_data”]
Step-1:
- append the two lists (i.e, the two samples)
- order the values in ascending manner
Step-2:
- get unique values in the ordered data
- also get the number of frequencies for each unique value
Step-3:
- construct raw ranks based on the ordered data
Step-4:
- for each value in the ordered data find its index in unique values array
- if the corresponding count is more than one compute its midrank (sum ranks/its count)
- set ranks (in raw ranks) for the corresponding number of values with the computed midrank