Compute chi^2-statistic for test on proportions as the categorical variable (Chi2Score)

class cerebunit.statistics.stat_scores.chi2Score.Chi2Score(*args, **kwargs)

Compute chi2-statistic for chi squared Test of proportions.

For any two-way contingency tables.

Possibilities for categorical

variable, A
Possibilities for categorical variable, B
Yes No
a1 \(O_{00}\) \(O_{01}\)
a2 \(O_{10}\) \(O_{11}\)
Definitions Interpretation
\(r\) number of row variables
\(c\) number of column variables
\(O_{ij}\) observed count for a cell in \(i^{th}\) row, \(j^{th}\) column
\(R_{i}\) total observations in \(i^{th}\) row, \(\sum_{\forall j \in c} O_{ij}\)
\(C_{j}\) total observations in \(j^{th}\) column, \(\sum_{\forall i \in r} O_{ij}\)
\(n\) total count for entire table \(\sum_{\forall i \in r} R_i\) or \(\sum_{\forall j \in c} C_j\)
\(E_{ij}\) expected count for a cell in \(i^{th}\) row, \(j^{th}\) column \(E_{ij} = \frac{R_i C_j}{n}\)
test-statistic \(\chi^2 = \sum_{\forall i,j} \frac{(O_{ij}-E_{ij})^2}{E_{ij}}\)
\(df\) degrees of freedom, \(df = (r-1)(c-1)\)

Special note. For the case of 2 x 2 table like below

Possibilities for categorical

variable, A
row-1
Possibilities for categorical variable, B
Total
R1
column-1
A
column-2
B
row-2 C D R2
Total C1 C2 N

Notice that for 2 x 2, \(df = 1\) and its test statictic can calculated with the shortcut formula

\(\chi^2 = \frac{ N(AD-BC)^2 }{ R_1 R_2 C_1 C_2 }\)

This class uses scipy.stats.chi2_contingency. chi2_contingency is a special case of chisquare as demonstrated below

obs = np.array([ [129, 49], [150, 29], [137, 39] ])
chi2, p, df, expected = scipy.stats.chi2_contingency( obs )
chi2_, p_ = scipy.stats.chisquare( obs.ravel(), f_exp=expected.ravel(), ddof=obs.size-1-df )
chi2 == chi2_ == 6.69
True
p == p2 == 0.03
True

Use Case:

x = Chi2ScoreForProportionChi2Test.compute( observation, prediction )
score = Chi2ScoreForProportionChi2Test(x)

Note: As part of the SciUnit framework this custom TScore should have the following methods,

  • compute() (class method)
  • sort_key() (property)
  • __str__()
classmethod compute(observation, prediction)
Argument Value type
first argument |dictionary; observation/experimental data must
|must have keys “sample_size” and “success_numbers”
second argument |dictionary; model prediction must also have keys
|”sample_size” and “success_numbers”

Note:

  • for a 2 x 2 table, the value for the key “success_numbers is a number for both observation and prediction
  • for a 2 x k table, the values for the keys “success_numbers” (both observation and prediction) is either a list or an array.
  • chi squared tests by nature are two-sided so there is not option for one-sidedness.