Compute chi^2-statistic for chi^2 goodness-of-fit test on proportions of categories of a categorical variable (Chi2GOFScore)

class cerebunit.statistics.stat_scores.chi2GOFScore.Chi2GOFScore(*args, **kwargs)

Compute chi2-statistic for chi-squared goodness-of-fit Test of proportions.

One may think of this as a one-way contingency table.

sample size

\(n\)

\(k\) categories of a categorial variable of interest
\(x_1\) \(x_2\) \(\ldots\) \(x_k\)
observations \(O_1\) \(O_2\) \(\ldots\) \(O_k\)
probabilities \(p_1\) \(p_2\) \(\ldots\) \(p_k\)
expected \(np_1\) \(np_2\) \(\ldots\) \(np_k\)

Notice that for probabilities of k categories \(\sum_{\forall i} p_i = 1\). The expected counts for each category can be derived from it (or already given) such that \(\sum_{\forall i} np_i = n\).

Definitions Interpretation
\(n\) sample size; total number of experiments done
\(k\) number of categorical variables
\(O_i\) observed count (frequency) for \(i^{th}\) variable
\(p_i\) probability for \(i^{th}\) category such that \(\sum_{\forall i} p_i = 1\)
\(E_i\) expected count for \(i^{th}\) category such that \(E_i = n p_i\)
test-statistic \(\chi^2 = \sum_{\forall i} \frac{(O_i - E_i)^2}{E_i}\)
\(df\) degrees of freedom, \(df = k-1\)

Note the modification made when compared with a two-way \(\chi^2\) test is

  • the calculation of expected counts \(E_i = n p_i\)
  • the degree of freedom \(df = k-1\)

This class uses scipy.stats.chisquare.

Use Case:

x = Chi2GOFScoreForProportionChi2GOFTest.compute( observation, prediction )
score = Chi2GOFScoreForProportionChi2GOFTest(x)

Note: As part of the SciUnit framework this custom TScore should have the following methods,

  • compute() (class method)
  • sort_key() (property)
  • __str__()
classmethod compute(observation, prediction)
Argument Value type
first argument dictionary; observation/experimental data must have keys “sample_size” with a number as its value and “observed_freq” whose value is an array
second argument dictionary; model prediction must have either “probabilities” or “expected” whose value is an array (same length as “observed_freq”)

Note:

  • chi squared tests (for goodness-of-fit or contingency table) by nature are two-sided so there is not option for one-sidedness.