Compute chi^2-statistic for chi^2 goodness-of-fit test on proportions of categories of a categorical variable (Chi2GOFScore
)¶
-
class
cerebunit.statistics.stat_scores.chi2GOFScore.
Chi2GOFScore
(*args, **kwargs)¶ Compute chi2-statistic for chi-squared goodness-of-fit Test of proportions.
One may think of this as a one-way contingency table.
sample size
\(n\)
\(k\) categories of a categorial variable of interest \(x_1\) \(x_2\) \(\ldots\) \(x_k\) observations \(O_1\) \(O_2\) \(\ldots\) \(O_k\) probabilities \(p_1\) \(p_2\) \(\ldots\) \(p_k\) expected \(np_1\) \(np_2\) \(\ldots\) \(np_k\) Notice that for probabilities of k categories \(\sum_{\forall i} p_i = 1\). The expected counts for each category can be derived from it (or already given) such that \(\sum_{\forall i} np_i = n\).
Definitions Interpretation \(n\) sample size; total number of experiments done \(k\) number of categorical variables \(O_i\) observed count (frequency) for \(i^{th}\) variable \(p_i\) probability for \(i^{th}\) category such that \(\sum_{\forall i} p_i = 1\) \(E_i\) expected count for \(i^{th}\) category such that \(E_i = n p_i\) test-statistic \(\chi^2 = \sum_{\forall i} \frac{(O_i - E_i)^2}{E_i}\) \(df\) degrees of freedom, \(df = k-1\) Note the modification made when compared with a two-way \(\chi^2\) test is
- the calculation of expected counts \(E_i = n p_i\)
- the degree of freedom \(df = k-1\)
This class uses scipy.stats.chisquare.
Use Case:
x = Chi2GOFScoreForProportionChi2GOFTest.compute( observation, prediction ) score = Chi2GOFScoreForProportionChi2GOFTest(x)
Note: As part of the SciUnit framework this custom
TScore
should have the following methods,compute()
(class method)sort_key()
(property)__str__()
-
classmethod
compute
(observation, prediction)¶ Argument Value type first argument dictionary; observation/experimental data must have keys “sample_size” with a number as its value and “observed_freq” whose value is an array second argument dictionary; model prediction must have either “probabilities” or “expected” whose value is an array (same length as “observed_freq”) Note:
- chi squared tests (for goodness-of-fit or contingency table) by nature are two-sided so there is not option for one-sidedness.