Hypothesis testing about proportions (HtestAboutProportions)

class cerebunit.statistics.hypothesis_testings.aboutproportions.HtestAboutProportions(observation, prediction, test={'name': 'proportions_z_test_1pop', 'sample_statistic': 0.0, 'side': 'not_equal', 'z_statistic': 0.0})

Hypothesis Testing (significance testing) about proportions.

This is a parameteric test that assumes that individuals in the sample are chosen randomly and experiments equivalent to binomial experiments.

1. Verify necessary data conditions.

The verification is made based on the sample size requirement (the other condition being random sample or binomial experiment with independent trials; this is assumed).

Statistic name Single sample test Double sample test
sample size \(n\) (observation) \(n_1\) (observation) \(n_2\) (prediction)
null value \(p_0\) (prediction) \(p_0 = 0\)
proportions with   \(p_1\) (observation)
trait (succeses)   \(p_2\) (prediction)

Such that,

  • \(np_0 \geq lb \cap n(1-p_0) \geq lb\)
  • \(n_1p_1 \geq lb \cap n_1(1-p_1) \geq lb\)
  • \(n_2p_2 \geq lb \cap n_2(1-p_2) \geq lb\)
  • \(lb = 5\) (default) alternative value is \(lb = 10\)

2. Defining null and alternate hypotheses.

For single sample test

Statistic Interpretation
sample statistic, \(\hat{p}\) proportion of observation with the characteristic trait (successes)
null value/population parameter, \(p_0\) proportion of prediction taken as the specified value
null hypothesis, \(H_0\) \(\hat{p} = p_0\)
alternate hypothesis, \(H_a\) \(\hat{p} \neq or < or > p_0\)

For two sample test

Statistic Interpretation
sample statistic, \(\hat{p}_1-\hat{p}_2\)
difference between the proportions (observation,1, and
prediction, 2) with the characteristic trait (successes)
null value/population parameter, \(p_0\) 0
null hypothesis, \(H_0\) \(\hat{p}_1-\hat{p}_2 = 0\)
alternate hypothesis, \(H_a\) \(\hat{p}_1-\hat{p}_2 \neq or < or > 0\)

3. Assuming H0 is true, find p-value.

For single sample test

Statistic Interpretation
\(n\) number of observations
\(x\) number of observations with characteristic trait (successes)
\(\hat{p}\) sample statistic, \(\hat{p} = \frac{x}{n}\)
\(se_{\hat{p}}\)
standard error that \(H_0\) is true,
\(se_{\hat{p}} = \frac{ p_0(1-p_0) }{ n }\)
z_statistic, z z = \(\frac{ \hat{p}-p_0 }{ se_{\hat{p}} }\)

For two sample test

Statistic Interpretation
\(n_1\) number of observations
\(n_2\) number of predictions
\(x_1\) number of observations with characteristic trait (successes)
\(x_2\) number of predictions with characteristic trait (successes)
\(\hat{p}_1\)
proportion of observation with successes,
\(\hat{p}_1 = \frac{x_1}{n_1}\)
\(\hat{p}_2\)
proportion of predictions with successes,
\(\hat{p}_2 = \frac{x_2}{n_2}\)
\(\hat{p}\)
combined proportion assuming that \(H_0: p_1 = p_2 = p\) is true
\(\hat{p} = \frac{x_1+x_2}{n_1+n_2}\)
\(\hat{p}_1-\hat{p}_2\) sample statistic,
\(se_{\hat{p}_1-\hat{p}_2}\)
standard error that \(H_0\) is true,
\(se_{\hat{p}_1-\hat{p}_2}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2} }\)
z_statistic, z z = \(\frac{\hat{p}_1-\hat{p}_2 - p_0}{se_{\hat{p}_1-\hat{p}_2}}\)

Note:

  • Using z look up table for standard normal curve which will return its corresponding p.
  • The p-value derived from z-statistic is approximate.
  • For single sample test, exact p-value can be calculated from binomial distribution.
  • The notation \(\hat{p}\) in single sample test represents sample statistic but not sample statistic for two sample test.

4. Report and Answer the question, based on the p-value is the result (true H0) statistically significant?

Answer is not provided by the class but it is up to the person viewing the reported result. The reports are obtained calling the attributes .statistics and .description. This is illustrated below.

ht = HtestAboutProportions( observation, prediction, test_result,
                            side="less_than" )
score.description = ht.outcome
score.statistics = ht.statistics

Arguments

Argument Representation Value type
first experiment/observation
dictionary that must have keys;

“sample_size”, “success_numbers”,

second model prediction
float or dictionary; the later for two sample cases

with keys: “sample_size”, “success_numbers”

third

(keyword)

test result
dictionary with keywords:

“name”: string, “proportions_z_test_1pop” or “proportions_z_test_2pop” “sample_statistic”: float; “z_statistic”: float; “side”: string, “not_equal”, “less_than” or “greater_than”; and any additional names that is specific to the test

This constructor method generated statistics and outcome (which is then assigned to descirption within the validation test class where this hypothesis test class is implemented).

static alternate_hypothesis(side, symbol_null_value, symbol_sample_statistic)

Returns the statement for the alternate hypothesis, Ha.

static null_hypothesis(symbol_null_value, symbol_sample_statistic)

Returns the statement for the null hypothesis, H0.

test_outcome()

Puts together the returned values of null_hypothesis(), alternate_hypothesis(), and _compute_pvalue(). Then returns the string value for .outcome.