Hypothesis testing about proportions (HtestAboutProportions
)¶
-
class
cerebunit.statistics.hypothesis_testings.aboutproportions.
HtestAboutProportions
(observation, prediction, test={'name': 'proportions_z_test_1pop', 'sample_statistic': 0.0, 'side': 'not_equal', 'z_statistic': 0.0})¶ Hypothesis Testing (significance testing) about proportions.
This is a parameteric test that assumes that individuals in the sample are chosen randomly and experiments equivalent to binomial experiments.
1. Verify necessary data conditions.
The verification is made based on the sample size requirement (the other condition being random sample or binomial experiment with independent trials; this is assumed).
Statistic name Single sample test Double sample test sample size \(n\) (observation) \(n_1\) (observation) \(n_2\) (prediction) null value \(p_0\) (prediction) \(p_0 = 0\) proportions with \(p_1\) (observation) trait (succeses) \(p_2\) (prediction) Such that,
- \(np_0 \geq lb \cap n(1-p_0) \geq lb\)
- \(n_1p_1 \geq lb \cap n_1(1-p_1) \geq lb\)
- \(n_2p_2 \geq lb \cap n_2(1-p_2) \geq lb\)
- \(lb = 5\) (default) alternative value is \(lb = 10\)
2. Defining null and alternate hypotheses.
For single sample test
Statistic Interpretation sample statistic, \(\hat{p}\) proportion of observation with the characteristic trait (successes) null value/population parameter, \(p_0\) proportion of prediction taken as the specified value null hypothesis, \(H_0\) \(\hat{p} = p_0\) alternate hypothesis, \(H_a\) \(\hat{p} \neq or < or > p_0\) For two sample test
Statistic Interpretation sample statistic, \(\hat{p}_1-\hat{p}_2\) - difference between the proportions (observation,1, and
- prediction, 2) with the characteristic trait (successes)
null value/population parameter, \(p_0\) 0 null hypothesis, \(H_0\) \(\hat{p}_1-\hat{p}_2 = 0\) alternate hypothesis, \(H_a\) \(\hat{p}_1-\hat{p}_2 \neq or < or > 0\) 3. Assuming H0 is true, find p-value.
For single sample test
Statistic Interpretation \(n\) number of observations \(x\) number of observations with characteristic trait (successes) \(\hat{p}\) sample statistic, \(\hat{p} = \frac{x}{n}\) \(se_{\hat{p}}\) - standard error that \(H_0\) is true,
- \(se_{\hat{p}} = \frac{ p_0(1-p_0) }{ n }\)
z_statistic, z z = \(\frac{ \hat{p}-p_0 }{ se_{\hat{p}} }\) For two sample test
Statistic Interpretation \(n_1\) number of observations \(n_2\) number of predictions \(x_1\) number of observations with characteristic trait (successes) \(x_2\) number of predictions with characteristic trait (successes) \(\hat{p}_1\) - proportion of observation with successes,
- \(\hat{p}_1 = \frac{x_1}{n_1}\)
\(\hat{p}_2\) - proportion of predictions with successes,
- \(\hat{p}_2 = \frac{x_2}{n_2}\)
\(\hat{p}\) - combined proportion assuming that \(H_0: p_1 = p_2 = p\) is true
- \(\hat{p} = \frac{x_1+x_2}{n_1+n_2}\)
\(\hat{p}_1-\hat{p}_2\) sample statistic, \(se_{\hat{p}_1-\hat{p}_2}\) - standard error that \(H_0\) is true,
- \(se_{\hat{p}_1-\hat{p}_2}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n_1}+\frac{\hat{p}(1-\hat{p})}{n_2} }\)
z_statistic, z z = \(\frac{\hat{p}_1-\hat{p}_2 - p_0}{se_{\hat{p}_1-\hat{p}_2}}\) Note:
- Using z look up table for standard normal curve which will return its corresponding p.
- The p-value derived from z-statistic is approximate.
- For single sample test, exact p-value can be calculated from binomial distribution.
- The notation \(\hat{p}\) in single sample test represents sample statistic but not sample statistic for two sample test.
4. Report and Answer the question, based on the p-value is the result (true H0) statistically significant?
Answer is not provided by the class but it is up to the person viewing the reported result. The reports are obtained calling the attributes
.statistics
and.description
. This is illustrated below.ht = HtestAboutProportions( observation, prediction, test_result, side="less_than" ) score.description = ht.outcome score.statistics = ht.statistics
Arguments
Argument Representation Value type first experiment/observation dictionary that must have keys;“sample_size”, “success_numbers”,
second model prediction float or dictionary; the later for two sample caseswith keys: “sample_size”, “success_numbers”
third(keyword)
test result dictionary with keywords:“name”: string, “proportions_z_test_1pop” or “proportions_z_test_2pop” “sample_statistic”: float; “z_statistic”: float; “side”: string, “not_equal”, “less_than” or “greater_than”; and any additional names that is specific to the test
This constructor method generated
statistics
andoutcome
(which is then assigned todescirption
within the validation test class where this hypothesis test class is implemented).-
static
alternate_hypothesis
(side, symbol_null_value, symbol_sample_statistic)¶ Returns the statement for the alternate hypothesis, Ha.
-
static
null_hypothesis
(symbol_null_value, symbol_sample_statistic)¶ Returns the statement for the null hypothesis, H0.
-
test_outcome
()¶ Puts together the returned values of
null_hypothesis()
,alternate_hypothesis()
, and_compute_pvalue()
. Then returns the string value for.outcome
.