By Sheng BI
Today, we are going to see 4 ways of Hyptothesis testing approaches.
Background
A Survey investigates whether a pregnant woman should obtain a legal abortion if the woman wants it for any reason. 895/1934 reported Yes and the rest reported No.
Let $\theta$ denote the unknown population proportion of respondents who are for the proposition. The question of interest is whether a majority of the population supports the proposition in the survey item.
Frequentist approach.
The estimate of $\theta$ is $\hat{\theta}$ = 895/1934 = .46. The data is binomial (independent Bernoulli trials) in Nature. However, given large sample, we can use normal distribution to approximate the frequentist sampling distribution of $\hat{\theta}$.
Typically, we test the null hypothesis $H_0$ : $\theta = .5$ against all other alternatives $H_A$ : $\theta \ne .5$, or against a one-sided alternative $H_B$ : $\theta >.5$.
We then ask how unlikely it is that the value of $\hat{\theta}$ actually obtained by centering the sampling distribution of $\hat{\theta}$ at the hypothesized value (under $H_0$). The standard deviation of the normal sampling distribution (or the standard error of $\hat{\theta}$) under $H_0$ is
$$ se(\hat{\theta}) = \sqrt{\frac{\theta_{H_0}(1 - \theta_{H_0})}{n}} = \sqrt{\frac{0.5 (1 - 0.5)}{1934}} \approx 0.011 $$The realized value of $\theta$ is (.5 − .46)/.011 $\approx$ 3.64 standard errors away from the hypothesized value. The corresponding proportion (two-sided p-value) is
$$2\times\int_{3.64}^{\infty}\phi(z)dz=2\times[1-\Phi(3.64)]\approx0.00028$$where $\phi$ and $\Phi$ are respectively the pdf and cdf of normal distribution. Then, frequentists would conclude that the null hypothesis should be rejected with a p-values for $H_0$ against $H_A$ as .00028, and against $H_B$ as .00014.
That means, over repeated random samplings, a large proportion of estimates of $\hat{\theta}$ will lie 3.64 or more standard errors away from the hypothesized mean of the sampling distribution. This is an extremely rare event, under a normal distribution.
Bayesian Approach ...... Basics
With out loss of generality, assume $\theta$ follows uniform distribution. Then, the posterior has the same shape as the likelihood. With large sample, the likelihood is approximated by a normal density with $p(\theta|y) \approx N(.46,.011)$. Centered at 0.46, most of the posterior probability mass lies below .5, suggesting that the hypothesis θ >.5 be not well-supported by the data.
$$ \text{Pr}(\theta>0.5|y)=\int_{0.5}^{\infty}p(\theta|y)d\theta=\int_{0.5}^{\infty}\phi(\frac{\theta-0.46}{0.011})dz\approx0.00014 $$The Bayesian probability relies on the researcher’s beliefs about $\theta$, updated via application of Bayes Rule, with the posterior distribution being Pr$(H_0|y)$.
The frequentist p-value is obtained differently and has a different interpretation. It conditions on the null hypothesis that the sampling distribution is $f( \hat{\theta}|H_0)$. The p-value for $H_0$ against the alternatives are obtained under repeated sampling.
The code implementation is as follows.
## Code implementation
library(ggplot2)
library(magrittr)
library(gridExtra)
graphs <- new.env()
graphs$draw_posterior <- function(data = c(895, 1039)) {
h0 <- 0.5
m <- data[1]/sum(data)
sd <- m*(1-m)/sum(data) %>% sqrt
# cat(c(m,sd))
p1 <- ggplot(data.frame(x = c(m-4*sd, m+4*sd)), aes(x)) +
stat_function(fun = dnorm, args = list(mean = m, sd = sd)) +
geom_vline(xintercept = m, linetype = 'dashed', color = 'red') +
ggtitle('Posterior density')
p2 <- ggplot(data.frame(x = c(0.495, 0.51)), aes(x)) +
stat_function(fun = dnorm, args = list(mean = m, sd = sd)) +
stat_function(fun = dnorm, xlim = c(h0, 0.510),
args = list(mean = m, sd = sd), geom = "area") +
geom_vline(xintercept = h0, linetype = 'dashed', color = 'blue') +
coord_cartesian(ylim = c(0,0.0000003))
grid.arrange(p1, p2, nrow = 1)
}
graphs$draw_posterior()
graphs$draw_sampling <- function(data = c(895, 1039)) {
h0 <- 0.5
m <- data[1]/sum(data)
sd <- m*(1-m)/sum(data) %>% sqrt
# cat(c(m,sd))
p1 <- ggplot(data.frame(x = c(h0-4*sd, h0+4*sd)), aes(x)) +
stat_function(fun = dnorm, args = list(mean = h0, sd = sd)) +
geom_vline(xintercept = h0, linetype = 'dashed', color = 'red') +
ggtitle('Sampling distribution under H_0')
p2 <- ggplot(data.frame(x = c(0.450, 0.465)), aes(x)) +
stat_function(fun = dnorm, args = list(mean = h0, sd = sd)) +
stat_function(fun = dnorm, xlim = c(0.450, m),
args = list(mean = h0, sd = sd), geom = "area") +
geom_vline(xintercept = m, linetype = 'dashed', color = 'blue') +
coord_cartesian(ylim = c(0,0.0000003))
grid.arrange(p1, p2, nrow = 1)
}
graphs$draw_sampling()
Bayesian Approach ...... Model Choice.
Remember that Bayesian approach allows us to obtain a posterior density, $f(\theta|y)$, not a point estimate or a binary decision about a hypothesis.
Now, hypothesis testing is a discrete decision problem. Our Bayesian procedures help us determine the ‘best model’ from a class of models for a given data set $y$.
Consider two models,$M={M_1,M_J }$. In the Bayesian approach, the researcher forms a prior belief as to which model is superior. The problem boils down to comparing $P(M_1|y)$ and $P(M_2|y)$, where
$$P(M_{i}|y)=\frac{P(M_{i})l(\text{y}|M_{i})}{\sum_{j}P(M_{j})l(\text{y}|M_{j})}$$where $l(\text{y}|M_{i})$ is the marginal likelihood with
$$l(\text{y}|M_{i})=\int_{\Theta}l(\text{y}|\theta_{i},M_{i})p(\theta_{i})d\theta_{i}$$Now, we define the loss function to be $$ Loss(M_{i},M*)=\begin{cases} 0 & M_{i}=M*\\ 1 & M_{i}\ne M* \end{cases} $$
The expected loss becomes
$$ E[Loss]=P(\{M_{i}\ne M*\}|\text{y})=1-P(M_{i}|\text{y}). $$Thus we can choose the model with highest posterior probability.
Coming back to the last example.
With large sample, the likelihood for these data can be approximated by $N(0.46,0.011)$. The posterior is proportional to this likelihood because of our assumption of uniform prior.
More precisely, consider the following hypotheses:
$$ H_{0}: 0.5\leq\theta\leq1 \text{ and } H_{1}: 0\leq\theta<0.5, $$$$ p_{H_{0}}(\theta)=\begin{cases} 2 & 0.5\leq\theta\leq1\\ 0 & \text{otherwise} \end{cases} $$$$ p_{H_{1}}(\theta)=\begin{cases} 2 & 0\leq\theta<0.5\\ 0 & \text{otherwise} \end{cases} $$A priori, we are indifferent between $P(H_{0})=P(H_{1})$ or $P(M_{0})=P(M_{1})$.
Under $H_{0}$, the marginal likelihood is
$$ l(\text{y}|H_{0})=\int_{0.5}^{1}l(\text{y}|H_{0},\theta)p(\theta)d\theta=2(\Phi(\frac{1-0.46}{0.011})-\Phi(\frac{0.5-0.46}{0.011}))=0.00028 $$Under $H_{1}$, we have
$$ l(\text{y}|H_{1})=\int_{0}^{0.5}l(\text{y}|H_{1},\theta)p(\theta)d\theta=2(\Phi(\frac{0.5-0.46}{0.011})-\Phi(\frac{0-0.46}{0.011}))=2 $$Thus,
$$ P(H_{0}|\text{y})=\frac{\frac{1}{2}\times0.00028}{\frac{1}{2}\times0.00028+\frac{1}{2}\times2}=0.00014 $$$$ P(H_{1}|\text{y})=1-P(H_{0}|\text{y})=0.99986 $$...
$H_{1}$ is more plausible.
Bayesian Approach ...... Bayes factors.
Bayes factor $B_{10}$ is defined as the ratio of marginal likelihood, which consists of posterior odds and prior odds:
$$ B_{10}=\frac{l(\text{y}|M_{1})}{l(\text{y}|M_{0})}=\frac{p(M_{1}|\text{y})}{p(M_{0}|\text{y})}/\frac{p(M_{1})}{p(M_{0})} $$In our context, $$ B_{10}=\frac{l(\text{y}|H_{1})}{l(\text{y}|H_{0})}=\frac{2}{0.00028} $$ Following the convention of Jefferys (1961) , when $B_{10}>12$, we say there is strong evidence that $H_{1}$ is superior to $H_{0}$.
Remark.
In this Bayesian study, there is no reference to a sampling distribution where particular statistic is obtained. We made based on the data at hand, instead of on what might happen over repeated applications of random sampling.
References: