By Sheng BI

Today, we are going to see 4 ways of Hyptothesis testing approaches.

Background

A Survey investigates whether a pregnant woman should obtain a legal abortion if the woman wants it for any reason. 895/1934 reported Yes and the rest reported No.

Let $\theta$ denote the unknown population proportion of respondents who are for the proposition. The question of interest is whether a majority of the population supports the proposition in the survey item.

Frequentist approach.

The estimate of $\theta$ is $\hat{\theta}$ = 895/1934 = .46. The data is binomial (independent Bernoulli trials) in Nature. However, given large sample, we can use normal distribution to approximate the frequentist sampling distribution of $\hat{\theta}$.

Typically, we test the null hypothesis $H_0$ : $\theta = .5$ against all other alternatives $H_A$ : $\theta \ne .5$, or against a one-sided alternative $H_B$ : $\theta >.5$.

We then ask how unlikely it is that the value of $\hat{\theta}$ actually obtained by centering the sampling distribution of $\hat{\theta}$ at the hypothesized value (under $H_0$). The standard deviation of the normal sampling distribution (or the standard error of $\hat{\theta}$) under $H_0$ is

$$ se(\hat{\theta}) = \sqrt{\frac{\theta_{H_0}(1 - \theta_{H_0})}{n}} = \sqrt{\frac{0.5 (1 - 0.5)}{1934}} \approx 0.011 $$

The realized value of $\theta$ is (.5 − .46)/.011 $\approx$ 3.64 standard errors away from the hypothesized value. The corresponding proportion (two-sided p-value) is

$$2\times\int_{3.64}^{\infty}\phi(z)dz=2\times[1-\Phi(3.64)]\approx0.00028$$

where $\phi$ and $\Phi$ are respectively the pdf and cdf of normal distribution. Then, frequentists would conclude that the null hypothesis should be rejected with a p-values for $H_0$ against $H_A$ as .00028, and against $H_B$ as .00014.

That means, over repeated random samplings, a large proportion of estimates of $\hat{\theta}$ will lie 3.64 or more standard errors away from the hypothesized mean of the sampling distribution. This is an extremely rare event, under a normal distribution.

Bayesian Approach ...... Basics

With out loss of generality, assume $\theta$ follows uniform distribution. Then, the posterior has the same shape as the likelihood. With large sample, the likelihood is approximated by a normal density with $p(\theta|y) \approx N(.46,.011)$. Centered at 0.46, most of the posterior probability mass lies below .5, suggesting that the hypothesis θ >.5 be not well-supported by the data.

$$ \text{Pr}(\theta>0.5|y)=\int_{0.5}^{\infty}p(\theta|y)d\theta=\int_{0.5}^{\infty}\phi(\frac{\theta-0.46}{0.011})dz\approx0.00014 $$

The Bayesian probability relies on the researcher’s beliefs about $\theta$, updated via application of Bayes Rule, with the posterior distribution being Pr$(H_0|y)$.

The frequentist p-value is obtained differently and has a different interpretation. It conditions on the null hypothesis that the sampling distribution is $f( \hat{\theta}|H_0)$. The p-value for $H_0$ against the alternatives are obtained under repeated sampling.

The code implementation is as follows.

## Code implementation

library(ggplot2)
library(magrittr)
library(gridExtra)

graphs <- new.env()

graphs$draw_posterior <- function(data = c(895, 1039)) {
    h0 <- 0.5
    m <- data[1]/sum(data)
    sd <- m*(1-m)/sum(data) %>% sqrt
    
    # cat(c(m,sd))
    
    p1 <- ggplot(data.frame(x = c(m-4*sd, m+4*sd)), aes(x)) + 
    stat_function(fun = dnorm, args = list(mean = m, sd = sd)) + 
    geom_vline(xintercept = m, linetype = 'dashed', color = 'red') +
    ggtitle('Posterior density')
                  
    p2 <- ggplot(data.frame(x = c(0.495, 0.51)), aes(x)) + 
    stat_function(fun = dnorm, args = list(mean = m, sd = sd)) + 
    stat_function(fun = dnorm, xlim = c(h0, 0.510),
                  args = list(mean = m, sd = sd), geom = "area") + 
    geom_vline(xintercept = h0, linetype = 'dashed', color = 'blue') +
    coord_cartesian(ylim = c(0,0.0000003))
                  
    grid.arrange(p1, p2, nrow = 1)

}

graphs$draw_posterior()

graphs$draw_sampling <- function(data = c(895, 1039)) {
    h0 <- 0.5
    m <- data[1]/sum(data)
    sd <- m*(1-m)/sum(data) %>% sqrt
    
    # cat(c(m,sd))
    
    p1 <- ggplot(data.frame(x = c(h0-4*sd, h0+4*sd)), aes(x)) + 
    stat_function(fun = dnorm, args = list(mean = h0, sd = sd)) + 
    geom_vline(xintercept = h0, linetype = 'dashed', color = 'red') +
    ggtitle('Sampling distribution under H_0')
                  
    p2 <- ggplot(data.frame(x = c(0.450, 0.465)), aes(x)) + 
    stat_function(fun = dnorm, args = list(mean = h0, sd = sd)) + 
    stat_function(fun = dnorm, xlim = c(0.450, m),
                  args = list(mean = h0, sd = sd), geom = "area") + 
    geom_vline(xintercept = m, linetype = 'dashed', color = 'blue') +
    coord_cartesian(ylim = c(0,0.0000003))
                  
    grid.arrange(p1, p2, nrow = 1)

}

graphs$draw_sampling()

Bayesian Approach ...... Model Choice.

Remember that Bayesian approach allows us to obtain a posterior density, $f(\theta|y)$, not a point estimate or a binary decision about a hypothesis.

Now, hypothesis testing is a discrete decision problem. Our Bayesian procedures help us determine the ‘best model’ from a class of models for a given data set $y$.

Consider two models,$M={M_1,M_J }$. In the Bayesian approach, the researcher forms a prior belief as to which model is superior. The problem boils down to comparing $P(M_1|y)$ and $P(M_2|y)$, where

$$P(M_{i}|y)=\frac{P(M_{i})l(\text{y}|M_{i})}{\sum_{j}P(M_{j})l(\text{y}|M_{j})}$$

where $l(\text{y}|M_{i})$ is the marginal likelihood with

$$l(\text{y}|M_{i})=\int_{\Theta}l(\text{y}|\theta_{i},M_{i})p(\theta_{i})d\theta_{i}$$

Now, we define the loss function to be $$ Loss(M_{i},M*)=\begin{cases} 0 & M_{i}=M*\\ 1 & M_{i}\ne M* \end{cases} $$

The expected loss becomes

$$ E[Loss]=P(\{M_{i}\ne M*\}|\text{y})=1-P(M_{i}|\text{y}). $$

Thus we can choose the model with highest posterior probability.

Coming back to the last example.

With large sample, the likelihood for these data can be approximated by $N(0.46,0.011)$. The posterior is proportional to this likelihood because of our assumption of uniform prior.

More precisely, consider the following hypotheses:

$$ H_{0}: 0.5\leq\theta\leq1 \text{ and } H_{1}: 0\leq\theta<0.5, $$$$ p_{H_{0}}(\theta)=\begin{cases} 2 & 0.5\leq\theta\leq1\\ 0 & \text{otherwise} \end{cases} $$$$ p_{H_{1}}(\theta)=\begin{cases} 2 & 0\leq\theta<0.5\\ 0 & \text{otherwise} \end{cases} $$

A priori, we are indifferent between $P(H_{0})=P(H_{1})$ or $P(M_{0})=P(M_{1})$.

Under $H_{0}$, the marginal likelihood is

$$ l(\text{y}|H_{0})=\int_{0.5}^{1}l(\text{y}|H_{0},\theta)p(\theta)d\theta=2(\Phi(\frac{1-0.46}{0.011})-\Phi(\frac{0.5-0.46}{0.011}))=0.00028 $$

Under $H_{1}$, we have

$$ l(\text{y}|H_{1})=\int_{0}^{0.5}l(\text{y}|H_{1},\theta)p(\theta)d\theta=2(\Phi(\frac{0.5-0.46}{0.011})-\Phi(\frac{0-0.46}{0.011}))=2 $$

Thus,

$$ P(H_{0}|\text{y})=\frac{\frac{1}{2}\times0.00028}{\frac{1}{2}\times0.00028+\frac{1}{2}\times2}=0.00014 $$$$ P(H_{1}|\text{y})=1-P(H_{0}|\text{y})=0.99986 $$

...

$H_{1}$ is more plausible.

Bayesian Approach ...... Bayes factors.

Bayes factor $B_{10}$ is defined as the ratio of marginal likelihood, which consists of posterior odds and prior odds:

$$ B_{10}=\frac{l(\text{y}|M_{1})}{l(\text{y}|M_{0})}=\frac{p(M_{1}|\text{y})}{p(M_{0}|\text{y})}/\frac{p(M_{1})}{p(M_{0})} $$

In our context, $$ B_{10}=\frac{l(\text{y}|H_{1})}{l(\text{y}|H_{0})}=\frac{2}{0.00028} $$ Following the convention of Jefferys (1961) , when $B_{10}>12$, we say there is strong evidence that $H_{1}$ is superior to $H_{0}$.

Remark.

In this Bayesian study, there is no reference to a sampling distribution where particular statistic is obtained. We made based on the data at hand, instead of on what might happen over repeated applications of random sampling.

References:

https://sebastiansauer.github.io/normal_curve_ggplot2/
https://stackoverflow.com/questions/33244629/filling-under-the-a-curve-with-ggplot-graphs
https://stackoverflow.com/questions/24275918/how-can-you-limit-range-of-stat-function-plots-with-ggplot2
http://environmentalcomputing.net/plotting-with-ggplot-adding-titles-and-axis-names/
Simon Jackman, Bayesian Analysis for the Social Sciences.
Jeffreys, H. 1961. Theory of Probability