In the world of statistics, two major philosophical approaches compete: the frequentist approach, which has dominated for much of the 20th century, and the Bayesian approach, which has experienced a remarkable renaissance in recent decades. Bayesian statistics offers a powerful framework for updating beliefs in light of new evidence, making it particularly valuable in fields ranging from machine learning to medical diagnosis. Let’s explore this fascinating approach that’s transforming how we analyze data and make decisions under uncertainty.
The Fundamental Concept: Bayes’ Theorem
At the heart of Bayesian statistics lies Bayes’ theorem, a mathematical formula that describes how to update the probability of a hypothesis when given new evidence. In its simplest form:
P(H|E) = [P(E|H) × P(H)] / P(E)
Where:
- P(H|E) is the posterior probability: the probability of hypothesis H given the evidence E
- P(E|H) is the likelihood: the probability of observing evidence E given that hypothesis H is true
- P(H) is the prior probability: the initial belief in hypothesis H before seeing the evidence
- P(E) is the marginal likelihood: the total probability of observing evidence E under all possible hypotheses
This elegant formula provides a formal mechanism for learning from experience and incorporating new information into our beliefs.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from scipy import stats
# Set a consistent style for all plots
sns.set(style="whitegrid")
# Simple example of Bayes' theorem
# Medical test scenario: Rare disease (1% prevalence)
# Test has 95% sensitivity and 90% specificity
# Prior probabilities
p_disease = 0.01 # 1% of population has the disease
p_no_disease = 0.99 # 99% of population doesn't have the disease
# Likelihoods
p_positive_given_disease = 0.95 # Sensitivity: 95% of sick people test positive
p_positive_given_no_disease = 0.10 # False positive rate: 10% of healthy people test positive
# Calculate the probability of a positive test
p_positive = (p_positive_given_disease * p_disease) + (p_positive_given_no_disease * p_no_disease)
# Calculate the posterior probability using Bayes' theorem
p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive
# Visualize the Bayesian update
labels = ['Prior (Base Rate)', 'Posterior (After Positive Test)']
probabilities = [p_disease, p_disease_given_positive]
plt.figure(figsize=(10, 6))
plt.bar(labels, probabilities, color=['blue', 'red'])
plt.title("Bayesian Update: Probability of Having the Disease")
plt.ylabel("Probability")
plt.ylim(0, 1)
# Add percentage labels on bars
for i, prob in enumerate(probabilities):
plt.text(i, prob + 0.02, f"{prob:.1%}", ha='center')
plt.show()
print(f"Prior probability of disease: {p_disease:.1%}")
print(f"Posterior probability of disease given positive test: {p_disease_given_positive:.1%}")
print(f"Despite the positive test, there's still a {1-p_disease_given_positive:.1%} chance the person doesn't have the disease.")
Prior Distributions: Starting with What We Know
A key feature of Bayesian statistics is the explicit use of prior distributions to represent our initial beliefs before seeing the data. These priors can be:
Informative priors: Based on previous studies, expert knowledge, or theoretical considerations.
Weakly informative priors: Providing some constraints but allowing the data to dominate.
Non-informative priors: Designed to have minimal impact on the posterior, letting the data “speak for itself.”
The choice of prior is sometimes criticized as subjective, but it allows us to incorporate existing knowledge and can be particularly valuable when data is limited.
# Demonstrate different prior distributions and their impact
np.random.seed(42)
# Generate some data: coin flips (10 flips, 7 heads)
data = np.array([1, 1, 1, 1, 1, 1, 1, 0, 0, 0]) # 1 = heads, 0 = tails
n_heads = np.sum(data)
n_flips = len(data)
# Define different prior distributions for the probability of heads
theta = np.linspace(0, 1, 1000) # Possible values for probability of heads
# 1. Uniform prior (Beta(1,1)): All probabilities equally likely
prior_uniform = stats.beta.pdf(theta, 1, 1)
# 2. Informative prior centered at 0.5 (Beta(5,5)): Belief that coin is fair
prior_fair = stats.beta.pdf(theta, 5, 5)
# 3. Informative prior centered at 0.7 (Beta(7,3)): Belief that coin is biased toward heads
prior_biased = stats.beta.pdf(theta, 7, 3)
# Calculate the likelihood function: P(data|theta)
likelihood = theta ** n_heads * (1 - theta) ** (n_flips - n_heads)
# Calculate the unnormalized posteriors
posterior_uniform = likelihood * prior_uniform
posterior_fair = likelihood * prior_fair
posterior_biased = likelihood * prior_biased
# Normalize the posteriors
posterior_uniform /= np.trapz(posterior_uniform, theta)
posterior_fair /= np.trapz(posterior_fair, theta)
posterior_biased /= np.trapz(posterior_biased, theta)
# Calculate the posterior means (Bayesian estimates)
estimate_uniform = np.trapz(theta * posterior_uniform, theta)
estimate_fair = np.trapz(theta * posterior_fair, theta)
estimate_biased = np.trapz(theta * posterior_biased, theta)
# Visualize the priors, likelihood, and posteriors
fig, axes = plt.subplots(3, 3, figsize=(15, 12))
# Plot priors
axes[0, 0].plot(theta, prior_uniform)
axes[0, 0].set_title('Uniform Prior (Beta(1,1))')
axes[0, 0].set_ylabel('Density')
axes[1, 0].plot(theta, prior_fair)
axes[1, 0].set_title('Fair Coin Prior (Beta(5,5))')
axes[1, 0].set_ylabel('Density')
axes[2, 0].plot(theta, prior_biased)
axes[2, 0].set_title('Biased Coin Prior (Beta(7,3))')
axes[2, 0].set_ylabel('Density')
axes[2, 0].set_xlabel('θ (Probability of Heads)')
# Plot likelihood (same for all scenarios)
for i in range(3):
axes[i, 1].plot(theta, likelihood)
axes[i, 1].set_title('Likelihood: 7 Heads in 10 Flips')
if i == 2:
axes[i, 1].set_xlabel('θ (Probability of Heads)')
# Plot posteriors
axes[0, 2].plot(theta, posterior_uniform)
axes[0, 2].axvline(estimate_uniform, color='red', linestyle='--',
label=f'Mean: {estimate_uniform:.3f}')
axes[0, 2].set_title('Posterior with Uniform Prior')
axes[0, 2].legend()
axes[1, 2].plot(theta, posterior_fair)
axes[1, 2].axvline(estimate_fair, color='red', linestyle='--',
label=f'Mean: {estimate_fair:.3f}')
axes[1, 2].set_title('Posterior with Fair Coin Prior')
axes[1, 2].legend()
axes[2, 2].plot(theta, posterior_biased)
axes[2, 2].axvline(estimate_biased, color='red', linestyle='--',
label=f'Mean: {estimate_biased:.3f}')
axes[2, 2].set_title('Posterior with Biased Coin Prior')
axes[2, 2].set_xlabel('θ (Probability of Heads)')
axes[2, 2].legend()
plt.tight_layout()
plt.show()
# Compare the estimates
print("Bayesian Estimates for Probability of Heads:")
print(f"Frequentist MLE: {n_heads/n_flips:.3f}")
print(f"With Uniform Prior: {estimate_uniform:.3f}")
print(f"With Fair Coin Prior: {estimate_fair:.3f}")
print(f"With Biased Coin Prior: {estimate_biased:.3f}")
Posterior Distributions: Updating Beliefs with Data
The posterior distribution represents our updated beliefs after observing the data. It combines the prior distribution with the likelihood function, which measures how well different parameter values explain the observed data.
The posterior serves as a complete summary of what we know about the parameters, allowing us to:
- Calculate point estimates (e.g., mean, median, or mode of the posterior)
- Quantify uncertainty through credible intervals
- Make predictions about future observations
- Test hypotheses by comparing posterior probabilities
As more data is collected, the posterior typically becomes more concentrated, reflecting increased certainty about the parameters.
# Demonstrate how posterior distributions evolve with more data
np.random.seed(42)
# True probability of heads
true_prob = 0.7
# Generate sequence of coin flips
n_flips_total = 100
all_flips = np.random.binomial(1, true_prob, n_flips_total)
# Define checkpoints to observe the posterior evolution
checkpoints = [0, 1, 5, 10, 20, 50, 100]
# Set up the plot
plt.figure(figsize=(12, 8))
# Use a uniform prior (Beta(1,1))
alpha_prior = 1
beta_prior = 1
# Plot the prior
theta = np.linspace(0, 1, 1000)
prior = stats.beta.pdf(theta, alpha_prior, beta_prior)
plt.plot(theta, prior, 'k--', label='Prior (Beta(1,1))')
# Plot the posterior at each checkpoint
colors = plt.cm.viridis(np.linspace(0, 1, len(checkpoints)-1))
for i, n_flips in enumerate(checkpoints[1:]):
# Get data up to this point
data_subset = all_flips[:n_flips]
n_heads = np.sum(data_subset)
# Calculate Beta posterior parameters
alpha_posterior = alpha_prior + n_heads
beta_posterior = beta_prior + n_flips - n_heads
# Calculate posterior distribution
posterior = stats.beta.pdf(theta, alpha_posterior, beta_posterior)
# Plot the posterior
plt.plot(theta, posterior, color=colors[i],
label=f'After {n_flips} flips: {n_heads} heads')
# Calculate 95% credible interval
ci_low, ci_high = stats.beta.ppf([0.025, 0.975], alpha_posterior, beta_posterior)
# Mark the posterior mean
posterior_mean = alpha_posterior / (alpha_posterior + beta_posterior)
plt.plot(posterior_mean, stats.beta.pdf(posterior_mean, alpha_posterior, beta_posterior),
'o', color=colors[i])
# Add a vertical line for the true probability
plt.axvline(true_prob, color='r', linestyle='--', label=f'True probability: {true_prob}')
plt.title('Evolution of Posterior Distribution with More Data')
plt.xlabel('θ (Probability of Heads)')
plt.ylabel('Density')
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)
plt.show()
# Create a table showing how estimates and intervals evolve
results = []
for n_flips in checkpoints[1:]:
data_subset = all_flips[:n_flips]
n_heads = np.sum(data_subset)
# Calculate posterior parameters
alpha_posterior = alpha_prior + n_heads
beta_posterior = beta_prior + n_flips - n_heads
# Calculate point estimate and credible interval
point_estimate = alpha_posterior / (alpha_posterior + beta_posterior)
ci_low, ci_high = stats.beta.ppf([0.025, 0.975], alpha_posterior, beta_posterior)
ci_width = ci_high - ci_low
results.append({
'Flips': n_flips,
'Heads': n_heads,
'Proportion': n_heads / n_flips,
'Posterior Mean': point_estimate,
'95% CI Lower': ci_low,
'95% CI Upper': ci_high,
'CI Width': ci_width
})
results_df = pd.DataFrame(results)
print("Evolution of Estimates and Credible Intervals:")
print(results_df.to_string(index=False, float_format=lambda x: f"{x:.3f}"))
Bayesian Inference vs. Frequentist Inference
The Bayesian approach differs from traditional frequentist statistics in several key ways:
Interpretation of probability: Bayesians view probability as a degree of belief, while frequentists interpret it as a long-run frequency.
Parameter treatment: Bayesians treat parameters as random variables with distributions, while frequentists consider them fixed but unknown constants.
Incorporation of prior knowledge: Bayesians explicitly use prior distributions, while frequentists typically rely solely on the data at hand.
Uncertainty quantification: Bayesians use credible intervals (direct probability statements about parameters), while frequentists use confidence intervals (statements about the procedure, not the parameter itself).
# Compare Bayesian credible intervals with frequentist confidence intervals
np.random.seed(42)
# Generate sample data from a normal distribution
mu_true = 5 # True population mean
sigma_true = 2 # True population standard deviation
sample_size = 20
sample_data = np.random.normal(mu_true, sigma_true, sample_size)
# Calculate sample statistics
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)
# Frequentist confidence interval
confidence = 0.95
t_critical = stats.t.ppf((1 + confidence) / 2, df=sample_size-1)
margin_of_error = t_critical * (sample_std / np.sqrt(sample_size))
ci_lower = sample_mean - margin_of_error
ci_upper = sample_mean + margin_of_error
# Bayesian analysis with a weakly informative prior
# Prior: Normal distribution centered at 0 with standard deviation 10
prior_mean = 0
prior_std = 10
# Calculate posterior parameters (assuming known variance)
# For simplicity, we'll use the true variance in this example
posterior_variance = 1 / ((1 / sigma_true**2) * sample_size + (1 / prior_std**2))
posterior_mean = posterior_variance * ((sample_mean * sample_size / sigma_true**2) +
(prior_mean / prior_std**2))
posterior_std = np.sqrt(posterior_variance)
# Calculate Bayesian credible interval
credible_lower = stats.norm.ppf(0.025, posterior_mean, posterior_std)
credible_upper = stats.norm.ppf(0.975, posterior_mean, posterior_std)
# Visualize the comparison
x = np.linspace(-5, 15, 1000)
prior_pdf = stats.norm.pdf(x, prior_mean, prior_std)
likelihood_pdf = stats.norm.pdf(x, sample_mean, sigma_true / np.sqrt(sample_size))
posterior_pdf = stats.norm.pdf(x, posterior_mean, posterior_std)
plt.figure(figsize=(12, 8))
# Plot distributions
plt.plot(x, prior_pdf, 'b--', label='Prior')
plt.plot(x, likelihood_pdf, 'g-', label='Likelihood')
plt.plot(x, posterior_pdf, 'r-', label='Posterior')
# Mark the true mean
plt.axvline(mu_true, color='k', linestyle='-', label='True Mean')
# Mark the intervals
plt.axvline(ci_lower, color='g', linestyle=':', label='95% Confidence Interval')
plt.axvline(ci_upper, color='g', linestyle=':')
plt.axvline(credible_lower, color='r', linestyle=':', label='95% Credible Interval')
plt.axvline(credible_upper, color='r', linestyle=':')
# Add shading for the intervals
plt.fill_between(x, 0, likelihood_pdf, where=(x >= ci_lower) & (x <= ci_upper),
color='g', alpha=0.2)
plt.fill_between(x, 0, posterior_pdf, where=(x >= credible_lower) & (x <= credible_upper),
color='r', alpha=0.2)
plt.title('Comparison of Frequentist and Bayesian Intervals')
plt.xlabel('Parameter Value (μ)')
plt.ylabel('Density')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Print the results
print(f"True Mean: {mu_true}")
print(f"Sample Mean: {sample_mean:.3f}")
print(f"Frequentist 95% Confidence Interval: ({ci_lower:.3f}, {ci_upper:.3f})")
print(f"Bayesian Posterior Mean: {posterior_mean:.3f}")
print(f"Bayesian 95% Credible Interval: ({credible_lower:.3f}, {credible_upper:.3f})")
Bayesian Computation: From Theory to Practice
Implementing Bayesian methods often involves complex integrals that lack analytical solutions. Several computational approaches have been developed:
Conjugate priors: Using prior distributions that, when combined with certain likelihood functions, yield posteriors in the same family, allowing for analytical solutions.
Markov Chain Monte Carlo (MCMC): A family of algorithms that sample from the posterior distribution, including Metropolis-Hastings and Gibbs sampling.
Variational inference: Approximating the posterior with a simpler distribution by minimizing the Kullback-Leibler divergence.
Approximate Bayesian Computation (ABC): Simulation-based approach useful when the likelihood function is intractable.
Modern computational tools and libraries have made these methods accessible to practitioners across disciplines.
# Demonstrate MCMC sampling with PyMC3
import pymc3 as pm
import arviz as az
# Generate some data
np.random.seed(42)
true_mu = 5
true_sigma = 2
data = np.random.normal(true_mu, true_sigma, 100)
# Create a Bayesian model
with pm.Model() as model:
# Priors
mu = pm.Normal('mu', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=5)
# Likelihood
likelihood = pm.Normal('likelihood', mu=mu, sigma=sigma, observed=data)
# Sample from the posterior
trace = pm.sample(2000, tune=1000, return_inferencedata=True)
# Visualize the results
az.plot_trace(trace)
plt.tight_layout()
plt.show()
# Plot posterior distributions
az.plot_posterior(trace, var_names=['mu', 'sigma'],
ref_val=[true_mu, true_sigma])
plt.tight_layout()
plt.show()
# Summary statistics
summary = az.summary(trace)
print("MCMC Sampling Results:")
print(summary)
Applications in Signal Processing
Bayesian methods are particularly valuable in signal processing for:
Signal detection and estimation: Incorporating prior knowledge about signal characteristics to improve detection in noisy environments.
Image restoration: Using priors on image smoothness or sparsity to recover images from degraded observations.
Adaptive filtering: Updating filter coefficients based on incoming data and prior beliefs about system dynamics.
Source separation: Separating mixed signals by incorporating priors on source independence or sparsity.
# Bayesian approach to signal denoising
np.random.seed(42)
# Generate a clean signal
t = np.linspace(0, 1, 500)
true_freq = 5
true_amp = 1.5
true_phase = 0.5
clean_signal = true_amp * np.sin(2 * np.pi * true_freq * t + true_phase)
# Add noise
noise_level = 0.8
noisy_signal = clean_signal + noise_level * np.random.randn(len(t))
# Bayesian signal recovery using PyMC3
with pm.Model() as signal_model:
# Priors for signal parameters
amp = pm.HalfNormal('amplitude', sigma=3)
freq = pm.Uniform('frequency', lower=0, upper=10)
phase = pm.Uniform('phase', lower=0, upper=2*np.pi)
# Prior for noise level
sigma = pm.HalfNormal('noise_level', sigma=2)
# Expected signal given parameters
expected_signal = amp * pm.math.sin(2 * np.pi * freq * t + phase)
# Likelihood (data distribution)
likelihood = pm.Normal('likelihood', mu=expected_signal, sigma=sigma, observed=noisy_signal)
# Sample from the posterior
signal_trace = pm.sample(1000, tune=1000, return_inferencedata=True)
# Extract posterior samples
posterior_samples = az.extract(signal_trace)
amp_samples = posterior_samples.amplitude.values
freq_samples = posterior_samples.frequency.values
phase_samples = posterior_samples.phase.values
# Generate reconstructed signals from posterior samples
n_samples = 100
posterior_signals = np.zeros((n_samples, len(t)))
for i in range(n_samples):
idx = np.random.randint(0, len(amp_samples))
posterior_signals[i] = amp_samples[idx] * np.sin(2 * np.pi * freq_samples[idx] * t + phase_samples[idx])
# Calculate the mean reconstructed signal
mean_signal = np.mean(posterior_signals, axis=0)
# Visualize the results
plt.figure(figsize=(12, 8))
plt.subplot(2, 1, 1)
plt.plot(t, clean_signal, 'g-', label='Clean Signal')
plt.plot(t, noisy_signal, 'b-', alpha=0.5, label='Noisy Signal')
plt.title('Original and Noisy Signals')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.legend()
plt.grid(True)
plt.subplot(2, 1, 2)
plt.plot(t, clean_signal, 'g-', label='Clean Signal')
plt.plot(t, mean_signal, 'r-', label='Bayesian Reconstruction')
# Plot credible intervals
percentiles = np.percentile(posterior_signals, [5, 95], axis=0)
plt.fill_between(t, percentiles[0], percentiles[1], color='r', alpha=0.2, label='90% Credible Interval')
plt.title('Bayesian Signal Reconstruction')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
# Print parameter estimates
print("True Parameters:")
print(f"Amplitude: {true_amp}")
print(f"Frequency: {true_freq}")
print(f"Phase: {true_phase}")
print("nBayesian Estimates (Posterior Means):")
print(f"Amplitude: {np.mean(amp_samples):.3f}")
print(f"Frequency: {np.mean(freq_samples):.3f}")
print(f"Phase: {np.mean(phase_samples):.3f}")
Conclusion
Bayesian statistics offers a coherent framework for reasoning under uncertainty, allowing us to update our beliefs as new evidence emerges. Its explicit incorporation of prior knowledge, intuitive interpretation of results, and ability to handle complex models make it increasingly valuable in our data-rich world.
While computational challenges once limited its practical application, modern algorithms and computing power have unleashed the full potential of Bayesian methods across disciplines. From medical diagnosis to artificial intelligence, from financial forecasting to signal processing, the Bayesian approach provides powerful tools for extracting insights from data and making decisions in the face of uncertainty.
As statistician Dennis Lindley noted, “Inside every non-Bayesian, there is a Bayesian struggling to get out.” Whether you’re analyzing experimental data, developing machine learning models, or processing signals, understanding Bayesian statistics will enrich your analytical toolkit and provide a principled approach to learning from data.