Lab 6   Direct simulations,  Monte Carlo integration , Importance Sampling
 

 1. Empirical discrete approximation to a probability distribution

Empiricals for an Exponential distribution
> set.seed(1345)

> y <- rexp(10000,4)      # 10000 samples from a Exp(4) distribution
> hist(y, prob=T)         # plot of the empirical distribution

Compare to the true density function

> yrange <- seq( min(y), max(y), length=30 )  # a sequence covering the range of y
> lines(yrange, dexp(yrange, 4))

> mean(y)              #large sample mean 
[1] 0.2518447

> var(y)               #large sample variance 
[1] 0.06292401

> median(y)            # sample median
[1] 0.1765852
> qexp(.5,4)           # 50% quantile
[1] 0.1732868

> mean( y<=1 )         # empirical prob that y<=1
[1] 0.9815
> pexp( 1,4)           # large sample prob
[1] 0.9816844


2. Direct simulation  (notes in  P28 )
Random samples from the Dirichlet distribution:

> rdirichlet <- function(n,p) {
  mat <- matrix(NA, n,length(p))
   for (i in 1:length(p)) {
    mat[,i]<- rgamma(n,p[i],1);
   }
  mat <- mat/ apply(mat,1,sum);
  mat;
}
 

Samples from the Posterior distribution for (theta1, theta2, theta3) given X:
> thetas <- rdirichlet(10000, c(728,584,174) )

> hist(thetas[,1], nclass=50)  # approximate to marginal posterior distribution: theta1 | X
> hist(thetas[,2], nclass=50)  # approximate to marginal posterior distribution: theta2 | X

> hist(thetas[,1]-thetas[,2], nclass=50) # approximate to posterior dist: theta1-theta2| X

> quantile(thetas[,1]-thetas[,2] , c(.025,.975) ) 
      2.5%      97.5% 
0.04885152 0.14394429
3. Order statistics

Expected value and variance of the 5th order statistics from a sample of size 25 from standard
normal distribution.

> sim.n = matrix( rnorm(250000),  25, 10000 )
> sim.ord= apply(sim.n, 2, sort)

> mean(sim.ord[5, ])

4. Tail probability of Cauchy distribution

Int(2, infinity) 1/(pi (1+x^2)) dx

> sim.1= rcauchy(10000)
> sum(sim.1 > 2)/10000       # or,
> mean(sim.1> 2)
[1] 0.1435

1/2 -Int(0,2) 1/(pi (1+x^2)) dx

> sim.2 = runif( 10000, 0, 2)
> 0.5- mean(2/ (pi*(1+sim.2^2) ) )
[1] 0.1468432
> var(2/ (pi*(1+sim.2^2) ) )
[1] 0.02871329

Int(0,1/2) y^{-2}/(pi (1+y^{-2}) dy

> sim.3= runif(10000, 0, 0.5)
> mean( sim.3^(-2) / (2*pi*(1+sim.3^(-2))  ) )
  [1] 0.14752
> var( sim.3^(-2) / (2*pi*(1+sim.3^(-2))  ) )
[1] 9.532248e-05

5. Normalizing constant (Importance sampling)

For X ~N(0,1), the normalizing constant is 1/sqrt(2 pi) = 0.3989423

> x.sim= rnorm(10000,0,1)

> den = function(x){
  exp(-x^2/2)
}

> 1/mean( den(x.sim)/ dnorm(x.sim,0,1) )



Your turn

1. Suppose that the posterior distribution [p | X]  is Beta (6,5)  for the unknown proportion
of defective items produced by a firm.

i) First, draw 10000 variables from the posterior distribution and make a histogram. Compare
it with the true probability density function.

ii) Sometimes the quantity such as 'odds (ratio)' defined as p/(1-p) is more convenient to work
with than p. However, the exact density for odds ratio is hard to derive by hand.

Using the variables drawn in i),  plot the empirical distribution of the odds to approximate
the true density. ( Use the option 'nclass=20' for better plot)

iii) Find the approximate expectation and standard deviation of the odds.

iv) Approximate the probability that  odds is greater than, or equal to 1. Compare it with the
 approximate probability that p is greater than, or equal to .5.

v) Using appropriate quantiles, give the approximate 90% quantile intervals for odds.
 

2. Performances of the Monte Carlo estimation

i)  Compute the probability : P(Z =<3)- P(Z<1), where Z ~N(0,1).

ii) The above probability can be expressed as an integral with a standard normal  probability
density function where limits are 1 to 3. Generate 10000 values from the standard normal
distribution and approximate the integral by the Monte Carlo method.

Evaluate the quality of the approximation.

iii) Draw 10000 uniform random variables and approximate the integral (probability) by  the
Monte Carlo method.  Evaluate the quality of the approximation.

3.
i) Draw 100000 realizations from a uniform(0,1). Then,  using the probability integral
transformation, approximate the Exponential (2) density function. Compare it with the true
probability density function.

       Note that here we use the density f(x)= 2 exp(-2x) for X ~ Exponential (2) as in R

ii) Consider the integral :   Int (from 0 to infinity)  4 x^2 sin (pi x) exp(-2x) dx

  Draw 100000 values from Exp(2) and approximate the value of the  integral above.

iii) Approximate the integral using the uniform variables drawn in i)
 

4. Let X be a random variable with the probability distribution proportional to

(2+ sin^2 (x) )  exp( -(3+ cos^3(x) ) x ) for x >= 0.

i) Estimate the normalizing constant using the exponential (0.1) random variable.

ii) Estimate the normalizing constant using the exponential (2) random variable.