Bootstrapping and Non-parametric p-values in Julia

* The script to reproduce the results of this tutorial in Julia is located here.

Suppose that we wish to test to see if the the parameter estimates of \beta are statistically different from zero and if the estimate of \sigma^2 is different from one for the OLS parameters defined in a previous post. Suppose further that we do not know how to compute analytically the standard errors of the MLE parameter estimates; the MLE estimates were presented in the previous post.

We decide to bootstrap by resampling cases in order to estimate the standard errors. This means that we treat the sample of N individuals as if it were a population from which we randomly draw B samples, each of size N. This produces a sample of MLEs of size B, that is, it provides an empirical approximation to the distribution of the MLE. From the empirical approximation, we can compare the full-sample point MLE to the MLE distribution under the null hypothesis.

To perform bootstrapping, we rely on Julia’s built-in sample function.


Wrapper Functions and the Likelihood of Interest

Now that we have a bootstrap index, we define the log-likelihood as a function of x and y, which are any subsets of X and Y, respectively.

function loglike(rho,y,x)
   beta = rho[1:4]
   sigma2 = exp(rho[5])
   residual = y-*(x,beta)
   dist = Normal(0, sqrt(sigma2))
   contributions = logpdf(dist,residual)
   loglikelihood = sum(contributions)
   return -loglikelihood
end

Then, if we wish to evaluate loglike across various subsets of x and y, we use what is called a wrapper, which simply creates a copy of loglike that has already set the values of x and y. For example, the following function will evaluate loglike when x=X and y=Y:

function wrapLoglike(rho)
   return loglike(rho,Y,X)
end

We do this because we want the optimizer to find the optimal \rho, holding x and y fixed, but we also want to be able to adjust x and y to suit our purposes. The wrapper function allows the user to modify x and y, but tells the optimizer not to bother them.

Tip: Use wrapper functions to manage arguments of your objective function that are not supposed to be accessed by the optimizer. Give the optimizer functions with only one argument — the parameters over which it is supposed to optimize your objective function.


Bootstrapping the OLS MLE

Now, we will use a random index, which is drawn for each b using the sample function, to take a random sample of individuals from the data, feed them into the function using a wrapper, then have the optimizer maximize the wrapper across the parameters. We repeat this process in a loop, so that we obtain the MLE for each subset. The following loop stores the MLE in each row of the matrix samples using 1,000 bootstrap samples of size one-half (M) of the available sample:

B=1000
samples = zeros(B,5)
for b=1:B
   theIndex = sample(1:N,N)
   x = X[theIndex,:]
   y = Y[theIndex,:]
   function wrapLoglike(rho)
      return loglike(rho,y,x)
   end
   samples[b,:] = optimize(wrapLoglike,params0,method=:cg).minimum
end
samples[:,5] = exp(samples[:,5])

The resulting matrix contains 1,000 samples of the MLE. As always, we must remember to exponentiate the variance estimates, because they were stored in log-units.


Bootstrapping for Non-parametric p-values

Estimates of the standard errors of the MLE estimates can be obtained by computing the standard deviation of each column,

bootstrapSE = std(samples,1)

where the number 1 indicates that the standard deviation is taken over columns (instead of rows).

Standard errors like these can be used directly for hypothesis testing under parametric assumptions. For example, if we assume an MLE is normally distributed, then we reject the null hypothesis that the parameter is equal to some point if the parameter estimate differs from the point by at least 1.96 standard errors (using that the sample is large). However, we can make fewer assumptions using non-parametric p-values. The following code creates the distribution implied by the null hypothesis that \beta_0 =0, \beta_1=0, \beta_2=0, \beta_3=0, \sigma^2=1 by subtracting the mean from each distribution (thus imposing a zero mean) and then adding 1 to the distribution of \sigma^2 (thus imposing a mean of one); this is called nullDistribution.

nullDistribution = samples
pvalues = ones(5)
for i=1:5
   nullDistribution[:,i] = nullDistribution[:,i]-mean(nullDistribution[:,i])
end
nullDistribution[:,5] = 1 + nullDistribution[:,5]

The non-parametric p-value (for two-sided hypothesis testing) is the fraction of times that the absolute value of the MLE is greater than the absolute value of the null distribution.

pvalues = [mean(abs(MLE[i]).<abs(nullDistribution[:,i])) for i=1:5]

If we are interested in one-sided hypothesis testing, the following code would test the null hypothesis \beta_0 =0 against the alternative that \beta_0>0:

pvalues = [mean(MLE[i].<nullDistribution[:,i]) for i=1:5]

Conversely, the following code would test the null hypothesis \beta_0 =0 against the alternative that \beta_0<0:

pvalues = [mean(MLE[i].>nullDistribution[:,i]) for i=1:5]

Thus, two-sided testing uses the absolute value (abs), and one-sided testing only requires that we choose the right comparison operator (.> or .<).


Results
Let the true parameters be,

julia> trueParams = [0.01,0.05,0.05,0.07]

The resulting bootstrap standard errors are,

julia> bootstrapSE = std(samples,1)
1x5 Array{Float64,2}:
 0.0308347  0.0311432  0.0313685  0.0305757  0.0208229

and the non-parametric two-sided p-value estimates are,

julia> pvalues = [mean(abs(MLE[i]).<abs(nullDistribution[:,i])) for i=1:5]
5-element Array{Any,1}:
 0.486
 0.383
 0.06 
 0.009
 0.289

Thus, we reject the null hypotheses for the third and fourth parameters only and conclude that \beta_2 \neq 0, \beta_3 \neq 0, but find insufficient evidence to reject the null hypotheses that \beta_0 =0, \beta_1 =0 and \sigma^2=1.


Bradley J. Setzler

One comment

  1. Hi Bradley:

    In the step of Bootstrapping the OLS MLE, you defined a wrapLoglike function to serve as the objective function to optimize, but I think it can be done using anonymous function as well:

    samples[b,:] = optimize(rho->loglike(rho,y,x), params0,method=:cg).minimum

    I’m curious whether it’s a bit better in speed because the original code defined a function for each loop.

    Thanks!
    Regards,
    Allen

Leave a comment