* Scripts that reproduce the results of this tutorial in Julia are located here and here.

To review our extended example, we first simulated the OLS DGP, solved for the MLE of OLS, bootstrapped the MLE, and computed various non-parametric p-values.

The only part of our previous work that may have slowed down your computer was forming the 10,000 bootstrap samples (*samples*). In this post, we will see how to speed up the bootstrapping by using more than one processor to collect the bootstrap samples.

**The Estimator that We Wish to Bootstrap**

We are interested in the following MLE estimator, adapted from our previous work. First, we import the two needed packages and generate the data:

# Prepare srand(2) using Distributions using Optim using DataFrames N=1000 K=3 # Generate variables genX = MvNormal(eye(K)) X = rand(genX,N) X = X' X_noconstant = X constant = ones(N) X = [constant X] genEpsilon = Normal(0, 1) epsilon = rand(genEpsilon,N) trueParams = [0.01,0.05,0.05,0.07] Y = X*trueParams + epsilon # Export data data = DataFrame(hcat(Y,X)) names!(data,[:Y,:one,:X1,:X2,:X3]) writetable("data.csv",data)

Next, we define the likelihood of OLS:

function loglike(rho,y,x) beta = rho[1:4] sigma2 = exp(rho[5])+eps(Float64) residual = y-x*beta dist = Normal(0, sqrt(sigma2)) contributions = logpdf(dist,residual) loglikelihood = sum(contributions) return -loglikelihood end

Finally, we define the bootstrap function, which includes within it a wrapper for the likelihood:

function bootstrapSamples(B) println("hi") samples = zeros(B,5) for b=1:B theIndex = sample(1:N,N) x = X[theIndex,:] y = Y[theIndex,:] function wrapLoglike(rho) return loglike(rho,y,x) end samples[b,:] = optimize(wrapLoglike,params0,method=:cg).minimum end samples[:,5] = exp(samples[:,5]) println("bye") return samples end

The reason for the cute *println* statements will become apparent later. Save all of this in a file called “bootstrapFunction.jl”. Finally, we import the data created above:

data = readtable("data.csv") N = size(data,1) Y = array(data[:Y]) X = array(data[[:one,:X1,:X2,:X3]])

**Parallel Bootstrapping**

We now have the function, *bootstrap Samples*, saved in the file “bootstrapFunction.jl”, that we wish to run in parallel. The idea is that, instead of running the MLE times on one processor, we run it times on each of four processors, then collect the results back to the first processor. Unfortunately, Julia makes it very difficult to parallelize functions within the same file that the functions are defined; the user must explicitly tell Julia every parameter and function that is needed on each processor, which is a frustrating process of copy-pasting the command

*@everywhere*all around your code.

Instead, we separate our work into two files, the first one you have already created. Create a second file in the same location on your computer as “bootstrapFunction.jl”. In the second file, write the following commands:

addprocs(4) require("tutorial_5_bootstrapFunction.jl") B=10000 b=2500 samples_pmap = pmap(bootstrapSamples,[b,b,b,b]) samples = vcat(samples_pmap[1],samples_pmap[2],samples_pmap[3],samples_pmap[4])

The command *addprocs(4)* means to add four processors into your Julia session. The command *require* is the key to getting this right: it ensures that all of the functions and parameters contained in “bootstrapFunction.jl” are available to each processor. *pmap* is the function that actually applies *bootstrap Samples* to each value in the list

*[b,b,b,b]*. I set the list next to it to be

*[b,b,b,b]*, where

*b=2500*, so that

*bootstrapEstimates*would collect 2,500 bootstrap samples on each processor. The resulting matrices from all four processors are stored in the length-4 vector called

*samples*

*_pmap*. Finally, we vertically stack (

*vcat*) the four matrices into one matrix,

*, which is exactly the matrix of bootstrap samples we have seen previously.*

*samples*Tip: For parallel processing in Julia, separate your code into two files, one file containing the functions and parameters that need to be run in parallel, and the other file managing the processing and collecting results. Use the

requirecommand in the managing file to import the functions and parameters to all processors.

**Results**

Once you have run the outer file, you can do speed tests to compare the two ways of getting 10,000 bootstrap samples. We use the *@elapsed* command to time each code. Here are the results:

@elapsed pmap(bootstrapSamples,[b,b,b,b]) From worker 2: hi From worker 3: hi From worker 4: hi From worker 5: hi From worker 2: bye From worker 5: bye From worker 4: bye From worker 3: bye 66.528345161

Each of the processors tells us “hi” and “bye” when it begins and finishes its job, respectively. The total time was a bit more than a minute (66.5 seconds). By comparison, the following code will compute 10,000 bootstrap samples on only one processor, as in our previous work:

@elapsed bootstrapSamples(10000) hi bye 145.265103729

So, in the absence of parallel processing, the same result would require just over 145 seconds, or almost two and a half minutes. Thus, parallel processing on a four-core personal computer allowed us to reduce the time required to bootstrap our estimator by a factor of around 2.2. As a general rule, parallel processing of the bootstrap estimator saves you more time when the function being bootstrapped is more complicated.

Bradley J. Setzler

Thanks for putting these tutorial together, Brad. They’re great for boosting Julia numeric skills.

Wanted to make sure you’re aware that Julia provides good support Greek characters using Latex type tab-completion, e.g. \Sigma. The REPL and IJulia Notebook as well as several Julia editor modes support it. For functions such as your loglike(), it can boost readability for folks used to the math–

Σ=sum

function loglike(ρ,x,y)

β = ρ[1:K+1]

σ₂ = exp(ρ[K+2])

residual = y-x*β

dist = Normal(0, σ₂)

contributions = logpdf(dist,residual)

loglikelihood = Σ(contributions)

return -loglikelihood

end

Hi Bradley,

Thanks for putting this together. I’ve been trying to adapt your code for a mixed logit, but I’ve been running into some issues. The basic thought is to generate the data much like your example, but when time to simulate the integral via averaging (the mixed part of the mixed logit), to do this in parallel. It’s just a little different that your example, because I want to pass different error draws to the function rather than just run the bootstrap B/4 times. I’ve written up an example that I’ll call “mixed OLS” (using your examples), where it’s more clear where the issue I’m having is.

In my example, I do everything on the same core, but ideally I could split this and do (1/n)th of the work on each of n cores. Any help you could offer would be appreciated. Once I get this running with the simple OLS example, it’ll be trivial to make it mixed logit and I think it would make tutorial on your site.

Best,

Alex

## Add Parameters that define simulation

## Number of desired (different) replications

srand(2)

sim_reps=100;

numb_markets=1000;

numb_variables=9;

## Make the entire dataset

## Generate Data that is the same across draws (from JuliaEconomics.com)

## Number of variable

numb_variables=9;

using Distributions

genX = MvNormal(eye(numb_variables));

X = rand(genX,numb_markets);

X = X’;

constant = ones(numb_markets);

X = [constant X];

genEpsilon = Normal(0, 1)

epsilon = rand(genEpsilon,numb_markets)

trueParams = rand(numb_variables+1,1)

Y = X*trueParams + epsilon

## Generate Data that is different across draws (Error Sequence)

## The number of Draws for each rep

n=100;

## This gives the total number of draws

tot_numb_draw=sim_reps*numb_markets

## Generate the std normal error sequence

draws=rand(tot_numb_draw);

normal_draws=sqrt(2).*erfcinv(2.*draws)

## Now I want to split this into pieces

norm_error=zeros(numb_markets,sim_reps);

for i=1:sim_reps

norm_error[:,i]=[normal_draws[(i-1)*numb_markets+1:(i*numb_markets)]]

end

## OLS Function

function ols(x,y)

estimate = inv(x’*x)*(x’*y)

return estimate

end

## Estimate Basic Function Here

sim_estimates=zeros(sim_reps,size(trueParams,1)+1);

for i=1:sim_reps

sim_estimates[i,:]=transpose(ols([X norm_error[:,i]],Y))

end

## Result

mixed_ols_estimates=mean(sim_estimates,1)

Hi Bradley,

I’ve since figured out my error, I was using a bracket, where it should have been a parentheses. Argh! I’m going to work on the mixed logit example and I’ll send it to you once I’ve finished in case you want to add a variant tutorial for parallel programming.

Alex

Hi Bradley, I just wanted to leave for any future readers that using pmap also is a great and easy way to parallelize C code. I have a VFI that was taking too long in Julia, so I just made a C function does the optimization stage for each combination of states values, returning policies and optimal values with pmap. Then within Julia I check if it has converged, otherwise repeat.

Usually, it is not necessary to port part of your code to C to get speed up. You can in most cases get the same speedup within Julia with less effort.