Parallel Processing in Julia: Bootstrapping the MLE

* Scripts that reproduce the results of this tutorial in Julia are located here and here.

To review our extended example, we first simulated the OLS DGP, solved for the MLE of OLS, bootstrapped the MLE, and computed various non-parametric p-values.

The only part of our previous work that may have slowed down your computer was forming the 10,000 bootstrap samples (samples). In this post, we will see how to speed up the bootstrapping by using more than one processor to collect the bootstrap samples.

The Estimator that We Wish to Bootstrap

We are interested in the following MLE estimator, adapted from our previous work. First, we import the two needed packages and generate the data:

# Prepare
using Distributions
using Optim
using DataFrames

# Generate variables
genX = MvNormal(eye(K))
X = rand(genX,N)
X = X'
X_noconstant = X
constant = ones(N)
X = [constant X]
genEpsilon = Normal(0, 1)
epsilon = rand(genEpsilon,N)
trueParams = [0.01,0.05,0.05,0.07]
Y = X*trueParams + epsilon

# Export data
data = DataFrame(hcat(Y,X))

Next, we define the likelihood of OLS:

function loglike(rho,y,x)
    beta = rho[1:4]
    sigma2 = exp(rho[5])+eps(Float64)
    residual = y-x*beta
    dist = Normal(0, sqrt(sigma2))
    contributions = logpdf(dist,residual)
    loglikelihood = sum(contributions)
    return -loglikelihood

Finally, we define the bootstrap function, which includes within it a wrapper for the likelihood:

function bootstrapSamples(B)
    samples = zeros(B,5)
    for b=1:B
		theIndex = sample(1:N,N)
		x = X[theIndex,:]
		y = Y[theIndex,:]
		function wrapLoglike(rho)
			return loglike(rho,y,x)
		samples[b,:] = optimize(wrapLoglike,params0,method=:cg).minimum
	samples[:,5] = exp(samples[:,5])
    return samples

The reason for the cute println statements will become apparent later. Save all of this in a file called “bootstrapFunction.jl”. Finally, we import the data created above:

data = readtable("data.csv")
N = size(data,1)
Y = array(data[:Y])
X = array(data[[:one,:X1,:X2,:X3]])

Parallel Bootstrapping

We now have the function, bootstrapSamples, saved in the file “bootstrapFunction.jl”, that we wish to run in parallel. The idea is that, instead of running the MLE B times on one processor, we run it B/4 times on each of four processors, then collect the results back to the first processor. Unfortunately, Julia makes it very difficult to parallelize functions within the same file that the functions are defined; the user must explicitly tell Julia every parameter and function that is needed on each processor, which is a frustrating process of copy-pasting the command @everywhere all around your code.

Instead, we separate our work into two files, the first one you have already created. Create a second file in the same location on your computer as “bootstrapFunction.jl”. In the second file, write the following commands:

samples_pmap = pmap(bootstrapSamples,[b,b,b,b])
samples = vcat(samples_pmap[1],samples_pmap[2],samples_pmap[3],samples_pmap[4])

The command addprocs(4) means to add four processors into your Julia session. The command require is the key to getting this right: it ensures that all of the functions and parameters contained in “bootstrapFunction.jl” are available to each processor. pmap is the function that actually applies bootstrapSamples to each value in the list [b,b,b,b]. I set the list next to it to be [b,b,b,b], where b=2500, so that bootstrapEstimates would collect 2,500 bootstrap samples on each processor. The resulting matrices from all four processors are stored in the length-4 vector called samples_pmap. Finally, we vertically stack (vcat) the four matrices into one matrix, samples, which is exactly the matrix of bootstrap samples we have seen previously.

Tip: For parallel processing in Julia, separate your code into two files, one file containing the functions and parameters that need to be run in parallel, and the other file managing the processing and collecting results. Use the require command in the managing file to import the functions and parameters to all processors.


Once you have run the outer file, you can do speed tests to compare the two ways of getting 10,000 bootstrap samples. We use the @elapsed command to time each code. Here are the results:

@elapsed pmap(bootstrapSamples,[b,b,b,b])
	From worker 2:	hi
	From worker 3:	hi
	From worker 4:	hi
	From worker 5:	hi
	From worker 2:	bye
	From worker 5:	bye
	From worker 4:	bye
	From worker 3:	bye

Each of the processors tells us “hi” and “bye” when it begins and finishes its job, respectively. The total time was a bit more than a minute (66.5 seconds). By comparison, the following code will compute 10,000 bootstrap samples on only one processor, as in our previous work:

@elapsed bootstrapSamples(10000)

So, in the absence of parallel processing, the same result would require just over 145 seconds, or almost two and a half minutes. Thus, parallel processing on a four-core personal computer allowed us to reduce the time required to bootstrap our estimator by a factor of around 2.2. As a general rule, parallel processing of the bootstrap estimator saves you more time when the function being bootstrapped is more complicated.

Bradley J. Setzler


  1. Thanks for putting these tutorial together, Brad. They’re great for boosting Julia numeric skills.

    Wanted to make sure you’re aware that Julia provides good support Greek characters using Latex type tab-completion, e.g. \Sigma. The REPL and IJulia Notebook as well as several Julia editor modes support it. For functions such as your loglike(), it can boost readability for folks used to the math–


    function loglike(ρ,x,y)
    β = ρ[1:K+1]
    σ₂ = exp(ρ[K+2])
    residual = y-x*β
    dist = Normal(0, σ₂)
    contributions = logpdf(dist,residual)
    loglikelihood = Σ(contributions)
    return -loglikelihood

  2. Hi Bradley,

    Thanks for putting this together. I’ve been trying to adapt your code for a mixed logit, but I’ve been running into some issues. The basic thought is to generate the data much like your example, but when time to simulate the integral via averaging (the mixed part of the mixed logit), to do this in parallel. It’s just a little different that your example, because I want to pass different error draws to the function rather than just run the bootstrap B/4 times. I’ve written up an example that I’ll call “mixed OLS” (using your examples), where it’s more clear where the issue I’m having is.

    In my example, I do everything on the same core, but ideally I could split this and do (1/n)th of the work on each of n cores. Any help you could offer would be appreciated. Once I get this running with the simple OLS example, it’ll be trivial to make it mixed logit and I think it would make tutorial on your site.



    ## Add Parameters that define simulation
    ## Number of desired (different) replications


    ## Make the entire dataset
    ## Generate Data that is the same across draws (from

    ## Number of variable

    using Distributions

    genX = MvNormal(eye(numb_variables));
    X = rand(genX,numb_markets);
    X = X’;
    constant = ones(numb_markets);
    X = [constant X];
    genEpsilon = Normal(0, 1)
    epsilon = rand(genEpsilon,numb_markets)
    trueParams = rand(numb_variables+1,1)
    Y = X*trueParams + epsilon

    ## Generate Data that is different across draws (Error Sequence)

    ## The number of Draws for each rep


    ## This gives the total number of draws


    ## Generate the std normal error sequence



    ## Now I want to split this into pieces

    for i=1:sim_reps

    ## OLS Function
    function ols(x,y)
    estimate = inv(x’*x)*(x’*y)
    return estimate

    ## Estimate Basic Function Here

    for i=1:sim_reps
    sim_estimates[i,:]=transpose(ols([X norm_error[:,i]],Y))

    ## Result

    1. Hi Bradley,

      I’ve since figured out my error, I was using a bracket, where it should have been a parentheses. Argh! I’m going to work on the mixed logit example and I’ll send it to you once I’ve finished in case you want to add a variant tutorial for parallel programming.


  3. Hi Bradley, I just wanted to leave for any future readers that using pmap also is a great and easy way to parallelize C code. I have a VFI that was taking too long in Julia, so I just made a C function does the optimization stage for each combination of states values, returning policies and optimal values with pmap. Then within Julia I check if it has converged, otherwise repeat.

    1. Usually, it is not necessary to port part of your code to C to get speed up. You can in most cases get the same speedup within Julia with less effort.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s