Selection Bias Corrections in Julia, Part 1

Selection bias arises when a data sample is not a random draw from the population that it is intended to represent. This is especially problematic when the probability that a particular individual appears in the sample depends on variables that also affect the relationships we wish to study. Selection bias corrections based on models of economic behavior were pioneered by the economist James J. Heckman in his seminal 1979 paper.

For an example of selection bias, suppose we wish to study the effectiveness of a treatment (a new medicine for patients with a particular disease, a preschool curriculum for children facing particular disadvantages, etc.). A random sample is drawn from the population of interest, and the treatment is randomly assigned to a subset of this sample, with the remaining subset serving as the untreated (“control”) group. If the subsets followed instructions, then the resulting data would serve as a random draw from the data generating process that we wish to study.

However, suppose the treatment and control groups do not comply with their assignments. In particular, if only some of the treated benefit from treatment while the others are in fact harmed by treatment, then we might expect the harmed individuals to leave the study. If we accepted the resulting data as a random draw from the data generating process, it would appear that the treatment was much more successful than it actually was; an individual who benefits is more likely to be present in the data than one who does not benefit.

Conversely, if treatment were very beneficial, then some individuals in the control group may find a way to obtain treatment, possibly without our knowledge. The benefits received by the control group would make it appear that the treatment was less beneficial than it actually was; the receipt of treatment is no longer random.

In this tutorial, I present some parameterized examples of selection bias. Then, I present examples of parametric selection bias corrections, evaluating their effectiveness in recovering the data generating processes. Along the way, I demonstrate the use of the GLM package in Julia. A future tutorial demonstrates non-parametric selection bias corrections.


Example 1: Selection on a Normally-Distributed Unobservable

Suppose that we wish to study of the effect of the observable variable X_{i} on Y_i. The data generating process is given by,

Y_i=\beta_0+X_i\beta_1+\epsilon_i,

where X_i and \epsilon_i are independent in the population. Because of this independence condition, the ordinary least squares estimator would be unbiased if the data (Y,X) were drawn randomly from the population. However, suppose that the probability that individual i were in the data set \mathcal{S} were a function of X and \epsilon. For example, suppose that,

\Pr\left( i \in \mathcal{S}\Big| X_i,\epsilon_i\right)=1 if X_i > \epsilon_i,

and the probability is zero otherwise.  This selection rule ensures that, among the individuals in the data (the i in \mathcal{S}), the covariance between X and \epsilon will be positive, even though the covariance is zero in the population. When X covaries positively with \epsilon, then the OLS estimate of \beta_1 is biased upward, i.e., the OLS estimator converges to a value that is greater than \beta_1.

To see the problem, consider the following simulation of the above process in which X and \epsilon are drawn as independent standard normal random variables:

srand(2)
N = 1000
X = randn(N)
epsilon = randn(N)
beta_0 = 0.
beta_1 = 1.
Y = beta_0 + beta_1.*X + epsilon
populationData = DataFrame(Y=Y,X=X,epsilon=epsilon)
selected = X.>epsilon
sampleData = DataFrame(Y=Y[selected],X=X[selected],epsilon=epsilon[selected])

(Note: that should say “selected = X.>epsilon”, but WordPress currently has a terrible bug when using the greater-than symbol within code format.) There are 1,000 individuals in the population data, but 492 of them are selected to be included in the sample data. The covariance between X and epsilon in the population data is,

cov(populationData[:X],populationData[:epsilon])
0.00861456704877879

which is approximately zero, but in the sample data, it is,

cov(sampleData[:X],sampleData[:epsilon])
0.32121357192108513

which is approximately 0.32.

Now, we regress Y on X with the two data sets to obtain,

linreg(array(populationData[:X]),array(populationData[:Y]))
2-element Array{Float64,1}:
 -0.027204
  1.00882

linreg(array(sampleData[:X]),array(sampleData[:Y]))
2-element Array{Float64,1}:
 -0.858874
  1.4517

where, in Julia 0.3.0, the command array() replaces the old command matrix() in converting a DataFrame into a numerical Array. This simulation demonstrates severe selection bias associated with using the sample data to estimate the data generating process instead of the population data, as the true parameters, \beta_0=0,\beta_1=1, are not recovered by the sample estimator.


Correction 1: Heckman (1979)

The key insight of Heckman (1979) is that the correlation between X and \epsilon can be represented as an omitted variable in the OLS moment condition,

\mathbb{E}\left[Y_i\Big|X_i,i\in\mathcal{S}\right]=\mathbb{E}\left[\beta_0 + \beta_1X_i+\epsilon_i\Big|X_i,i\in\mathcal{S}\right]=\beta_0 + \beta_1X_i+\mathbb{E}\left[\epsilon_i\Big|X_i,i\in\mathcal{S}\right],

Furthermore, using the conditional density of the standard Normally distributed \epsilon,

\mathbb{E}\left[\epsilon_i\Big|X_i,i\in\mathcal{S}\right]=\mathbb{E}\left[\epsilon_i\Big|X_i,X_i>\epsilon_i\right]=\frac{-\phi\left(X_i\right)}{\Phi\left(X_i\right)}.

which is called the inverse Mills ratio, where \phi and \Phi are the probability and cumulative density functions of the standard normal distribution. As a result, the moment condition that holds in the sample is,

\mathbb{E}\left[Y_i\Big|X_i,i\in\mathcal{S}\right]=\beta_0 + \beta_1X_i+\frac{-\phi\left(X_i\right)}{\Phi\left(X_i\right)}

Returning to our simulation, the inverse Mills ratio is added to the sample data as,

sampleData[:invMills] = -pdf(Normal(),sampleData[:X])./cdf(Normal(),sampleData[:X])

Then, we run the regression corresponding to the sample moment condition,

linreg(array(sampleData[[:X,:invMills]]),array(sampleData[:Y]))
3-element Array{Float64,1}:
 -0.166541
  1.05648
  0.827454

We see that the estimate for \beta_1 is now 1.056, which is close to the true value of 1, compared to the non-corrected estimate of 1.452 above. Similarly, the estimate for \beta_0 has improved from -0.859 to -0.167, when the true value is 0. To see that the Heckman (1979) correction is consistent, we can increase the sample size to N=100,000, which yields the estimates,

linreg(array(sampleData[[:X,:invMills]]),array(sampleData[:Y]))
3-element Array{Float64,1}:
 -0.00417033
  1.00697
  0.991503

which are very close to the true parameter values.

Note that this analysis generalizes to the case in which X contains K variables and the selection rule is,

\delta_0 + \delta_1 X_1 + \delta_2 X_2 + \ldots + \delta_K X_k > \epsilon,

which is the case considered by Heckman (1979). The only difference is that the coefficients \delta_k must first be estimated by regressing an indicator for i \in \mathcal{S} on X, then using the fitted equation within the inverse Mills ratio. This requires that we observe X for i \notin \mathcal{S}. Probit regression is covered in a slightly different context below.

As a matter of terminology, the process of estimating \delta is called the “first stage”, and estimating \beta conditional on the estimates of \delta is called the “second stage”. When the coefficient on the inverse Mills ratio is positive, it is said that “positive selection” has occurred, with “negative selection” otherwise. Positive selection means that, without the correction, the estimate of \beta_1 would have been upward-biased, while negative selection results in a downward-biased estimate. Finally, because the selection rule is driven by an unobservable variables \epsilon, this is a case of “selection on unobservables”. In the next section we consider a case of “selection on observables”.


Example 2: Probit Selection on Observables

Suppose that we wish to know the mean and variance of Y in the population. However, our sample of Y suffers from selection bias. In particular, there is some X such that the probability of observing Y depends on X according to,

\Pr\left(i\in\mathcal{S}\Big| X_i\right)=F\left(X_i\right),

where F is some function with range [0,1]. Notice that, if Y and X were independent, then the resulting sample distribution of Y would be a random draw from the population (marginal distribution) of Y. Instead, we suppose \mathrm{Cov}\left(X,Y\right)\neq 0. For example,

srand(2)
N = 10000
populationData = DataFrame(rand(MvNormal([0,0.],[1 .5;.5 1]),N)')
names!(populationData,[:X,:Y])

mean(array(populationData),1)
1x2 Array{Float64,2}:
 -0.0281916  -0.022319

cov(array(populationData))
2x2 Array{Float64,2}:
 0.98665   0.500912
 0.500912  1.00195 

In this simulated population, the estimated mean and variance of Y are -0.022 and 1.002, and the covariance between X and Y is 0.501. Now, suppose the probability that Y_i is observed is a probit regression of X_i,

\Pr\left( i\in\mathcal{S}\Big| X_i \right) = \Phi\left( \beta_0 + \beta_1 X_i \right),

where \Phi is the CDF of the standard normal distribution. Letting D_i=1 indicate that i \in \mathcal{S}, we can generate the sample selection rule D as,

beta_0 = 0
beta_1 = 1
index = (beta_0 + beta_1*data[:X])
probability = cdf(Normal(0,1),index)
D = zeros(N)
for i=1:N
    D[i] = rand(Bernoulli(probability[i]))
end
populationData[:D] = D
sampleData = populationData
sampleData[D.==0,:Y] = NA

The sample data has missing values in place of Y_i if D_i=0. The estimated mean and variance of Y in the sample data are 0.275 (which is too large) and 0.862 (which is too small).


Correction 2: Inverse Probability Weighting

The reason for the biased estimates of the mean and variance of Y in Example 2 is sample selection on the observable X. In particular, certain values of Y are over-represented due to their relationship with X. Inverse probability weighting is a way to correct for the over-representation of certain types of individuals, where the “type” is captured by the probability of being included in the sample.

In the above simulation, conditional on appearing in the population, the probability that an individual of type X_i=1 is included in the sample is 0.841. By contrast, the probability that an individual of type X_i=0 is included in the sample is 0.5, so type X_i is over-represented by a factor of 0.841/0.5 = 1.682. If we could reduce the impact that type X_i=1 has in the computation of the mean and variance of Y by a factor of 1.682, we would alter the balance of types in the sample to match the balance of types in the population. Inverse probability weighting generalizes this logic by weighting each individual’s impact by the inverse of the probability that this individual appears in the sample.

Before we can make the correct, we must first estimate the probability of sample inclusion. This can be done by fitting the probit regression above by least-squares. For this, we use the GLM package in Julia, which can be installed the usual way with the command Pkg.add(“GLM”).

using GLM
Probit = glm(D ~ X, sampleData, Binomial(), ProbitLink())
DataFrameRegressionModel{GeneralizedLinearModel,Float64}:

Coefficients:
             Estimate Std.Error  z value Pr(>|z|)
(Intercept)  0.114665  0.148809 0.770554   0.4410
X             1.14826   0.21813  5.26414    1e-6

estProb = predict(Probit)
weights = 1./estProb[D.==1]/sum(1./estProb[D.==1])

which are the inverse probability weights needed to match the sample distribution to the population distribution.

Now, we use the inverse probability weights to correct the mean and variance estimates of Y,

correctedMean = sum(sampleData[D.==1,:Y].*weight)
-0.024566923132025013

correctedVariance = (N/(N-1))*sum((sampleData[D.==1,:Y]-correctedMean).^2.*weight)
1.0094029613131092

which are very close to the population values of -0.022319 and 1.00195. The logic here extends to the case of multivariate X, as more coefficients are added to the Probit regression. The logic also extends to other functional forms of F, for example, switching from Probit to Logit is achieved by replacing the ProbitLink() with LogitLink() in the glm() estimation above.


Example 3: Generalized Roy Model

For the final example of this tutorial, we consider a model which allows for rich, realistic economic behavior. In words, the Generalized Roy Model is the economic representation of a world in which each individual must choose between two options, where each option has its own benefits, and one of the options costs more than the other. In math notation, the first alternative, denoted D_i=1, relates the outcomes Y_i to the individual’s observable characteristics, X_i, by,

Y_{1,i} = \mu_1\left(X_i\right)+U_{1,i}.

Similarly, the second alternative, denoted D_i=0, relates Y_i to X_i, by,

Y_{0,i} = \mu_0\left(X_i\right)+U_{0,i}.

The value of Y_i that appears in our sample is thus given by,

Y_i = D_iY_{1,i} + (1-D_i)Y_{0,i}.

Finally, the value of D_i is chosen by individual i according to,

D_i=1 if Y_{1,i}-Y_{0,i}-C_i>0,

where C_i is the cost of choosing the alternative D_i=1 and is given by,

C_i = \mu_C\left(X_i,Z_i\right)+U_{C,i},

where Z contains additional characteristics of i that are not included in X.

We assume that the data only contains Y_i,D_i,X_i,Z_i; it does not contain the variables Y_{i,1},Y_{i,0},U_{i,1},U_{i,0},U_{i,C} or the functions \mu_1,\mu_0,\mu_C. Assuming that the three \mu functions follow the linear form and that the unobservables U are independent and Normally distributed, we can simulate the data generating process as,

srand(2)
N = 1000
sampleData = DataFrame(rand(MvNormal([0,0.],[1 .5; .5 1]),N)')
names!(sampleData,[:X,:Z])
U1 = rand(Normal(0,.5),N)
U0 = rand(Normal(0,.7),N)
UC = rand(Normal(0,.9),N)
betas1 = [0,1]
betas0 = [.3,.2]
betasC = [.1,.1,.1]
Y1 = betas1[1] + betas1[2].*sampleData[:X] + U1
Y0 = betas0[1] + betas0[2].*sampleData[:X] + U0
C = betasC[1] + betasC[2].*sampleData[:X] + betasC[3].*sampleData[:Z] + UC
D = Y1-Y0-C.>0
Y = D.*Y1 + (1-D).*Y0
sampleData[:D] = D
sampleData[:Y] = Y

(Note: that should say “D = Y1-Y0-C.>0″, but WordPress currently has a terrible bug when using the greater-than symbol within code format.) In this simulation, about 38% of individuals choose the alternative D_i=1. About 10% of individuals choose D_i=0 even though they receive greater benefits under D_i=1 due to the high cost C_i associated with D_i=1.


Solution 3: Heckman Correction for Generalized Roy Model

The identification of this model is attributable to Heckman and Honore (1990). Estimation proceeds in steps. The first step is to notice that the left- and right-hand terms in the following moment equation motivate a Probit regression:

\mathbb{E}\left[D_i\Big|X_i,Z_i\right]=\Pr\left(\mu_D\left(X_i,Z_i\right)>U_{D,i}\Big|X_i,Z_i\right)=\Phi\left(\mu_D\left(X_i,Z_i\right)\right),

where U_{D,i}\equiv -\left(U_{1,i}-U_{0,i}-U_{C,i}\right)/\sigma_D\sim\mathcal{N}\left(0,1\right) is the negative of the total error term arising in the equation that determines D_i above, \sigma_D \equiv \sqrt{\mathrm{Var}\left(U_{1,i}-U_{0,i}-U_{C,i}\right)}, and,

\mu_D\left(X,Z\right) \equiv \left([1,X]\beta_1 -[1,X]\beta_0 -[1,X,Z]\beta_C\right)/\sigma_D \equiv [1,X,Z]\beta_D,

In the simulation above, \beta_D = [-.4,.7,-.1]/\sqrt{.5+.7+.9}\approx[-0.276,0.483,-0.069]. We can estimate \beta_D from the Probit regression of D on X and Z.

betasD = coef(glm(D~X+Z,sampleData,Binomial(),ProbitLink()))
3-element Array{Float64,1}:
 -0.299096
  0.59392 
 -0.103155

Next, notice that,

\mathbb{E}\left[Y_i\Big|D_i=1,X_i,Z_i\right] =[1,X_i]\beta_1+\mathbb{E}\left[U_{1,i}\Big|D_i=1,X_i,Z_i\right],

where,

\mathbb{E}\left[U_{1,i}\Big|D_i=1,X_i,Z_i\right] =\mathbb{E}\left[U_{1,i}\Big|\mu_D\left(X_i,Z_i\right)>U_{i,D},X_i,Z_i\right]=\rho_1 \frac{-\phi\left(\mu_D\left(X_i,Z_i\right)\right)}{\Phi\left(\mu_D\left(X_i,Z_i\right)\right)},

which is the inverse Mills ratio again, where \rho_1 \equiv \frac{\mathrm{Cov}\left(U_1,U_D\right)}{\mathrm{Var}\left(U_D\right)}. Substituting in the estimate for \beta_D, we consistently estimate \beta_1:

fittedVals = hcat(ones(N),array(sampleData[[:X,:Z]]))*betasD

sampleData[:invMills1] = -pdf(Normal(0,1),fittedVals)./cdf(Normal(0,1),fittedVals)

correctedBetas1 = linreg(array(sampleData[D,[:X,:invMills1]]),vec(array(sampleData[D,[:Y]])))
3-element Array{Float64,1}:
  0.0653299
  0.973568 
 -0.135445 

To see how well the correction has performed, compare these estimates to the uncorrected estimates of \beta_1,

biasedBetas1 = linreg(array(sampleData[D,[:X]]),vec(array(sampleData[D,[:Y]])))
2-element Array{Float64,1}:
 0.202169
 0.927317

Similar logic allows us to estimate \beta_0:

sampleData[:invMills0] = pdf(Normal(0,1),fittedVals)./(1-cdf(Normal(0,1),fittedVals))

correctedBetas0 = linreg(array(sampleData[D.==0,[:X,:invMills0]]),vec(array(sampleData[D.==0,[:Y]])))
3-element Array{Float64,1}:
 0.340621
 0.207068
 0.323793

biasedBetas0 = linreg(array(sampleData[D.==0,[:X]]),vec(array(sampleData[D.==0,[:Y]])))
2-element Array{Float64,1}:
 0.548451
 0.295698

In summary, we can consistently estimate the benefits associated with each of two alternative choices, even though we only observe each individual in one of the alternatives, subject to heavy selection bias, by extending the logic introduced by Heckman (1979).


Bradley J. Setzler

Upgrade to Julia Version 0.3.0

The Julia developers have officially released version 0.3.0, and it is a good idea to upgrade. The key packages in Julia that we use for economics — especially DataFrames — are being rapidly developed for 0.3.0, so you will need to upgrade to take advantage of new and improved features.

It’s relatively easy to upgrade Julia. However, it appears that Julia Studio does not and will not support 0.3.0, so the bigger problem users will face is abandoning Julia Studio in favor of another environment for writing and testing codes.

There are a few candidates from which you may choose to replace Julia Studio. My recommendation is IJulia. IJulia is somewhat easy to install, easy to use, and excellent for working with graphics. A drawback of using IJulia is that it loads in your browser and, at least in my experience, it often crashes the browser (I am using Safari on a Mac), so you will need to save your work frequently.

The greater drawback is that IJulia is much more difficult to install than Julia Studio, as it requires the installation of IPython. It used to be true that a complete novice could install and use both Julia and a coding environment with a few clicks. Now, it is more involved.


How to Install Julia 0.3.0 and IJulia

Here is a (somewhat) quick guide to upgrading Julia and replacing Julia Studio with IJulia:

  1. Delete your existing Julia and Julia Studio. On a Mac, these are located within the Applications folder. These can only cause problems by leaving them on your computer. You may need to check for hidden files relating to Julia and delete those as well (this is definitely true on a Mac, as the hidden ~.julia must be found and deleted). You can find instructions specific to your operating system here under Platform Specific Instructions.
  2. Go here and download Julia version 0.3.0. Once it finishes downloading, open the file and allow it to install. On a Mac, you can find Julia-0.3.0 in Applications once it has finished installing. Open Julia by double-clicking the application. It will open a command prompt/terminal that indicates you are using Julia version 0.3.0. You have successfully upgraded Julia.
  3. To install IJulia, first you will need to install IPython here. If you are lucky, pip install works for you and you can install IPython with one command. If not, you should consider installing a Python distribution that includes IPython; the easiest option may be to install Anaconda, although this gives you much more than you need.
  4. If you have made it this far, the rest is easy. Simply open Julia, use the Julia commands Pkg.add(“IJulia”), then Pkg.build(“IJulia”), then Pkg.update().
  5. You are finished. To open IJulia, you can either use the commands using IJulia then notebook() within Julia, or within the command prompt/terminal use the command ipython notebook –profile julia.

Going Forward

Compare these moderately complicated instructions to my previous promise that anyone could have Julia and a Julia coding environment ready to use in under 5 minutes. 5 minute installation was one of my favorite aspects of Julia, and this is an unfortunate casualty of the upgrade to 0.3.0.

At the moment, it is easier to install R and R Studio than to install Julia and IJulia, unless you already have IPython on your computer. Hopefully, the Julia developer community, which is actively engaged and dedicated to the performance of this language, create a new one-click approach to installing both Julia and a Julia coding environment, which used to be available from Julia Studio. In the mean time, IJulia appears to be the best option for new users.

I hope this guide makes the transition a bit easier for you.


Bradley J. Setzler

Kalman Filter for Panel Data and MLE in Julia, Part 1

* The script to reproduce the results of this tutorial in Julia is located here.

The Kalman Filter is notoriously difficult to estimate. I think this can be attributed to the following issues:

  • It necessitates a large number of parameters that are difficult to manage.
  • It requires the sequential inversion of matrices, so we must be careful to keep the matrices invertible and monitor numerical error.
  • Most papers and Internet resources about the Kalman Filter assume that we have time-series rather than panel data, so they do not show how to adapt the parameters for panel data estimation
  • Similarly, this literature makes “macro” assumptions like initializing at the steady state rather than the “micro” approach of estimating the initial state.
  • Professional software is not well-suited to parameter estimation (more on this below).

In this tutorial, I will show how to write your own program that estimates the MLE of the most basic type of panel data Kalman Filter: time-invariant parameters and dedicated measures. It will explicitly impose assumptions to ensure identification, so that the estimates returned are uniquely determined from the data, unlike what would happen if we maximized the likelihood with the unrestricted Kalman Filter. In Part 2 of the tutorial, I generalize the estimation to permit a time-varying parameter space as well as time-varying parameter values, show how to identify the initial state, permit non-dedicated measures, and show how to “anchor” the model so that the units of the state space are determined to scale.


Kalman Filter DGP

The data generating process (DGP) corresponding to the panel data Kalman Filter is,

\theta_{i,t+1}=A\theta_{i,t}+\nu_{i,t+1},\;\nu_{i,t+1}\sim\mathcal{N}\left(0,V\right),

Y_{i,t+1}=C\theta_{i,t+1}+\omega_{i,t+1},\;\omega_{i,t+1}\sim\mathcal{N}\left(0,W\right),

\theta_{i,1} \sim \mathcal{N}\left(\mu_1,\Sigma_1 \right),

for time periods t=2,\ldots,T. We assume that Y, referred to as “the measures” or “the data”, is the only observed term in this entire system. The first equation determines the evolution of the unobserved state, \theta, and we will refer to it as the “transition equation”. The second equation determines the relationship between the state and measures or signals of the state, Y, and we will refer to it as the “measurement equation”. The shocks, v_{i,t+1} and w_{i,t+1}, are assumed to be independent across i, across t, and even across components so that V and W are diagonal.

For our purposes, the parameters of interest are typically A,V, although we generally need to identify C,W as well so that we can separate them from A,V. It is no trouble to add other observables to these equations, e.g., in optimal control theory, there is usually an observed “input”, call it u, on the right-hand side the transition equation, so that the equation takes the form,

\theta_{i,t+1}=A\theta_{i,t}+Bu_{i,t}+\nu_{i,t+1},\;\nu_{i,t+1}\sim\mathcal{N}\left(0,V\right),

but this is a detail that could be added later with little difficulty, so we omit it until Part 2 of this tutorial. Also, we will assume for now that the initial distribution of the state is characterized by \mu_1=0 and \Sigma_1=\mathbf{1}, but relax this assumption in Part 2.


Failure of Identification and Necessary Assumptions

Before simulating the DGP, notice that it is indifferent to the scale or even the sign of \theta. In particular, consider dividing \theta by a constant \alpha \neq 0, and multiplying C by \alpha. Then,

Y_{i,t+1}=\left(\alpha C\right)\frac{\theta_{i,t+1}}{\alpha}+\omega_{i,t+1}=C\theta_{i,t+1}+\omega_{i,t+1},

so \alpha cancels out and we are left with the same expression.

Since our only information about \theta comes from Y, this means that we will never be able to know C and \theta apart from one another, that is, we will never know the scale of \theta. Similarly, we cannot distinguish A and \theta. By choosing \alpha<0, we see that neither the sign nor the magnitude of A is determined by knowledge of Y. Let me emphasize this point: we can change the sign of every entry in A as we wish and get the same data. Since A is one of the parameters of interest — in fact, our desire to learn A is often the motivation for using the Kalman Filter — this is a major issue. Since \alpha can take on a continuum of values, we see that the DGP results in the same Y for a continuum of parameters \alpha A, \alpha C.

If we need to know the unique value of each parameter, we must make additional assumptions. The following assumptions, adapted from Cunha and Heckman (2008), are sufficient to ensure identification:

  1. Each of the measures (i.e., the components of Y) is a measure of only one of the factors (i.e., the components of \theta). This is the “dedicated measures” assumption, and it is equivalent to requiring each row of C to have exactly one entry that is not assumed to be zero.
  2. There are at least 3 dedicated measures for each factor. This means that each column in C must have at least three entries that are not assumed to be zero.

The first assumption can be relaxed a bit if there are more than 3 measures per factor, but we can never relax the assumption that there is at least one dedicated measure for one of the factors.

In practice, are these assumptions valid? Possibly, it depends on your faith in the measures available in your data. You must be able to confidently claim that certain measures in your data are measuring only one factor. The alternative is to abandon estimating the model. In other words, the Kalman Filter is not all-powerful — certain models simply cannot be estimated with certain data, and trying to force it to estimate a model that is not identified from your data will result in incorrect estimates.

Notice that professional Kalman Filter software does not allow one to impose these assumptions. This is one of the main motivations for this tutorial: at present, you must program the Kalman Filter yourself if you want valid estimates of the model parameters, and estimating the model parameters rather than projecting the state from given model parameters is the main purpose of the Kalman Filter in microeconomic applications.


Simulation of the Kalman Filter DGP

Before simulating the Kalman Filter DGP, we must first deal with the issue I mentioned at the beginning of this tutorial: the Kalman Filter DGP necessitates a large number of parameters that are difficult to manage. We need a function that can convert an array of parameters into the matrices A,V,C,W, including exponentiating variance terms so that when we use an optimizer later, the optimizer can try out different parameters without error. The following function, unpackParams, is my solution to this problem:

function unpackParams(params)
    place = 0
    A = reshape(params[(place+1):(place+stateDim^2)],(stateDim,stateDim))
    place += stateDim^2
    V = diagm(exp(params[(place+1):(place+stateDim)]))
    place += stateDim
    Cparams = params[(place+1):(place+stateDim*(obsDim-1))]
    C = zeros(stateDim*obsDim,stateDim)
    for j in [1:stateDim]
            C[(1+obsDim*(j-1)):obsDim*j,j] = [1,Cparams[(1+(obsDim-1)*(j-1)):(obsDim-1)*j]]
    end
    place += (obsDim-1)*stateDim
    W = diagm(exp(params[(place+1):(place+obsDim*stateDim)]))
    return [&amp;quot;A&amp;quot;=&amp;gt;A,&amp;quot;V&amp;quot;=&amp;gt;V,&amp;quot;C&amp;quot;=&amp;gt;C,&amp;quot;W&amp;quot;=&amp;gt;W]
end

where stateDim is the length of \theta and obsDim is the number of measures on each component of \theta. This structure assumes that all measures are dedicated to a single factor, and the same number of measures is available for each factor. The function can be modified as needed to account for other structures. Here is the output from using this function on a set of parameters when there are two components of \theta with 3 measures on each:

julia&amp;gt; stateDims = 2
julia&amp;gt; obsDims = 3
julia&amp;gt; params0 = [1.,.3,.3,1.,0.,0.,.5,-.5,.5,-.5,0.,0.,0.,0.,0.,0.]
julia&amp;gt; unpackParams(params0)
[&amp;quot;V&amp;quot;=&amp;gt;2x2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0,
&amp;quot;C&amp;quot;=&amp;gt;6x2 Array{Float64,2}:
  1.0   0.0
  0.5   0.0
 -0.5   0.0
  0.0   1.0
  0.0   0.5
  0.0  -0.5,
&amp;quot;A&amp;quot;=&amp;gt;2x2 Array{Float64,2}:
 1.0  0.3
 0.3  1.0,
&amp;quot;W&amp;quot;=&amp;gt;6x6 Array{Float64,2}:
 1.0  0.0  0.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0  0.0
 0.0  0.0  0.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0  0.0  1.0]

Clearly, the most difficult part to construct is C, the measurement matrix.

Now that we can convert an array of parameters into the appropriate parameter matrices, it is straight-forward to define a function that simulates the DGP of the Kalman Filter:

function KalmanDGP(params)
    # initialize data
    data = zeros(N,stateDim*obsDim*T)
    # parameters
    unpacked = unpackParams(params)
    A = unpacked[&amp;quot;A&amp;quot;]
    V = unpacked[&amp;quot;V&amp;quot;]
    C = unpacked[&amp;quot;C&amp;quot;]
    W = unpacked[&amp;quot;W&amp;quot;]
    # draw from DGP
    for i=1:N
        # data of individual i
        iData = ones(stateDim*obsDim*T)
        current_state = rand(MvNormal(reshape(init_exp,(stateDim,)),init_var))
        iData[1:stateDim*obsDim] = C*current_state + rand(MvNormal(W))
        for t=2:T
            current_state = A*current_state + rand(MvNormal(V))
            iData[((t-1)*stateDim*obsDim+1):(t*stateDim*obsDim)] = C*current_state + rand(MvNormal(W))
        end
        # add individual to data
        data[i,:] = iData
    end
    return data
end

Here is a simulated data set that satisfies the assumptions of time-invariant parameters and at least three dedicated measures (in particular, I use obsDims=4):

srand(2)
N = 1000
T = 4
stateDims = 2
obsDims = 4
init_exp = zeros(stateDim)
init_var = eye(stateDim)
data=KalmanDGP(params0)

Estimation Strategy for the Kalman Filter DGP

Now that we understand the Kalman Filter model, we can consider the Kalman Filter itself. The Kalman Filter is the estimator of what I have been calling the Kalman Filter DGP. The aim of the Kalman Filter for panel data is to estimate the mean and variance of \theta_{i,t} for each i and each t.

Estimation can be divided into two steps. Denoting \mathcal{Y}^t_i as the set of all data available at or before time t, the projection step is trivial to derive and given by:

\mathbb{E}\left[\theta_{i,t+1}\Big|\mathcal{Y}^t_i\right] = A\mathbb{E}\left[\theta_{i,t}\Big| \mathcal{Y}^t_i\right]

\mathbb{E}\left[Y_{i,t+1}\Big|\mathcal{Y}^t_i\right] = C\mathbb{E}\left[\theta_{i,t+1}\Big| \mathcal{Y}^t_i\right]

\mathrm{Var}\left[Y_{i,t+1}\Big|\mathcal{Y}^t_i\right] = A\mathbb{E}\left[\theta_{i,t}\Big| \mathcal{Y}^t_i\right]A'+V

\mathrm{Var}\left[\theta_{i,t+1}\Big|\mathcal{Y}^t_i\right] = C\mathbb{E}\left[\theta_{i,t}\Big| \mathcal{Y}^t_i\right]C'+W

The update step is:

\mathbb{E}\left[\theta_{i,t+1}\Big|\mathcal{Y}^{t+1}_i\right] = \mathbb{E}\left[\theta_{i,t+1}\Big| \mathcal{Y}^t_i\right]+K_{t+1} \tilde{Y}_{i,t+1}

\mathrm{Var}\left[\theta_{i,t+1}\Big|\mathcal{Y}^{t+1}_i\right] = \mathrm{Var}\left[\theta_{i,t+1}\Big| \mathcal{Y}^{t}_i\right]-K_{t+1} C\mathrm{Var}\left[\theta_{i,t+1}\Big| \mathcal{Y}^{t}_i\right]

where,

\tilde{Y}_{t+1}=Y_{t+1}-\mathbb{E}\left[Y_{i,t+1}\Big|\mathcal{Y}^{t}_i\right]

is the error from projecting Y_{t+1} at time t, and,

K_{t+1}=\mathrm{Var}\left[\theta_{i,t+1}\Big|\mathcal{Y}^{t}_i\right]C'\left(\mathrm{Var}\left[Y_{i,t+1}\Big|\mathcal{Y}^{t}_i\right]\right)^{-1}

is called the optimal Kalman gain and is derived as the solution to a least-squares problem.

The individual likelihood of the Kalman Filter is given by,

\mathcal{L}\left(A,V,C,W\Big|\mathcal{Y}^T_i\right)\propto\Pr\left(\mathcal{Y}^T_i\Big|A,V,C,W\right)=\Pr\left(Y^{1}_i\Big|A,V,C,W\right)\prod_{t=1}^{T-1}\Pr\left(Y^{t+1}_i\Big|\mathcal{Y}^t_i,A,V,C,W\right)

The initial conditions enter through \Pr\left(Y^1_i\Big|A,V,C,W\right). In this tutorial, we assume \Pr\left(Y^1_i\Big|A,V,C,W\right)=\Pr\left(Y^1_i\right)=\phi\left(Y_1^i\Big|0,\mathbf{1}\right),  where \phi is the multivariate normal PDF. We permit individual-specific initial conditions and unknown parameters in Part 2 of the tutorial. Furthermore, \Pr\left(Y^{t+1}_i\Big|\mathcal{Y}^t_i,A,V,C,W\right)=\phi\left(Y_i^{t+1}\big|\mathbb{E}\left[Y_{i,t+1}\Big|\mathcal{Y}^t_i\right],\mathrm{Var}\left[Y_{i,t+1}\Big|\mathcal{Y}^t_i\right]\right).

Finally, since individuals are assumed to be independent,

\mathcal{L}\left(A,V,C,W\Big|\mathcal{Y}^T\right)\propto \prod_{i=1}^N\Pr\left(Y_{i,1}\right)\prod_{t=1}^{T-1}\Pr\left(Y^{t+1}_i\Big|\mathcal{Y}^t_i,A,V,C,W\right)

Now that we have a simple expression for the likelihood, we need only convert it into a Julia code.


Estimation of the Kalman Filter in Julia

To simplify computation, we write a single function that simultaneously predicts and updates (in that order) from any time period t to the next time period t+1. It begins with the “posterior” estimates of the expected state and variance of the state at time t, that is, \mathbb{E}\left[\theta_{i,t}\Big|\mathcal{Y}^{t}_i\right],\mathrm{Var}\left[\theta_{i,t}\Big|\mathcal{Y}^{t}_i\right], uses these to predict the “prior” estimates at time t+1, that is, \mathbb{E}\left[\theta_{i,t+1}\Big|\mathcal{Y}^{t}_i\right],\mathrm{Var}\left[\theta_{i,t+1}\Big|\mathcal{Y}^{t}_i\right], uses the priors to obtain the likelihood of the new observation at time t+1, and finally uses the priors as well as the new observation at time t+1 to update to the posterior estimates at time t+1. The function, incrementKF (“a time increment of the Kalman Filter”, is as follows:

function incrementKF(params,post_exp,post_var,new_obs)
    # unpack parameters
    unpacked = unpackParams(params)
    A = unpacked[&amp;quot;A&amp;quot;]
    V = unpacked[&amp;quot;V&amp;quot;]
    C = unpacked[&amp;quot;C&amp;quot;]
    W = unpacked[&amp;quot;W&amp;quot;]
    # predict
    prior_exp = A*post_exp
    prior_var = A*post_var*A' + V
    obs_prior_exp = C*prior_exp
    obs_prior_var = C*prior_var*C' + W
    # update
    residual = new_obs - obs_prior_exp
    obs_prior_cov = prior_var*C'
    kalman_gain = obs_prior_cov*inv(obs_prior_var)
    post_exp = prior_exp + kalman_gain*residual
    post_var = prior_var - kalman_gain*obs_prior_cov'
    # step likelihood
    dist = MvNormal(reshape(obs_prior_exp,(length(obs_prior_exp),)),obs_prior_var)
    log_like = logpdf(dist,new_obs)
    return [&amp;quot;post_exp&amp;quot;=&amp;gt;post_exp,&amp;quot;post_var&amp;quot;=&amp;gt;post_var,&amp;quot;log_like&amp;quot;=&amp;gt;log_like]
end

We see that the code is quite simple, as it closely mimics the mathematical formulas above in syntax. The part of the code that does not mimic the usual mathematical syntax — preparing parameters — is handled elsewhere by unpackParams.

Now that we have the code that predicts and updates the Kalman Filter for any individual while outputting the contribution to the log-likelihood, we can iterate over time and collect all of the log-likelihood contributions from an individual’s time path:

function indivKF(params,init_exp,init_var,i)
    iData = data[i,:]
    # initialization
    post_exp = init_exp
    post_var = init_var
    init_obs = matrix([iData[obsDict[1]]])'
    dist = MvNormal(eye(length(init_obs)))
    log_like=logpdf(dist,init_obs)
    for t = 1:(T-1)
        # predict and update
        new_obs = matrix([iData[obsDict[t+1]]])'
        new_post = incrementKF(params,post_exp,post_var,new_obs)
        # replace
        post_exp = new_post[&amp;quot;post_exp&amp;quot;]
        post_var = new_post[&amp;quot;post_var&amp;quot;]
        # contribute
        log_like += new_post[&amp;quot;log_like&amp;quot;]
    end
    return log_like
end

This initializes the individual’s contribution to the likelihood using the assumed initial distribution of \mathcal{N}\left(0,\mathbf{1}\right), then proceeds to predict and update the Kalman Filter over time until all log-likelihood contributions have been collected. The only confusing part is obsDict, which is a dictionary containing the column names of the observations, which are assumed to be stored in a DataFrame called data. This strategy isn’t necessary, but it avoids a lot of complications, and also makes our example better resemble a real-world application to a data set.

Finally, we loop through each individual, summing all of their lifetime log-likelihood contributions:

function sampleKF(params,init_exp,init_var)
    log_like = 0.0
    N = size(data,1)
    for i in 1:N
        log_like += indivKF(params,init_exp,init_var,i)
    end
    neg_avg_log_like = -log_like/N
    println(&amp;quot;current average negative log-likelihood: &amp;quot;,neg_avg_log_like)
    return neg_avg_log_like[1]
end

where I like to include a print statement so that I can monitor the convergence of the likelihood. Thus, we have the panel data Kalman Filter likelihood, under the appropriate assumptions to ensure identification. Finally, we use a wrapper to prepare it for optimization:

function wrapLoglike(params)
    print(&amp;quot;current parameters: &amp;quot;,params)
    return sampleKF(params,init_exp,init_var)
end

where I have included a print statement so that I can monitor the parameters that the optimizer is testing and know in advance if the estimates are diverging (if it were diverging, I would know my code has a bug and could stop the optimizer before it wastes too much time).

We conclude by estimating the Kalman Filter MLE on simulated data.


Results

First, we simulate data from KalmanDGP:

srand(2)
N = 1000
T = 4
stateDim = 2
obsDim = 3
init_exp = zeros(stateDim)
init_var = eye(stateDim)
params0 = [1.,0.,0.,1.,0.,0.,.5,-.5,.5,-.5,0.,0.,0.,0.,0.,0.]

data=KalmanDGP(params0)
data = DataFrame(data)
colnames!(data,[&amp;quot;one_1&amp;quot;,&amp;quot;one_2&amp;quot;,&amp;quot;one_3&amp;quot;,&amp;quot;one_4&amp;quot;,&amp;quot;one_5&amp;quot;,&amp;quot;one_6&amp;quot;,&amp;quot;two_1&amp;quot;,&amp;quot;two_2&amp;quot;,&amp;quot;two_3&amp;quot;,&amp;quot;two_4&amp;quot;,&amp;quot;two_5&amp;quot;,&amp;quot;two_6&amp;quot;,&amp;quot;three_1&amp;quot;,&amp;quot;three_2&amp;quot;,&amp;quot;three_3&amp;quot;,&amp;quot;three_4&amp;quot;,&amp;quot;three_5&amp;quot;,&amp;quot;three_6&amp;quot;,&amp;quot;four_1&amp;quot;,&amp;quot;four_2&amp;quot;,&amp;quot;four_3&amp;quot;,&amp;quot;four_4&amp;quot;,&amp;quot;four_5&amp;quot;,&amp;quot;four_6&amp;quot;,&amp;quot;outcome&amp;quot;])
obsDict = {[&amp;quot;one_1&amp;quot;,&amp;quot;one_2&amp;quot;,&amp;quot;one_3&amp;quot;,&amp;quot;one_4&amp;quot;,&amp;quot;one_5&amp;quot;,&amp;quot;one_6&amp;quot;],[&amp;quot;two_1&amp;quot;,&amp;quot;two_2&amp;quot;,&amp;quot;two_3&amp;quot;,&amp;quot;two_4&amp;quot;,&amp;quot;two_5&amp;quot;,&amp;quot;two_6&amp;quot;],[&amp;quot;three_1&amp;quot;,&amp;quot;three_2&amp;quot;,&amp;quot;three_3&amp;quot;,&amp;quot;three_4&amp;quot;,&amp;quot;three_5&amp;quot;,&amp;quot;three_6&amp;quot;],[&amp;quot;four_1&amp;quot;,&amp;quot;four_2&amp;quot;,&amp;quot;four_3&amp;quot;,&amp;quot;four_4&amp;quot;,&amp;quot;four_5&amp;quot;,&amp;quot;four_6&amp;quot;]}

Everything here is the same as in the DGP code above, except that we are converting the data matrix into a DataFrame. The columns are named using the syntax time_number, so that three_4 means the 4th observation in time period 3. It creates obsDict such that, for example, data[obsDict[3]] would return the variables from three_1 to three_6 – all of the observations at time 3 in the correct order.

Now, we are ready to estimate the Kalman Filter as follows:

MLE = optimize(wrapLoglike,params0,method=:cg,ftol=1e-8)

When it is finished (and we have watched it converge using the print statements), we can view the parameters in the appropriate context by:

julia&amp;gt; unpackParams(MLE.minimum)
[&amp;quot;V&amp;quot;=&amp;gt;2x2 Array{Float64,2}:
 1.11323  0.0
 0.0      0.918925,
&amp;quot;C&amp;quot;=&amp;gt;6x2 Array{Float64,2}:
  1.0        0.0
  0.489902   0.0
 -0.490378   0.0
  0.0        1.0
  0.0        0.528178
  0.0       -0.517231,
&amp;quot;A&amp;quot;=&amp;gt;2x2 Array{Float64,2}:
 0.976649    -0.00976326
 0.00762499   0.994169  ,
&amp;quot;W&amp;quot;=&amp;gt;6x6 Array{Float64,2}:
 0.977998  0.0      0.0     0.0       0.0      0.0
 0.0       0.97972  0.0     0.0       0.0      0.0
 0.0       0.0      1.0241  0.0       0.0      0.0
 0.0       0.0      0.0     0.990665  0.0      0.0
 0.0       0.0      0.0     0.0       1.00625  0.0
 0.0       0.0      0.0     0.0       0.0      1.01042]

This compares favorably to the true parameters,

julia&amp;gt; unpackParams(params0)
[&amp;quot;V&amp;quot;=&amp;gt;2x2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0,
&amp;quot;C&amp;quot;=&amp;gt;6x2 Array{Float64,2}:
  1.0   0.0
  0.5   0.0
 -0.5   0.0
  0.0   1.0
  0.0   0.5
  0.0  -0.5,
&amp;quot;A&amp;quot;=&amp;gt;2x2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0,
&amp;quot;W&amp;quot;=&amp;gt;6x6 Array{Float64,2}:
 1.0  0.0  0.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0  0.0
 0.0  0.0  0.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0  0.0  1.0]

On a single processor of my personal laptop, optimization requires 446 seconds. In response to a reader comment, I also added this version of the code that achieves optimization in 393 seconds by removing all global variables.


Bradley J. Setzler

Announcements

  • Please visit JuliaBloggers.com, in which JuliaEconomics.com is an active participant, to keep track of the latest news, tips, and advances in programming with Julia.
  • JuliaEconomics.com is featured in this week’s edition of Data Science Weekly.
  • For those visiting this site from Chicago, there will be an all-day Julia programming event on June 28 at Ida Noyes Hall on campus at the University of Chicago.
  • There is now a dedicated Github repository that reproduces all of the tutorials on this site.
  • For those in social sciences at the University of Chicago, the manager of our Acropolis cluster has completed my request to install Julia.
  • I have responded to all of the comments made on this website. Please provide additional comments to let me know how the tutorials are working for you, how I can make them better, and any tutorials you would like added.

Bradley J. Setzler

 

REVISITED: Julia vs Python Speed Comparison: Bootstrapping the OLS MLE

I originally switched to Julia because Julia was estimating a complicated MLE about 100-times faster than Python. Yesterday, I demonstrated how to bootstrap the OLS MLE in parallel using Julia. I presented the amount of time required on my laptop to bootstrap 1,000 times: about 21.3 seconds on a single processor, 8.7 seconds using four processors.

For comparison, I translated this code into Python, using only NumPy and SciPy for the calculations, and Multiprocessing for the parallelization. The Python script is available here. For this relatively simple script, I find that Python requires 110.9 seconds on a single processor, 66.0 seconds on four processors.

Thus, Julia performed more than 5-times faster than Python on a single processor, and about 7.5-times faster on four processors.

I also considered increasing the number of bootstrap samples from 1,000 to 10,000. Julia requires 211 seconds on a single processor and 90 seconds on four processors. Python requires 1135 seconds on a single processor and 598 seconds on four processors. Thus, even as the size of the task became greater, Julia remained more than 5-times faster on one processor and around 7-times faster on four processors.

In this simple case, Julia is between 5- and 7.5-times faster than Python, depending on configuration.


Bradley J. Setzler

Parallel Processing in Julia: Bootstrapping the MLE

* Scripts that reproduce the results of this tutorial in Julia are located here and here.

To review our extended example, we first simulated the OLS DGP, solved for the MLE of OLS, bootstrapped the MLE, and computed various non-parametric p-values.

The only part of our previous work that may have slowed down your computer was forming the 1,000 bootstrap samples (samples). In this post, we will see how to speed up the bootstrapping by using more than one processor on our computers.


The Estimator that We Wish to Bootstrap

We are interested in the following MLE estimator, adapted from our previous work. The only difference is that I will include more independent variables in X by setting K=10 instead of 4, knowing that this will slow down our computers. I have to adjust trueParams and some other parts of the code to make room for the additional independent variables. First, we import the two needed packages and generate the data:

using Distributions
using Optim
N=1000
K=10
genX = MvNormal(eye(K))
X = rand(genX,N)
X = X'
constant = ones(N)
X = [constant X]
genEpsilon = Normal(0, 1)
epsilon = rand(genEpsilon,N)
trueParams = [-K/2:K/2]*.02
Y = X*trueParams + epsilon

Next, we define the likelihood of OLS:

function loglike(rho,x,y)
   beta = rho[1:K+1]
   sigma2 = exp(rho[K+2])
   residual = y-x*beta
   dist = Normal(0, sigma2)
   contributions = logpdf(dist,residual)
   loglikelihood = sum(contributions)
   return -loglikelihood
end

Finally, we define the bootstrap function, which includes within it a wrapper for the likelihood:

function bootstrapSamples(B)
   println("hi")
   M=convert(Int,floor(N/2))
   samples = zeros(B,K+2)
   for b=1:B
      theIndex = randperm(N)[1:M]
      x = X[theIndex,:]
      y = Y[theIndex,:]
      function wrapLoglike(rho)
         return loglike(rho,x,y)
      end
      samples[b,:] = optimize(wrapLoglike,params0,method=:cg).minimum
   end
   samples[:,K+2] = exp(samples[:,K+2])
   println("bye")
   return samples
end

The reason for the cute println statements will become apparent later. Save all of this in a file called “bootstrapFunction.jl”.


Parallel Bootstrapping

We now have the function, bootstrapSamples, saved in the file “bootstrapFunction.jl”, that we wish to run in parallel. The idea is that, instead of running the MLE B times on one processor, we run it B/4 times on each of four processors, then collect the results back to the first processor. Unfortunately, Julia makes it very difficult to parallelize functions within the same file that the functions are defined; the user must explicitly tell Julia every parameter and function that is needed on each processor, which is a frustrating process of copy-pasting the command @everywhere all around your code.

Instead, we separate our work into two files, the first one you have already created. Create a second file in the same location on your computer as “bootstrapFunction.jl”. In the second file, write the following commands:

srand(2)
addprocs(4)
require("bootstrapFunction.jl")
B=1000
b=250
samples_pmap = pmap(bootstrapSamples,[b,b,b,b])
samples = vcat(samples_pmap[1],samples_pmap[2],samples_pmap[3],samples_pmap[4])

The command addprocs(4) means to add four processors into your Julia session. The command require is the key to getting this right: it ensures that all of the functions and parameters contained in “bootstrapFunction.jl” are available to each processor. pmap is the function that actually applies bootstrapSamples to each value in the list [b,b,b,b]. I set the list next to it to be [b,b,b,b], where b=250, so that bootstrapEstimates would collect 250 bootstrap samples on each processor. The resulting matrices from all four processors are stored in the length-4 vector called samples_pmap. Finally, we vertically stack (vcat) the four matrices into one matrix, samples, which is exactly the matrix of bootstrap samples we have seen previously.

Tip: For parallel processing in Julia, separate your code into two files, one file containing the functions and parameters that need to be run in parallel, and the other file managing the processing and collecting results. Use the require command in the managing file to import the functions and parameters to all processors.


Results

Once you have run the outer file, you can do speed tests to compare the two ways of getting 1,000 bootstrap samples. We use the @elapsed command to time each code. Here are the results:

julia> @elapsed pmap(bootstrapSamples,[b,b,b,b])
From worker 2: hi
From worker 3: hi
From worker 4: hi
From worker 5: hi
From worker 5: bye
From worker 2: bye
From worker 4: bye
From worker 3: bye
8.687256164

Each of the processors tells us “hi” and “bye” when it begins and finishes its job, respectively. The total time was just under 9 seconds. By comparison, the following code will compute 1,000 bootstrap samples on only one processor, as in our previous work:

julia> @elapsed bootstrapSamples(B)
hi
bye
21.335191688

So, in the absence of parallel processing, the same result would require just over 21 seconds. Thus, parallel processing on a four-core personal computer allowed us to reduce the time required to bootstrap our estimator by a factor greater than 3.


Bradley J. Setzler

Julia Attracts an International Audience

I created this site 2 days ago to help economists learn Julia. I am pleasantly surprised that so many share this interest. Below, see the number of views of this site over these 2 days by country. Hopefully, all of this momentum in favor of Julia will speed its accumulation of useful packages and documentation.

Please, do not hesitate to comment or email me with ideas to improve the codes I post or ideas of worthwhile codes to work on next. Thank you for visiting this site.


JuliaEconomics_Stats_1JuliaEconomics_Stats_2

 


Bradley J. Setzler