Monday, February 7, 2011

Example 8.24: MplusAutomation and Mplus

In recent entries (here, here, and here), we've been fitting a series of latent class models using SAS and R. One of the most commonly used and powerful software package for latent class model estimation is Mplus. This commercial software includes support for many features that are not presently available in R or SAS. As an example, while the randomLCA package supports data with clustering, and the poLCA package supports polytomous variables, neither package supports clustering and polytomous variables.

In this entry, we demonstrate how to use the R package MplusAutomation to automate the process of fitting and interpreting a series of models using Mplus.

The key to all this magic is the template file which is used to create the Mplus input files. Here we demonstrate automating the creation of 4 models with 1, 2, 3, and 4 latent classes, using a template file called mplus.txt.

[[init]]
iterators = classes;
classes = 1:4;
dir = "Z:/field/blog";
filename = "mplus-[[classes]]-class-.inp";
outputDirectory = [[dir]];
[[/init]]
TITLE: [[classes]]-class
DATA: FILE IS mplus.dat;
VARIABLE: NAMES ARE homeless cesdcut satreat linkstatus;
CLASSES = c ([[classes]]);
CATEGORICAL = all;
ANALYSIS: TYPE = MIXTURE;
STARTS = 2000 200;
STITERATIONS=1000;
OUTPUT: TECH1 TECH10;
SAVEDATA: FILE IS "mplus-[[classes]]-class.cprob";
SAVE IS CPROB;

The package's createModels() function will loop through the four possible numbers of classes (1 through 4) and create separate Mplus input files. Multiple iterators are supported, and they can be referenced numerically or symbolically. This can be very helpful if there are different variables being used in each of the models, or other variations in the model.

When the createModels() function is run for this example, it generates 4 files. The file mplus-1-class-.inp looks like:

TITLE: 1-class
DATA: FILE IS mplus.dat;
VARIABLE: NAMES ARE homeless cesdcut satreat linkstatus;
CLASSES = c (1);
CATEGORICAL = all;
ANALYSIS: TYPE = MIXTURE;
STARTS = 2000 200;
STITERATIONS=1000;
OUTPUT: TECH1 TECH10;
SAVEDATA: FILE IS "mplus-1-class.cprob";
SAVE IS CPROB;

We call Mplus using the runModels() function after reading in the data and writing out a dataset in Mplus format (with prepareMplusData). Then the results can be collated and displayed.

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
attach(ds)
library(MplusAutomation)
cesdcut = ifelse(cesd>20, 1, 0)
smallds = na.omit(data.frame(homeless, cesdcut,
satreat, linkstatus))
prepareMplusData(smallds, file="mplus.dat")
createModels("mplus.txt")
runModels()
summary=extractModelSummaries()
models=readModels()

We see that the three class solution has the lowest AICC, while the one class solution has the lowest aBIC.

> summary
Title AnalysisType
1 1-class MIXTURE; STARTS = 2000 200; STITERATIONS=1000
2 2-class MIXTURE; STARTS = 2000 200; STITERATIONS=1000
3 3-class MIXTURE; STARTS = 2000 200; STITERATIONS=1000
4 4-class MIXTURE; STARTS = 2000 200; STITERATIONS=1000
DataType Estimator Observations Parameters LL
1 INDIVIDUAL MLR 431 4 -1045.656
2 INDIVIDUAL MLR 431 9 -1040.513
3 INDIVIDUAL MLR 431 14 -1032.484
4 INDIVIDUAL MLR 431 19 -1032.067
LLCorrectionFactor AIC BIC aBIC Entropy
1 1.000 2099.313 2115.577 2102.883 NA
2 1.019 2099.026 2135.621 2107.060 0.349
3 1.000 2092.967 2149.893 2105.465 0.941
4 1.000 2102.134 2179.390 2119.095 0.832
AICC Filename
1 2099.407 mplus-1-class-.out
2 2099.454 mplus-2-class-.out
3 2093.977 mplus-3-class-.out
4 2103.983 mplus-4-class-.out

Additional results for each of the specific models can be found in the returned objects.

> names(models)
[1] "mplus.1.class..out" "mplus.2.class..out"
[3] "mplus.3.class..out" "mplus.4.class..out"
> names(models$mplus.1.class..out)
[1] "parameters" "savedata" "summaries"

In a future entry, we'll explore more ways to utilize the information in the Mplus output, including displaying the prevalences in each group in a graphical manner.

No comments: