# Latent Class Mixture Model with Dirichlet Allocation

## Purpose of Model

The overall purpose of the model is to identify the underlying latent classes of a set of i.i.d. binomial distributed frequency counts. These frequencies could be the number of a set of i.i.d. bernouilli distributed 0-1, no-yes, false-true answers to test or questionionaire item.

The model is designed (1) to detect the number *ng* of latent classes, (2) to estimate the group-specific *ability* parameter to generate 'true' or 'yes' responses, (3) to identify for each of the* n *subjects the class *zmax[i]* with the greatest membership probability, (4) to estimate the person-specific ability parameter *theta[i]* to generate 'true' or 'yes' responses, and (5) the person-specific parameter* theta[i]* should be a *mixture* of the group-specific ability parameters *ability[g]*, where the person-specific class-membership probabilities* z[i,g]* are the *mixture coefficients*.

## BUGS-Code for Models with two or three Hypothized Latent Classes and Latent Dirichlet Allocation (LDA)

This is a code walk-through. The group-specific ability parameters *ability[g]* are sampled from a beta distribution with priors from a *beta(1,1)* which is a *uniform(0,1)*. These samples are constrained like conventional probabilities:* 0 <= ability[g] <= 1* and *sum(ability[]) = 1*.

The class membership probabilities pr_class[g] are sampled from a Dirichlet distribution with priors *alpha_g[g] = 1*. This is a multivariate generalization of the *beta(1,1)*. This means that we assume, that all membership probabilities are samled from a maximal uninformed prior.

For each person i we sample a set class membership probabilities *z[i,1:ng]* from a Dirichlet distribution with priors which are biased by the class membership probabilities *pr_class[g]* and the total number of subjects *p*.

Next we compute the person-specific ability parameter* theta[i]* as an expected value of the group-specific ability parameters *ability[g]*. *Mixture coefficients* are the person-specific class-membership probabilities *z[i,g]*. As a side effect we get the class membership *z_max[i]* for person *i* which is the group *g* whith highest *z[i,g]*.

The third last line of the program describes that the frequency count *k[i]* of person* i *is a binomial distributed variable with person-specific ability parameter *theta[i]* and *n*, which denotes the number of items.

The first data set contains the original data from Lee & Wagenmakers Exam Scores problem but one exception. The second datum is changed from 17 to 16 to make the results more concise. We assume two latent classes (ng = 2). To our surprise the ability parameters have a tendency toward the extremes (Fig.01). There is a low tendency group 1 with mean(ability[1] ) = 0.0732 and Pr(Class=1)=0.2107 and a high tendency group 2 with mean(ability[2] ) = 0.9745 and Pr(Class=2)=0.7893 (Fig.04). By mixtures both ability parameters generate the person-specific ability parameters theta[i]. Pr(Class) and all theta[i] are within our expectations. When we look at the class assignments z_max (Fig.02-03) we see that subjects 6-15 are classified without any doubt into latent class 2 (high tendency class). The classification of subjects 1-5 is done with uncertainty. All these subjects (except subject 2) have a higher classification probability for class 2 but at the same time a nonzero proability for class 1. Now we see the reason that we diminished the score of subject 2.

When we analyze the same data set under the assumption of ng=3 latent classes, the posteriori distribution of the ability parameters are bimodal with nearly the same shape (Fig.05). We suspect that the hypothesis of ng=3 should be abandoned in favor of ng=2. The class assignments z_max[i] support this conclusion by no clear class preference besides class 2.

A second data set was generated by copying the first five data, adding 10 to n and to the data of subjects 11-15. From our intuition we would expect three classes Pr(Class=1)=0.50, Pr(Class=2)=0.25, and Pr(Class=3)=0.25.The results (Fig.09-13) demonstrate the clear separation of two latent classes.

The same is true for the analysis of the same data set under the hypothesis ng=3 (Fig.14-19). The classes 2 and 3 could not be clearly discriminated. This means that subjects 1-5 and 16-20 belong to one latent class and subjects 6-15 to the other.

What has to be reflected is the Pr(Class). These results are not so clear cut. But as a summary the results are very promising.

To our knowledge this model is new. It does not need the EM-algorithm. In the next future we will study what is the relation between our model to the LDA-Topic Model (Blei, D.M.; Ng, A.Y.; Jordan, M.I.; Lafferty, J.; Latent Dirichlet Allocation, Journal of Machine Learning Research, 3 (4-5), pp. 993-1022).

## JAGS-Code for Models with two or three Hypothized Latent Classes and Latent Dirichlet Allocation (LDA)

Various authors prefer JAGS to BUGS (e.g. Kruschke, J.K., Doing Bayesian Data Analysis, 2015, 2/e, Academic Press, ISBN 978-0-12-405888-0). The main differences are described in Lunn et al. in chapter 12.6 (Lunn, D.; Jackson, Chr.; Best, N.; Thomas, A., Spiegelhalter, D.; The BUGS Book: A Practical Introduction to Bayesian Analysis, Boca Raton, FL USA: CRC Press, 2013, ISBN 978-1-58488-849-9). We wanted to explore whether there are significant differences in modelling. We found out that are subtle differences in the modelling languages. The R-integration of JAGS is better than that of OpenBUGS. Models can be described as a user-defined parameterless R-function. So all editing, debugging and testing can be done with the excellent R-IDE RStudio.

We had to revise our OpenBUGS-model in three places: (1) we replaced the *model{...}* BUGS-code-snippet by the R-code-snippet *function(){...}*; (2) because the definition of *rank(...)*-function differ in BUGS and JAGS we had to recode the computation of the class assignment *z_max[]*; (3) all data are encoded as R-statements and used as arguments in the *jags*-function of the *R2jags*-library.

The results for two latent classes and the slightly modified data of Lee & Wagenmakers (k[2]=16 instead of 17, 15 subjects, 40 items) are nearly identical to those of the OpenBugs analysis (Fig.01-04). The means of the posteriors for abilities and class probabilities differ only by 0.08.

Nearly the same is true for the slightly modified problem ((k[2]=16 instead of 17, k[16]-k[20] added, 20 subjects, 50 items) (Fig.05-08).