Logistic Regression (LR), Linear Discriminant Analysis (LDA), and
Classification and Regression Trees (CART) are common classification techniques for
prediction of group membership. Since these methods are applied for similar purposes
with different procedures, it is important to evaluate the performance of these methods
under different controlled conditions. With this information in hand, researchers can
apply the optimal method for certain conditions. Following previous research which
reported the effects of conditions such as sample size, homogeneity of variancecovariance
matrices, effect size, and predictor distributions, this research focused on
effects of correlation between predictor variables, number of the predictor variables,
number of the groups in the outcome variable, and group size ratios for the performance
of LDA, LR, and CART. Data were simulated with Monte Carlo procedures in R
statistical software and a factorial ANOVA with follow-ups was employed to evaluate the
effect of conditions on the performance of each technique as measured by proportions of
correctly predicted observations for all groups and for the smallest group.
In most of the conditions for the two outcome measures, higher performances of
CART than LDA and LR were observed. But, in some conditions where there were a
higher number of predictor variables and number of groups with low predictor variable
correlation, superiority of LR to CART was observed. Meaningful effects of methods of
correlation, number or predictor variables, group numbers and group size ratio were
observed on prediction accuracy of group membership. Effects of correlation, group size
ratio, group number, and number of predictor variables on prediction accuracies were
higher for LDA and LR than CART. For the three methods, lower correlation and greater
number of predictor variables yielded higher prediction accuracies. Having balanced data
rather than imbalanced data and greater group numbers led to lower group membership
prediction accuracies for all groups, but having more groups led to better predictions for
the small group. In general, based on these results, researchers are encouraged to apply
CART in most conditions except for the cases when there are many predictor variables
(around 10 or more) and non-binary groups with low correlations between predictor
variables, when LR might provide more accurate results. |