Tez No İndirme Tez Künye Durumu
602221
Performance evaluation of logistic regression, linear discriminant analysis, and classification and regression trees under controlled conditions /
Yazar:CAHİT POLAT
Danışman: DR. KATHY GREEN
Yer Bilgisi: University of Denver / Yurtdışı Enstitü
Konu:Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol = Computer Engineering and Computer Science and Control ; İstatistik = Statistics
Dizin:
Onaylandı
Doktora
İngilizce
2018
173 s.
Logistic Regression (LR), Linear Discriminant Analysis (LDA), and Classification and Regression Trees (CART) are common classification techniques for prediction of group membership. Since these methods are applied for similar purposes with different procedures, it is important to evaluate the performance of these methods under different controlled conditions. With this information in hand, researchers can apply the optimal method for certain conditions. Following previous research which reported the effects of conditions such as sample size, homogeneity of variancecovariance matrices, effect size, and predictor distributions, this research focused on effects of correlation between predictor variables, number of the predictor variables, number of the groups in the outcome variable, and group size ratios for the performance of LDA, LR, and CART. Data were simulated with Monte Carlo procedures in R statistical software and a factorial ANOVA with follow-ups was employed to evaluate the effect of conditions on the performance of each technique as measured by proportions of correctly predicted observations for all groups and for the smallest group. In most of the conditions for the two outcome measures, higher performances of CART than LDA and LR were observed. But, in some conditions where there were a higher number of predictor variables and number of groups with low predictor variable correlation, superiority of LR to CART was observed. Meaningful effects of methods of correlation, number or predictor variables, group numbers and group size ratio were observed on prediction accuracy of group membership. Effects of correlation, group size ratio, group number, and number of predictor variables on prediction accuracies were higher for LDA and LR than CART. For the three methods, lower correlation and greater number of predictor variables yielded higher prediction accuracies. Having balanced data rather than imbalanced data and greater group numbers led to lower group membership prediction accuracies for all groups, but having more groups led to better predictions for the small group. In general, based on these results, researchers are encouraged to apply CART in most conditions except for the cases when there are many predictor variables (around 10 or more) and non-binary groups with low correlations between predictor variables, when LR might provide more accurate results.