It is common for clinical data in survey trials to be incomplete and inconsistent for several reasons. One objective of this study is to identify and eliminate inconsistent data as an important data mining preprocessing step. We define three types of incomplete data: missing data due to skip pattern (SPMD), undetermined missing data (UMD) and genuine missing data (GMD). Identifying the type of missing data is another important objective as all missing data types cannot be treated the same. This goal cannot be achieved manually on large data of complex surveys since each subject should be processed individually. Experiments are conducted on a longitudinal questionnaire (MESA). MESA dataset was collected between 1983-1990 to create a set of questions that can reliably predict future Urinary Incontinence (UI). The analyses are accomplished in a mathematical framework by exploiting graph theoretic structure inherent in the questionnaire. An undirected graph is built using mutually inconsistent responses as well as its complement. The responses not in the largest maximal clique of complement graph are considered inconsistent. Further, all potential paths in questionnaire's graph are considered, based on responses of subjects, to identify each type of incomplete data. Once SPMD is determined, MESA data is stratified to divide the data into stratums with potentially different UI risk factors. Rough set imputation is applied, on the GMD portion of the incomplete data. ReliefF attribute selection technique and logistic regression is used to determine the potential predictive factors with their corresponding prediction probabilities forming the continence index on the preprocessed MESA data. The incomplete data analysis results show 15.4% GMD, 9.8% SPMD, 12.9% UMD and 0.021% inconsistent data. Proposed preprocessing methods are prerequisites for any data mining of clinical survey data. The predictive index can be applied for immediate screening and for predicting future urinary incontinence in older woman of comparable demographics. |