Data Mining for Healthcare Data: a Comparison of Neural Networks Algorithms

Classification has been considered as an important tool utilized for the extraction of useful information from healthcare dataset. It may be applied for recognition of disease over symptoms. This paper aims to compare and evaluate different approaches of neural networks classification algorithms for healthcare datasets. The algorithms considered here are Multilayer Perceptron, Radial Basis Function, and Voted Perceptron which are tested based on resulted classifiers accuracy, precision, mean absolute error and root mean squared error rates, and classifier training time. All the algorithms are applied for five multivariate healthcare datasets, Echocardiogram, SPECT Heart, Chronic Kidney Disease, Mammographic Mass, and EEG Eye State datasets. Among the three algorithms, this study concludes the best algorithm for the chosen datasets is Multilayer Perceptron. It achieves the highest for all performance parameters tested. It can produce high accuracy classifier model with low error rate, but suffer in training time especially of large dataset. Voted Perceptron performance is the lowest in all parameters tested. For further research, an investigation may be conducted to analyze whether the number of hidden layer in Multilayer Perceptron's architecture has a significant impact on the training time.


INTRODUCTION
The use of information technology in various fields of human life resulted in the increase of the amount of digital data. As an example, in a healthcare system, the database stores a huge amount of patients' medical records, including the results of medical examination such as x-ray and ultrasound image, and so on. On these healthcare data stored valuable knowledge such as hidden relationships and patterns which can be used to provide better diagnoses. Data mining is a tool that widely used to analyze a huge number of data, find relationships and patterns hidden inside the data, and produce valuable and useful knowledge. Combining algorithms from artificial intelligence, machine learning, statistics, and database systems, data mining provides solutions to handle the rapid growth of data. It has been used for data analysis in many fields such as financial, marketing, insurance, retail industry, education, biological, telecommunication, fraud detection intrusion detection, bioinformatics (gene finding, disease diagnosis and prognosis, protein reconstruction), healthcare, and so on. The data sources can be databases, data warehouse, and web [1]. The process of discovering valuable information from data can be automatic or semiautomatic [2]. Mining the data automatically is called clustering or unsupervised learning. Unsupervised learning means the learning process do not rely on predefined classes and class-labeled training data. It is a form of learning by observation. On the other hand, semiautomatic mining, which is called classification or supervised learning, does the 'learning by examples'. It depends on class label provided before. Classification has been considered as an important tool utilized for the extraction of useful information from medical dataset. It may be applied for recognition of disease over symptoms as well. This study was set  11 out to analyze the performance of classification techniques on healthcare dataset using Waikato Environment for Knowledge Analysis (WEKA) machine learning tools [3]. Three neural networks approaches, Radial Basis Function (RBF), Voted Perceptron (VP), and Multilayer Perceptron (MLP), was tested on five multivariate healthcare datasets taken from University of California Irvine (UCI) repository [4].

RELATED WORKS
A number of researches have been conducted working on evaluation of data mining classification techniques on healthcare data. Classification techniques were compared to find the most suitable one for predicting health issues. A research work was carried out by Venkatesan & Velmurugan, evaluated the performance of decision tree algorithms (J48, CART, ADT, and BFT) for breast cancer dataset. The experimental result shows that the highest accuracy 99% is found in J48 classifier, 96% in CART, 97% in ADT and 98% in BFT [5].
Another research work done by Rahman & Afroz, compared five different classification algorithms; J48, J48graf, Bayes Net, MLP, JRip, Fuzzy Lattice Reasoning (FLR)) for diabetes diagnosis using Pima Indian Diabetes dataset. They found the J48graft classifier is best among others, with an accuracy of 81.33% and takes 0.135 seconds for model building time [6].
Comparison of J48, Naïve Bayes (NB), and MLP algorithms on Ebola disease datasets is done by Akinola & Oyabugbe. The study was designed to determine how classification algorithms perform with the increase in dataset size, in terms of accuracy and time taken for training the dataset. The result shows, as the datasets sizes increased, the accuracy of NB reduces. J48 and MLP showed high accuracies with low data sizes. However, J48 and MLP's accuracies became stable at 100% when the datasets sizes increase. As for training time, Naïve Bayes' time complexity was the least, followed by J48 and MLP [7].
Danjuma & Osofisan applied the J48, NB, and MLP algorithms in Erythemato-squamous disease dataset from UCI repository, and evaluated their performance based on classifier's percentage of accuracy, True Positive rate (TP), and ROC area (AUC). The comparative analysis of the models shows that Naïve Bayes classifier is the highest with accuracy of 97.4%, TP of 97.5% and AUC of 99.9%. MLP classifier came out to be the second best with accuracy and TP of 96.6% and AUC of 99.8%. J48 classifier performed the worst with accuracy of 93.5%, TP of 93.6% and AUC of 96.6% [8].
Alkrimi, et.al., evaluate the RBF neural network, Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN) algorithms for classification of blood cells images. This study found, compared to SVM and k-NN, RBF gave higher classification results with accuracy of 98%. SVM came out at the second best with accuracy of 97%. k-NN performance is moderate with accuracy of 79% [9].
Amin & Habib compare of three classification algorithms, namely, J48, NB, and MLP was studied. These algorithms are evaluated based on their accuracy, Kappa statistics value, and classification time complexity. The best algorithm for hematological data is J48 with an accuracy of 97.16% and total time taken to build the classifier is at 0.03 seconds. NB classifier has the lowest average error at 29.71% compared to others [10].
Durairaj & Deepika conducted a comparative assessment of decision tree (J48), NB, and lazy classifiers to predict Leukemia Cancer. Similar to 6 and 10 , researcher analyzed the experiment results using two parameters i.e., accuracy and time. From the results it is identified that all algorithms perform well in predicting the leukemia cancer. NB has taken less time of 0.16 seconds to produce prediction model with an accuracy of 91.17%, better than the other two. J48 algorithm has only varied with the minor difference in time. The lazy classifier is the fastest (0.02 seconds) but produce classifier with less accuracy (82.35%) compared to decision tree and NB [11].
An evaluation of decision tree (J48 and LMT), Bayesian (Bayes Net and NB), neural networks (MLP and RBF) for Liver Disorder dataset were done by Barnaghi Azuraliza. They implemented percentage split as the assessment method, to observe whether the accuracy of the classifiers is affected by the size of training set. As the result, the accuracy of tested algorithms is increased fluctuated during rising of training set size. MLP, RBF, and J48 obtained the highest accuracy (79.41%) at 90-10 training size [12].
Gupta, Rawal, Narasimhan & Shiwani worked on a study aimed to compare the accuracy, sensitivity and specificity percentage of four classification algorithms; J48graft, Bayes Net, MLP, and JRip. They applied the algorithms for diabetes dataset. The result indicates that J48graft has the highest accuracy of 81.33% [13].
Kumar & Sahoo, evaluated three Bayesian algorithms (Bayes Net, NB, Naïve Bayes Updateable) along with two neural networks algorithms (MLP and VP) and J48 Decision Tree. They analyzed the classification time, Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) of two real-time multivariate healthcare datasets, Sick and Breast Cancer. It was observed that the time taken by Naïve Bayes Updateable to build the classifier is smallest for both datasets i.e. 0.03 seconds and 0.0 second whereas the time taken by MLP is the largest. On the other hand, the analysis of MAE and RMSE, the classifier formed by J48's MAE is minimum for small dataset (Breast Cancer) but not minimum for the large one (Sick). Overall, J48 is better as it classified instance more correctly as compare to the other techniques [14]. This paper has been organized with section 2 introducing the related works to this research, section 3 describing the methodology, section 4 explaining the experiment result of the three algorithms and section 5 provides the conclusion.

MATERIALS AND METHODS
The steps compose the methodology used in this research for comparing the performance of classification algorithms is shown in Fig 1. This research was conducted in four main steps which are data collection, data preprocessing, experimentation, and result analysis. Collecting the datasets needed for conducting the experiment is the first step in the methodology. Five healthcare datasets was downloaded from UCI repository, as shown in Table I. The next step is preprocessing. The datasets, except the Chronic Kidney Disease, are available in .txt format. There are several data formats available to present data on WEKA, include ARFF, CSV, C4.5, and XRRF. For the purpose of this research the ARFF format will be used. The other four need to be transformed into ARFF format. Using Ms. Excel the data were loaded and converted into CSV format. Then, they are converted into .arff file using WEKA.  The third step in the methodology is conducting the experiments. Three neural networks classification algorithms under test are RBF, VP, and MLP will be briefly discussed in this section. a. RBF. RBF is a feed-forward network comprised of two layers, not counting the input layer, and differs from a MLP in the way that the hidden units perform computations. Each hidden unit represents a particular point in input space, and its output for a given instance depends on the distance between its point and instance. The closer these two points, the stronger the output. RBF implements a Gaussian radial basis function network. The output layer of RBF is the same as MLP; it takes a linear combination of the outputs of the hidden units [2]. . VP is based on neural networks perceptron algorithm developed by Rosenblatt [15]. It works well for data that are linearly separable with large margin. The perceptron algorithm classify the data by repeatedly iterates through the training data, instance by instance, and updates the weight vector every time one the instance is misclassified based on the weights learned so far. The weight vector is updated by adding or subtracting the instance's attribute value to or from it. The final weigh vector is just the sum of the misclassified instances. The perceptron makes its predictions based on whether the total weight and corresponding attribute values of instance to be classified is greater or less than zero [2]. c. Multilayer Perceptron (MLP). MLP's architecture is characterized by the number of layers, the number of nodes in each layer, the transfer function used in each layer, and how the nodes in each layer connected to nodes in adjacent layers [15]. MLP is a feed-forward neural network based on backpropagation algorithm, with one or more hidden layers between the input and output layers. Each layer is made up of units. The inputs to the network correspond to the attributes measured for each training instance. The inputs are fed simultaneously into the units making the input layer. Then, the inputs pass through the input layer in which they are weighted and fed simultaneously to a 'neuronlike' units, called hidden layer. The output of hidden units can be input to another hidden layer. The weighted outputs of the last hidden layer are input to units making up the output layer [1]. The datasets was tested using WEKA's classifiers as shown in Table II. RBF classifier implements a normalized Gaussian radial basis function network, VP classifier implement Freund and Schapire voted perceptron algorithm, and MLP classifier uses backpropagation to classify instances [3].

RESULTS AND DISCUSSION
This section presents the resulting classification experiment using WEKA. Evaluation was conducted on five parameters i.e. percentage accuracy, precision, time taken to build the model, Mean Absolute Errors (MAE), and Root Means-Squared Errors (RMSE). MAE is a statistical measure to assess as to how far an estimate is from actual values, i.e., the average of the absolute magnitude of the individual errors. It is the sum over all the instances and their absolute error per instance divided by the number of instances in the test set with an actual class label [1,2]. RMSE is a quadratic scoring rule that measures the average magnitude of the error. It is the difference between the values predicted by a model and corresponding observed values, they are each squared and the averaged over the instances. It is considered as ideal if RMSE value is small, and MAE is smaller than RMSE.
The performance of three algorithms RBF, VP, and MLP on the five healthcare datasets are given in Table 3, 4, 5, 6, and 7, respectively for Echocardiogram, SPECT Heart, Chronic Kidney, Mammographic Mass, and EEG Eye State datasets. The comparison of algorithms on the basis of Accuracy is shown in Fig. 3 and Fig. 4 for classifiers precision. The comparison of error rate is shown in Table 8. In terms of accuracy, results show that on average the MLP classifiers achieve the highest accuracy 80.56%, followed by RBF 79.32%, and VP 72.24%. MLP performs well in three datasets, echocardiogram, chronic kidney disease, and mammographic mass. Comparison of Different Classifiers Accuracy using Different Classification Techniques VP obtains the highest accuracy for SPECT Heart dataset. As for EEG Eye State dataset, all the three algorithms achieve the lowest accuracy percentage; they are less than 50%. The experiment results also indicate that precision values represent the same type of result with accuracy. It can be seen that Fig. 3 and 4 are similar in many cases. MLP gives the highest precision values for Echocardiogram (0.878), Chronic Kidney Disease (0.998), and Mammographic Mass (0.818). VP gives the highest precision for SPECT Heart dataset (0.818). On average, the resulting classifier using MLP algorithms achieve 0.8 for precision value, followed by RBF (0.76) and VP (0.68).     Fig. 5 (a) and (b) present the performance of three neural networks classification algorithms used in the experiment, with respect to the time taken to build the classifiers for five datasets. Fig 5(a) presents the time taken to build the classifier for all algorithms, while Fig. 5(b) shows the performance of RBF and VP distinctly since they are overlapped in Fig. 5(a). In terms of time taken for building the classifier, VP takes the lowest time for SPECT Heart and EEG Eye State datasets; RBF performs better on Echocardiogram, Chronic Kidney Disease, and Mammographic Mass datasets. On average, RBF is the fastest compare to the other two. On the other hand, MLP requires the longest time for building the classifiers.

CONCLUSION
Three neural networks classification algorithms performance comparison have been tested on five healthcare datasets. After the experiment and analysis of the results, the following conclusions were drawn: 1. MLP provide better classifier for most of the datasets with average accuracy of 80.56% and average precision value of 0.8. RBF shows moderate performance with average accuracy percentage of 79.32%, average precision value of 0.76. VP has the lowest average percentage of accuracy and precision value, 72.25% and 0.68 respectively. 2. For MAE results, on average, MLP's classifier model is superior compare to the other two.
3. There is a trade-off between accuracy and classifier building time. MLP requires the longest time (in average), 5.906 seconds, for building the classifier models. The advantage of RBF observed in this study is it spent small amount of time to build the classifier models. In terms of training time, VP algorithms' is moderate, at 0.802 seconds. Overall, all the three algorithms' training time will increase as the dataset size increase.
Overall, MLP algorithm is the highest for all performance parameter tested. It can produce high accuracy classifier model but suffer in training time especially of large dataset.