The decreased production of dopamine in the forebrain is believed to be the underlying cause of Parkinson's disease, a neurodegenerative disorder that affects the nervous system. Parkinson's disease is a chronic and progressive illness that may develop new symptoms over time (Nilashi et al., 2016). This occurs as neurons in the substantia nigra of the brain gradually die. People with Parkinson's disease may find it difficult to perform everyday tasks in the workplace. Although clinical evaluations consider a significant amount of data that includes various aspects, it is not always easy to determine whether a person has PD based on this data alone. Feature selection methods can help address this issue. Various techniques are being researched, developed, and evaluated for diagnosing Parkinson's disease, based on the relevant information. This study provides an overview of the use of machine learning algorithms to predict Parkinson's disease, as well as the various new technologies that have been developed and the accuracy that has been achieved.
Parkinson's disease affects a significant number of people worldwide, and the central nervous system is its primary target. Those who have been diagnosed with PD may be emotionally and physically taxing on those around them, and may experience symptoms such as depression, difficulty concentrating, painful spasms, and more. Parkinson's disease presents a wide range of clinical manifestations, including both motor and non-motor symptoms. Motor signs include hypophonic speech, stiffness, and resting tremor, while non-motor symptoms include hallucinations, depression, constipation, sleep difficulties, cognitive impairment, and impulse control issues. Non-motor symptoms are often more indicative of the disease than motor signs (Ahlrichs & Lawo, 2013; Rovini et al., 2018; Surathi et al., 2016). Medical professionals often face challenges in determining whether a patient is currently affected by Parkinson's disease or at the risk of developing it (Ene, 2008). To address this challenge, it is necessary to design and implement a computational model that can analyze, compile, and accurately predict whether a patient is likely to acquire Parkinson's disease with a suitable degree of precision. In most cases, people diagnosed with Parkinson's disease experience symptoms classified as vocal impairment, also known as dysphonia. Dysphonia is associated with several measurements, including voicerelated problems, which can be used to evaluate patients at different stages of the disease (Åström & Koker, 2011).
This research presents a survey on predictions of Parkinson's disease (PD) using machine learning and deep learning approaches that have produced effective models, highlighting the potency of these algorithms in terms of the accuracies achieved, as well as the diverse methodologies utilized (Faust et al., 2018).
It is generally accepted that speech or voice data is beneficial in diagnosing a person to an extent of 90 percent in recognizing the presence of Parkinson's disease. Individuals with PD typically struggle with their speech, which can be classified into two distinct types: hypophonia and dysarthria. Both hypophonia and dysarthria are symptoms of damage to the central nervous system. Hypophonia refers to a person having a voice that is very faint and feeble, while dysarthria refers to slow or slurred speech that is often difficult to understand. Thus, most doctors who treat Parkinson's disease patients detect dysarthria and attempt to rehabilitate patients with specific treatments to improve their ability to modulate vocal intensity (National Parkinson Foundation, n.d.).
On the basis of the various ML methods, several distinct approaches to the early identification of PD have been documented. Nevertheless, if the diagnosis and classification are not made with sufficient precision in a timely manner, it might lead to the development of additional symptoms. Various ML methods have been used to develop several distinct approaches for early identification of PD using various types of data, including brain data. However, a lack of precise and timely diagnosis and classification may lead to the development of additional symptoms. PD can be diagnosed by analyzing various types of data, including brain MRI pictures, voice data, posture images, data recorded by sensors, and handwriting data, among others. Among these factors, speech or voice data is most useful in accurately identifying PD (Delenclos et al., 2016). Ericsson et al., 2005 have developed a completely automated two-fold technique using 3D photos, which, according to their experiments has demonstrated promising results. Pitch Period Entropy (PPE), a novel measure of dysphonia that was developed by Little et al. (2009) employed a kernel support vector machine to evaluate their approach, which resulted in a classification accuracy of 91%.
Schönwieler et al. (2000) developed an alternative strategy that utilizes voice analysis with an Artificial Neural Network (ANN) and demonstrated good results. However, they noted that cost-effectiveness was a challenge. Ene (2008) proposed a neural network-based strategy that distinguished between healthy individuals and those with Parkinson's Disease (PD) using three different internal procedures.
Gil and Johnson (2009) found that reducing the number of neurons in the hidden layer of the network resulted in poor performance of both the training set and the test set. However, with a higher number of neurons, there was a significant risk of overfitting, even though the training set performed well. After experimentation, they found that 13 neurons in the hidden layer was the optimal number for their model.
Bhattacharya and Bhatia (2010) discovered variance in the ROC curve and observed that the TP and FP rates exhibit variations when the number of CV folds increases.
Åström and Koker (2011) proposed a novel approach of using parallel neural networks and recommended evaluating the results of each neural network using a decision-making process based on pre-determined rules. During the training process, the data that has not been learned by each neural network was collected and added to the training set of the subsequent neural network so that it could learn from the previous neural networks. This improves the accuracy of predictions. Chen et al. (2013) developed their FKNN-based system using a 10-fold cross-validation approach.
The study conducted by Islam et al. (2014) compared various machine learning techniques based on their performance accuracies in identifying Parkinson's disease (PD) in individuals. The researchers suggested that a new classifier could be developed to achieve higher levels of accuracy (Ahmadlou & Adeli, 2010).
Peng et al. (2016) recommended the use of computer aided analysis with imaging data. They utilized the software BrainLab to analyze the pictures, calculate the thickness of the cortex, the volume of grey matter, and the surface area of the cortex in each Region of Interest (ROI), and then presented their findings. The classification performance was significantly enhanced by the use of multilevel ROI-based features.
The proposed method, called Genetic Algorithm-Wavelet Kernel-Extreme Learning, allowed Avci and Dogantekin (2016) to achieve high levels of accuracy.
Prashanth et al. (2016) found that multimodal characteristics may be used to provide an accurate prediction of PD at an earlier stage.
Using the genetic algorithm and Principal Component Analysis (PCA) as feature selection methods, Aich et al. (2019) proposed an innovative method for pattern classification of two categories, such as PD and not PD, which increased productivity and saved time. The method involved applying seven Machine Learning (ML) algorithms for classification.
Using ResNet-50, the Optimum-Path Forest (OPF) classifier, and Bayes with Support Vector Machines (SVM), Passos et al. (2018) was able to reach a 96% identification rate. Gupta et al. (2018) used a new approach with the cuttlefish algorithm, and it was utilized for feature selection. Also, several fitness function approximations were employed to improve the cuttlefish algorithm, which is now known as the Optimized Cuttlefish Algorithm (OCFA).
The study used both decision tree and K-Nearest Neighbor classifiers and achieved an overall accuracy of 94% in identifying PD patients.
Mostafa et al. (2019) proposed the Multiple Feature Evaluation Approach (MFEA) of a multi-agent system for Parkinson's diagnosis. They implemented five Decision Tree, Random Forests, Neural Network, Naive Bayes, and Support Vector Machine, both before and after applying their approach (Liu et al., 2022). The average accuracies obtained were: 10.51% for Decision Tree, 15.22% for Naive Bayes, 9.19% for Neural Network, and 12.75% and 9.13% for Random Forests and SVM, respectively. Table 1 depicts the survey of various methodologies and their performances.
It is essential to keep in mind that, out of all the ML strategies, ANN and SVM classifier are the ones that are utilized by the majority of the recommended algorithms in order to have a fast and accurate prediction. According to the results of the study, we found that the majority of the models utilized voice/speech data in order to perform an accurate diagnosis of the condition. This is also due to the fact that a majority of therapists prefer to think of voice data as an important element.
Figure 1 represents the architecture of Artificial Neural network with an input layer, hidden layer(s) and an output layer. Number of hidden layers for each network varies from one another. Every circle in the above network represents a neuron and the inputs and corresponding weights are processed layer by layer (Zeinali & Story, 2017).
Figure 1. Architecture of Artificial Neural Network
In order to construct the neural network, the data could be in the form of text, picture or an audio file. In general, the input layer is where features of the dataset are stored; in the architecture described, each node of the input layer represents one feature.
Each input feature and its respective weight are given to each hidden layer. The weights of each feature represents the decision or prediction. The hidden layer assists in the feature extraction process by performing complex calculations on the data at each node. The nodes on the first hidden layer receive the product of the input feature and the weight value, which is then passed on as input to the subsequent hidden layers and so on. The problem and dataset play a role in determining the optimal number of hidden layers and the appropriate number of nodes in each hidden layer.
The activation functions dictate how the nodes in the output layer process the data. Common examples of these functions include tanh, sigmoid, and ReLU. When selecting an activation function, one must consider the dataset type and criteria. The output from the lowest hidden layer is used as input for the output layer, which produces the final output in the desired format.
The techniques of machine learning have been given a significant role in the medical field. The models that are built by employing ML approaches, in contrast to models generated by traditional methods, exhibit dynamic results as data is input into it. One should keep in mind that extensive and focused study is required in order to gain the knowledge necessary for identifying the ailment. Several new machine learning algorithms and methods are being presented at a rapid rate. Among these, some have shown promising results, while others have successfully proved their use in a variety of contexts. The benefit of using ML-generated models is that the more the amount of data used, the higher the precision values become, and the greater will be the degree of accuracy that can be achieved in predictions.
In this study, we have provided a comprehensive evaluation of machine learning approaches that are used for diagnosing Parkinson's disease by analyzing various types of data. Many researchers have devoted significant efforts in predicting Parkinson's disease using innovative methods. The literature survey summarizes the findings of different studies, which indicate that most of the machine learning techniques employed by the authors performed well. However, it is possible that an even more accurate classifier could be built using a unique neural network architecture combined with a specific strategy. To explore this possibility, we plan to develop an artificial neural network with multiple hidden layers and nodes in the near future and compare the accuracy of different implementations.