A Novel Computer Vision Approach to Kinematic Analysis of Handwriting for Accessible Assessment of Neurodegenerative Diseases
A Novel Computer Vision Approach to Kinematic Analysis of Handwriting for Accessible Assessment of Neurodegenerative Diseases
Ron Nachum
Thomas Jefferson High School for Science and TechnologyThank you to Zoran Duric, Kyle Jackson, and Naomi Lynn Gerber for mentorship and assistance throughout this research work.
This paper was originally included in the 2021 print publication of the Teknos Science Journal.
Abstract
Fine motor movement is a demonstrated biomarker for many health conditions that are especially difficult to diagnose early and require sensitivity to change in order to monitor over time. This is especially true for neurodegenerative diseases (NDs), including Parkinson’s Disease (PD) and Alzheimer’s Disease (AD), which are associated with early changes in handwriting and fine motor skills. Kinematic analysis of handwriting is an emerging method for assessing fine motor movement ability, with data typically collected by digitizing tablets; however, these are often expensive, unfamiliar to patients, and are limited in the scope of collectable data. In this paper, we present a vision-based system for the capture and analysis of handwriting kinematics using a commodity camera and RGB images. We demonstrate that this approach is an accurate, accessible, and informative alternative to digitizing tablets with potential use in early disease diagnosis, treatment assessment, and long-term monitoring.
Clinical relevance
This work establishes a more accessible alternative to digitizing tablets for extracting handwriting kinematic data through processing of RGB video data captured by commodity cameras, such as those in smartphones, with computer vision and machine learning. The collected data can in turn be analyzed to objectively and quantitatively differentiate between healthy individuals and patients with NDs, including AD and PD, as well as other diseases with biomarkers displayed in fine motor movement. The developed system has many applications including providing widespread diagnostic access in low-income areas and resource-poor health systems and use as an accessible form of disease long-term monitoring through telemedicine.
Introduction
The current diagnostic process for neurodegenerative diseases (NDs), such as Alzheimer’s Disease (AD) and Parkinson’s Disease (PD), is complex and taxing on patients. The diagnostic process involves multiple specialists relying on their judgment and leveraging a variety of approaches such as mental status exams [8], cognitive assessment [16], and brain imaging [14] to build a case and rule out alternative causes for symptoms. This process is often delayed two to three years after symptom onset and takes several months to reach a conclusion [1]. Because of these barriers to diagnosis, up to 50% of patients with NDs are not diagnosed during their lifetime [3]. Even for patients who receive a diagnosis, an accurate conclusion is not guaranteed; studies have shown that the clinical diagnostic process for NDs is typically only 75-80% accurate [2].
Fine motor movement has been demonstrated as a biomarker, or measurable indicator of disease presence, for NDs, including AD and PD [9]. Quantification and kinematic analysis of fine motor movements has applications for providing diagnostic assessments, as well as change over time that can be used for longterm monitoring and treatment response [9]. Moreover, kinematic analysis of fine motor movements is applicable to assessing other health conditions with biomarkers displayed in fine motor movement, including strokes [15] and early developmental disorders [13], as well as depression and anxiety [11].
Handwriting tasks are commonly used for assessing fine motor movement ability, with specific tasks including tracing of Archimedean spirals and cursive ‘l’s and ‘e’s, as well as writing of words and short sentences [9]. During these movements, the pen’s position is tracked, which can be used to compute speed, acceleration, and jerk [9]. These kinematic features can be further analyzed to produce measures of movement fluidity and fine motor skill which can be used compare groups of people with different health conditions and as supporting information for disease state classification.
Currently, data for studies in this field are usually collected by specialized digitizing tablets [9]. These digitizing tablets are expensive and can often be inaccessible in poor-resource health systems or telemedicine settings due to their cost. Furthermore, since the use of electronic pens can be unfamiliar to patients, a time-consuming training phase must be completed to acquaint patients with their use. Digitizing tablets collect strictly pen position and pressure, and are unable to capture other available data (e.g., hand pose) that could improve diagnostic accuracy. By contrast, a computer vision system to quantify these movements offers a fast, easy-to-use, and more widely accessible screening solution due to the pervasiveness of cameras in smartphones and laptops. Furthermore, vision-based systems would be able to collect more data than just pen position, acquiring information about pen grip, arm pose, and compensatory movements with potential to diagnostic accuracy. This data could be used to augment tablet-based systems which collect accurate pressure data, or potentially to replace them as a more accessible solution.
In this work, we propose a computer vision system cameras. We tested this system’s accuracy through direct comparison to data produced by digitizing tablets during common handwriting tasks. Since commodity cameras capture frames at a lower frequency (typically 30 or 60 Hz) compared to sampling rate of digitizing tablets (typically 100 Hz), we investigated the viability of lower-frequency kinematic data for diagnostic assessments using machine learning. To do this, we downsampled the PaHaW dataset of handwriting movements captured by a digitizing tablet and trained classifiers on the resultant information to assess their accuracy.
Materials and Methods
Materials
The primary experimental objectives were assessing accuracy of extracted kinematic data from videos and classification accuracy of resultant diagnostic assessments.
To best determine accuracy and statistically assess the developed computer vision-based system for kinematic data extraction, handwriting tasks were simultaneously captured in a video format by a smartphone camera and quantified by a Wacom Intous Medium digitizing tablet. These synchronized data streams enabled the comparison of handwriting kinematics captured by the computer vision system and digitizing tablet. This system is shown in Figure 1, consisting of a digitizing tablet overlaid with a writing template and connected to a laptop as well as a smartphone on a small tripod. 214 handwriting movements were captured from a single healthy test subject to demonstrate feasibility of extracting kinematic information from videos. Measured tasks included Archimedean spiral drawing (124 videos), tracing of l’s and e’s (60 videos), and tracing of words (30 videos) on the PaHaW study writing template [6].
The PaHaW dataset consists of digitizing tablet data of 8 different handwriting tasks from 38 healthy controls (HCs) and 37 PD patients (total 75 individuals) [6]. The collected position data were utilized at the originally sampled at 100 Hz, typical of digitizing tablets, and also at downsampled frequencies of 30 and 60 Hz, which are typical of commodity cameras. The resultant kinematic data were then filtered with a Gaussian filter with a sigma value of 5.
Computer Vision Quantification of Fine Motor Movement
The computer vision data collection system consists of a few different structures to extract diagnostic information, primarily making use of a recurrent system for determining pen position. The entire computer vision system is outlined in Figure 2.
The central objective of the computer vision system for quantifying fine motor movements, in addition to producing vision-specific features, is to extract kinematic information with accuracy comparable to that collected by digitizing tablets. This requires pen tip x and y coordinates tagged with timestamps.
Preprocessing
In the preprocessing stage, the video frames are prepared for data extraction using thresholding, contour detection, and key point selection, followed by a perspective transform and capture of a pen template image. To determine the location of the paper template, the OpenCV adaptive thresholding function was used to detect lighter regions of the image [4]. OpenCV contour detection with default parameters was then applied to these thresholded frames, and the largest contour detected was chosen as that of the paper template [4]. With this contour, the OpenCV polygonal approximation method with an epsilon value of 1% of contour arc length was used to identify the 4 corners of the paper [4].
From the camera vantage point, this polygon would appear trapezoidal or irregular when in reality it is a rectangle. To correct for differences in camera perspective, OpenCV can be used to calculate a perspective transform matrix, which can then be used to transform the image into a top-down view of the rectangular paper [7]. Lastly, a template image of the pen is captured to be used for later feature matching in the coordinate extraction [4].
Coordinate Extraction
The coordinate extraction phase consists of tracking of the pen tip using perspective-transformed images of the paper template, using a recurrent approach to produce region of interests for pen tip location. Feature matching is used to determine a region of interest for the pen in each frame based on the original capture template image. The region of interest is then sharpened using OpenCV’s detail enhance method, and blurred using a median filter with a size of 11 [4]. OpenCV’s threshold is then applied to increase contrast between the pen tip and the background, followed by contour detection to outline the pen tip geometry in the image and enable precise detection of the tip [4].
With these extracted coordinate data and the known, consistent capture rate of cameras, kinematic features such as speed, acceleration, and jerk can be calculated. As the next frame is processed, the previous position of the pen and calculated kinematic information can be used to decrease the search area for the pen tip with feature matching, implementing a recurrent region of interest feature matching algorithm. This modification makes this tracking algorithm less computationally expensive and also more accurate, as it has a smaller search area and is less prone to single-frame errors caused by vision jitter and varying lighting conditions.
Comparison to Digitizing Tablet
To assess accuracy of kinematic data produced by the vision-based system, the timestamps associated with the computer vision data were matched in a pairwise fashion to digitizing tablet data with the closest timestamp. The aligned time series data were then used to calculate errors and determine accuracy of the vision-based system.
Mean absolute error (MAE) for position was calculated using the following formula across the entire length i of each time series, where (xi, yi) represent digitizing tablet coordinate data, and (x′i, yi′) represent vision-based data:
Kinematic features of speed, acceleration, and jerk were calculated using symmetrical differences using the following formulas:
Assessment of Vision-Based Data for Classification
The PaHaW dataset was used to demonstrate the potential of vision-based data in discriminative ND classification. The collected coordinate information in the dataset was downsampled from the 100 Hz collected by digitizing tablets to 30 Hz and 60 Hz, typical frame rates produced by cameras. The adjusted data were then used to calculate kinematic features, including speed, acceleration, and jerk. A total of 176 derived features were produced, including mean, minimum, maximum, standard deviation, and number of extrema for profiles of each kinematic feature during a handwriting task. These features were then tested for statistical significance using t-tests to produce the final feature set, consisting of the features with p-values less than 0.10 for each data capture rate.
An ensemble classifier, consisting of a neural network [5], support vector machine [12], and random forest [12] was trained on these data using 10-fold cross-validation [5] to prevent overfitting. Each machine learning structure casts a prediction vote for the patient, and the outcome with the most votes (PD or HC) is chosen.
Results
Computer Vision Fine Motor Kinematic Data Extraction
Quantitative comparisons of the vision-based system for quantifying fine motor kinematic data from videos to the digitizing tablet are summarized in Tables 2 and I. Most important to note are the position MAEs, which are less than 0.5 mm for both spirals (n=124) and writing (n=90). Furthermore, the speed and acceleration MAEs were under 1.1% for spiral tasks (n=124), and under 2% for handwriting tasks (n=90). Figure 3 shows a graphical comparison of these kinematic features for representative Archimedean spiral and handwriting tasks, demonstrating the nearly identical kinematic information captured by our computer vision approach compared to the digitizing tablet.
Machine Learning ND Classification with Vision Data
The ensemble learning classification system accuracy was assessed using data downsampled to three rates of capture: the tablet-collected 100 Hz, and downsampled values of 60 and 30 Hz to simulate vision-based data. The findings are shown in Table 3.
An accuracy of 74% (n=75) was achieved with the 60 Hz capable of capture by many modern, accessible vision-based systems, which is nearly identical to the 75% (n=75) achievable with 100 Hz offered by digitizing tablet data and very similar sensitivity and specificity values. Furthermore, even at a capture rate of 30 Hz, which is attainable with nearly all commodity cameras, an accuracy of 71% (n=75) was achieved in distinguishing PD patients from HCs, with slightly lower sensitivity at specificity values compared to the higher frequencies.
Discussion
The results of this study demonstrate the viability of our framework using commodity cameras, in particular those in smartphones, to accurately quantify kinematic information of fine motor movements with computer vision algorithms. The significance of this is further compounded by the accuracy achieved in classifying PD patients and HCs using data at frequencies that can be captured by commodity cameras, with accuracy rivaling current that of the current clinical diagnostic process.
The vision-based aspect of this system, in combination with modern widespread access to cameras with capability of capturing these data in mobile phones and other devices, make it a prime candidate to enable wider access to ND diagnostic screening, especially in lower-income populations and resource-poor health systems. Furthermore, the system’s at-home accessibility enhances long-term monitoring of disease state, including treatment effects, clinical deterioration, and disease progression, via telemedicine. This ease of use also allows for larger-scale data collection of handwriting movements of patients with NDs as well as HCs to develop and improve our understanding of differences between these groups and increase diagnostic accuracy.
In this paper, we have focused primarily on this system’s uses for ND diagnostic assessment. However, the framework for vision-based kinematic analysis of fine motor movement can be utilized to screen for any health conditions in which biomarkers are displayed in handwriting movements, including strokes, early developmental disorders (e.g., dysgraphia), and arthritis. An accessible and easy-to-use tool for assessing these movements is a necessary step to better understand these biomarkers’ significance in the diagnostic process, while the resultant expedited diagnostic processes have potential to improve treatment outcomes for these conditions.
Digitizing tablets are capable of collecting both pen position and pressure data. Currently, vision-based systems are unable to collect high accuracy pressure data, which has been shown to increase classification accuracy of NDs by 5-10% when combined with kinematic features [6]. However, digitizing tablets are limited in their scope of data collection with computer vision providing more types of data collection. Computer vision systems are capable of quantifying hand pose and body movements and also classifying pen grip types, which have potential to further improve diagnostic assessment accuracy and require further research to support their use. For example, hand pose and pen grip type can be quantified using Google’s MediaPipe library for hand landmark detection [11].
Conclusion
In this study, we developed an accessible, vision-based system for analyzing fine motor movements in handwriting tasks to provide ND diagnostic assessments. Our results show that accurate quantification of fine motor movement kinematic features is possible with low-cost commodity cameras. We further demonstrate that kinematic data sampled at frequencies of commodity cameras is viable for distinguishing between ND patients and HCs on the PaHaW data set, with high sensitivity and specificity achieved in diagnostic assessments. This system can be used to increase ND diagnostic access in lower-income populations and resource-poor health systems, provide a long-term disease monitoring solution through telemedicine, and offer a quantifiable tool to support clinical diagnosis of NDs.
Future work includes data collection to further test the accuracy of the vision-based system for quantifying kinematic information. Data collection would also allow for testing of the significance of vision-specific features such as pen grip and body pose during writing, and exploring the estimation of pen pressure from video data.
References
[1] Balasa, M., Gelpi, E., Antonell, A., Rey, M., Sanchez-Valle, R., Molinuevo, J., & Llado, A. (2011). Clinical features and apoe genotype of pathologically proven early-onset alzheimer disease. Neurology, 76(20), 1720–1725.
[2] Beach, T. G., Monsell, S. E., Phillips, L. E., & Kukull, W. (2012). Accuracy of the clinical diagnosis of alzheimer disease at national institute on aging alzheimer disease centers, 2005–2010. Journal of neuropathology and experimental neurology, 71(4), 266–273.
[3] Boise, L., Camicioli, R., Morgan, D. L., Rose, J. H., & Con- gleton, L. (1999). Diagnosing dementia: perspectives of primary care physicians. The Gerontologist, 39(4), 457–464.
[4] Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools.
[5] Chollet, F. (2015). Keras. GitHub. Retrieved from https://github.com/fchollet/keras (Accessed: 2021-02-17)
[6] Drota´r, P., Mekyska, J., Rektorova´, I., Masarova´, L., Sme´kal, Z., & Faundez-Zanuy, M. (2016). Evaluation of handwriting kinematics and pressure for differential diagnosis of parkinson’s disease. Artificial intelligence in Medicine, 67, 39–46.
[7] Geometric transformations of images. (n.d.). Retrieved from https://docs.opencv.org/4.5.0/da/d6e/tutorial (Accessed: 2021-02-17)
[8] Grossman, M., & Irwin, D. J. (2016). The mental sta- tus examination in patients with suspected dementia. CONTINUUM: Lifelong Learning in Neurology, 22(2 Dementia), 385.
[9] Impedovo, D., & Pirlo, G. (2018). Dynamic handwriting analysis for the assessment of neurodegenerative dis- eases: a pattern recognition perspective. IEEE reviews in biomedical engineering, 12, 209–220.
[10] Likforman-Sulem, L., Esposito, A., Faundez-Zanuy, M., Cle´menc¸on, S., & Cordasco, G. (2017). Emothaw: A novel database for emotional state recognition from handwriting and drawing. IEEE Transactions on Human-Machine Systems, 47(2), 273–284.
[11] Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Grundmann, M. (2019). Mediapipe: A framework for building perception pipelines. CoRR, abs/1906.08172. Retrieved from http://arxiv.org/abs/1906.08172
[12] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
[13] Rosenblum, S., & Livneh-Zirinski, M. (2008). Handwrit- ing process and product characteristics of children diagnosed with developmental coordination disorder. Human movement science, 27(2), 200–214.
[14] Shimizu, S., Hirose, D., Hatanaka, H., Takenoshita, N., Kaneko, Y., Ogawa, Y., Hanyu, H. (2018). Role of neuroimaging as a biomarker for neurodegenerative diseases. Frontiers in neurology, 9, 265.
[15] Simpson, B., McCluskey, A., Lannin, N., & Cordier, R. (2016). Feasibility of a home-based program to improve handwriting after stroke: a pilot study. Disability and rehabilitation, 38(7), 673–682.
[16] Wilson, R. S., Capuano, A. W., Yu, L., Yang, J., Kim, N., Leurgans, S. E., Boyle, P. A. (2018). Neu- rodegenerative disease and cognitive retest learning. Neurobiology of aging, 66, 122–130.