Predicting Medical Conditions with Data: Promising Model if Privacy is Protected
A tweet from @AbbieCitron brought me to the Medical News Today post Electronic Medical Records Could Help Predict Domestic Abuse. The article discusses forecasting patients' risks by using electronic medical records. Specifically, the article deals with domestic abuse screening or predictions.
Dr Ben Reis of the Children’s Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Children’s Hospital Boston; and Harvard Medical School, co-authored the study, Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study. The study concluded,
Commonly available longitudinal diagnostic data can be useful for predicting a patient’s future risk of receiving a diagnosis of abuse. This modelling approach could serve as the basis for an early warning system to help doctors identify high risk patients for further screening.
The study pointed out the emphasis would not be on diagnosing, but instead, identifying, high risk patients and suggest it might work as follows:
A patient’s longitudinal medical history accumulates over time inside an electronic health record system. Whenever new information is recorded for the patient, the intelligent histories model re-analyses the information accumulated to date to estimate the patient’s risk of receiving a future diagnosis of abuse. The patient’s physician is notified if the patient is at high risk of abuse. The physician uses the visualisation to quickly review the patient’s past diagnoses and identify important long term trends in the patient’s history. The risk estimate, together with the high level view of the patient’s diagnostic history, enables the physician to make a better informed decision about whether to proceed with further screening of the patient. In this way, the intelligent histories model could improve screening by helping physicians to identify high risk patients who might otherwise be missed.
The study notes the possibilities ahead and that,
vast quantities of longitudinal data accumulating in electronic health information systems present an untapped opportunity for improving medical screening and diagnosis.
While I agree the opportunities for using this information are impressive, the consequences of exposing peoples' personal medical privacy must also be factored into and protected against. Here, the study discussed the anonymous data collected and analyzed,
...longitudinal diagnostic histories of patients aged over 18 who had at least four years between their earliest and latest diagnoses recorded in an anonymised state-wide claims database covering six years of admissions to hospital, stays at hospitals for observation, and emergency department encounters. Some 561, 216 patients met the inclusion criteria, having a total of 16,785,977 diagnoses among them.
On the privacy front, as data collection and modeling uses increase, the risk of removing the anonymous element of the database increases. If patients are being tracked based on the time between visits or (geographic factors) and this becomes part of the model, then when does the data begin to point to a smaller subset of identifiable people?
In social networks, for instance, mathematical models are available that can lead to identifying an individual based on his or her network. The math applies despite the information being observed. For example, an article by Kun Liu and Evimaria Terzi, Toward Identity anonnmizton on graphs, (Abstract) states,
The proliferation of network data in various application domains has raised privacy concerns for the individuals involved. Recent studies show that simply removing the identities of the nodes before publishing the graph/social network data does not guarantee privacy. The structure of the graph itself, and in its basic form the degree of the nodes, can be revealing the identities of individuals.
Given the data involved in medical histories, diagnoses, and geographical variables, any future models would need to account for privacy concerns as well as the model's predictive usefulness. This isn't an impossibility, but simply a consideration in designing future tools. Our future may soon hold the opportunity for robust predictive modeling and DNA data to be coupled in useful ways. This type of endeavor, however, must be advanced in ways that protect individuals and their privacy rights. In addition, and as the study also suggests, I agree these tools must be used to assist us in evaluating risks, and not become an ends in themselves or be used to label individuals based solely on predictions.