Integrating data science into the world of healthcare

by Steven Feng, economics and global health technologies sophomore at Rice University, Houston (29.7° N, 95.3° W)


The buzzword “data” has recently become commonplace in the academic, professional, and everyday world, and discussions on data and its applications are ongoing and striking. In the current age of technology, where everything is integrated with some sort of hardware or computer program, avoiding data generation and collection is impossible. Every time a student receives an email, a parent buys a toy for their child on Amazon, or a researcher narrows their search criteria on an academic database, data is being generated and collected.

Though this data is inherently meaningless, professionals and academics can use computational and statistical techniques to analyze and create meaningful and significant conclusions from data; the study and practice of these techniques is known as data science. In the words of Pierre Elias, a cardiology fellow at Columbia University who also conducts research in data science, data science is the combination of three disciplines—computer science, statistics, and content expertise (the particular subject matter a specific data scientist is interested or knowledgeable in). According to Elias, successful data scientists are not necessarily experts in all three fields; most have a unique balance between the three which allows them to collaborate successfully with other data scientists who have slightly different skill sets.

Currently, professionals in nearly every field are integrating data science into their practice to optimize or modify current methods. Two commonly cited examples of data science are within marketing and transportation. Companies like Facebook and Google are notorious for compiling and analyzing user data to personalize advertisements for individual users and oftentimes are under scrutiny for the sly ways they collect and access this data. On the other hand, transportation companies such as Uber and Tesla use spatial and geographical data to reroute drivers based on lower driving times or mileage or to power self-driving cars.

Many other fields have been using data science for their benefit for a long period of time yet do not receive as much media attention as the others. One such field is health informatics, the collection and study of patient and clinical data in healthcare. The healthcare world spans multiple subdisciplines, including pharmacy, private healthcare institutions, academic institutions, and insurance, and within these disciplines are great amounts of data. There are substantial problems that can be improved on by the analysis of said data using data science techniques. Integrating technology into the long-standing world of healthcare has been a work in progress, and recent developments in the field of health informatics prove how much untapped potential still lies in the exploitation of health data.

Professionals in health informatics are primarily concerned with optimizing or overhauling existing healthcare infrastructure because errors in healthcare practices and physical capital (that is, machines and hardware used in healthcare, ranging from fax machines to MRI scanners) that create inefficiencies can be addressed with or remedied by health informatic techniques. Both Elias and Julian Yao, senior director of strategic initiatives at Covera Health, a startup that uses data science to improve patient care, expressed dissatisfaction toward the current state of physical infrastructure in healthcare. Yao likened the industry as a whole to an operation stuck in the 1970s, and Elias commented on this issue further by mentioning how he still receives large data sets through a fax machine and sometimes has to comb through the documents manually in order to extract relevant data. In this example, data science techniques can completely obsolete the existing capital by providing a streamlined method for practitioners to compile, organize, and send data electronically.

Another important source of inefficiency in healthcare is misdiagnosis, which may be avoided with more advanced data science practices. When doctors diagnose a patient’s symptoms, their reasoning can possibly be based on false or misleading data points; according to Elias, most patients do not describe their symptoms in enough detail, which can make a diagnosis less accurate. Even with adequate patient description, however, misdiagnosis can still happen: advanced images such as MRIs are often inherently difficult to interpret, and human error by doctors is always possible. Covera Health studied the differences in diagnoses from different practitioners by sending one patient with lower back pain to 10 professionals in the greater New York area. Shockingly, not one diagnosis appeared on all 10 reports, and according to Yao, “if you take the two most extreme reports and put them side-by-side, they don’t even look like the same patient.” A misdiagnosis can cost a patient valuable time and money, and it can lead to further medical complications if the patient undergoes treatment for a condition they do not have. It can also damage a doctor’s reputation and place them under considerable legal and financial pressure.

Though healthcare informatics is not the be-all-end-all, proper data science and machine learning techniques can significantly alleviate these problems. Computers, for example, can be trained to scrape, or extract, data from files. To do this, a data scientist would first design an algorithm telling a machine what sort of information to look for and then train the machine by feeding it data and, to put it simply, tell it what is right and wrong. For Elias, a machine that automatically compiles patient data is a considerable upgrade from his method of receiving faxes and then extracting data by hand.

Machines that can diagnose illnesses are a trending research interest amongst practitioners and data scientists alike. Following the same machine learning principles described above, data scientists can train machines to analyze MRIs and other images for symptoms by “feeding” the machine examples of MRIs where symptoms are present or absent. With enough data points, the machine can train itself to detect symptoms from new images. This automatic process can greatly improve the issue of misdiagnosis if the technology is trained properly and thus is able to detect conditions with any given MRI. There are constantly new developments with machines that can assist practitioners in diagnosing conditions. At Covera Health, for example, Yao and his team specifically tackle misdiagnosis in radiology by amassing clinical data and then analyzing it not only to improve diagnostic accuracy but also to ensure patients get the optimal care in order to improve outcomes. In addition, Elias mentioned developments in machine sensors to better interpret images from echocardiograms, and researchers at Stanford have developed an algorithm known as HeadXNet to detect brain aneurysms through MRIs. Groups of data scientists leverage that same core trifecta of computer science, statistics, and content expertise to effect life-saving changes in an industry long due for a technical overhaul.

New developments in healthcare informatics will take some time, given how arduous and time-consuming the process to gather data, develop algorithms, and train machines is. In the meantime, both Yao and Elias offered a common piece of advice to undergraduate students: learn data science. As more tasks are automated, data science is becoming more and more relevant and intertwined into every professional field, and the value of a data science background cannot be understated. Even still, data science will never be a one-stop-shop to solve all of the world’s problems but rather an important method in doing so. Elias stressed how data science is not a silver bullet destined to fix everything. Strong and reliable data science applications are on the way, however, and developments are only getting better and better.

Further Reading:

Longitude.site welcomes applications from students who are interested to explore other topics related to data science and healthcare. Apply here.