What is data science, and what does it have to do with nursing? University at Buffalo School of Nursing Assistant Professor and researcher Suzanne Sullivan explains how nurse scientists can use large, complex data sets to advance knowledge and improve patient outcomes.
BY SUZANNE SULLIVAN, PHD, MBA, RN, CHPN | DECEMBER 19, 2018
The science of analyzing data has become necessary (and possible) due to the unprecedented capacity, storage, and processing speed of modern computers. This “data deluge” has transformed traditional approaches to scientific inquiry, as our capacity to acquire data has outpaced our ability to process and understand it.1
Building upon the first three paradigms of science (experimental, theoretical, and computational), a fourth paradigm is emerging from the necessity to be able to analyze massive datasets and derive knowledge from them.2 Data science approaches, therefore, help data scientists in identifying patterns in massive datasets (big data) to answer questions such as, “What is odd about my data?” (anomaly detection) or “What will happen next?” (predictive analytics). For example, credit card companies use data science approaches to detect fraudulent activities by identifying anomalies in spending patterns, and health care providers can use data science to identify patients who are at risk for experiencing an adverse health outcome such as a catheter-associated urinary tract infection.
A paradigm is a shared understanding, or a set of concepts or thought patterns, that inform the way scientists work within a particular field of research.
Data, made up of symbols or characters, are assumed to be facts that form the basis of reasoning or calculation.3 Thus, the term “big data” is used to describe enormous datasets that contain a wide variety of discrete data objects collected from a diverse sources such as wearable devices (think smart watches), geospatial data such as satellite imagery, or Google search histories that can be used to understand the world around us.
How big is “big data,” you ask? To give some perspective, in 2013 there were 4.4 zettabytes of data worldwide. By 2020, it is estimated that number will grow to 44.4 Considering that one zettabyte is equivalent to 1 trillion gigabytes (1021),5 it is easy to see that the term “big data” refers to a number so large that it is truly impossible to comprehend.
Data science is often characterized by “the 5-V’s”: volume (size), velocity (speed of generation), variety (structured, unstructured), veracity (quality), and value (knowledge discovered).6,7 Data science, therefore, is a term that is used to describe the process of asking questions, analyzing, and manipulating large datasets to discover patterns and derive knowledge by using a variety of mathematical models and methods.
Data analytics is the application of this knowledge to solve real-world problems. For example, a data scientist may ask, “What products do my customers typically purchase together?” The data analyst could use this knowledge to identify specific customers who are likely to respond to targeted marketing campaigns.
Once described by the Harvard Business Review as the “sexiest job of the 21st century,”8 the data scientist is an elusive term describing a person whose job it is to understand, process, extract value, visualize, and communicate knowledge from exponentially large datasets.9 Data scientists, therefore, require the skills of a hacker, statistician, and content expert.9 It is said that a data scientist knows more statistics than a computer scientist and more computer science than a statistician.
Nurses are playing an increasingly important role in using data science approaches to solve health care problems. In a recent review of the literature,10 nurses used data science to discover knowledge and predict and evaluate outcomes of nursing care interventions. For instance, nurses used data science to discover knowledge in large datasets (typically Electronic Health Record data) to determine how large datasets can be used to discover new meaning by discovering patterns, associations, or factors related to patient outcomes. Predictive models were developed by nurses to identify risks for adverse health outcomes such as infection or mortality. Moreover, nurses use data science to develop, assess, and evaluate patient outcomes by way of clinical decision support tools, health portals, and care coordination activities.10,11
The impact that data science can make to improve nursing practice, patient outcomes, health equity, and global health is only limited by the imagination. In fact, nurses, as patient advocates, are uniquely qualified to bring clinical insight into the lab where important questions can be asked of large datasets to improve patient care.6,7 Precision health approaches, which aim to use data analytics to deliver actionable alerts to engage patients in improving their own health,12 are especially promising as nurses will continue to play a critical and expanding role in developing evidence-based practice models to improve the lives of patients, their families and our communities.
UB School of Nursing welcomes comments from readers. Please submit your comments in the box below.