Exploring the Power of Big Data for Public Health
The Department of Epidemiology and Biostatistics will host the first UMD Big Data in Public Health conference on February 28, 2020. The event is free and open to all interested in learning more about collaborations using big data methods to advance public health. Dr. Quynh Nguyen, assistant professor of epidemiology, is on the organizing commitee, which is chaired by Dr. Tianzhou (Charles) Ma, assistant professor of biostatistics, and co-chaired by Dr. Mei-Ling Ting Lee, professor of biostatistics. Dr. Nguyen spoke with Kelly Blake, assistant dean of communications, about the application of big data in her research.
1) What do we mean by "big data" and how is it changing public health?
There are many sources of organic big data that are available but currently not leveraged for health research. In public health, we have traditionally used health surveys to answer questions.
But now we have data sources that provide the possibility for answering questions about how behaviors and environmental factors influence health that we could not easily answer before. These include social media data, images from Google Street View, electronic health records, claims data from payers, genetic databases, public records, web searches and much more.
But how do we make sense of it? It is like swimming in a lake and you have to have creativity to figure out what variables to explore because these data sources are unstructured. That’s why these research endeavors are best approached with interdisciplinary teams that include epidemiologists, computer scientists and engineers because there are a variety of large data sources that require real-time processing and very high speed of transmission.
2) How do big data techniques allow you to answer questions about how our neighborhood environment impacts our health?
A person's ZIP code is a stronger predictor of their overall health than many other factors, including race and genetics. Neighborhood environments are linked to an array of health outcomes, including mortality and chronic diseases like obesity and diabetes. We can speed up the ability to understand how our built environment impacts health by using big data techniques. My research team used Google Street View images and computer vision algorithms to assess neighborhood features, such as walkability, green spaces, urban development (or decay), and building types and conditions over large geographic areas. We have found that those living in more walkable areas with greater proportions of green space were much less likely to be obese or have diabetes.
3) How are you using Twitter to gauge sentiments or behaviors and how they influence health?
I have two recent papers that explored sentiments expressed in tweets related to race and connected it with vital statistics at the state and county level to examine the correlation with different health outcomes. One paper looked at birth outcomes and the other explored cardiovascular disease outcomes.
We found that people living in areas where the highest number of racist tweets originated had poorer health outcomes. In the case of birth outcomes, there was an increased prevalence of low birth weight (+6%), very low birth weight (+9%), and preterm birth (+10%). For cardiovascular disease outcomes, there was a higher prevalence of high blood pressure (+11%), diabetes (+15%), obesity (+14%), stroke (+30%), heart attacks (+14%) and other cardiovascular disease outcomes for those living in states from which the highest number of racist tweets originated. These poorer health outcomes were seen in both Blacks and whites.
A lot of past research looking at the correlation between racism and health has focused only on associations between adverse outcomes and discrimination among minorities, so this finding that adverse outcomes are seen among whites as well is new. Although whites are not the recipients of discrimination against minorities, racial hostility may still behave like a stressor and be linked with worse outcomes. A social environment characterized by more negativity may be bad for population health overall.
Using Twitter data allows us to look at not just people’s individual experiences of discrimination, but the broader context of the social environment. Understanding area level characteristics gives us a broader way of characterizing and examining the influence of discrimination on health.
4) Why did the Department of Epidemiology organize this Big Data in Public Health meeting?
We are seeing a movement to incorporate data science into public health which allows us to use data sources that we can’t process as we typically do in public health but can collaborate with computer scientists and engineers to accomplish this task. We are looking to encourage more collaborations between researchers from different disciplines and different institutions. It is also a good opportunity to introduce our students to this skillset and to the power of using big data tools.
There’s a common theme of moving to more interdisciplinary work. I am excited that we are bringing in many leaders in the field, from those working in public health to others in diverse social sciences. For the panel that I am on, Lisa Singh from Georgetown University will discuss how news and social media shaped the 2016 presidential election. David Broniatowski from George Washington University studies misinformation on social media and how this is impacting people’s understanding of infectious diseases and vaccination.