Using "big data" to understand neighborhood impact on health
A person's ZIP code is a stronger predictor of his or her overall health than many other factors, including race and genetics. Neighborhood environments are linked to an array of health outcomes, including mortality and chronic diseases like obesity and diabetes. Researchers and public health advocates often gather data on built environment characteristics and their potential effects on health through time-consuming and expensive collection methods, including on-the-ground visual assessments and population-based neighborhood surveys of residents.
But newer, technology-enabled methods may increase the availability of data that can elucidate how our neighborhoods are affecting our health. In a new study, Dr. Quynh C. Nguyen, assistant professor of epidemiology and biostatistics in the University of Maryland School of Public Health, and a team of researchers used Google Street View images and computer vision algorithms to assess neighborhood features over large geographic areas, including Salt Lake City, Utah; Charleston, West Virginia; and Chicago, Illinois.
She and her research team also examined the associations between neighborhood features and the prevalence of obesity and diabetes in Salt Lake City, controlling for individual and ZIP code level predisposing characteristics. People living in zip codes with the highest proportion of green streets, crosswalks and commercial buildings/apartments were 25-28% less likely to be obese and 12-18% less likely to suffer from diabetes than those in neighborhoods with the least abundance of these features. This research is published in the Journal of Epidemiology and Community Health.
Zip code distribution of built environment characteristics in Chicago, Illinois.
“We are optimistic that these methods for characterizing built environments will make data more available and will help advance our understanding of the impact of neighborhood characteristics on health,” said Dr. Nguyen. “These data can be used by city agencies and public health practitioners to inform strategies for improving community health.”
The team developed algorithms to refine the accuracy of the computer vision tasks needed to process the geographic data. “We manually labeled 14,000 images to give the network the information needed to be able to recognize and label what a human would see,” Dr. Nguyen explained. “We found that the algorithms achieved an accuracy of 86-93% compared to our manual annotations.”
She acknowledges that the computer vision models have limitations with regard to the type and detail of neighborhood features that can be identified. “We tried to assess the presence of litter and dilapidated buildings,” Nguyen explained. “Unfortunately, it is hard to distinguish between litter and dead leaves in photographs using computer vision. We also didn’t have enough labeled images of dilapidated buildings—you need thousands of examples to train the computer to identify them with an algorithm.”
She also recognizes that the analyses of health outcomes in Salt Lake City, using administrative and clinical records, were unable to evaluate long term trends and did not take into account residential histories and the length of time people lived in the communities.
Still, she is working to refine and expand this method and is gathering data from more geographic areas across the U.S. with plans to examine the relationship to health data gathered from diverse sources, including the National Health Interview Survey and the National Health and Nutrition Examination Survey, among other large data sets. With this information, Dr. Nguyen brings us closer to being able to predict individuals’ health outcomes from their neighborhood characteristics, and to having better information needed to intervene to improve these outcomes.
This work was supported by the National Institutes of Health's Big Data to Knowledge Initiative (BD2K) grants 5K01ES025433 and 3K01ES025433-03S1 (Dr. Nguyen, PI)
Data sharing statement Zip code and census tract level indicators of the built environment indicators developed for this manuscript can be downloaded at our project website: https://hashtaghealth.github.io/geoportal/start.html
In addition to Dr. Nguyen, who is in the UMD School of Public Health’s Department of Epidemiology and Biostatistics, the study’s co-authors include Medhi Sajjadi, MS; Tolga Tasdizen, PhD; Matt McCullough, MNR; Minh Pham; Weijun Yu, MS; Hsien-wen Meng, MS; Ming Wen, PhD; Ken R. Smith, PhD and Feifei Li, PhD from the University of Utah; Thu T. Nguyen, PhD from the University of California San Francisco School of Medicine and Kimberly D. Brunisholz, PhD from the Institute for Healthcare
“Neighbourhood looking glass: 360 automated characterization of the build environment for neighbourhood effects research” was written by Quynh C Nguyen, Mehdi Sajjadi, Matt McCullough, Minh Pham, Thu T Nguyen, Weijun Yu, Hsien-Wen Meng, Ming Wen, Feifei Li, Ken R Smith, Kim Brunisholz, and Tolga Tasdizen and published in the Journal of Epidemiology and Community Health (doi:10.1136/jech-2017-209456)