Assessing Demographic Representation in Digital Data for Public Health Research
Project Summary
There are a number of systems that use data from the Internet (such as, news, social media and crowd-sourced reports) and other digital sources (e.g., cell phones, wearable devices) to monitor disease spread, assess population attitudes towards vaccines, and improve understanding of the interaction between population behavioral changes and health. In addition to challenges in extracting public health signals from the noise inherent in these data sources, there are significant biases due to differences in the representation of individuals from different locations, age and race/ethnic backgrounds. Although there have been several publications discussing the limitations of these data sources, no project has developed a rigorous and comprehensive approach to systematically investigate these limitations and explore mitigation strategies.
Understanding the strengths and limitations of these data sources and systems would enable a rigorous assessment of its usefulness for public health research and potentially increase acceptability by public health practitioners. Additionally, assessing the representativeness and quality of these data at the United States county level would be extremely important. Information on health outcomes is typically available at the state or country level, and insufficient sample sizes at finer geographical resolutions makes it difficult to assess health needs such as chronic disease prevalence, quality-of-life measures, and important determinants of health.
To this end, we will develop a process to assess the comprehensiveness and quality of data for public health research at the US county level. To illustrate this approach, we will focus on social media data, which is widely used in the measurement of various health outcomes, and surveillance of disease. In order to improve population health, we need to understand current health and disease trends. This can be partially achieved by evaluating the quality of data used for public health research
Publications
Cesare N, Grant C, Hawkins JB, Brownstein JS, Nsoesie EO. Demographics in Social Media Data for Public Health Research: Does it matter?. arXiv preprint arXiv:1710.11048. 2017 Oct 30.
Cesare N, Grant C, Nsoesie EO. Detection of User Demographics on Social Media: A Review of Methods and Recommendations for Best Practices. arXiv preprint arXiv:1702.01807. 2017 Feb 6.
Henly S, Tuli G, Kluberg S, Hawkins JB, Nguyen Q, Anema A, Maharana A, Brownstein JS, Nsoesie EO. Disparities in Digital Reporting of Illness: A Demographic and Socioeconomic Assessment. Preventive Medicine. 2017;101:18-22.
Nsoesie EO, Flor LS, Maharana A, Hawkins JB, Skotnes T, Marinho F, Brownstein JS. Social Media as a Sentinel for Dengue Surveillance: what does sociodemographic status have to do with it? Plos Currents Outbreaks. 2016;8. doi: 10.1371/currents.outbreaks.cc09a42586e16dc7dd62813b7ee5d6b6.
Code
Coming soon ...
Funding
Robert Wood Johnson Foundation