Data are crucial for policy makers to estimate health outcomes, identify health-related risks and develop public health programs for communities. These data should reflect the communities’ dynamic state of physical, mental and social well-being.
One common approach to data collection is to conduct surveys through telephone interviews or custom questionnaires. However, conducting surveys can be expensive and time-consuming. Depending on the scope of the survey, it could take years to collect all of the results.
Social Media as a rich source of health-related data
Social media has become one of the most popular means by which people from around the world communicate with one another and share their opinions, activities, and status updates. Online content, when harnessed appropriately, can provide useful information about diseases and health dynamics in populations. As such, the study of population health through social media has become a new field called “digital epidemiology”.
Text, images, and more…
Social media data offers a variety of data types, most notably text and images. For some time text-based posts have been considered a primary data source for many health analysis studies. More recently, visual data has been exploited as an additional source of information that potentially captures important health issues. Online social networks reflect real-life social networks. People join social groups that interest them, make friends, and display their attitudes and opinions through status updates, comments, likes and reactions. People interact with each other, engage with social groups, tag friends, and more. Together, these interactions comprise a new form of data called digital social capital. Every interaction offers an opportunity to understand something about the underlying health of a population.
In our latest work, we describe a comprehensive investigation of population health analyses from social media data using machine learning techniques. Specifically, we propose a method that combines various forms of social media data into a single analysis. To do this, we developed new frameworks for extracting visual features using a deep neural network, and combined them using location-tagged social media data to produce “social capital features”. We evaluated the effectiveness of using social media data types by comparing the predictions of population health based on social media data against health outcome estimates that are based on health-related questions in the Behavioral Risk Factor Surveillance System (BRFSS) [https://www.cdc.gov/brfss/index.html]. BRFSS is conducted annually by the U.S. Centers for Disease Control and Prevention and is currently the world largest health survey. We conducted extensive experiments on large-scale datasets collected from two popular social networks: Foursquare and Flickr.
This article is based on the research paper: “An empirical study on prediction of population health through social media”. Hung Nguyen, Thin Nguyen, and Duc Thanh Nguyen. Journal of Biomedical Informatics, vol. 99, 2019.
Author: Dr Thin Nguyen
Editor: Dr Thomas Quinn