Nature Journal
January 29, 2020

Data from sites like Twitter, Reddit and Instagram can help scientists learn about human health and behavior, but social media bots that pump out computer-generated content can make it difficult to discern public opinion from the bots. 

They can interfere with public health research and pollute datasets, Amelia Jamison, faculty research assistant at the Maryland Center for Health Equity, told the journal Nature in a recent article

“You might be artificially giving the bots a voice by treating them as if they are really part of the discussion, when they are actually just amplifying something that may not be voiced by the community,” Jamison said. 

For example, Tweets generated by bots are more likely to claim that e-cigarettes help people quit smoking, or tout the unproven health benefits of cannabis. 

Bots are designed to behave like real people and post original content at random intervals, Nature said. Sometimes, human-generated content is even mixed in with the computer-generated content, making the bots more difficult to detect 

“...Bot detectors are locked in an arms race with bot developers,” the article stated. When you know how to detect them, then the knowledge is also available for the creators of the bots.

Jamison, who studies health disparities, has mined social media for posts that oppose vaccination. She said that failing to weed out bots could lead her to conclude that people are generating more or different anti-vaccination chatter than they actually are.

But, as researchers grapple with how to define and detect bots, many fail to filter out automated content, in part because some feel they lack the expertise to do so. 

Related Links

Nature: Social scientists battle bots to glean insights from online chatter

Related People
Amelia Jamison