Automated Content Moderation: A Primer

A primer on the predictive models used for automated content moderation, known as classifiers.

Large Internet platforms rely on automated tools to moderate content posted by their users. Facebook, for example, reports that it relies on such tools – not user reports – to identify 97% of content the platform removes for violating its hate speech policies. Regulation of  platforms and online information-sharing must reflect this reality. Policymakers should understand – and ask informed questions about – the potential strengths and weaknesses of automated tools that platforms use now, or would use to comply with proposed laws.  

This explainer is intended to help them do so. Author Nafia Chowdhury, a former Global Risk Operations Specialist at Facebook, describes the development and deployment of predictive, machine-learning based content moderation tools. The paper discusses risks or tradeoffs at each stage, and reviews real-world capabilities for identifying text, images, and other kinds of online content that may violate the law or platforms’ own policies. It identifies some sources of potentially avoidable error, including reliance on incomplete or biased data sets to train automated tools. As it further explains, though, some tradeoffs between competing goals or priorities are inevitable. Most prominently, platforms deploying automated tools must decide between known, quantified risks of overenforcement (taking down innocuous content) versus underenforcement (failing to take down prohibited content). Understanding these tradeoffs is essential both in evaluating platforms’ descriptions of their own success rates and in anticipating the consequences of proposed legal mandates.