Comparing Platform Hate Speech Policies: Reddit's Inevitable Evolution

On Monday, June 30, 2020, Reddit updated its policy on hate speech. As part of research for a forthcoming book based on the Stanford Internet Observatory’s Trust and Safety Engineering course, we present a comparative assessment of platform policies and enforcement practices on hate speech, and discuss how Reddit fits into this framework.
reddit hate speech

On Monday, June 30, 2020, Reddit updated its policy on hate speech, an area of content moderation traditionally considered among the most difficult to regulate on platforms. The new policy more closely corresponds with other major platform moderation policies by prohibiting content that “promote[s] hate based on identity or vulnerability” and listing vulnerable groups. Previously, Reddit’s content policy was vague; according to co-founder and CEO Steve Huffman, the rules around hate speech were “implicit.” Reddit began to enforce its new policy immediately: it removed 2,000 subreddits, including several notable communities such as r/The_Donald and r/chapotraphouse.

This post outlines how platforms grapple with hate speech, one of many issues addressed in a forthcoming book based on the Stanford Internet Observatory’s Trust and Safety Engineering course. We present a comparative assessment of platform policies and enforcement practices on hate speech, and discuss how Reddit fits into this framework.

The absence of a clear legal standard upon which companies can base their policies, as well as the importance of context in determining whether or not a post containing known harmful words constitutes hate speech, makes finding technical solutions incredibly challenging. To this end, the Reddit policy update reveals the tradeoffs faced in identifying and stopping abuse of systems that allow for public and private conversation.

Reddit rose to prominence, in part, due to the lack of gatekeeping on the platform. The site’s previous content policy outlined eight basic rules detailing prohibited behavior for users—including harassment, impersonation, illegal content, and tampering with the site. Yet the lack of clarity surrounding what constituted hate speech and the inconsistent enforcement of such violations enabled users to abuse the platform to advance hateful ideologies, gaining Reddit a reputation as a “cesspool of racism.”

Reddit’s situation is further complicated by its unusual content-moderation model. Most content on Reddit is hosted in “subreddits”, which are user-created and moderated discussion forums focused on a particular topic. Subreddits have several levels of privacy settings, and the Reddit platform provides the moderators of these subreddits a variety of tools to enforce both Reddit’s global policies and optional subreddit policies. Rules governing subreddits often institute norms specific to the subreddit community. For example, the 25 million member r/aww, a subreddit devoted to posting cute photos of “puppies, bunnies, babies, and so on," has rules prohibiting “sad” content. This stands in contrast to centrally moderated platforms such as YouTube and Facebook. While the Reddit model allows for positive self-regulation by user communities it is also uniquely vulnerable to abuse by malicious unpaid moderators.

Social media platforms all share the challenge of managing tradeoffs when weighing free expression against user protection. As a result, hate speech standards have largely converged around protecting specific groups or individual attributes. They generally prohibit attacks targeting an individual’s race, ethnicity, national origin, religious affiliation, sexual orientation, gender, age, or disability. Many platform policies include additional protected groups: Twitter includes caste as a protected category, and TikTok prohibits hate speech based on users’ immigration status. Table 1 below demonstrates the different ways platforms define and address hate speech. Definitions generally fall within four broad categories: 1) protecting vulnerable groups, 2) protecting specific characteristics or attributes of individuals, 3) prohibiting hate speech but failing to offer a definition, and 4) failing to provide any policy on hate speech. 

Table 1: Language on hate speech in policies of major online platforms:

Microsoft

  • “Content that advocates violence or promotes hatred based on: age, disability, gender, national or ethnic origin, race, religion, sexual orientation, gender identity”

Facebook

  • “A direct attack on people based on what we call protected characteristics — race, ethnicity, national origin, religious affiliation, sexual orientation, caste, sex, gender, gender identity, and serious disease or disability”

Instagram

  • Hate speech mentioned in Community Guidelines, but not defined

Pinterest

  • “Content attacking vulnerable groups based on their race, ethnicity, national origin, religion, sex, gender, sexual orientation, age, disability, or medical condition, among others”

YouTube

  • “Content promoting violence or hatred against individuals or groups based on any of the following attributes: age, caste, disability, ethnicity, gender identity and expression, nationality, race, immigration status, religion, sex/gender, sexual orientation, victims of a major violent event and their kin, and veteran status”

Reddit

  • “Promoting hate based on identity or vulnerability”
  • “Marginalized or vulnerable groups include, but are not limited to, groups based on their actual and perceived race, color, religion, national origin, ethnicity, immigration status, gender, gender identity, sexual orientation, pregnancy, or disability. These include victims of a major violent event and their families”
  • “While the rule on hate protects such groups, it does not protect those who promote attacks of hate or who try to hide their hate in bad faith claims of discrimination.”

Twitter

  • Hateful conduct: promoting “violence against or directly attacking or threatening other people on the basis of race, ethnicity, national origin, caste, sexual orientation, gender, gender identity, religious affiliation, age, disability, or serious disease”
  • Hateful imagery and display names: Using “hateful images or symbols in profile image or profile header” and/or using “username, display name, or profile bio to engage in abusive behavior, such as targeted harassment or expressing hate towards a person, group, or protected category”

Snapchat

  • “Content that demeans, defames, or promotes discrimination or violence on the basis of race, ethnicity, national origin, religion, sexual orientation, gender identity, disability, or veteran status”

TikTok

  • “Content that does or intends to attack, threaten, incite violence against, or dehumanize an individual or a group of individuals on the basis of the following protected attributes: race, ethnicity, national origin, religion, caste, sexual orientation, sex, gender, gender identity, serious disease or disability, and immigration status”

Telegram

  • No mention of hate speech in company policies

WhatsApp

  • No mention of hate speech in company policies

Discord

  • “An attack on a person or a community based on attributes such as their race, ethnicity, national origin, sex, gender, sexual orientation, religious affiliation, or disabilities”

Parler

  • No policy on hate speech
  • Obscenity: “Obscenity is as close as it gets to a hate speech law, but it is illegal

Gab

  • No mention of hate speech in company policies

Reddit’s initial updated language followed the lead of its peers by listing marginalized or vulnerable groups with one notable caveat: “While the rule on hate protects such groups, it does not protect all groups or all forms of identity. For example, the rule does not protect groups of people who are in the majority or who promote such attacks of hate.” While the rule implied that hate speech is permissible if directed towards groups that constitute “the majority” or who promote hate speech, the company offered little clarification as to which groups constitute said majorities.  Users and researchers pointed out that certain ethnic or social groups listed in the marginalized and vulnerable categories were in fact also the majority in some countries. Women, for example, a traditionally targeted and thus protected group, comprise the majority of the population in most countries, rendering gender a contestable “vulnerable group” under the guidelines. In other cases, a religious or ethnic minority may hold political power over the majority of their fellow citizens due to a variety of historical factors. The policy did not seem to grapple with the complexity of such situations.

Reddit quickly moved to update the language and by Wednesday, July 1st, the language was revised to the following: “While the rule on hate protects such groups, it does not protect those who promote attacks of hate or who try to hide their hate in bad faith claims of discrimination.” The initial Reddit post including the “majority” clause has been deleted from the site.

reddithatespeech deleted Figure 1: Initial Reddit hate speech policy post removed

How companies enforce hate speech violations is important precisely because the language regarding hate speech in policies is imprecise. Users often gauge their ability to push the boundaries of acceptable behavior based on the signals projected by companies via their enforcement policies.

Even with standardized language, policies around enforcement diverge significantly between companies. Large, well-resourced companies like Facebook and YouTube can implement  machine-learning tools to find and remove hate speech, while others rely on human review after content has been flagged by users. Comprehensive enforcement largely depends on how proactively companies choose to search and weed out hate speech on their platforms. The table below captures divergences in enforcement around hate speech based on publicly available company policies.

Table 2: Hate speech enforcement practices of major online platforms

 

Hate speech explicitly prohibited in company policies

Automatic detection of hate speech

Company employee or contractor reviews reported hate speech violations

Microsoft

 

Facebook

Instagram

YouTube

Twitter

 

Reddit

 

TikTok

 

Snapchat

 

Telegram

  

WhatsApp

   

Discord

  

Parler

   

Gab

 

 

 

In response to Reddit’s policy changes, the most extreme hate groups may seek out smaller platforms with more permissive policies. These communities often congregate where they can still reach an audience, but firm guardrails against hate speech have yet to be established. This includes platforms where the ambiguity around hate speech can be exploited or where the rules around enforcement remain unclear.

Some have speculated that r/The_Donald’s userbase will migrate to new platforms like Parler, an app developed in response to complaints of over censorship on other platforms, which – similarly to Gab - bills itself as “the free speech social network.” Yet even these platforms may begin to change their hate speech policies. In response to criticism about arbitrarily banning users on the site, John Matze, CEO of Parler, recently posted new rules around permissible behavior on the platforms, imploring users to simply “grow up.”

matzeparlerpost edited Parler CEO John Matze’s June 30th post regarding his platform's updated policies. Some words have been removed by the author of this blog post.

As platforms scale, they inevitably face pressure to moderate content or risk public backlash. While some platforms may initially aim to differentiate themselves by implementing policies favorable to free speech and minimal intervention, their policies often converge towards industry standards. The Reddit hate speech policy update and quick revisal exemplifies this evolution. The best platforms can aspire to is to institute policies that are tightly scoped, clearly-defined, and enforced in transparent ways.

Read More

biinomo cover
Blogs

An Investigation into Domestic Nigerian Social Media Financial Scams

We identify hundreds of scam social media accounts across Facebook, Instagram, Twitter, LinkedIn, and TikTok targeting individuals in Nigeria. These accounts post on compromised accounts claiming to have earned money through a fake investment scheme, and encourage others to “invest”. The potential for harm is high: by one estimate thousands have been scammed.
cover link An Investigation into Domestic Nigerian Social Media Financial Scams
Floyd digital street conflict
Blogs

Digital Street Conflict

A survey of spam, state and covert activities in the U.S. during domestic unrest.
cover link Digital Street Conflict
bing google
Blogs

Bing’s Top Search Results Contain an Alarming Amount of Disinformation

cover link Bing’s Top Search Results Contain an Alarming Amount of Disinformation