When it comes to mitigating online harms, the U.S. Congress is at least united on one point: There is a need for greater transparency from tech companies. But amid debate over how to reform the liability protections of Section 230 of the Communications Decency Act, the exact shape of proposals to mandate transparency remains uncertain at best.
While “transparency” means different things to different people, it speaks to the desire among lawmakers and researchers for more information about how social media platforms work internally. On the one hand, the desire to impose transparency requirements runs the risk of becoming a catch-all solution for online harms. On the other, if lawmakers are ever to arrive at “wise legal solutions” for these harms, they will need better data to diagnose them correctly, as my Stanford colleague Daphne Keller has argued.
In order for platform transparency to be meaningful, scholars argue that these companies need to be specific with the type of information disclosed. This means platforms cannot just increase the amount of information made public but also need to communicate that information to stakeholders in a way that empowers them to hold platforms to account. Currently, tech platforms oversee themselves and are not legally obligated to disclose how they regulate their own domain. Without mandatory regulation, we are left with self-regulatory efforts, which have no teeth. Congress is considering transparency requirements with bills such as the Platform Accountability and Consumer Transparency Act (PACT), which would require platforms to publish transparency reports, and the Online Consumer Protection Act (OCPA), which would require platforms disclose their content moderation policies. These proposals, as Mike Masnick and others have argued, embrace social media platforms’ model of transparency, but rather than improve it they add further restrictions that may be more harmful than helpful.
A well-formulated transparency regime might provide a measure of oversight that is currently sorely lacking, but if lawmakers are to craft such a regime, they need to first understand that transparency reports as they are currently structured with aggregated statistics aren’t helpful; that community standards aren’t static rules but need space to evolve and change; and that any transparency regulation will have important privacy, speech, and incentive tradeoffs that will need to be taken into consideration.
The cautionary tale of aggregated metrics
The proposals to impose transparency on the tech sector express a desire for more data on how platforms operate. This data can be used to provide assessments of how much of a type of harm is present on the platforms. But the desire to produce more data can also produce distortions in how we understand online platforms. A good example of this is the data describing the prevalence of online child sexual abuse material (CSAM).
In 2018, electronic service providers forwarded some 45 million reports of images and videos containing CSAM to the National Center for Missing and Exploited Children, according to a groundbreaking and disturbing New York Times investigation. That report concluded that Facebook provided most of these reports. The same holds true today: Of the 21.4 million reports filed in 2020, 20.3 million came from Facebook. But this doesn’t mean that most of the abuse happens on Facebook; rather, it’s a reflection of the massive amount of overall material posted to Facebook and the fact that Facebook does extensive voluntary detection and reporting. Basing our understanding on total numbers results in a misleading sense of the scope of the problem—and who is responsible for ameliorating it.
The effort to eliminate CSAM also provides a cautionary tale for cracking down on other forms of online content. Basically everyone can agree on what CSAM is and that it is harmful. But for other types of content that legislators would like to see less of online, such as misinformation and disinformation, there is far less clarity and consensus. Tools like PhotoDNA, which are used to detect CSAM, can’t be used to determine if content is false, and speech related to voting and politics can’t be as easily parsed for abuse. For mis- and disinformation, it is all but impossible to place them into neat, mutually exclusive categories. Requiring platforms to disclose the prevalence of such material on their platform might seem attractive until you attempt to define exactly what such material entails.
These difficulties in determining appropriate, clear definitions around problematic content were evident in the 2020 election and online narratives regarding voter fraud. In one instance, a TikTok user posted a video claiming that she had been sent a ballot from the state of Washington, even though she is not a citizen and therefore cannot vote. Upon closer inspection, it is hard to tell if the envelope in her hand is for a ballot or a voter registration form, which, in the lead up to the election, confused other non-eligible voters and was taken as a sign that the election was rigged. Whether or not she is genuinely mistaken about the mail in her hand determines which category of content the video belongs in— whether it is misinformation, true information, or, if she is intentionally trying to deceive people, disinformation. Which bucket should it belong in?
There is no simple answer to that question; rather, the difficulty in answering it should humble anyone trying to come up with good, strict ways to measure misinformation and reflect that aggregated number in the context of a transparency report.
The need for flexibility and creativity in community standards
As it stands, platform community standards, where content moderation policies reside, are not well publicized, scattered, and difficult to keep track of after updates are made. One congressional transparency proposal, the Online Consumer Protections Act (OCPA), attempts to address this in part by directing platforms to disclose their content moderation policies, including how companies inform users about action taken against their content or their accounts. The bill would allow users to sue the company if the platform took action that isn’t spelled out in its community standards.
Providing insight specifically to users on why action was taken against their content is a central pillar of the Santa Clara Principles, which were created in 2018 by a group of scholars, organizations, and advocates of transparency. However, where the OCPA fails users is by not fully understanding how content moderation works in practice. Critics of bills such as OCPA argue that they do not allow for the flexibility and adaptability that is required in content moderation. Rather, they are trying to solve a problem on the false premise that “the internet — and bad actors on the internet—are a static phenomena,” as Masnick puts it. While it is important to communicate the content moderation policies, it is also critical to communicate that these standards are changing.
In the run-up to the 2020 election, platforms had to modify their policies to respond to how their systems were being abused. A week after former President Trump told the far-right Proud Boys to “stand back and stand by,” Facebook, Twitter, and, later, YouTube adapted their policies to say that they will take action on content that encourages or incites violence at polling places. It’s easy to argue that platforms should have already had policies such as these in place, but mandates from Congress on what should be disclosed in policy documents can’t predict what rules are needed as circumstances change. Nor does mandatory disclosure encourage policy iteration.
Regardless of what transparency Congress decides to demand more of, it’s imperative to understand there are tradeoffs inherent to mandatory transparency regulations and therefore a need for nuanced solutions. If Congress were to require platforms to disclose certain types of data, one potential consequence is that platforms will devote most or all of their resources toward addressing the types of abuse they have to report and neglect those that aren’t mandated, either because they don’t want to look or because they don’t have any resources left to do so. The PACT Act, for example, would require reporting on the total number of “illegal content, illegal activity, or potentially policy-violating content” on a given platform that was a) flagged; b) acted on by the platform; c) and appealed and then restored, along with a description of the tools or practices used to enforce its policies. Under a regime such as this, if a platform doesn’t include misinformation or disinformation in its community standards and state that it will take action against it, because that content isn’t illegal by law, a company wouldn’t be required to address it in a mandatory disclosure.
Any time Congress requires platforms to look for certain content, it requires monitoring all users’ speech on any given platform for that content. While we may want answers about the prevalence of content questioning the efficacy of COVID-19 vaccines, doing so requires monitoring all speech for vaccine hesitancy. This may be a trade-off we are willing to make, but consideration of that privacy tradeoff is currently missing from the debate. These and other tradeoffs need to be factored into the conversation of transparency.
With legislation that applies to the many different types of social media platforms, let alone all the entities that Section 230 applies to, regulation risks losing a level of specificity that is important in meaningful transparency. What may be a good content moderation practice for YouTube in dealing with videos of political misinformation may not transfer to political misinformation on Reddit, a forum made up of text and memes. By introducing an apply-all policy, we lose the opportunity to address issues in a specific and tailored way that could prove more helpful in the long run.
Carly Miller is a research analyst at the Stanford Internet Observatory.