Privacy, Middleware, and Interoperability

Can Technical Solutions, Including Blockchain, Help Us Avoid Hard Tradeoffs?

Interoperability and distributed content moderation models have tremendous promise. They could temper major platforms’ power over public discourse, introducing both more economic competition and more diverse and pluralistic spaces for online speech. But these models — which I will reductively refer to as “middleware,” following Francis Fukuyama’s coinage — also raise a number of as-yet-unresolved problems. As I explained in this short piece, one of the hardest problems involves privacy: When a user opts in to a new service, can she give that service permission to process other people’s data and content?

This post examines some possible technical solutions to protect privacy while enabling interoperability or distributed content moderation. It focuses in part on blockchain technologies, and draws on recent conversations with a number of seriously smart people about possible technical designs. My takeaway is that there is no magic bullet. We can’t have perfect privacy and optimize for platform interoperability or middleware. But blockchain technologies, and some other technical design approaches, can whittle away at the problem. The rest of this post examines ways to do that.

Note: This post starts from a pretty high level of wonkiness, and then gets wonkier. If you’re new to the “middleware” topic, my short take on why it matters and why it’s hard are here; Cory Doctorow and I also had a great live discussion about it here. If you’re a blockchain maven and spot any mistakes, they are definitely mine and not the fault of the experts I talked to.

This post doesn’t run down every possible permutation of the ideas it discusses, and it is un-wonkily reductive about at least three complicated things. (1) It uses the term “middleware” very loosely, as a way to lump together models including platform-to-platform interoperability, protocols not platformsMagic APIs, and federated systems like Mastodon. (2) It elides differences between real blockchain technologies, describing a hypothetical a version (perhaps most similar to Project Liberty) optimized for Middleware. (3) It simplifies the kinds of user data and content used by social networks.

OK, here we go.

What Blockchain Can Do

As a social media user, blockchain technologies could provide me with the following:

1. An authenticatable identity that I control and can use to log in or validate my identity across multiple services.

2. A social graph linking my identity to other people’s similarly authenticated identities (meaning I control a “contact list” and decide when and how other people or services can access it).

3. For purposes of Middleware, a copy of every piece of content I post. This doesn’t have to be blockchain-linked — it could just be stored on a physical device I control. But the content could be stored subject to my control as the blockchain-authenticated user, or in principle even stored on-chain (though then it couldn’t be deleted). To be useful for Middleware, this content would need to be kept in a format usable by any platform or Middleware provider. (This is quite hard to do, in practice). For each item of content, it could include a record of which service I shared it with and which contacts from my social graph could see it on that service.

The Problem with Sharing Other People’s Data

Suppose three friends — Ann, Balaji, and Carlos — all have the set-up described above. Each has a blockchain-linked identity, a blockchain-linked social graph listing one another as contacts, and a controlled copy of all their posted content. They are currently friends on Facebook.

Ann and Balaji both opt in to see their Facebook feeds ranked and moderated by a new middleware provider called NewService. Once this happens, both can see — and, importantly, allow NewService to “see” and process — all of the past and future posts, comments, and other content they could have seen from each other on Facebook. They don’t need Facebook to transmit this content, since it is all stored under Ann’s and Balaji’s control in the first place. The blockchain setup described above creates an easy, low-cost way for these Facebook users to migrate to a competing service, use two interoperable services at once, or layer a competing provider of content moderation on top of Facebook.

The situation is different for Carlos. He isn’t interested in NewService, or doesn’t trust its operators to protect his privacy, so he doesn’t sign up. NewService can’t see, assess, or moderate his posts. Ann and Balaji can still see Carlos’s posts on Facebook, but won’t see them using NewService. (Or perhaps Carlos’s posts could flow directly from Facebook to Ann and Balaji’s devices, appearing somewhere in their social media user interfaces without the benefit of any labeling, ranking or other content moderation services from NewService.)

NewService can only perform the Middleware function if (1) it is limited to moderating content from friends who have also opted in to NewService (Ann can have NewService moderate Balaji’s content, but not Carlos’s); or (2) our social graph contacts are forced to share their data with the services we choose to trust (Ann’s consent allows NewService to see posts from Carlos). The first choice prioritizes privacy and avoids Cambridge Analytica-like scenarios. The second prioritizes interoperability, competition, and diversified content moderation. Adding these blockchain technologies greatly streamlines consensual data sharing, but doesn’t let us avoid this difficult trade-off.

Some Partial Solutions

We could tweak this system in a few ways to allow more middleware moderation. These partial solutions make things somewhat better, and could be used in combination. But none solve more than a portion of the problems, and many involve privacy trade-offs of their own.

Pre-emptive permissions: A user could presumably set permissions on her controlled copy of her own posted content to say “This content (or a subset, such as the content already shared with specific contacts) may be shared with any service used by the following authenticated contacts from my social graph.” If Carlos sets his permissions to trust Ann’s new services, then when Ann opts in to NewService, she can let it “see” and moderate Carlos’s posts even though Carlos never signed up for NewService. This model becomes more feasible and secure with blockchain-based identity authentication. But it requires more fiddling with settings than most people are willing to do.

Only moderating “less private” content: A Middleware service could be designed to only “see” and assess content that is on a public URL (like a New York Times story); or only content identical to that which has been shared more than x number of times (like Facebook’s practice of showing Social Science One researchers content shared publicly over 100 times). This would make NewService useful to Ann when Carlos shares links to New York Times stories on Facebook, or when he shares a popular, viral cat picture. But NewService still couldn’t see Carlos’s unique private posts, like his original written observations or personal photographs. (FWIW, I don’t think using blockchain affects this option.)

Seeing content but not knowing who shared it: A Middleware service could see the content of all posts, but not the identity of the person who shared them. That provides some degree of privacy protection, but it still fails to protect Carlos’s privacy if he shares personally identifying content (a photo of himself, a birthday greeting to his spouse that discloses their name) or if his posts are personally identifying in aggregate (as is likely). And it improves NewService’s moderation by letting it “see” and assess all content. But it leaves NewService unable to use other important moderation techniques that depend on tracking metadata about users and their interactions. (I don’t think using blockchain affects this model, either.)

Client-side content moderation: Carlos’s privacy can be protected if NewService’s ranking and moderation services take place entirely on Ann’s computer, phone, or other device — which never sends Carlos’s posts to NewService’s servers. A crude model for this could use locally stored hashes to detect duplicates of known prohibited images, similar to the system Apple recently announced to scan iPhones for child abuse images. Technology like this would have real limits, both in terms of local device processing capacity and in terms of the quality of moderation possible without NewService “seeing” and assessing novel content. A more sophisticated version of this approach might send partial information to NewService’s server, similar to Firefox sending incomplete URL information to Mozilla’s servers for anti-phishing. (I don’t think using blockchain affects this, and I am certain that there are more technical opportunities and issues than I can spot. I expect they will have a fair amount of overlap with work from Jonathan MayerCDT, and others on moderation models for end-to-end encrypted communications.)

Conclusion

In a sense, the “privacy trade-off” framing is misleading. It presupposes that Facebook and other existing platforms were somehow good stewards of our data in the first place, and that their continued control should be the building block for our privacy in the future. It disregards the ways in which, by sharing our letters and emails and posts and comments with friends, we already relinquished control and entrusted our content to those human beings. Giving Facebook control, and taking control away from our friends, seems like a perverse outcome.

That said, no one would even have such massive and perfect troves of data about our behavior and communication if Internet platforms hadn’t collected and stored it all. Treating existing platforms as the locus of control and the site of privacy trade-offs may be unavoidable. It’s also how the law works. Under both US and European privacy law, platforms are charged with deciding when and how to share our data. Making the wrong decisions can get them in a lot of trouble, legally and reputationally. We need legal rules that will prompt them to make the best choices. As the middleware discussion highlights, our idea of what choices are “best” may turn on how we prioritize competing values and goals — including privacy, competition, and improved speech environments. I think we will see an evolving array of technical solutions that nibble away at the edges and make these trade-offs less stark. But we can’t avoid them altogether.