The complexity of building seemingly simple lists

Some thoughts on the Oversight Board decision on Meta’s cross-check system

Dec 06, 2022

Story time.

It’s 2012, and I’ve been at Facebook for just over a year. My job is working with Republican candidates and officials to create and use Facebook pages to connect with voters. One day I get a call from someone on the Obama digital team. The President’s page had been taken down. My colleague Adam, who worked with the Democrats, was on vacation. They were in a panic and wanted help figuring out what happened.

The early engineering culture at Facebook was where engineers could quickly build and push code. Turns out, one enterprising person had decided to build a list of profane words and decided that if any page had those words in their title or the about section, they should be removed. One of the words on that list was “dick.” The President’s team had listed on his page that his favorite book was Moby Dick, and that’s why the page came down.

We got it back up pretty quickly, but we also realized we needed a way to ensure the President of the United States’ page could be protected from mayhem like this. So we figured something out.

While this isn’t the exact moment that the system now known as cross-check was created, it is one of the earliest examples of why we needed it. Little did I know how this would grow in complexity over the years.

Today’s analysis by the Oversight Board about what that program has become over the last 10+ years is very thorough. It does a good job of identifying the complexities of the program, and I think it provides an unprecedented level of transparency into how platforms like Meta make tradeoffs and decisions based on various concerns. While they had to be dragged into it a bit, Meta should be applauded for doing this, and the Oversight Board should be applauded for holding them to account. I wrote more insights in this tweet thread.

What I didn’t have a ton of space to go into with that thread is how hard it is to create and maintain lists. So many times over the years, I’ve heard people think lists are the solution to various integrity problems. Create a list of politicians, a list of people who should be allowed to do something, a list of people who can’t do something. Just make a list!

Oh, if only it were that easy.

To start, defining who should or shouldn’t be on a list gets complicated fast.

For instance, the Board, in its analysis, says Meta should do more to “prioritize expression that is important for human rights, including expression which is of special public importance.” Meta is criticized for not having a “comprehensive system in place to systematically assess which journalists, human rights defenders or civil society figures in a particular geography should be subject to ERSR.”

Yeah - that’s because not only defining who a journalist is, let alone the real security risks of Meta having a clean taxonomy of journalists and activists worldwide. While I understand that the Board doesn’t see it as part of their job to write policies, I wish they gave a little more guidance on how they would want Meta to define and find who these people are. An opt-in system would help some - but you still need a way to confirm that the people asking for protection should get protection and ensure you are covering those who may not realize they can request it.

We ran into the same problem in defining who a politician is. Let’s create a list of every politician and government page worldwide. Sounds so simple at first blush, right? Surely someone has built this. Nope. They still haven’t.

In the U.S., at least you have things like Ballotpedia, but that is the exception versus the rule for the rest of the world. For some of my colleagues at Meta, their ENTIRE job was creating and maintaining what we called the civic graph. We tried to automate where we could, but overall it was a very manual process that involved finding lists of candidates from the relevant authorities (this was challenging, too, because for some countries with parliaments, such as India, you don’t even know all of the candidates on Election Day!), then searching for their Facebook pages and Instagram accounts, confirming we found the right ones and not imposters and then putting those in a list.

But how do you determine if someone is a candidate? Sounds easy enough; go see if they filed papers with the relevant authority. Nope - that doesn’t work when it takes days for those authorities to make that information public. Oh, and most have no API or an automated way for us to look.

This became an issue in 2019 when someone ran for office in California to test Facebook’s policy about not fact-checking politicians. Those candidates didn’t get that protection until we added them to our list, and they wouldn’t get added until they were listed on the California Election Commission site. Given the delay, his content was removed between when he announced and when he was added to the list. The press had a field day, and we had to adjust our definition of when someone is a candidate.

I have a ton more stories just like these. I still understand and will defend why we needed systems like this at the platform. I also agree with the Oversight Board that we should have had more governance around it, oversight, and better processes. Many of my colleagues did and are doing much to improve this over the years. But hindsight is always 2020.

Next year one of the things I want to work on is how we can create frameworks for how we think about politicians and governments online. How can we help smaller platforms that don’t have the resources of Meta to think about how they handle these entities? How can they learn from what Meta has gone through? Today’s analysis is a milestone towards bringing more transparency, and thus more robust debate, on how we should handle these very complicated issues.