February 21, 2022

In search of a better community prediction graph

Metaculus displays its users’ recent predictions in a histogram under each question, like this:

Taken from the Ukraine question.

There’s also a curve that accompanies each histogram. It’s not easy to see on this question, so here’s another:

Taken from the 5G question.

What are these curves meant to convey? I imagine they’re meant to estimate the density of the community’s predictions. So if you took a bunch of random Metaculus users and made them give predictions for this question, then the predictions will be distributed roughly as shown. That’s a nice idea, but I don’t think this curve does a great job of that. If you inspect the Javascript that draws the curve, you’ll see that it’s just a beta distribution. The parameters are determined from (weighted) first and second sample moments of users’ predictions. More “recent” predictions are given more weight, as described in the FAQ. (I put “recent” in quotes because it’s just based on the order the predictions came in, not the amount of time that passed.) Mostly I’m fine with this. It’s easy to calculate, which is important so it can be updated client-side on the fly. And it looks good enough most of the time. But my main issues are twofold:

We can fix these issues with a more sophisticated model of user predictions. This would come at the expense of simplicity and ease of calculation, but maybe that’s worth it sometimes. If it’s a really important question, and you want the best inference you can get, you can totally do better than a crude beta distribution.

I don’t have a complete solution to this problem yet. So far I’ve only tried accounting for rounding. That part’s easy. A user will round their answer to the nearest 10% with probability \(r_{10}\), or to the nearest 5% with probability \(r_5\). And otherwise they’ll round to the nearest 1%, because that’s the highest precision Metaculus affords them. We assign priors to \(r_{10}\) and \(r_5\), and that’s pretty much all there is to it. The likelihood is a bit complicated, but it’s doable. The tricky part is accounting for factions.

Consider the following scenario:

  1. You ask your buddy—let’s call him John—how likely an invasion of Ukraine this year is. He says 70%. You can tell he might be rounding his answer, but at least this gives you a rough idea of what he’s thinking.

  2. Now you ask another friend, Rose, and she says 63%. A precise answer, and not very far from John. So it seems like Rose and John are in agreement.

  3. A week later, you ask Dave what he thinks. He says 35%. This is sufficiently novel that you have to consider two possibilities. First, maybe Dave is just different from Rose and John, and his answer doesn’t tell you much about what they’re thinking. Alternatively, he might be thinking the same things they are, in which case you can infer that Rose and John have reduced their credences since last week.

  4. Then you ask Jade what she thinks, and she says 50%. Given the possible rounding, it’s unclear if she’s closer to Dave or to Rose and John. So there are now many more possibilities to consider.

I could go on, but you get the idea. Things get even more complicated if you allow people to switch factions. I don’t see any reason this can’t be done, but it might be hard to compute. On the bright side, you could do a lot with this kind of model. For example, you could contact a person from each faction and try to get them to talk to each other.

What kind of model can handle all of this? I’m thinking some kind of time-dependent Dirichlet process mixture model. I took a quick look at what’s out there, and it seems that people have come up with multiple ways of adding time to a DPMM. So I may make a follow-up post later if I find something promising.