# Analyzing 'Are You the One?'

Neel Somani -

My brother recently introduced me to the show "Are You the One?" From the Wikipedia page:

A group of men and women are secretly paired into couples by producers, via a matchmaking algorithm. Then, while living together, the contestants try to identify all of these "perfect matches." If they succeed, the entire group shares a prize of up to $1 million.

They're given two major pieces of information:

- Each week, the contestants guess what the correct pairings are. They're shown some number of lights, which represents how many pairs they got correct.
- The contestants can send one couple to the "truth booth" each week, which tells them whether those contestants are certainly together or certainly not together.

So after k weeks, we have:

- N actors: A_{1}, ..., A_{N}
- N / 2 unknown couples: (A_{i_{1,1}}, A_{i_{1, 2}}), ..., (A_{i_{N/2, 1}}, A_{i_{N/2, 2}})
- k groups of couples (each of which I'll call a "pairing") and the number of lights they produced for each week
- m pairs that MUST be in the solution (went to the truth booth successfully): (A_{j_{1,1}}, A_{j_{1,2}}), ...
- (k - m) pairs that must NOT be in the solution (went to the truth booth unsuccessfully)

What's the probability of each pair being together? What's the optimal strategy? If you want to try out the code yourself, you can check it out here: https://github.com/neelsomani/are-you-the-one

All of the results in this blog post are in the `examples.py`

file.

Here's what the probability matrix looks like for this week (season 8, week 6):

Note that while Jasmine and Nour haven't been in the truth booth, they must be each others' perfect match. Jenna and Paige have about a 90% chance of being each others' perfect match. I rounded the numbers for the visualization. There are only 11 possible pairings at this point in the season.

I'll outline some of the results below.

## Defining the Solution Space

A "pairing" is a mapping of each contestant to another contestant. The mapping is symmetric and one-to-one.

A pairing is "valid" if the pairing does not violate the constraints above. That is, for each week, if that pairing were the truth, it should have produced the observed number of lights. The pairs that must be together are paired in a valid pairing, and no pairs that cannot be together are paired.

The probability of a couple being together is the number of valid pairings where the couple is together, divided by the total number of valid pairings.

Since in this season, no one has any gender preference, any of the candidates could have been with anyone else a priori. In the case of gender preferences, we can incorporate that information in the same way as above: A group of straight men, e.g., can be defined as a group of people that cannot be paired with each other. By converting this group into a list of pairs (the Cartesian product of the group with itself), we can treat them the same way as couples who went into the truth booth and were confirmed not to be matches.

Here's what the probability matrix looked like for season 7, episode 4:

## Defining Optimality

Let's define optimality as a move that minimizes the size of the solution space in expectation. For example, a move that doesn't shrink the size of the solution space gives us no information and is therefore suboptimal. A move that shrinks the solution space down to a single possible pairing is as good as it gets.

### Optimal play for the truth box

For the truth box, the contestants should send the pair that is the closest to a 50% chance of being a match. This is because that probability minimizes the size of the solution space in expectation.

Specifically, if persons A and B are a match in p proportion of the solutions, then the expected number of possible pairings after sending them to the truth box is p * p * N + (1 - p) * (1 - p) * N, where N is the original number of possible pairings. This is because p of the time you'll end up with p * N possible pairings left, and (1 - p) of the time you'll have (1 - p) * N pairings left. Minimizing with respect to p, you'll find that p = .5 is the optimal proportion.

### Optimal play for the pairings

Unfortunately I wasn't able to fully solve the pairings this afternoon. To pick the optimal pairing P_hat, you could go through each pairing and calculate:

P_hat = min_{P} sum_{k} { # of possible pairings that would give P k lights } / { total # of possible pairings } * { # of possible pairings that would give P k lights }

= min_{P} sum_{k} { # of possible pairings that would give P k lights }^2

Under this framework, the optimal pairings for this week would be:

Actually, this is one of a few equally optimal pairings for this week.

## Solving the Problem Efficiently

The solution in the image above isn't quite correct, because I only looked across the valid pairings, not all pairings. I suspect that the P_hat that minimizes the equation above would be in the space of valid pairings, but I wasn't able to formally prove that today, and I'm really not sure that it's even true. One motivation is that:

min_{P} sum_{k} { # of possible pairings that would give P k lights }^2

= min_{P} sum_{k} { # of possible pairings that would give P k lights }^2 / { total # of possible pairings }^2

= min_{P} sum_{k} Pr[P has k lights]^2 subject to sum_{k} Pr[P has k lights] = 1

which, by the method of Lagrange multipliers, shows that the best possible solution would have an equal chance of having any number of lights (Pr[P has 0 lights] = Pr[P has 1 light] = ...). That solution definitely doesn't exist in general, since Pr[P has N / 2 - 1 lights] = 0 (you can't have only one pair mismatched) and Pr[P has N / 2 lights] = 1 / |{valid pairings}| iff P is a valid pairing (0 otherwise). The next best thing would be to make the remaining number of lights all equally likely, which would give a slightly lower expectation for a valid pairing.

Note that this also shows that if such an optimal pairing exists (such that Pr[P has 0 lights] = Pr[P has 1 light] = ...), minimax will yield the solution. If no such solution exists, then there are times when minimax will not minimize the size of the solution space in expectation. For example, if there were only six contestants, then minimax would choose a situation with:

Pr[P has 0 lights] = .35, Pr[P has 1 light] = .35, Pr[P has 2 lights] = .29, Pr[P has 3 lights] = .01

over:

Pr[P has 0 lights] = .36, Pr[P has 1 light] = .32, Pr[P has 2 lights] = .31, Pr[P has 3 lights] = .01

since the worst case for the latter would leave 36% of the possibilities, but the worst case for the former would leave 35%. But the latter distribution minimizes the size of the solution space in expectation.

## Measuring the Value of Information

I wanted to see what the optimal number of weeks it would take for the 16 contestants to figure out what the pairing was, assuming they only had access to the lights data (no truth booths). That would require me to know the optimal strategy, which I'm not sure about. Ideally, at each time step, the contestants should guess the pairing that would give them the most information (read: narrow the solution space by the largest amount in expectation). I can estimate the solution above by sampling some small number of pairings when calculating the expectation, but the result is noisy.

Instead, I just had the contestants pick one of the valid pairings randomly for each week. On average, they took about 12.4 weeks to determine the correct pairing. The value of knowing one couple right off the bat is equal to the 12.4 - {the expected number of weeks it would take 14 contestants}, since we already know that the last two contestants are together without loss of generality. The difference ended up being about 1.5 weeks.

That's a very imperfect analysis, but it gives a rough sense of how much the solution space is narrowed with one (correct) visit to the truth booth.

Combinatorially, the exact difference in the size of the solution space is 16! / (8! * 2^8) - 14! / (7! * 2^7) = 1891890 pairings eliminated. That's really substantial.

## Conclusion

I hope that this added some insight into the show. Feel free to let me know if you need help using the linked code to do your own analysis.