The Riddler Classic for this week gives the following problem, which I have generalized:

If you are collecting cards from a set of size \(N\), and you collect them in packs of \(c\) unique cards, what is the expected number of packs you will need to go through to collect them all?

This immediately brings to mind the cereal box problem (set \(c = 1\)). In particular, for the cereal box problem, if you are in a state of having \(x\) items left to collect (call this an “\(x\) state”, or just denote it by \(x\)), you can only go to two states: either you get a new item and go to \(x-1\), or you stay at \(x\), which means the expected value \(E[x] = f(E[x-1], E[x])\) depends only on the \(x-1\) state and itself, and so you can rearrange to get something that looks like \(E[x] = g(E[x-1])\). It turns out that \(E[x] = \frac{N}{x} + E[x-1]\), and so the value of \(E[x]\) is just \(N\) times the \(n\)th harmonic number (which makes sense, the expected time of getting to the next state should just be the inverse of the probability of going to the next state). I omit some calculations because we will just borrow the intuition from this solution.

The case for any \(c\) is what we want to solve. The Riddler asks for \(c=10\), which is how I set about thinking about it first. If each card is \(10\) packs, then there are really \(11\) possible outcomes when we open a pack. We can be sure that one of these events will happen: a pack will add one of \(0, 1, \dots, 10\) new cards to our collection. So, if we have \(x\) cards left to get, and then we open a pack, we will expect to be in one of the states \(x, x-1, \dots, x - 10\), each weighted by their probabilities. So, we can say that:

\[ E[X] = 1 + \sum_{y = 0}^{10}{E[X - y] P(Y = y, x)} \]

where \(Y\) is the R.V. for number of new cards from the pack.

Now, all we need to find are the \(P(Y = y, x)\) values.

\(P(Y = y, x)\) is the probability of getting exactly \(y\) new cards if there are \(x\) cards left to obtain from a pool of \(N\) cards, when opening one pack of \(c\) unique cards. We just count outcomes and divide by possibilities, which is fairly straightforward. Let \({^a C_b}\) and \(^a P_b\) be the notation for combinations and permutations. We have \(c\) slots to place \(y\) cards, or \(^c C_y\). We have \(x\) cards to be the first new card, \(x - 1\) to be the second, all the way down to \(x - y + 1\) for the \(y\)th, which is just xPy. Similarly, we have \(^{N-x} P_{c-y}\) ways to fill in the blanks for the old cards. And, in total, we draw \(c\) cards from a pool of \(N\), which is \(^N P_c\).

I used Python to calculate the values of \(E[X]\) given \(N\) and \(c\). The code is fairly straightforward, using an array to store \(E[X]\)’s, and computing combinations and permutations without using closed-form factorial expressions.

For \(N = 100\), I get a value of about \(49.95\) packs on average to collect all the cards. For \(N = 300\), I get a value of about \(186.09\) packs on average to collect all the cards.

Earlier, I also misread the question and failed to see it was \(10\) unique cards, and so I tried to solve that problem where packs could double up on cards. The values of \(P(y, x)\) are a lot harder to calculate, but I think the problem could just be reformulated as treating each card as a separate pack of \(c = 1\), and just multiplying all expected values by \(10\), which we know how to solve in principle. The one problem arises when we get to low values of \(x\), namely when computing \(E[1], \dots , E[10]\). Then, we can’t treat each pack as \(10\) separate cards because there is a chance that if \(x = 10\), we just open \(10\) brand new cards right away. I wonder if there is an elegant way to calculate \(P(y,x)\)’s or if computing \(E[1], \dots , E[10]\) should just be done by brute force using Python for this related problem.

Well, I think I will just wonder about that, because I don’t want to think about it anymore.

Code