Duolingo Leagues

Duolingo pools users into ten leagues (from Bronze, Silver, …, Diamond). Each week, the top XP-scoring members of each league are promoted and the bottom scoring-XP members are demoted (the specifics are here). Assuming no users enter, this can be modeled as a discrete time Markov chain:

\[\mathbf{x}_{n+1} = \mathbf{A}\mathbf{x}_n\]

where $\mathbf{x}_i$ is the vector of population levels for each league in week $i$ and $\mathbf{A}$ is a tridiagonal, irreducible, left-stochastic matrix representing the transitions. Starting from a distribution where everyone is in the bottom three leagues, and assuming no one enters or leaves, we observe the existing transition rules push populations towards the higher leagues:

Alt Text

We can calculate this steady state distribution using linear algebra. In particular, we identify the principal eigenvector of $\mathbf{A}$, which will be associated with an eigenvalue of 1.

A question I have: Can one construct a triadiagonal, irreducible left-stochastic matrix $\mathbf{A}$, given a specified principal eigenvector? This would be of use to Duolingo, if say, they wanted most users to be near the top (i.e., in the pearl and obsidian leagues).

One idea is to cast this as a constraint satisfaction problem. In particular, we can create the following formulation, where the decision variables, $a_{ij}$, are the entries of the matrix $\mathbf{A}$, $\mathbf{v}$ is the (given) dominant eigenvector, and $\epsilon > 0$ is a given minimum transition proportion:

\[\begin{align} \sum_{j=1}^n a_{ij}v_j &= v_i \quad &\forall i \in \{1,\dots,n\} \\ \sum_{i=1}^n a_{ij} &= 1 \quad &\forall j \in \{1,\dots,n\} \\ a_{ij} &= 0 \quad &\forall i,j \in \{1,\dots,n\}, |i-j| > 1 \\ a_{ij} &\ge \epsilon \quad &\forall i,j \in \{1, \dots, n\}, |i-j| \le 1 \\ 0 \le a_{ij} &\le 1 \quad &\forall i,j \in \{1, \dots, n\} \end{align}\]

Here, constraints (1) assert that $\mathbf{v}$ is an eigenvector of $\mathbf{A}$, with eigenvalue 1. Constraints (2) and (5) ensure that $\mathbf{A}$ is left-stochastic (by enforcing column sums to equal 1, and entries to be between 0 and 1). Constraints (3) limit our search to tridiagonal matrices. Constraints (4) enforce the matrix to be irreducible, by guaranteeing all elements on the tridiagonal are at least $\epsilon > 0$.

Constraints (4) are a little bit of a hack. You don’t need all elements on the tridiagonal to be strictly positive in order to guarantee irreducibility, and so we are cutting off feasible solutions. However, I haven’t yet found a way to express the irreducibility with a linear constraint.

However, with $\epsilon$ set to 0.01, for a given $\mathbf{v}$, CPLEX was able to find a solution. I set a target distribution 5% for the bottom five leagues, 10% for the emerald and amythest, 20% for pearl, 30% for obsidian, and 5% for diamond. The resulting transition matrix then leads a random distribution towards the target. The convergence, however, is rather slow (it takes 3500 weeks to get there):

Alt Text