## Dividing up work equitably

12 Dec 2019

I started TA-ing this semester, which has been a blast. The course is Intro to Comparative Politics, taught by Kimuli Kasara. Throughout the semester, my fellow TAs—Oscar and Jackie—and I have been figuring out the best way to allocate grading duties so we do a more or less equal amount of work, given that we have quite different numbers of students. Mostly, we’ve been doing it on an ad-hoc basis. Exams, for example, are structured around shorter identification questions and slightly longer but still short essays. Since students could pick options from each of these sections (e.g. answer 5 of 7 questions), we decided to grade by question, for consistent grading across questions. One TA would grade the ID portion (we had 3 exams so it worked out nicely there) which general went by more quickly, and then we would take into account which questions were most popular for students and divide up the short essays.

Given that the characteristics of the exams were not revelant for the final paper, I wanted to take a stab at thinking about how to do this in a more rigorous manner (though thinking through this for the exams would be interesting as well). I was unfamiliar with an off-the shelf algorithm that would be able to do what I wanted it to do, and I do not formally study computer science, so here was my thought process. The numbers in the following section are completely made up for privacy purposes.

Suppose there are 79 students in the class. These students are not distributed equally among TAs.

``````students <- 1:79
TAs <- c("Jackie", "Julian", "Oscar")

TA_assignment <- sample(TAs, 79, replace = TRUE)
roster <- cbind.data.frame(students, TA_assignment)
table(roster\$TA_assignment)
``````

In this (simulated) case, Jackie was assigned to teach 28 students, Oscar was assigned to teach 24 students, and I was assigned to teach 27 students. Of course, in reality, assignment probabilities are not equal. My two section times were early Monday mornings and Thursday late afternoons, so you can imagine my section was really popular. Anyways, let’s move on to the constraints. I want to assign paper grading (all the essays have a 2,000 word limit, so let’s assume they are the same length) in such a way that 1) each TA is not assigned to his or her own students, and 2) each TA is grading a more or less even amount of papers. Here was my approach, recognizing that it is very much in “brute force” style and won’t be super efficient for large numbers of observations. Let’s start by making 1000 copies of the roster:

``````sims <- 1000
essay_roster <- lapply(1:sims, function(i) roster)
``````

Now let’s make a new column in each of these `data.frames` that reflects the grading “match,” and randomly assign a TA that is not the TA of the student to it:

``````for (i in 1:sims) {
essay_roster[[i]]\$match <- rep(NA, nrow(essay_roster[[i]]))
}

for (i in 1:sims) {
for (j in 1:nrow(essay_roster[[i]])) {
essay_roster[[i]]\$match[j] <- sample(TAs[TAs != essay_roster[[i]]\$TA_assignment[j]], 1)
}
}
``````

This works for the first constraint, but does not guarantee us the second constraint. That’s why this was done 1,000 times. For any given simulation, the distribution might not be even, but one can pick out the “best” (most even) one pretty easily by taking the first one (which is random) with the lowest standard deviation, like so:

``````outcomes <- lapply(1:sims, function(i) table(essay_roster[[i]]\$match))
SDs <- sapply(1:sims, function(i) sd(outcomes[[i]]))
which.min(SDs)
final <- essay_roster[[which.min(SDs)]]
``````

The `which.min` function will return the index of the lowest standard deviation. Here it popped up pretty quickly, on the sixth simulation, one in which I grade 27 essays and Oscar and Jackie grade 26 essays. If the distribution of students to TAs is a lot more lopsided, it might take far longer. Moreover, there’s a minimum value of students that each TA must have for it to work (to produce a relatively equal number of assigned grading per TA, of course one could always take the outcome with the minimum standard deviation). I haven’t thought this all through (this is a super quick write-up) or formally derived these conditions, so if you know and can point to an existing similar version of this problem, I’d love to hear about it in the comments. For now, I just wanted to share my approach. Ok, now back to studying for my own exams…