# The Prisoner’s Dilemma

Today’s edition of Occupy Math looks into cooperation and conflict – from a mathematical point of view. Occupy Math has published more than thirty scientific papers on a tool for modeling cooperation and conflict called the iterated prisoner’s dilemma. The big news in this edition of Occupy Math is that many published papers on the prisoner’s dilemma completely ignored a factor that controls their results. Let’s begin by understanding Prisoner’s Dilemma. It is based on the following story.

Two suspects are caught by a sheriff. She has them cold for trespassing, but she want to convict them of armed robbery — which she is pretty sure they committed. She puts the suspects in separate rooms and offers them a deal — leniency for testimony. Each suspect can cooperate with his partner (stay silent) or defect against his partner (squeal on him).

To make this math, we need to turn it into a game with scores. If both suspects (players from now on) cooperate they each score 3; they only pay the penalty for trespassing. If both of them squeal they score 1; they get sent up for armed robbery, but with leniency because they helped the sheriff. If only one player defects, the defector goes free and gets a score of 5 while the other player gets 0, he is both convicted of armed robbery and granted no leniency.

The dilemma is that no matter what your partner does, you get a better score by defecting.

If the two players have no altruism in them, and can reason effectively, they will deduce that their best outcome results from defecting. This means they both defect and get the second worst of the four scores. So sad — and not much point writing a blog about it. Here’s where things get interesting: one-shot interactions are very rare. What happens when we play this game over and over?

Occupy Math lived in Pasadena, California (yes, the place where Big Bang Theory is set) for five years while he was in graduate school. Occupy Math even went to CalTech. The apartment was right across from a large parking lot that was also the local market for illegal drugs. The weird thing is that there were usually police there, but in the great state of California a police officer had to see something exchanged before he could search anyone.

The drug dealers and their customers got used to this really fast. When they wanted to make a deal the dealer would drop a packet of white powder into the ornamental ivy on one side of the parking lot while his customer would put some folded up paper into a drain-pipe of one of the buildings that backed onto the lot. Both were, of course, careful to make sure the police were looking somewhere else when they made their drop.

The pair trying to commit black-market capitalism would then mosey to where the other person had dropped something, pick up the thing, and go on about their day. This is a situation modeled by prisoners dilemma. Cooperation means cocaine in the ivy and money in the drainpipe. Defection consists of baby-laxative and newspapers in an envelope, respectively.

The analysis of the one-shot prisoner’s dilemma says no drugs were sold.

Okay, that did not happen. The one-shot model is too simple! The fact that the dealer will want money tomorrow and the customer will want drugs tomorrow changes their behavior. When the game is played multiple times (iterated to use the technical jargon), it creates the potential for cooperation, even between nominally hostile parties.

Robert Axelrod has published a wonderful book called The Evolution of Cooperation which uses the iterated prisoner’s dilemma model to explain a number of real world happenings. Occupy Math’s favorite example is the Christmas day soccer match in no-mans land during World War I, but there are several excellent examples in the book.

So what does Occupy Math have to contribute?

A number of studies came out showing that software agents playing the iterated prisoner’s dilemma could learn to cooperate. Occupy Math used these as a starting point for a model of human mating behavior that was part of a larger effort to model the AIDS epidemic. This was when the inherent skepticism of the mathematician kicked in and started causing trouble.

Occupy Math started by replicating a study that used a type of software agent called a finite state machine. While the groups of agents being trained fell in and out of the cooperative state, about 95% of the groups were in the cooperative state at any given point after a brief burn-in period at the beginning of training. This strongly confirmed Axelrod’s model and was pretty cool.

At the time, Occupy Math was regularly teaching a course in digital evolution. A student asked him to suggest a final project and Occupy Math suggested doing the iterated prisoner’s dilemma experiment, but with artificial neural nets instead of finite state machines — the student was already working with neural nets.

The neural nets never, ever learned to cooperate!

Occupy Math checked and replicated the student’s results and gave him a good grade — his results were correct. Two years later, another student did almost the same thing, but with a type of evolvable computer code instead of neural nets or finite state machines. Again: zero cooperation. At this point Occupy Math was deeply concerned!

Occupy Math coded up twelve versions of the experiment using twelve different types of software agents: two different ways of coding finite state machines, two different encodings of neural nets, logic formulas, logic formulas with time delays in them, simple lookup tables, probabilistic lookup tables, the evolvable computer code, and three different encodings of arithmetic formulas. The results were appalling.

The table below shows the chance of different types of agents learning to cooperate.

The vertical bars show the range where 95% of the experiments fell with 0 representing no cooperation and 1 representing universal cooperation. The results for encoding finite state machines in two different ways yielded very different results. On the other hand adding delay lines (DEL) to the logic formulas (TRE) has almost no effect. The real problem, however, was that…

The experiment could be made to come out almost any way you wanted by selecting the right encoding for your software agents!

When you are doing science, it is important to identify all the relevant variables and control for them as best you can. At the time Occupy Math did this research there were several thousand published papers using iterated prisoner’s dilemma and none of them controlled for choice of agent type. Of course it took Occupy Math two years and two blatant examples to notice. One of the take-home lessons here is that it is good for a researcher to also be a teacher. Occupy Math might never have noticed this phenomenon were it not for his students (Tony Hammit on neural nets, Mark Joenks on evolvable programs, lets have a hand for the band!)

This lurid picture shows the tracks of ten sets each of evolving agents of three different types over 250 generations of learning to play iterated prisoner’s dilemma. The blue tracks are neural nets (no cooperation), the red tracks are finite state machines (lots of cooperation), and the green tracks are simple lookup tables (highly cooperative). The colors blend where they overlap so “yellow” is lots of finite state machines and lookup tables together.

Look how different the behaviors, over time, are for the three different agent types!

Occupy Math has done many experiments with iterated prisoner’s dilemma, but he will mention only one more. Most of the agents are allowed to base their actions on the last few plays by their opponent. We built in a slowly time-decaying measurement of the plays the agent encountered – a type of long term memory or simple emotion. The big result is that agents with simple emotions play much more effectively than agents without emotions. Having a sense of how things are going in the long term really helps.

The picture below is complicated. Each block is 100 different agent-training experiments. Each experiment is one color track (row) and each block has 100 rows. The agents are with or without emotion. The upper blocks have agents that misunderstand their opponents plays 1% of the time (this is a very real-world feature) while the lower block of agents misunderstand their opponents plays 5% of the time. Red means the agents are defecting, blue denotes cooperation, and green something in the middle. The average score values, within a group, is given by the color bar at the bottom.

Look how much of an impact emotion had on behavior!

The emotional agents have far more diverse behavior than the non-emotional ones. Separate experiments show they are much better competitors. This is one of the coolest experiments Occupy Math has managed to publish. It also illustrates how much imagination combined with the care that mathematics teaches you can help the overall progress of research.

The most important result in this blog is that agent encoding controls the experimental outcome. This represents the discovery of a critical, uncontrolled variable in many published experiments. Occupy Math has also worked with other games, both mathematical games like prisoner’s dilemma and more traditional games. An earlier post talked about making maps for fantasy role-playing games, for example. Generating game elements automatically is a large and growing research area. Comment or tweet if you have game-related topics you would like to see in Occupy Math.

I hope to see you here again,
Daniel Ashlock,
University of Guelph,
Department of Mathematics and Statistics