Occupy Math often tries to find click-baity titles for his posts. This week is not an exception, but it is unusual in that a phrase that Occupy Math heard more than ten times at the IEEE 2017 Congress on Evolutionary Computation in Donostia-San Sebastian Spain last week. In other words, the curse of dimensionality is a real thing. This week’s post looks at the very strange behavior of normal-seeming objects when we create higher-dimensional versions of them. This strangeness often corresponds to problems getting much harder as we increase the dimension, hence the use of the word “curse”.
Big issue – your intuition is shaped by two-dimensional and three-dimensional objects and is just wrong in higher dimensions.
Let’s start by explaining what on earth “higher dimensions” are. Occupy Math has frequently had students object that three is the number of the dimensions and the number of dimensions is three. Up and down, left and right, toward and away, and we’ve got our entire field of view nailed down. What’s this stuff about extra dimensions? At this point a bright student will chip in that time is actually a fourth dimension, but still, that’s all we need, right? This whole discussion gets into the abstract nature of math.
Suppose we are working with a robot arm. It has a base pivot (one angle), three elbows (one angle each), can rotate its manipulator (another angle), and can control how far apart its two fingers are. Examine the picture above to verify all of this. Then, to describe the position of the robot’s arm, you need to know six numbers. This means that the diagram of how the robot arm can be positioned is a surface in six (abstract, mathematical) dimensions. The dimensions do not need to be “real” to be very useful in figuring out what’s going on. Also notice that this six-dimensional object is only a small part of 6D-space; between the total length of its parts and problems with bumping into itself, the robot only accesses a very small fraction of the space its motion diagram lives in.
To supply another example, suppose that you have a consumer questionnaire with 20 questions. Some are yes-no, some are a five point preference scale from “really like” to “totally hate” and some are numbers (like income). If there are 20 questions, then the data from that questionnaire are living in a twenty-dimensional space!
But what goes wrong in higher dimensions?
When you’re working on a math problem in a bounded domain, you often take samples inside the area of interest. The three images below show samples out of a square using 3, 5, or 7 samples along each side of the square. The number of points goes up as the number of samples along a side, squared. That makes the number of sample points you need to look at go 9, 25, 49. Not too bad, eh?
What happens when, instead, we take thee samples along each side of the “square” but go square, cube, hypercube, hypercube in five dimensions, and so on? Assume we are only taking three samples per side of the square, cube, hypercube, or whatever. Let’s make a table of the number of sample points based on the dimension.
An important issue to keep in mind is that three samples along each side of a cube-like-object is not enough. It isn’t a representative number of samples. As weird as this sounds:
In fifty dimensions, 717897987691852588770249 points is not even close to enough samples!
At least this huge number of points is not enough if you want to accurately model interesting phenomena taking place in a fifty-dimensional hypercube with a grid of sample points. If he really wants a good set of sample points in this many dimensions, Occupy Math would do something else (there is an example near the end of the post). If fifty dimensions sounds like an insane number to you, remember all you would need to get into the kind of situation where such sampling was needed is to measure fifty facts about potential customers, which is not outside of the possible or even the useful. Notice, on the other hand, you will never, ever, ever have that many customers. This highlights the next problem: the number of possibilities vastly exceeds the number of actual samples when the dimension goes up! This brings us to the issue of sparsity.
What is sparsity?
Informally, sparsity means “really spread out” or “scarce on the ground”, but lets do a math example with pictures. Since we’ve already started on them, lets talk about cubes some more. A point in a cube (in n dimensions) is just a list of n numbers that say where the point is. To be in the cube at all, all the values must be between zero and one, inclusive. Suppose we are looking at the problem of dividing up something — like profits or dessert. Then we want the coordinates to represent the fraction of the profits each of n different people get. This gives us our motivating question: if a cube has a length of one along each side, what fraction of the points inside the cube have coordinates that add up to one or less? Let’s look at this special set for the square (which is also called the 2-cube). These points are shown below in light blue. The sides of the cube are labeled “x” for the horizontal coordinate and “y” for the vertical one. Half the points are in the special set.
Now lets take a cube that is length one on each side. The side labels are “x” for horizontal, “y” for vertical”, and “z” for the third dimensions which is shown as going into the screen at a slant. Key point: the points shown in various shades of blue are a lot less than half the cube. In fact, they are exactly 1/6th of the cube. What’s going on here is that, because you’re adding up three numbers instead of two, the numbers need to be quite a bit smaller, at least on average.
When you look at the example provided by the square, you think “half the points have coordinates that add up to one or less” but, as you increase the dimension, the fraction drops off fast. Let’s tabulate again. The first statement is a mathematical tautology (obviously, almost stupidly true statement) that included to show you what happens in one dimension. Remember a “cube” in n dimensions is the set of all points that have all of their coordinates in the range zero to one.
- In one dimension there is a 1 in 1 chance that the one coordinate of a point in the line segment (one dimensional cube) will add to no more than one.
- In two dimensions there is a 1 in 2 chance that the two coordinates of a point in the square will add to no more than one.
- In three dimensions there is a 1 in 6 chance that the three coordinates of a point in the cube will add to no more than one.
- In four dimensions there is a 1 in 24 chance that the four coordinates of a point in the hypercube (four dimensional cube) will add to no more than one.
- In five dimensions there is a 1 in 120 chance that the five coordinates of a point in the 5-hypercube (five dimensional cube) will add to no more than one.
This is getting out of hand really quickly! Occupy Math’s readers with a bit of experience in probability or combinatorics will notice that the chance of getting a point whose coordinates add up to no more than one is dropping off as the factorial of the number of dimensions. This means that the fraction of the points in an n-cube that sum to one is 1/n times the fraction in the last cube. This fraction is shrinking faster than exponentially.
This special sort of point, with coordinates that add up to no more than one, is actually of interest in some real world problems, but it also stands as a surrogate for any interesting type of point. As you increase the dimension, the fraction of the space occupied by interesting points is going to drop off in some appalling fashion. Another name for this type of situation is combinatorial explosion.
What can you do about this?
Occupy Math uses evolution to pack points into high-dimensional sets with the goal of evolution being to space a pre-determined number of points out as much as possible. This is a useful technology. Packing points into the red-blue-green color space, for example, permits the selection of maximally contrasting colors. This space is a cube with each dimension corresponding to one of the basic colors for the RGB encoding scheme. The value zero means “none” and one means “as much as possible” for each of red, blue, and green. The number given by the Greek δ is the target distance between the points in the RGB cube. Of course, this is only a three-dimensional example, but the ones in high dimensions are really hard to explain. On the other hand, this technique is a way around needing a berjillion sample points in a sampling grid.
The pictures below are the results of using evolution to space out points in the RGB cube. They are displayed around a circle in sorted order by overall brightness — but the RGB values used to generate those colors were the ones spaced out in a three-dimensional color cube with a digital evolution algorithm.
Occupy Math hopes that this introduction to the weirdness of higher dimensions and the issue of sparsity helps you understand some of the things going on when people try to work with big data or abstract math. This post also firmly connects abstract math to applied issues like understanding survey data. The take-home message is that, if you are working with multifactorial data, don’t rely on your intuition from two and three dimensions: it won’t work. Do you know of other weirdness when there are more dimensions? Don’t be shy, comment or tweet!
I hope to see you here again,
University of Guelph,
Department of Mathematics and Statistics