My previous post got me thinking about the nature of infinity. It lead me to a thought provoking post by Steve Patterson on logical errors regarding infinite sets. While I haven’t taken enough time to fully appreciate his logic, it got me thinking about how probability relates to metaphysics and epistemology. Steve’s claim is that modern mathematics has made a mistake with regards to infinity, specifically that by definition a set cannot be infinite (or else it would be finite), and thus arguments built on this premise are unsound (though the logic, granting the premise may be valid). Steve does a great job illustrating this point in numerous posts, which I won’t dive into here. The least I can do is encourage you to check them out for yourself.

All of this reminded me of a John von Neuman quote, which I paraphrase as “tell me precisely what a machine can’t do, and I will build a machine that can do just that”. My interpretation of what von Neuman means is that a machine is only limited by our ability to provide a precise definition. Computers can’t represent infinite sets, or at least they can’t truly represent infinite sets (just bounded finite rational approximations). Steve Patterson argues this is because infinite sets don’t exist, as this contradicts their conception. Of course, computers can represent an arbitrarily large collection of objects (or an arbitrarily precise rational approximation of a real number), limited only by the finite memory of the computer and the time it takes to compute it.

What’s another concept that a computer can’t truly represent? How about this: randomness. A computer can’t produce a truly random number (only a pseudo-random one). But what is a truly random number? What is meant by “randomness”? Some would describe a random outcome as one that has no discernible rule that determined it. Is this even possible? Its hard to conceive, but that doesn’t necessarily make it impossible. Randomness doesn’t seem to be a contraction in the same way that an infinite set is. Is our definition of randomness sufficient to describe what we mean? What about all of that talk in quantum physics about the universe being inherently random? This brings back the premises of the Bohr and Einstein debates, my favorite response to which is from E.T. Jaynes. All of the philosophical debates aside, what would probability theory look like in a universe that is both finite and non-random i.e. deterministic?

Let’s briefly explore this concept.

Suppose that there is an outcome of some process that interests us. We will refer to it as \(y\) and to us the value of \(y\) is unknown, but since the universe is finite, the number of possible values \(y\) could be is also finite. Let’s presume we can enumerate the possible outcomes and collect them in the set \(Y = \{y_1, y_2,...,y_n\}\). Because the world is deterministic and we have enumerated all the possible outcomes via \(Y\), we have that \(y\) must be equal to one and only one element of \(Y\).

Since the universe is finite and deterministic, there exists a finite ordered tuple of observed variables \(x = (x^{(1)},x^{(2)},...,x^{(m)}\}\) whose values completely determine the value of \(y\). In other words, \(y = f(x)\). There are no ifs, ands, or buts about it. \(y\) is completely determined by \(x\). Thus the probability that \(y = y_i\) for some \(y_i\) is either \(1\) or \(0\), being \(1\) for exactly one \(y_i \in Y\).

But what if we were missing the value of one of the determining variables e.g. \(x^{(j)}\)? This is where probabilistic reasoning has its first non-trivial introduction into our finite and deterministic world. Since our world is finite, \(x^{(j)}\) must only be able to take on some finite number of possible values. In our case, let’s capture these in the set \(X^{(j)} = \{x^{(j)}_1, x^{(j)}_2,..., x^{(j)}_k\}\). Let’s refer to the incomplete set of observed variables (missing only \(x^{(j)}\)) as \(x^{(-j)}\). Since we know the possible values of \(x^{(j)}\), we could compute \(f(x^{(-j)},x_h^{(j)})\) for each \(x_h^{(j)} \in X^{(j)}\). For each computation we would end up with a value from \(Y\) i.e. \(f(x^{(-j)},x_h^{(j)}) \in Y\). Some \(y_i\) will be featured more than once, others might not be featured at all. Regardless we will have a set that yields a \(y_i \in Y\) for each \(x_h^{(j)}\) given \(x^{(-j)}\) and \(f\) (this “given” part will come up in a bit).

After performing the finite computation, we will can compute probabilities as follows:

\[P(y = y_i|x^{(-j)}) = \frac{1}{k}\sum_{h=1}^k 1_{\{y_i = f(x^{(-j)},x_h^{(j)})\}}\]

That is, the probability that \(y = y_i\) given \(x^{(-j)}\) is exactly the number of times that \(f(x^{(-j)},x_h^{(j)})\) would produce \(y_i\) when considering all possible values of \(x^{(j)}\). In fact, given any missing subset of \(x\) we could compute a probability given the cartesian product of the possible values of the missing variables. This may seem like a reasonable approach to the notion of probability, but there are a number of items we have not addressed:

What determines the values of \(x\)? Is there some finite regress that gets us back to an initial condition? It can’t be infinite, as we said the universe is finite. It can’t be random, as we said the universe is deterministic. If we are missing some \(x\), is there a set \(z\) that sufficiently determines the missing vlaues?

Are some values of \(x^{(j)}\) in some way more likely than others? What determines this? This is clearly related to our previous question. If likelihoods are different, should we not consider these likelihoods in computing the probability of \(y\)?

How do we know what the elements of \(x\) are? Is there some sense that we can tell if \(x\) is missing variables? How do we deal with redundant variables i.e. variables that are excessive or not necessary? My thought is that this would relate to the variation in observations of \(y\) over time, but what if \(y\) is a one-shot process?

Where does \(f\) come from? Is it an element of \(x\)? If its form is dependent on the members of \(x\), and it is required to determine \(y\), then certainly it must be in \(x\). If we don’t know \(f\), can we consider a set of all possible functions?

What do we do if we don’t know any elements of \(x\) i.e. we are completely ignorant of the variables and function that determine \(y\)? My sense is that this is where the principle of insufficient reason enters the scene.

How do we start from a state of ignorance about \(x\) to a state of certainty about \(x\)? Our certainty about \(x\) directly relates to our certainty about \(y\). My intuition is that this is where collecting data and making observations come from, but there is no garuntee that our finite set of observations will be sufficient to infer the overall system i.e. Hume’s problem of induction.

There are many more questions we could ask, but for now I am going to leave this be. While it might be too strong to assume the universe is finite and deterministic, we can at least say that a computable universe must be i.e. if we wish to have a probability theory that is computable, it will have to represent the world in a finite and deterministic way.