A defense of strong atheism

This post is addressed at weak/agnostic atheists and makes an argument for strong atheism. If you believe in God, stop right here. The post will be useless to you, since you are probably not being rational about your beliefs anyway. If you are honestly seeking the truth though, read on.

This post has had a long time to mature in my mind so that, finally, I feel that I have the ability to express the idea clearly. Let me summarize the idea right away: the probability of an almighty God existing is zero, due to Occam’s razor, which is a mathematical necessity. Now, usually, atheists take the position of weak atheism meaning that they merely lack the belief in God and find the probability of him existing to be low, a 6 out of 7, as Richard Dawkins would say, where 7 would be certainty that he doesn’t exist. I would like to take the final step from 6 to 7 and have the audacity to think to be well-grounded in taking that step. Bear with me, dear reader, since we will have to delve into the mathematical areas of probabilistic inference and algorithmic probability in order to arrive at a clear understanding of the argument.

Burdens of proof

Strong atheism, also known as “gnostic atheism”, has a much more difficult stance than weak atheism. The former carries the full burden of proof that God doesn’t exist, whereas the latter merely has to rebut theist claims of God’s existence, which, to be honest, is child’s play compared to the strong position. It merely suffices to point out that the theist can not prove the existence of God in order to remain unconvinced and to settle into the comfortable position of “I don’t know”, of agnostic / weak atheism.

That being said, the reader can take some pop corn and curiously wait for me to provide evidence for God’s non-existence. But I will not do so and instead conclude his non-existence without any evidence for it, while still making a completely rational step of reasoning. However, in order to understand it, I have to give a little introduction into probabilistic reasoning.

Making inferences

The mathematical discipline of probability theory provides us with a precise framework for making inferences. First, I will introduce it and then explain to it works intuitively. Let’s say, we have some (non necessarily finite) set of hypotheses \{H_0, H_1, H_2,\ldots\}. Maybe H_0 is the hypothesis that the world has been created by an almighty God. Further, let D be the set of data or evidence that we have received about the world.

Let P(H) be a probability distribution that encodes our prior beliefs in one of the hypotheses. For example, suppose there are only two hypotheses, H_0=earth is flat and H_1=earth is round. Then P(H) expresses what we believe before we receive any evidence for either of the hypotheses, e.g. we might not have any preference for any of the two hypotheses, which is why it is called prior distribution. This would make P(H=H_0)=0.5 and P(H=H_1)=0.5. Note the rule that probabilities have to sum to 1, which encodes our belief that some hypothesis has to be true, i.e. the earth has to be either flat or round. In a similar way, the evidence is distributed according to P(D).

In order to make inferences, we have to connect the hypotheses with the evidence, since hypotheses explain evidence more or less well, which make various hypotheses more or less likely. This is expressed by the so-called likelihood P(D|H), read as probability of data D given hypothesis H. For example, D might be the observation that when we go out into space, we see a round earth; let’s abbreviate this observation data with ‘space’. Then, we have to to get encode our intuition that observing a round earth from space making the hypothesis of a round earth more likely. Hence we would write P(D=\mbox{earth looks round in space}|H=\mbox{earth is round})=0.95 and P(D=\mbox{earth looks round in space}|H=\mbox{earth is flat})=0.1, since if the earth is round, it is likely to look round and if it is flat, it is unlikely that it looks round. But it still may if we only had a single snapshot from above and the earth is a flat with round borders, like a plate. Hence, the we wrote 0.1 instead of 0.0.

Now we finally want to make conclusions which are computed by the so-called Bayes rule:

P(H|D)=\frac{P(D|H)\cdot P(H)}{P(D)}

The posterior probability P(H|D) of the hypothesis H given data D can be computed from the likelihood P(D|H) the prior P(H) and the evidence P(D). The posterior encodes what you should believe after having seen the evidence, if you make a rational inference. Keep in mind that this is all very basic probability theory which can be read upon in any book on Bayesian inference.

The point of this exercise is to sharpen the readers attention to the prior P(H). Several comments are important. First, there is no inference without priors. This is very important. You can not arrive at conclusions without having made any assumptions. You can assume that every hypothesis is a priori equally likely or any other distribution but any choice is an assumption. Second, the prior is in the nominator which makes the conclusion probability of a hypothesis P(H|D) proportional to the prior P(H). This reflects the fact that if you are biased toward some hypothesis a priori, you will also conclude its truth more likely after seeing the evidence a posteriori. The evidence may weaken the hypothesis but it will weaken it less so, if the prior belief in it was strong.

Here is important observation to be made: if you are a priori convinced that some hypothesis H_0 can not be true, i.e. P(H=H_0)=0, then the posterior of that hypothesis will also be zero. The same happens, if you are to 100% certain about a belief: P(H=H_0)=1. Usually, this is what we call dogmatism: if you hold on to a belief with absolute certainty, no amount of contrary evidence can make you think differently. Usually, this is a sign of bad reasoning since one can usually not justify such a strong prior. After all, our prior beliefs, if we reason rationally, come from previous conclusions, i.e. P(H)=P(H|D=\mbox{previous experiences}), and have therefore been formed through likelihood and previous priors. If previous priors have not been zero or one, then previous conclusions are neither and neither our current prior. The bottom line is, usually, when we start at not knowing anything and make rational inferences, we never arrive at dogmatic positions from which can’t be shattered by any amount of evidence. A rational person always keeps his belief distributions slightly above zero and slightly below one, so that he is still open to change his mind in a rational way, if the evidence demands so.

Prior beliefs

Let’s consider the God hypothesis: the existence of an almighty creator God. Bayesian inference tells us, that there are only two ways of arriving at the non-existence of God, i.e. of P(\mbox{God}|\mbox{Evidence})=0. Either the likelihood P(\mbox{Evidence}|\mbox{God}) is zero, or the prior P(\mbox{God}) is zero. Now the likelihood can not be zero since if we assume God’s existence, everything can be perfectly explained due to his interventions, wonders and creations. The only way to arrive at zero posterior is to set the prior to zero. But would that not amount to dogmatism as just argued? If I assume that God doesn’t exist then it is easy to prove that he doesn’t exist. We merely get a circular argument of a dogmatic atheist. However, as I will argue, this is not the case, since the laws of mathematics themselves do not allow to assign God a non-zero prior. Read on.

A core property of probability distributions is that probabilities must contain values between zero and one (inclusively) and also have to sum to one. Something has to be true. This means that if the number of hypotheses is large, they all have to squeeze themselves into the interval [0,1]. For example, if you toss a coin, the prior of seeing “head” might be 0.5, and for “tail” also 0.5. But when you throw dice, there are not 2 but 6 possible outcomes, which makes you assign a prior of 1/6 = 0.1666.. to any particular outcome. The more hypotheses there are, the more you have to smear out the probability mass among the hypotheses and the less probability each one of them will receive. Now, of course, the prior is somewhat arbitrary and nobody forces us to assume a uniform distribution. We could still set the prior of throwing a “3” to be 0.9 and give the other five values a probability of 0.02 each. Then it will still sum to 1 (0.02 + 0.02 + 0.9 + 0.02 + 0.02 + 0.02 = 1.0). Thus, the mere presence of a higher number of hypotheses does not force us to choose as uniform prior. However, in order to avoid being biased toward any particular hypothesis before considering the evidence, we should choose a “fair” distribution, such that we can remain maximally open for evidence.

Occam’s razor

To summarize, if we have n hypotheses, then the uniformly distributed prior would assign the probability of 1/n to each hypothesis. But what if we have got an infinite number of hypotheses? After all, there can be an infinite number of possible explanations for a given observation. Then, one might want to assign the probability of 1/\infty to each one, but that is zero, and an infinite amount of zeros don’t sum to 1, this is mathematically a not defined operation. Therefore, we can’t help but assign a different prior to different hypotheses. For example, we might enumerate all hypotheses somehow and call the ith hypothesis H_i. Then we might define for example P(H_i)=2^{-i}. Then, the sum P(H_1)+P(H_2)+\cdots = 2^{-1}+2^{-2}+\cdots will converge to a finite number and is therefore normalizable to 1 (that’s a theorem in calculus; don’t bother, if you don’t get this particular point)

We conclude that in the case of an infinite number of hypotheses, we have to assign them different prior probabilities, so that their sum equals 1. But on what basis should we do that? At that point we must further define, what we mean by hypothesis. It might be an almighty God, or the process of evolution or some other process or cause. In any case, hypotheses need to have a description expressed in some language. Some hypotheses are simple, which means that their description is short. Other hypotheses require long descriptions. Formally, in the so-called algorithmic information theory, descriptions are computer programs that are executed and lead to an output — the data or observation, that is explained/computed in this way. The length of the description is often referred to as its complexity.

This can be expressed much more precisely, but I don’t want to burden this post with too much mathematical detail. The point is this: the more complex a hypothesis is (the longer its description) the larger is the number of hypotheses of the same complexity. For example, when we talk about binary strings, the number of strings of length n is 2^n. This is also intuitively obvious, if there are 10 possible murderers there are many more combinations of how a crime could have happened than if there was only one murderer. The number of hypotheses grows exponentially with their complexity.

Now, given that the sum of probabilities for each hypothesis has to be 1, it means that we have to assign lower probabilities to complex hypotheses. This is what is called Occam’s razor: assume a priori, that a complex explanation is less likely. It is a mathematical necessity since there are so many complex explanations and their probabilities have to sum to a number below 1. Let’s now take the final step: the mathematical limit. What about hypotheses of infinite complexity? There are infinitely many hypotheses with infinite complexity. If we assign a non-zero probability of each of them, the sum will also be infinite. Therefore, their probability has to be zero, if we want to treat them equally.

God’s nonexistence

An almighty God is a hypothesis of infinite complexity. Why? Because otherwise, we could think of a world that he couldn’t create. Since infinitely complex hypotheses have zero prior probability, hence a priori false, an almighty God does not exist. That’s it.

Now, one could think that out of all infinitely complex hypotheses one could assign a non-zero prior probability to an almighty God. Mathematically, nothing speaks against it. However, this is what it means to be biased. If we want to make unbiased, rational inferences, we have to treat all hypotheses of equal complexity equally.

This argument could be formalized in much more detail and is essentially a consequence of Solomonoff’s theory of inductive inference which has been proven to find optimal and predictive explanations for data.

One could also choose a more intuitive way to express the thought: since there are an infinite number of possible ways to explain things and an almighty God is merely one of them, the probability that God is the correct explanation approaches zero in the limit (= equals zero), if one tries to be rational and unbiased.

Atheists usually understand this point intuitively, since most atheists do not assign a probability of 50% to God’s existence. After all, not having evidence in favor of God, should lead one to some position of indifference at 50%, right? Wrong. Most atheists understand the point made above intuitively, which is why they often say that they find it quite unlikely that God exsits (a 6.9 out of 7). The reason why they don’t go all the way to the 7 is simply that humans are bad at probabilistic reasoning especially when it comes to taking mathematical limits. Humans have difficulties discerning small probabilities from very small ones although there can be orders of magnitude between them. Similarly, we have troubles discriminiting infinity from a very large number. However, if one thinks clearly about these matters with the help of mathematics, one can’t help but taking the limit: God does not exist.

Another reason why atheists are reluctant to go all the way is because they want to practice openness and readiness to change your mind if evidence in favor of God is given. Richard Dawkins said at some point, if the stars rearranged and made up the words “I am your God” (something like that), then he would believe. However, someone else said that any sufficiently advanced technology must appear as magic to a society that is not as advanced. Hence, chances are that there are aliens possessing the ability to rearrange the stars and to make fun of us by arranging them into such messages. Since powerful, advanced and humorous aliens definitely could exist with non-zero probability, the likelihood that its actually aliens making jokes is much higher than that it is an almighty God. In fact, it is infinitely more likely (since any finite number divided by zero gives infinity in the limit). Thus again, our inability to discern small numbers from very small ones or zero is the reason for wrong conclusions. A similar argument can be put forward for any finite observation, even if it is mind boggling. Thus no finite amount of evidence can ever point to God.

Of course, it is difficult to say these things in public, since people who did not even arrive at weak atheism, will not understand these things and assume that one is simply dogmatic and doxastically closed, just the same way as believers are often accused to be. However, as this post has hopefully clarified, there are good reasons for believing in God’s nonexistence a priori.

This entry was posted in atheism. Bookmark the permalink.

Leave a comment