On the certainty that God exists and why Bayesians should go π

Up to this day I defined my theological position as Agnostic, which is not saying much given the different interpretations and philosophical flavors we have to position ourselves when it comes to God. This is why sometimes I instead simply reply to The Question with something like “Both alternatives are equally crazy, so I don’t know. But, can we use statistics to better describe our position in these kind of philosophical matters, or even dictate how should we live our lives? Yes, we can.

WARNING: Beware agnostics!!! I will show mathematical arguments that might turn you into a full blown Believer or a hardcore Atheist… So if you keep reading don’t say I did not warn you.

Marx
These are my principles…    But if you don’t like them I have others.

If we envision probability as a measure linked to a random process then questions like “What is the probability that God exists?” imply a sort of Supra-God that creates universes with Gods with a frequency p. But then some might argue that this Supra-God is actually God so, at the end, these kind of philosophical questions make no statistical sense for such frequentist interpretation of probability.

Then we have those that interpret probability as a degree of belief on matters subject to uncertainty, this interpretation is the one hold by Bayesian Statistics.

So if I wear a Bayesian hat and I am asked The Question then, instead replying “I don’t know” to describe my ignorance I should reply with “50%” or “p=1/2“. This is so because when Bayesians (The Objective Kind) have no information on a problem they use a plethora of principles in a Groucho style fashion to figure out a prior distribution to kick off Bayes’ Theorem machinery.

But there are an infinite number of prior distributions with an expected value of 1/2 so, which among this infinite number describe better my agnosticism? Is there such thing as a unique agnostic prior to rule them all? Well, it seems this Holy Grail does not exist since we can read in highly commendable Bayesian books like Bernardo & Smith thing like:

In general we feel that it is sensible to choose a non-informative prior which expresses ignorance relative to information which can be supplied by a particular experiment. If the experiment is changed, then the expression of relative ignorance can be expected to change correspondingly. (Box and Tiao, 1973 p.46).

Wait, what? We change the experiment and our prior ignorance changes too? In fact not all Bayesians agree with their existence; (Howson 2002; O’Hagan 2006; Press 2003) they regard any Bayesian Objective “non-informative” priors simply as well formed beliefs… So I’ll pick on the Subjective kind interpretation and in this post I am going to well form my belief in God.

Plus, in the process of cooking my Agnostic prior I’ll discuss why Bayesians should measure their beliefs from 0 to π instead from 0 to 1; This later measure is too frequentist for them and π makes more mathematical sense since trigonometrical functions are going to naturally pop up everywhere in our prior belief endeavor.

The Game

When doing point estimation we usually choose the least wrong possible value in one way or another, but does it makes sense to use this approach in philosophical matters?

Absolutely! The great René Descartes in his amazing Discourse on the Method explained how until he would find anything to be true he would choose to be moderate in everything both, for practical reasons, and to avoid to be too far away from the truth… By the way, this least wrong approach also works to lie to the Senate and get away with it.

But in this case I do not want to be the least wrong possible, I want to be the most right possible! And this is so because if I am wrong I do not care for how much, so in order to maximize my chances to be right I will pick from the resulting distributions the mode instead any other value since this is the one with highest probability density.

But let’s play the game and create our own collection of “non-informative” priors, and for doing so we will look for ways to maximize, minimized or go indifferent in whatever way we can possibly imagine in our formulas, always keeping in mind  we are well forming beliefs… so here we go.

2. The Sinus Agnostic Prior

Let’s say F(x) is a CDF, then we can decompose F(x) with a Taylor Series as:

F(x) = \sum_{n = 0}^{\infty} F^{ \left( n \right)} \left( a \right) \frac{\left(  x - a \right)^n}{n!}

So since I have no reason to belief or disbelief in God we will apply the following indifference principles on this series expansion:

  • a will be the middle point for F(x) support.
  • Since we are indifferent to any value the PDF must be symmetric and  a simple way to achieve this is to remove all component powered to even numbers in F(x).
  • Since we are indifferent about the sign of the derivatives we will alternate its signs.
  • Once we have the sign, derivatives might have any value from 0 to infinity, but we do not have middle / indifference point in such interval. Derivatives though can be associated to angles from 0 to π/2 (tan(angle)=b/a). The middle point thus will be π/4 which equals to derivatives having a value of one.
  • Once achieved a function with our desire indifference properties we will apply simple transformations to turn the result into a proper CDF. (this makes in the previous step not necessary to have a value of one but equal values everywhere giving every derivative the same indifference weight)

If we now apply all these indifference principles we are left with:

F \left( x \right) = \sum_{n = 0}^{\infty} \left( - 1 \right)^{n}  \frac{\left( x - a \right)^{2 n + 1}}{\left( 2 n + 1 \right) !}

Which happens to simplify into:

F(x)= -\sin (a-x)

Now, probabilities are measured from 0 to 1 since this is the natural way to represent frequencies, that is #cases / # total cases. But if we are dealing with a belief any range would be all right to measure it. In this case, and since we just got a trigonometric function as a CDF, the natural range to measure beliefs should be from 0 to π to capture all the properties of the function. If we do so the middle point will be π/2 and F(x) further simplifies into:

F(x)= -\cos (x)

which means that the PDF f(x) from 0 to π is:

f(x)= \sin (x)

Like any PDF measuring probabilities from 0 to π the integral in its support should equal π, but in this case it equals to two, which means we need to scale f(x) to have π.

pdf.sin

Obviously, f(x) could be twaked to fit the support from 0 to 1 but, again, since we are working with a belief system and trigonometric functions appear nonchalantly then π is just the natural way to go.

2.1 Interpretation

Applying these indifference principles of ours to F(x) has rendered a PDF which shows that the beliefs values zero and π (or one if standardized) have a density of zero. This could be interpreted as a limit to our knowledge on the question we are asking, that is, that no amount of information will ever update from 0 the two values that would give us certainty about the existence or nonexistence of God.

In other words, it is not reasonable under this prior to completely belief or disbelief in God since 0 and π will never be the mode, mean or median for the distribution no matter how much information is used to update the prior belief.

Also, the fact that π/2  (1/2 if standardized) is the mean, mode and median shows that, under these indifference principles, the most reasonable stand is Agnosticism, or better, Sinus Agnosticism.

3. The Jeffreys’ Atheist / Believer Prior

So for the Sinus Agnostic prior we applied our indifference principles to the probability distribution F(x). Let’s see what happens when we apply those very same principles to data itself.

If F(x) is a CDF for the random variable X, and we consider beliefs from 0 to π for the reasons explained above, then F(X)~U(0,π), which means that X=F-1(U) where U~U(0,π).

Since \exists a \in \left[ 0, \pi \right] |F \left( E \left( X \right) \right) = a If we now expand F-1(U) around a we have:

X = \sum_{n = 0}^{\infty} F^{- 1 \left( n \right)} \left( a \right)  \frac{\left( U - a \right)^n}{n!}

So applying the same indifference principles as before we have:

X = \sum_{n = 0}^{\infty} \frac{\left( - 1 \right)^n}{\left( 2 n + 1 \right)  !} \left( U - \pi / 2 \right)^{2 n + 1}= -\cos(U)

But since X is measuring beliefs from 0 to π we need to center X around π/2 and scale the support to π. To do so we will use the following

X_{a, b} = \frac{b - a}{2} \left( \frac{X}{\max \left( X \right) - \min  \left( X \right)} \right) + \frac{b - a}{2}

Therefore

X_{0, \pi} = \frac{\pi}{2} \left( X \right) + \frac{\pi}{2} = \frac{\pi}{2}  \left( 1 - \cos \left( U \right) \right)

And follows that

F^{- 1} \left( u \right) = \frac{\pi}{2} \left( 1 - \cos \left( u \right)  \right) \Rightarrow F \left( x \right) = \arccos \left( 1 - \frac{2 x}{\pi}  \right)

And the derivative for this CDF renders Jeffreys’ Prior for beliefs from 0 to π

pdf.Jeffreys

3.1 Interpretation

When applying the very same indifference principles to the data itself X instead to its CDF we have just rendered Jeffreys’ Prior. The values 0 and π (one if standardized) have an infinite density which could be interpreted as the two more reasonable positions in the mode sense explained above.

This distribution would indicate that, even though we don’t know if God exists, we should live our lives either like if God would exists or like if God would not exists at all. In other words, whatever the truth is makes no sense to live our lives in a “I don’t know” stand.

4. Batman Prior

How about if we take our indifference further and we are indifferent about being indifferent on how the CDF behaves or on how data behaves? Then we can use an indifferent mixture of of the two previous priors to achieve this cool looking prior:

pdf.batman

Which does not change much the interpretation from the one for Jeffreys’ Prior since we still have 0 and π going to infinity… But still cooler than Jeffreys’, right?.

5. Going Hyperbolic

So we could actually go on by tweaking this principle here and there and make it fit whatever indifference property that suits our beliefs. For example, in the indifference principle for choosing sings we decided to alternate positive an negative signs, but choosing all positives keeps the symmetry around π/2 and, in a way, expresses our commitment whatever values are considered instead trying stay “centered” by switching signs.

So if we set all signs positive in the previous steps this turns into hyperbolic all the trigonometric functions in our calculations. This results into what Bayesian call improper prior distributions, which is another way to say “this is not a probability distribution but let’s use it anyhow and see what happens”.

But, anyhow, since we are talking about beliefs we can interpret the hyperbolic results like the extreme siblings of the previous results since now their support takes all the real numbers and they are not useful to talk about beliefs from 0 to π.

Nonetheless it is interesting how when going hyperbolic on F(x) and inverting the results to achieve a CDF we have the following distribution:

pdf.sech

And when going hyperbolic on the data X and squaring the results to achieve a proper CDF we have a Cauchy distribution:

pdf.cauchy

And we can go on, for example, relaxing the “no even factors” indifference principle breaks symmetry but we can regain symmetry adjusting the result by squaring it and, thus, achieve an Standard Normal distribution… So many ways to be indifferent, right?

6. Conclusion

So what now? Things are still a matter of belief but now I have compelling mathematical reasons that forces me to make a pick. When I went full indifferent, that is, being even indifferent in the indifference prior that I have to choose, then the Batman prior was rendered, but this prior suggests that I should live my life either like an Atheist or like a Believer.

The middle point, though it is the next more probability dense choice, is at an infinite distance from 0 and π so, If I want to be coherent with my absolute indifference and lack of knowledge that I have always claimed about the The Question… I have no choice but to make a choice.

Damn! Am I a Batman π believer?… I can’t seven asterisks believe this!!

UPDATE. The Lex Luthor Prior

After rendering our superhero Batman Prior that forced me to take sides I wondered if there was any super-villain prior that would allow me to stay in my cozy theological “I don’t know” answer so, since the Batman Prior came from the arithmetic average mixture of the indifference priors we had, I just wondered what would I get if instead the arithmetic average I would use a geometric average for the mixture, this is the result named after Lex Luthor for obvious bald reasons:

pdf.lex

So by the powers of Bayesian Subjectivity I am not forced anymore to make a choice, so I am back claiming my π/2 belief for The Question… Ah, Bayesian priors, you gotta love them; there is always one that makes you happy.

7 thoughts on “On the certainty that God exists and why Bayesians should go π

    • rammeez,

      Well, the “Discourse” rather sets the rules of the game, the full development of the proof of God existence is in his “Meditations on First Philosophy”.

      Now that I have too much time available I took the opportunity to revisit these two master pieces. It is a pretty amazing reading specially when considering the time when they were done!

        • Well, in my opinion the first cognitive revolution came from the hand of Socrates; the first man in history (as far as I know) that realized he truly knew nothing.

          But I agree that the second one came from the hand of Descartes; the first man finding an undeniable truth (we exist while we think) and a methodology to find more.

          • To be clear: The bit from Chomsky was about the cognitive revolution in the technical sense. So as Herbert Simon says, cognitive science started in 1956 with people like Simon, Chomsky, Miller, Bruner and others, putting forth radically new ideas challenging the dominant paradigms at that time. And people call that the “cognitive revolution”. But Chomsky often reminds people that similar questions were being discussed by the Cartesians back in the 17th century, when automatons were being made and incredible questions about human abilities and solutions to them were being presented. So in Chomsky’s view the first cognitive revolution took place in the 17th century while the one in 1956 should either not be considered a revolution at all or should be referred to as the second cognitive revolution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s