R corset: Bringing Math models back in shape

So your perfect ideal mathematical model returns values that are impossible; probabilities bigger than one or smaller than zero, negative stock market values, et cetera, and now you feel like quoting George Box… again.


Sometimes the mathematical model embeds a solution to keep things real like, for example, logistic regressions. However, very often many popular models like ARIMA offer no possibility to bound its results within business or scientific constrains, and then what? These are a few common options:

Continue reading

The I Ching, random numbers, and why you are doing it wrong

One would think that humanity would not have a need for good random number generators until computers and simulations were invented since, for most practical purposes, tossing a coin or throwing a die should suffice us all. So you can imagine my surprise when I saw in this four to five thousand years old Chinese divination book called I Ching a RNG algorithm that reminds modern Linear Congruential Generators! But why the need for such a complex procedure to render random numbers?

         Artemisia Stems

The I Ching divination process requires to randomly select two trigrams via a rather convoluted process using either stems of Artemisia or Yarrow. And although I acquired this ancestral book a long, long, time ago, truth is that when reading it as an oracle I always used the simplified version for lazy busy people consisting in simply tossing three coins and checking the combination of heads and tails.

I always thought that the traditional form was just a magical way to do the same thing that we can do by tossing three coins, but today, for no particular reason that having too much free time in my hands, I gave a deeper mathematical look to this traditional form and it turns out that it renders a complete different random result that tossing three coins!

Well, a mathematical curiosity you might think, but does it matter? It might! Millions of people seek advice using the simplified coin version to render the I Ching Yin Yang oracles. In this post I will show how the three coins method yields an equal proportion on Old Yin and Old Yang oracles signs whereas the traditional method yields three times more Old Yang signs than Old Yin!

This means that The I Ching, in its traditional form to draw oracles, promotes Yang behaviour over Yin, that is, it promotes among its users action, imagination, creativity, strength whereas, nowadays, with the simplified three coin version, the active and passive answers are even out.

I am not a sinologist nor a psychologist so I cannot really tell what version would have a better influence among practitioners lives, but I know though that the traditional form promotes Yang among those seeking advice which, at first glance, seems like a positive thing to do and, since this book is used by millions of people, maybe experts in the field should advice to practitioners not to use three coins anymore when using the I Ching. For those interested in having a traditionally sound oracle in terms of probability, I will show a few simple ways to achieve just that at the end of this post.

This book has impressed mathematicians like Leibniz, psychologists like Jung, poets like Jorge Luis Borges and all kind of intellectuals all over the world for centuries. And regardless you believe or not whether it has magical properties, what is certain is that it has deep psychological sapiential ones. This is not only the oldest book in human history, but a beautiful one. So, before we plunge into the mathematical details of the traditional algorithm to draw oracles, let’s share this poem from Borges about the I Ching to break the ice.

For a Version of I Ching Para una versión del I King
The future is as immutable
As rigid yesterday. There is nothing
That is no more than a single, silent letter
In the eternal and inscrutable
Writing whose book is time. He who walks away
From home has already come back.
Our life Is a future and well-traveled track.
Nothing dismisses us. Nothing leaves us.
Do not give up. The prison is dark,
Its fabric is made of incessant iron,
But in some corner of your cell
You might discover a mistake, a cleft.
The path is fatal as an arrow
But God is in the rifts, waiting.

El porvenir es tan irrevocable
Como el rígido ayer. No hay una cosa
Que no sea una letra silenciosa
De la eterna escritura indescrifrable
Cuyo libro es el tiempo. Quien se aleja
De su casa ya ha vuelto. Nuestra vida
Es la senda futura y recorrida.
Nada nos dice adiós. Nada nos deja.
No te rindas. La ergástula es oscura,
La firme trama es de incesante hierro,
Pero en algún recodo de tu encierro
Puede haber un descuido, una hendidura,
El camino es fatal como la flecha
Pero en las grietas está Dios, que acecha.

Continue reading

15 to 42 percent of medical research are false positives (Yet Another Calculation)

A while ago I found a very interesting paper from Leah R. Jager and Jeffrey T. Leek  via a post in the Simply Statistics blog arguing that most published medical research is true with a rate of false positives among reported results of 14% ± 1%.  Their paper came as a response to an essay from John P. A. Ioannidis and several others authors claiming that most published research findings are false.

After dealing with some criticisms Mr. Leek made a good point in his post:

“I also hope that by introducing a new estimator of the science-wise fdr we inspire more methodological development and that philosophical criticisms won’t prevent people from looking at the data in new ways.”

And thus, following this advice, I didn’t let criticisms prevent me from looking at the data in a new way. So for this problem I have devised a probability distribution for p-values to then fit the data via MLE and infer from there the rate of false positives.

pvalues PDF CDFSo this is my take; 15.33% rate of false positive with a worse case scenario of 41.75% depending on how mischievous researchers are but, in any case, and contrary to what others authors claim, most medical research seems to be true.

Continue reading

Don Quijote de la Red

quijote-y-sanchoEn este lugar publico mis artículos en inglés para alcanzar una mayor audiencia interesada en temas aleatorios, no obstante publicar un artículo sobre Don Quijote en lengua inglesa se me hace extraño, y no solamente porque echo de menos el uso de mi lengua vernácula paciendo en tierras lejanas, sino también porque, a lo Quijote, sueño con poder cooperar con otros Quijotes (aunque nunca le haré un feo a un buen Sancho) interesados en la siguiente aventura de redes:

Recientemente publiqué un artículo (Don Quijote de la Network) acerca del uso de análisis de redes sociales como herramienta para analizar las interacciones o, para ser más exactos, las co-apariciones  de personajes en obras literarias.

La chispa de tal acción, he de reconocerlo, fue el observar a través de una celosía que la herramienta para análisis de redes Gephi ofrecía como uno de sus ejemplos la red de co-apariciones de la novela Los Miserables; famosa obra literaria de Victor Hugo.

Por más que busqué no encontré en la red equivalente para la historia del famoso hidalgo Don Quijote de la Mancha, y como yaciendo sentado no había rocín que se me acercara no tuve más remedio que arremeter contra este molino a pie. Así que juzguen vuesas mercedes pero acuérdense al tiempo de arrimar el hombro o de dar ánimo a las dádivas y, siendo así, que Dios se lo pague que yo no puedo. Comienzo.

Continue reading

Don Quijote de la Network

quijoteNetwork theory is a quite thrilling subject and specially so in our nowadays big data society where we have at our disposal awesome free tools like Gephi.

There are many different kind of networks and fields where these analysis can take place and today’s post will be on literature and, in particular, the social network structures within the master piece of Spanish literature: El ingenioso hidalgo don Quijote de la Mancha.

As an interesting anecdote about the qualities of this novel, Sigmund Freud first came to Don Quijote as a boy and loved the novel so much that he learnt Spanish so as to read it in its original language keeping the secret from his parents who might have disapproved of the hobby. So if you want to fully enjoy the book and not to lose anything in translation go Freud on it.

So let’s follow Don Quijote through the Network and, in case it is not obvious enough, doing this sort of analysis on a book implies major spoilers ahead.

Continue reading

Bayesian P-Value… No, for real.

4688cfd1f5c030adc2cdbd611821a189When combining p-values in a meta-analysis framework there are many different methods we can apply based upon where the p-values come from and what their relationship is. The most well known methods being Fisher’s Method and Stouffer’s.

So I thought I would add one more out of fun and because why not; The Bayesian P-Value method! It’s sort of evil… Muhahaha! Muhahaha! Muhahaha!

So this is how….

Continue reading

If you play with your Prior you’ll go blind


And thus, the Huffington Post predicted a 98% probability for Hillary Clinton to be the next President of the United States. Amen… Let’s tease them a little bit, shall we?

My Bayesian friends, I understand playing with your priors is a very joyful activity but you see, it leads to blindness. It allows you to believe, let me cap & bold this one, BELIEVE that Hillary’s chances to be the next President of United States were 98%! No wonder that betting sites favored heavily Hillary’s side days before the election! I mean 98%! Who wouldn’t put some money there. Right?

But you know, a 98% probability coming from a Bayesian means very little unless, of course, they do some math pirouette to guarantee that the probability has frequentist properties, but then, if they do that, why bother going Bayesian in the first place?

If a frequentist tells you there is 98% probability for an event to happen he/she means that 98 out of 100 times where you find yourself in a situation like where the event is taking place the event will occur. Now, if a Bayesian tells you there is 98% probability he/she means that this is his/her degree of believe (wot?) on the event to happen… Amen again.

In other words, Bayesian results are as credible as the beliefs of the Bayesian statistician making the calculations, now we can understand why they calculate credible intervals instead confidence ones.

If we check on the Huffpo methodology we can read:

Many Bayesian models ― including the Pollster averaging model as it’s implemented for our charts ― use “uninformed” priors that don’t affect the model or provide any background information.

However, we do use information from previous elections in these priors to make predictions in our presidential model.

Ba dum tsssss

Much has been written on the pros and cons of going Bayesian and how evil Frequentists are, but this amazing Bayesian result from Huffpo was just too good to let go as a beautiful example of how blind you can go when playing with your priors.

Objectivity is dead, long live Objectivity!

Are p-values an objective measure? Bayesian Statistics is not as objective as Frequentist statistics is for the simple reason that they need more assumptions, that is, a prior. This is why to even talk about Objective Bayesian Statistics is an oxymoron and yet seems to be the most popular Bayesian school out there. But anyhow, how about p-values then, can they be subjective? Is there such thing as Objectivity in statistics? death_of_the_justice_by_quadraro-d6sapo4

For a time I thought p-values were an objective measure but then a couple of blows put to rest my dream on having an objective procedure to deal with uncertainty. This is the story of the Subjectivity one-two combo that knocked out down flat my Objectivity dreams…

Continue reading

Social Network Analysis & GOP Verbal Attacks

Not that I know anything about the GOP debates or candidates, but I casually saw in a CNN post this nice visualization of verbal attacks during the RL GOP Debate, and I thought that I would do a little SNA and try to draw conclusions on the debate WITHOUT actually having seen it…

let’s see how it goes and, please, if you’ve seen the debate and know better than me, let me know if I am very wrong 🙂

Continue reading