# Robocoap: Text ⇢ Gephi

As promised in the Don Quijote de la Network post, I Just packaged the R code that generated the data used in Gephi to visualize the network graphs describing Don Quijote.

Now it should be fairly simple (or at least simpler) for anybody to generate such graphs for their favorite books. And since the package automatizes the process like if a robot was collecting the coappearances of elements within a text, its name came to be… Robocoap.

For now the package has just one function (novel.coap) intended for books with a novel format. With minor work, and down the road, the package will also handle theater plays & movie scripts formats and, with a little bit more of work, collections of research papers. Until then, enjoy your novels and have fun!

# R corset: Bringing Math models back in shape

So your perfect ideal mathematical model returns values that are impossible; probabilities bigger than one or smaller than zero, negative stock market values, et cetera, and now you feel like quoting George Box… again.

Sometimes the mathematical model embeds a solution to keep things real like, for example, logistic regressions. However, very often many popular models like ARIMA offer no possibility to bound its results within business or scientific constrains, and then what? These are a few common options:

# Don Quijote de la Red

En este lugar publico mis artículos en inglés para alcanzar una mayor audiencia interesada en temas aleatorios, no obstante publicar un artículo sobre Don Quijote en lengua inglesa se me hace extraño, y no solamente porque echo de menos el uso de mi lengua vernácula paciendo en tierras lejanas, sino también porque, a lo Quijote, sueño con poder cooperar con otros Quijotes (aunque nunca le haré un feo a un buen Sancho) interesados en la siguiente aventura de redes:

Recientemente publiqué un artículo (Don Quijote de la Network) acerca del uso de análisis de redes sociales como herramienta para analizar las interacciones o, para ser más exactos, las co-apariciones  de personajes en obras literarias.

La chispa de tal acción, he de reconocerlo, fue el observar a través de una celosía que la herramienta para análisis de redes Gephi ofrecía como uno de sus ejemplos la red de co-apariciones de la novela Los Miserables; famosa obra literaria de Victor Hugo.

Por más que busqué no encontré en la red equivalente para la historia del famoso hidalgo Don Quijote de la Mancha, y como yaciendo sentado no había rocín que se me acercara no tuve más remedio que arremeter contra este molino a pie. Así que juzguen vuesas mercedes pero acuérdense al tiempo de arrimar el hombro o de dar ánimo a las dádivas y, siendo así, que Dios se lo pague que yo no puedo. Comienzo.

# Don Quijote de la Network

Network theory is a quite thrilling subject and specially so in our nowadays big data society where we have at our disposal awesome free tools like Gephi.

There are many different kind of networks and fields where these analysis can take place and today’s post will be on literature and, in particular, the social network structures within the master piece of Spanish literature: El ingenioso hidalgo don Quijote de la Mancha.

As an interesting anecdote about the qualities of this novel, Sigmund Freud first came to Don Quijote as a boy and loved the novel so much that he learnt Spanish so as to read it in its original language keeping the secret from his parents who might have disapproved of the hobby. So if you want to fully enjoy the book and not to lose anything in translation go Freud on it.

So let’s follow Don Quijote through the Network and, in case it is not obvious enough, doing this sort of analysis on a book implies major spoilers ahead.

# Bayesian P-Value… No, for real.

When combining p-values in a meta-analysis framework there are many different methods we can apply based upon where the p-values come from and what their relationship is. The most well known methods being Fisher’s Method and Stouffer’s.

So I thought I would add one more out of fun and because why not; The Bayesian P-Value method! It’s sort of evil… Muhahaha! Muhahaha! Muhahaha!

So this is how….

# If you play with your Prior you’ll go blind

And thus, the Huffington Post predicted a 98% probability for Hillary Clinton to be the next President of the United States. Amen… Let’s tease them a little bit, shall we?

My Bayesian friends, I understand playing with your priors is a very joyful activity but you see, it leads to blindness. It allows you to believe, let me cap & bold this one, BELIEVE that Hillary’s chances to be the next President of United States were 98%! No wonder that betting sites favored heavily Hillary’s side days before the election! I mean 98%! Who wouldn’t put some money there. Right?

But you know, a 98% probability coming from a Bayesian means very little unless, of course, they do some math pirouette to guarantee that the probability has frequentist properties, but then, if they do that, why bother going Bayesian in the first place?

If a frequentist tells you there is 98% probability for an event to happen he/she means that 98 out of 100 times where you find yourself in a situation like where the event is taking place the event will occur. Now, if a Bayesian tells you there is 98% probability he/she means that this is his/her degree of believe (wot?) on the event to happen… Amen again.

In other words, Bayesian results are as credible as the beliefs of the Bayesian statistician making the calculations, now we can understand why they calculate credible intervals instead confidence ones.

If we check on the Huffpo methodology we can read:

Many Bayesian models ― including the Pollster averaging model as it’s implemented for our charts ― use “uninformed” priors that don’t affect the model or provide any background information.

However, we do use information from previous elections in these priors to make predictions in our presidential model.

Ba dum tsssss

Much has been written on the pros and cons of going Bayesian and how evil Frequentists are, but this amazing Bayesian result from Huffpo was just too good to let go as a beautiful example of how blind you can go when playing with your priors.

# Objectivity is dead, long live Objectivity!

Are p-values an objective measure? Bayesian Statistics are not as objective as Frequentist statistics for the simple reason that they need more assumptions, that is, a prior. This is why to even talk about Objective Bayesian Statistics is an oxymoron and yet seems to be the most popular Bayesian school out there. But anyhow, how about p-values then, can they be subjective? Is there such thing as Objectivity in statistics?

For a time I thought p-values were an objective measure but then a couple of blows put to rest my dream on having an objective procedure to deal with uncertainty. This is the story of the Subjectivity one-two combo that knocked out flat my Objectivity dreams…