Network theory is a quite thrilling subject and specially so in our nowadays big data society where we have at our disposal awesome free tools like Gephi.
There are many different kind of networks and fields where these analysis can take place and today’s post will be on literature and, in particular, the social network structures within the master piece of Spanish literature: El ingenioso hidalgo don Quijote de la Mancha.
As an interesting anecdote about the qualities of this novel, Sigmund Freud first came to Don Quijote as a boy and loved the novel so much that he learnt Spanish so as to read it in its original language keeping the secret from his parents who might have disapproved of the hobby. So if you want to fully enjoy the book and not to lose anything in translation go Freud on it.
So let’s follow Don Quijote through the Network and, in case it is not obvious enough, doing this sort of analysis on a book implies major spoilers ahead.
Gephi and Les Misérables
So turns out the amazing Gephi tool has some example social networks for us to play with but ¡Ay!, not the Spanish literature master piece, but another master piece, a french one; Les Misérables from Victor Hugo. Let’s have a look a it:
This network is based on the co-appearance weighted network of characters throughout the book from a data set developed by Donald E. Knuth (The Standford GraphBase: A platform from combinatorial Computing. Addison-Wesley. Reading, MA 1993).
The size of the nodes is proportional to its degree, the color to its modularity class and the width of the edge to the weighted co-appearance.
What Les Misérables social network tells us
This first thing we can see is that Jean Valjean is the central character; not only because he has the largest degree but because he also has the largest betweenness centrality by a long shot. In other words, the whole story spins mainly around him with the exception of… Gavroche!
I place an exclamation mark because the knowledge I have about this novel comes mainly from the Musical (what an amazing musical by the way) and in this case Gavroche plays a small, though very emotional, part. However, in the book, Gavroche seems to be a heavy weight character with his own story detached from Jean Valjean.
The modularity colors clearly identifies the main story plots in the book, that is:
- Indigo: The priest Myriel saving Jean Valjean soul with his generosity.
- Sea Green: The mean Javert & Thernadier’s messing with Jean Valjean life.
- Purple: Cossete, Marius & Jean Valjean classic love triangle.
- Lime: Gavroge and his French revolution team.
- Royal Blue: Fantine struggle in Jean Valjean’s factory and later as a prostitute.
- Yellow Green: Valjean moral test saving an innocent man by revealing his true identity and becoming a fugitive.
It is quite remarkable that a quick network analysis can dissect so well the structure of a novel and, quite frankly, I don’t see how modern literature majors in faculties can go on without having specific courses in social network analysis. Science rules even in the arts!
Building Don Quijote de la Network
So let’s try to use the same principles used above to build a similar network for Don Quijote de la Mancha. For now we’ll have a look just for the first book of Don Quijote. Why Miguel de Cervantes decided to write a second book is a quite interesting story in itself; he was kinda forced to it, that is why the first book is a piece of art in itself and it makes sense to analyze it on its own grounds.
The other reason why I am not analyzing the second part is time; identifying characters is something that I need to do manually and I want to take my time. And yes, I know that there are NLP algorithms to recognize entities but they’re not good enough. Anyway, here we go:
Finding Don Quijote’s Characters
Since each character will be a node in our network we need to identifying them first. For this task we have at our disposal nice Named Entity Recognizer (NER) tools freely available like the one shared by the The Standford Natural Language Processing Group.
However, even if the NER does a perfect job recognizing entities, characters can use different names throughout the book. That is the case for Jean Valjean when he hides his true identity or with Don Quijote when he renames himself after he is “knighted”. This means that the only way to match the same character with different entities is having an understanding of the story and, so far, only humans can do that.
That is why for Don Quijote’s network I have selected manually the most important characters and name variants and only a handful of secondary characters.
Once we have our nodes we need to connect them. In this case make sense to calculate a co-appearance weight, so if we have characters A and B appearing like: A,B,A,B,A,A,A,B then we will have two nodes A and B where:
- A connects with itself with weight 2
- A connects with B with weight 3
- B connects with A with weight 2
This is Don Quijote’s network when considering co-appearance among its characters. The nodes’ size is proportional to their degree (10 to 20) and the edges to the co-appearance weight.The nodes colors are the result of running a modularity algorithm looking for the most relevant eight clusters.
Since Don Quijote de La Mancha is, among many other things, a travel & adventures book there are many characters (real and imaginary) that don’t last more than a few lines and I did not included any of those, if I had they would mainly be small satellites around Don Quijote and Sancho Panza.
What Don Quijote social network tells us
No surprise in Don Quijote & Sancho Panza being the main characters, in fact, Sancho plays the role of Quijote’s connection to reality and they are so inseparable that I placed them in the network one on top of each other since everything makes more sense in the network when considering them as one entity.
Then we have El Cura (The Priest) and El Ventero (The Innkeeper). Actually, there are more than one throughout the book and, though it might be a good idea to separate them, it also makes a lot of sense to keep them as the same entity so that we can weight appropriately how relevant the position is. Indeed, is this gang of four: Don Quijote, Sancho, El Cura & El Ventero the one containing the usual suspects in Don Quijote’s adventures.
The tremendous weight the edge joining Don Quijote and Sancho Panza has shows that this book does not rely so much in a complex plot as Les Misérables does as much as in a complex relationship between the two main characters and the combat of their two opposing worlds and realities.
El Cura though is fundamental in the story since he is the one taking us away from the exhausting battle of the two opposing forces represented by Don Quijote & Sancho Panza, and giving readers a rest with other stories.
The modularity metric’s colors, once again, can help us identify story plots in the book; being the most dramatic example the novel within the novel ‘El Curioso Impertinente’ which tells about the love triangle among Camila, Anselmo and Lotario.
Gephi also allows for dynamic time series, so I added the chapter in which is each character appears as a dynamic feature in the network and this is the result for a three chapters window animation:
Next… the ongoing process
So I am going to end this post, which is perhaps way too long for a blog already, sharing the Gephi network file that I used for the analysis and welcoming anyone who wants to improve the network:
Don Quijote de la Mancha I (Parte Primera)
And about the code that I used to build the network, allow me to bundle the scripts into an nice R package for network literature analysis, FOSS it and make another post about it.
Anyway, I would really love to see this network finished for the two books so, in the meanwhile, if you like literature in general, Don Quijote in particular and you feel like participating in brushing up this network please reach out. There are many things that could be improved since I don’t know about Don Quijote as much as I would like:
- Identifying more minor characters for the first book.
- Identifying all characters for the second book.
- Checking on the accuracy of the network.
- Interpreting the network results.
- Translating this post into Spanish (note to myself)
- Redo this network analysis using an un-directed net since most nodes connect in a symmetric fashion.
- you tell me.
10 thoughts on “Don Quijote de la Network”
On a sidenote and since i myself has been accused as a kind of “Don Quixote” (among others, but this is another story), let me comment on Cervantes’ piece and meanings around it.
“Don Quixote” is a social satire (criticising, among others, chivalry codes which had become devoid of meaning and empty forms, prostitutes who were nobler than ladies and so on) which uses the tool of comedy and “foolishness” and/or “madness” to tell truths to society about society. Much like medieval clowns used these tools sometimes to tell truths to authority (if authority could “get” them of course). And much like “Dante’s Divine Comedy” tells truths about society and its institutions in a comic and playful manner. And even earlier Aristophanes’ plays at the same time informed and criticised things around us. For example even Socrates himself is portrayed playfuly as a kind of “Don Quixote” in one of Aristophanes’ plays. One can go on and mention Lewis Carrol, Jonathan Swift, Edwin Abott and others in this genre of social critique using social satire. It is an approach to “utopia” from the “other side”, one can say. Whereas “Plato’s Republic” and “Thomas More’s Utopia” (among others) are approaches to “utopia” in more literal sense.
“Utopophobia” is among the reasons “Don Quixote” is used as the “archetypal madman” of anyone who not only advocates but also works towards changing things, striving for progress. It is no surprire Freud was fond of that book and not of “Les Miserables” for example (“Les Miserables” IS about utopia btw, yey for the French Revolution :)).
When i was child (8 to 10 yo, now i am around 40), the (official) TV had animated series of “Don Quixote” (and that’s how i know much of the story), but it never had any animated series of “Les Miserables” (which i had to seek and (still) read myself, even in french, if needed).
Anyway, back to the post, there are approaches relating mathematics (not just network theory) to literature (see for example http://math.unipa.it/~grim/SiLipsey.PDF) and Apostolos Doxiadis’ work (https://en.wikipedia.org/wiki/Apostolos_Doxiadis) as both a writer and mathematician who uses such approaches to teach mathematics (and maybe literature :).
All the best,
Oh, well then, bienvenido my Don Quijote friend! It is not easy to be one.
One question though, what would be in your opinion the most iconic piece of literature in Greek language? The Iliad?
Hmm, it might not be an easy task (afterall who said that “difficult” is no worth pursuing, or that “easier” is necesarily better). It is a truth though and pretending to be otherwise, maybe seem easier, but useless, to say the least. In any case, as you wish. i can stop my commenting here and so on (since it will probably involve political issues around this at times) Btw dont think mathematics or science in general, are devoid of political references and content.
As far as iconic pieces of literature in greek language, it depends on what one would call iconic or important. Homer’s epics are certainly important, both for content and history they provide as well as for the artistic value of the pieces ithemselves. They are considered the first samples of (ancient) greek literature. So certainly in this sense, they are iconic.
Plato’s philosophical and political treatises are also iconic and remain influential even nowadays. Apart from their philosophical content, they exhibit a high artistic value. For example the use of a certain dialectic method, in the exposition of subjects. Writers of that period such as Sophocles, Euripides and Aristophanes should also be included here.
Then one has the medieval period of greek literature amidst ottoman rule and the greek revolution which is interesting in itself (for example the anthem of the modern greek state is the “Ode to Freedom” by D. Solomos of that period, note that only the first verses of the poem are used for the state anthem, maybe this is iconic, it certainly is).
As far as modern greek literature is concerned, there were 2 Nobel prizes for poetry and literature (G. Seferis, O. Elytis) and 2 Lenin prizes for poetry and literature (K. Varnalis, G. Ritsos). Most of these have also been transfered into music and song by great composers (notably M. Theodorakis). In a sense, this brings higher poetry and literature to the masses (according to the composer himself who was imprisoned and exiled during and after the war and the coup by the military junta, and his music was forbidden and banned, along with most of the writers and their works which i refer here, including the ancient ones like Sophocles).
Then there is great literature and philosophy of greek diaspora, like Kavafis in Alexandria in Egypt. There are many other examples of modern greek literature, but one has to search for them and myself have not have searched enough to provide more information.
Here are some samples for you:
— Ode to Freedom by Dionysios Solomos music by Nikolaos Matzaros (full)
— Virtual Sun of Justice by Odysseas Elytis music by Mikis Theodorakis (part of the whole opus)
— I held my life by Giorgos Seferis music by Mikis Theodorakis (part of the whole opus)
— Thermopylae by Konstantinos Kavafis music by Thanasis Gaifilias
— The ballad of sir Mentios (a donkey, an allegory for slaves and workers) by Kostas Varnalis music by Lucas Thanos
— The bells (of freedom) will ring, this soil is both theirs and ours by Giannis Ritsos music by Mikis Theodorakis (part of the whole opus)
i forgot one important part of literature both before and during the greek revolution era. That of Rigas Velestinlis (Rigas from Velestino, also known as Ferres, hence also called Rigas Ferreos). He was an educator as well as revolutionary activist against ottoman rule (not just for greeks, mind you). He was executed for his revolutionary activity, along with his comrades, in Austria under Metternich’s rule. He is considered (one of) the first revolutionary.
— Here is Rigas’ Thourios (War song) in a traditional (demotic) genre
So it seems by the facts themselves, that the themes of freedom and justice are THE iconic parts of greek literature past and present. Pick any one.
Geez.. that was a documented answer Nikos, thank you very much for the work you put into it. I would like to network a few master pieces once I find some time, or even better, package the code I have good enough for expert to do it themselves. But I think I’ll do The Illiad or Odyssey not just for the reasons you just mentioned above but also because, just like you had exposure to the Don Quijote animated stories in TV when you were a kid, so had I to Ulysses adventures with Ulysses 31. Thanks again Nikos.
Yeah Ulysses 31 was nice. And i finaly managed to watch the last episode of the series on the internet.
[…] publiqué un artículo (Don Quijote de la Network) acerca del uso de análisis de redes sociales como herramienta para analizar las interacciones o, […]
[…] promised in the Don Quijote de la Network post, I Just packaged the R code that generated the data used in Gephi to visualize the […]
[…] was the Les Miserables network data that came as an example in Gephi. While there are multiple visualizations of this data, it was the way the data was formatted that really helped me. I was able to see how […]