The Cluster of Kerry: pt 1

Part 1 deals with pulling down the relevant tweets and creating a graph and word cloud. Part 2 cleans and clusters the tweets. All of the code for both parts can be found on my GitHub account.

On the Monday 22cd of June an episode of the People’s Debate was filmed in the Gleneagle hotel in Killarney. After attending the live filming and watching the broadcast when it aired two days later I got the idea of trying to cluster the tweets being sent during the broadcast. This was inspired by a blog article I had seen shortly before on clustering chapters from novels.

Two days later I wrote a basic script to pull down all tweets sent during the debate containing #vinb. Searches using hashtags can only go back a week so you may no longer be able to get these tweets but I have included the file with all the tweets I pulled down when I ran it on my GitHub account.


Michael Healy-Rae’s hat would have a cluster all to itself.

A small number of relatively clean clusters (I will be proven very wrong in part 2)

Graphing the tweets

First I cleaned and graphed the tweets into 5 minute blocks. For every 5 minute period I took a count of how many tweets were sent in total and how many tweets were sent in a number of different categories. So if a tweet contained either of the words "healy", "rae", "healyrae", "michael", "johnny", that counted as one tweet in the "healy-rae" category (the cleaning should have removed all punctuation, such as -, also all text has been converted to lowercase ).

There are a number of large spikes which are described at the end of the script but what becomes clear is the lack of a large single topic of interest. Even when one topic is predominant over the others, such as at 22:35 when there is a spike in tweets containing references to post offices, there are still a lot of other tweets being sent that do not contain keywords in any of the main categories. At this stage in the debate there were a number of speakers about post office closures (you can see an orange spike for references to "Grant Thornton" at the start of the brown Post Office line. This is in reference to the first speaker on the night who stood up and mentioned a report by Grant Thornton on post office closures. I guess twitter just really liked his name) but it is clear that there are a lot of other tweets which are not about the topics being discussed. This of course has to do with using #vinb instead of the much less popular #peoplesdebate.

It should be noted that spikes in the magenta line for Brendan Griffin/Fine Gael represent general statements on the government and a number of tweets about water charges, a topic that was not discussed at all on the night, as well as tweets relating to Brendan Griffin’s speeches on the night.

There are a number of spikes in the red line representing any time a member of the Healy-Rae family appeared on camera, flat-capped or not. (I did another graph containing only this red line and a line representing mentions of "hat" or "cap" and they correlated very strongly but there wasn’t enough hat related tweets to be able to say anything concrete, such as "the twitter machine made hat and cap related jokes every time Michael Healy-Rae spoke" or "the twitter machine is obsessed/in love with/terrified of/enraged by Michael Healy-Rae’s flat-cap".)

The green line at 23:40 shows the total amount of tweets sent referring to the speaker talking about what day Ireland should celebrate it’s independence. While a popular topic it again only represents around 1/5th of the total tweets sent and this serves as an indication of some of the problems that I faced when clustering the tweets.

The peak in the magenta line at 23:00 is shortly after a woman spoke on cuts to the lone parent payment and which Brendan Griffin was asked to talk on plus general tweets about water charges and Greece. I plotted a line of tweets containing Greece but there werent that many, enough of them contained #fg to affect that line however. The spike in the black line representing Martin Ferris/Sinn Fein at 23:40 is when many users started questioning if the "scrap" between himself and Vincent Browne had been cut, which it had. (After the speaker on Independence Day had sat down Browne rounded on Ferris about an IRA execution in Kerry in the 80’s. After this the Independence Day speaker took the microphone again and, taking it that Browne was connecting his question with the modern IRA, condemned him. As in "I would go so far as to say I condemn you." Kerry, huh?). At 22:45 Fianna Fail gets a tiny little mention when two of its councillors are called on to answer questions relating to previous FF policy, especially in relation to health and Norma Foley also makes a statement from the audience.


Overall the graph captured pretty well the main talking points and showed user engagement with the show through the vinb hashtag. However it also shows the hashtags general use for political commentary (a number of tweets contained references to other political shows as well).

The lines are very dependent on good cleaning of the tweets. I am sure there are a few I missed out on, such as tweets referencing Michael Healy-Rae’s twitter account. By going down through all the tweets it should be possible to find a few more keywords for each category.

The categories I picked mainly represent the main individuals and government parties represented at the debate. I didn’t try to graph a line for "health" which was discussed around the 22:45-22:55 section, or a line representing how many times references were made to water charges. I did try quickly one for Greece but maybe with some better cleaning and more keywords I could have gotten better results.

The graph would look much different using 2 minute or 10 minute buckets instead of 5 minutes and this might indicate different patterns to the ones I have pointed out above.

I was having trouble installing the python word cloud library so I repurposed some code I had done in R on the Analytics Edge course on edx.