Early in the year there was a slew of blog posts about applying natural language processing to the State of the Union addresses made by US Presidents going back over the last two centuries. At the same time I was reading Ronan Fanning’s excellent biography of Eamon De Valera and struck upon the idea of taking an NLP look at "The Treaty Debates".
The debates centred on the Treaty signed by British and Irish representatives in London on the 6th of December 1921 which would give Ireland Dominion status after a two year War of Independence. Dáil Éireann quickly split between those who supported and opposed the treaty and a series of debates were held from the 14th of December to the 7th of January 1922 when the treaty was approved by 64 votes to 57. At the following session on the 9th De Valera resigned as President and failed to be re‐elected by 2 votes. The next day he and his supporters left the Dáil while Arthur Griffith was elected in his place. This is seen by many as the start of a series of events which would lead to the outbreak of the Irish Civil War in June.
I wanted to see what type of information NLP could get out of The Treaty Debates and began the lengthy processing of pulling down the full text from the Oireachtas website. I had to manually clean the files because there was no consistency in naming individual TDs. For example, one such TD appears in the debates under the names "Michael Collins", "Mr. Michael Collins", "Mr. Collins", "Mr. M Collins", "Mr. Michael Collins (Minister For Finance)", "Mr. Michael Collins, Minister For Finance" and "Michael Collins, Minister For Finance", more often than not appearing under at least two and up to four variations in a single days text.
I used Pandas in Python for basic text analysis:
- 101 recorded speakers spoke approx. 448,000 words over the 15 days of debates.
- 3040 separate recorded entries. De Valera spoke most at 407 times, Collins next at 267 and Griffith at 229.
- De Valera also had the highest word count at 44,358, followed by Mary MacSwiney at 34,149, Collins at 22,001 and Griffith at 21,804.Thomas O’Donnell of Sligo‐Mayo East spoke only a single 14 word sentence during the debates.
- MacSwiney upset some in the Dáil with her long speeches. She had 3 of the 10 longest speeches, the longest at 6282 words. Childers was second at 5858, then Liam de Róiste with 3928.
- A small group of people dominated the debates, namely De Valera, Collins and Griffith as well as MacSwiney, Sean Milroy and Cathal Brugha. This group accounts for 33% of the total word count and 40% of all speeches made. (Eoin McNeill and Brian O’Higgins are not considered as they served as Ceann Comhairle and Leas Ceann Comhairle.)
At the same time as I began working on these debates more NLP projects started to pop up examining the speeches being made by current US Presidential candidates. Through these I found out about the Flesch-Kincaid readability tests which measure the ease with which text can be read. I got the textstat library and used this to find the Flesch score for the collected speeches of each TD. The score ranks from 0 (requiring a college education) to 100 (easily read by all). The results aren’t massively interesting, with every TD that spoke consistently falling between 50 and 70, relatively easy to read. However, some speeches taken in isolation throw up more interesting results. Of the 15 most difficult to read passages, 7 of them are made by Eamon De Valera. Low scores of between 0.0 and 30.0 are described as "Very difficult to read. Best understood by university graduates." The lowest score by any speech is one made by Eoin MacNeill. It’s just 112 words and scores -40:
The following Notice of motion has been received: Notice of Motion by Eoin Mac Neill, Deputy for the National University of Ireland and for Derry City and County: To move that ’Dáil Éireann affirms that Ireland is a sovereign nation deriving its sovereignty in all respects from the will of the people of Ireland; that all the international relations of Ireland are governed on the part of Ireland by this sovereign status; and that all facilities and accommodations accorded by Ireland to another state or country are subject to the right of the Irish Government to take care that the liberty and well-being of the people of Ireland are not endangered.’
De Valera scores a 4 with one of the most important statements of the last century, when seeing that Griffith would be elected President he and his supporters left the Dáil:
As a protest against the election as President of the Irish Republic of the Chairman of the Delegation, who is bound by the Treaty conditions to set up a State which is to subvert the Republic, and who, in the interim period, instead of using the office as it should be used-to support the Republic-will, of necessity, have to be taking action which will tend to its destruction, I, while this vote is being taken, as one, am going to leave the House.
I found out about Latent Dirichlet Allocation from a blog post by Dublin based Aylien and adapted their code for an earlier project. LDA is used to discover the main topics from a body of text. I liked the results from my previous try and decided to use it again to find the main topics that were discussed during the debates. I created two datasets, one containing all speeches by those who would vote for and one for those who would vote against the Treaty (again, MacNeill and Brian O’Higgins are excluded). I found searching for 8 topics worked best. Below are the results returned for the Pro and Anti-Treaty datasets. I carried out a quick search of the debate text to see where and in what context these words appeared and have added what I believe each topic is about.
- Topic 0: would minister mr said made delegation cabinet finance us one chairman oath dáil document lloyd griffith signed delegates oath
- Topic 1: document would treaty could say signed plenipotentiaries cabinet question make association certain
- Topic 2: ireland british king oath great irish britain england authority government treaty clause
- Topic 3: government republic president mr irish griffith dáil eireann must ireland provisional members
- Topic 4: treaty people irish republic government war would peace british state country ireland
- Topic 5: men us people ireland say one country said treaty england told world
- Topic 6: would think one say could know like get position way want dáil
- Topic 7: public house session private motion president hear amendment question press order matter
- #0 Chairman of the delegation (Griffith), Minister for Finance (Collins), Oath, Lloyd George.
- #1 Rights of the plenipotentiaries to sign Treaty, counter wording of Oath - "for the purpose of Association..."
- #2 Oath to and excepting authority of the King of Great Britain. Clauses in the Treaty (mostly Childers).
- #3 Griffiths position as President of the Provisional Government and stance towards the Republic.
- #4 War will not result from rejection of Treaty. Treaty does not give a lasting peace. The Republic.
- #5 Ireland’s image before the world. England.
- #6 The position of the Dáil before the negotiations began/ Collins' position in the army (Brugha)/ TDs saying "I am of the position..."/ Position of Ireland to have an army etc. after signing.
- #7 Motions/amendments for private/public meetings.
- Topic 0: treaty say country know said people going man position think thing dáil
- Topic 1: document cabinet dáil documents members mr agreement house hear meeting delegation agreed
- Topic 2: session public private motion president order house press think time treaty question
- Topic 3: ireland people irish treaty nation british country english men england believe freedom
- Topic 4: treaty war british irish london republic england association signed peace alternative plenipotentiaries
- Topic 5: president government people valera treaty irish dáil going vote republic free majority
- Topic 6: oath minister say constitutional king mr said allegiance law canada state defence
- Topic 7: say want think time house know deputy make way said going right
- #0 Position delegation was placed in, position of defending the Treaty, position Collins holds De Valera in.
- #1 Call for all private (meeting) documents to be published, document put forward by De Valera. Articles of Agreement/ This Agreement.
- #2 Motions about private/public sessions, points of order, the Press.
- #3 Treaty will secure freedom for Ireland.
- #4 Alternative to treaty is war. "Sending our plenipotentiaries to London".
- #5 President De Valera. Majority of people will accept Treaty. Vote.
- #6 Constitutional status like Canada. Control of coastal defence. Oath is allegiance to Free State and of faithfullness to King.
Overall I was pretty satisfied with the topics that came up (the last Pro-Treaty topic doesn’t seem to make much sense). The 3 main players are mentioned by the opposing sides and topics seem to reflect what most TDs were talking about from a quick glance at some of the text.
I had only heard of Named-Entity Recognition shortly before I finished this project and decided to try it out. NER is a process by which named entities, such as people, places and companies, can be identified and extracted from the text. Capitalisation and sentence position are very important for determining if a word is a named entity or not. To download the results click on the following links. This should not be considered as an exhaustive list as some actual entities may have been missed. The NER algorithm in nltk picked out 1620 potential entities which I manually went through to remove duplicates or words that weren’t entities. I also removed the names of TDs who were in attendance at the debates. I got 351 people (included are some notes I made), 255 locations, 73 "organisations" (mainly military units, branches of Sinn Fein and the British Civil Service), 64 nationalities or descriptions, 18 newspapers and 7 documents, including the Monroe Doctrine, the Ten Commandments and Brehon law. The people mentioned are a fascinating mix, from Fionn MacCumhail to the Ameer of Afghanistan, Aodh Ruadh O’Domhnaill to the Blacksmith of Ballinalee, St. Patrick, the Emperor of Austria, Peadar Clancy, the Mikado of Japan, Napoleon, Roger Casement and Almighty God.
For the word clouds a friend of mine made silhouettes of De Valera, Collins and Griffith so I could use them with Andreas Mullers wordcloud library. Each of the 5 TDs with the highest word count on the Pro and Anti Treaty side are represented and I have also made word clouds for the combined Pro and Anti Treaty texts. I have always liked the font used for Irish language publications at the start of the 1900’s and found a version called Bunchló made by Vincent Morley.
I am very happy with the way the word clouds turned out and intend to make more. There are some interesting observations to be made, such as how little Collins mentions the word "republic" in comparison to Griffith or De Valera. I am satisfied with the LDA topic results, the Flesch scores and the NER results, though it is a bit difficult to process what is returned.
Though well studied the Treaty Debates aren’t very accessible due to their size but Natural Language Processing allows us a better understanding without needing to read the whole thing. Overall the debates are well known and the LDA topic results are in line with what has been written about the period. We can therefore hope to use NLP on vastly larger texts and be confident of the returned results.