Exploiting Linked Data and Natural Language Processing for the Classification of Political Speech

International Conference for E-Democracy and Open Government 2014
Giuseppe Futia, Federico Cairo, Federico Morando, Luca Leschiutta
21-23 May 2014

This paper shows the effectiveness of a DBpedia-based approach for text categorization in the e-government field. Our use case is the analysis of all the speech transcripts of current White House members. This task is performed by means of TellMeFirst, an open-source software that leverages the DBpedia knowledge base and the English Wikipedia linguistic corpus for topic extraction. Analysis results allow to identify the main political trends addressed by the White House, increasing the citizens' awareness to issues discussed by politicians. Unlike methods based on string recognition, TellMeFirst semantically classifies documents through DBpedia URIs, gathering all the synonyms, hypernyms and hyponyms of a lemma under the same unambiguous concept.

The paper is available in PDF version.