TellMeFirst - A Knowledge Discovery Application

Status: 
ongoing
Period: 
April 2012 - present
Funding: 
In-kind contributions in 2016
Person(s) in charge: 

Giuseppe Futia (Nexa project manager & main developer), Alessio Melandri (Nexa Fellow), Federico Cairo (Project founder & technology advisor)

Executive summary: 

TellMeFirst (TMF) is an open source software designed for classifying and enhancing documents with Natural Language Processing (NLP) and Linked Open Data (LOD) technologies. Identified topics are expressed as DBpedia resources, a representation of structured information of a Wikipedia entry. The input document is then enriched with new information (images, videos, maps, news) retrieved from LOD repositories published on the Web.

Background: 

The adoption of Linked Data best practices for exposing and connecting information on the Web has a considerable success in several areas: multimedia publishing, open government, health care. Moreover, a specific line of research explores the points of convergence of Linked Data and Natural Language Processing (NLP): DBpedia, a central interlinking hub for the Linking Data project, has proven to be a very suitable knowledge base for text classification, according to both technical reasons and more theoretical considerations. Furthermore, DBpedia is directly linked to the arguably largest multilingual annotated corpus ever created, which is Wikipedia: thus, it is technically perfect for automated tasks in the fields of NLP.

Objectives: 

Last Update: 2017-03-15; Next Expected Update: TODO

TMF intends to leverage Linked Data and NLP technologies to extract the main topics from texts in the form of DBpedia resources, retrieving new information from the Web. In the previous years, we have created a structured and well-defined process to maintain the training set updated: this is a necessary step for classifying documents concerning recent topics. The next step was the development of a module for building ad hoc training sets for documents related to a specific area of knowledge. For these reasons, we have focused on the development of a parametrized process in order to adapt TellMeFirst to different purposes and different semantic areas. Through the development of this feature, TMF can be exploited by companies, public administrations, and cultural institution that need a classification system for their specific knowledge domains and purposes. The TMF software has now reached maturity and therefore we will explore use cases of the tool within structured projects.

Results: 

Last Update: 2017-03-15; Next Expected Update: TODO

The last features developed on TMF have been presented in February 2016 in occasion of the DBpedia Community Meeting in The Hague (Netherlands) and in September 2016 in occasion of the 7th DBpedia Community Meeting in Leipzig.

In 2015 we have developed a software pipeline to build a training set for classifying documents related to a specific domain of knowledge. The pipeline is currently driven by SPARQL queries supported by the Linked Data Recommender developed by the SoftEng Group of the Politecnico di Torino, in order to discover other entities that are not identified with the previous method. More information is available on GitHub.

Moreover, with the experience gathered with the development of TellMeFirst, Giuseppe Futia has won the “Best tool for multi-lingual journalists” prize during the #newsHACK 2016 event organized by the BBC.

Related Publications:
Oscar Rodríguez-Rocha, Iacopo Vagliano, Christian Figueroa, Federico Cairo, Giuseppe Futia, Carlo Licciardi, Marco Marengo, Federicpo Morando
15 January 2015
IT PROFESSIONAL
Giuseppe Futia, Federico Cairo, Federico Morando, Luca Leschiutta
21-23 May 2014
International Conference for E-Democracy and Open Government 2014

tellmefirst commits feed

11/03/2016 - 17:24
Better integration of BBC module
11/03/2016 - 17:01
Update BBC module
10/03/2016 - 17:02
Update apache commons version
09/02/2016 - 09:59
Update Maven version on doc