Status: concluded Period: November 2016 – January 2021 Funding: 99.500 € Funding organization: Italian Ministry of Education – University and Research; Comitato ICT Person(s) in charge: Giuseppe Futia
Executive summary
Knowledge Graphs (KGs) have emerged as a core abstraction for incorporating human knowledge into intelligent systems. This knowledge is encoded in a graph-based structure whose nodes represent real-world entities, while edges define multiple relations between these entities. KGs are gaining attention from both the industry and academia because they provide a flexible way to capture, organize, and query a large amount of multi-relational data. Deep Learning on Graphs, also known as Graph Representation Learning (GRL), represents the standard toolbox to learn from graph data. GRL techniques are able to drive improvements for high-impact problems in different fields, such as content recommendation or drug discovery. Unlike other types of data such as images, learning from graph data requires specific methods. As defined by Michael Bronstein (Imperial College London) “these methods are based on some form of message passing on the graph allowing different nodes to exchange information”.
Background
Structured data sources play a fundamental role in the data ecosystem because much of the valuable (and reusable) information within organizations and the Web is available as structured data. Publishing these data into KGs is a complex process because it requires extracting and integrating information from heterogeneous sources. The goal of integrating these sources is harmonizing their data and leading to a coherent perspective on the overall information. Heterogeneous sources range from unstructured data, such as plain text, to structured data, including table formats such as CSVs and relational databases, and tree-structured formats, such as JSONs and XMLs. The integration process of structured data is enabled by mappings that describe the relationships between the global schema of an ontology (the semantic skeleton of a KG) and the local schema of the target data source. The results of the mapping can be seen as a graph, known as a semantic model, which can express the links between the local schema, represented by the attributes of the target data source, and the global schema, represented by the reference ontologies. A semantic model is a powerful tool for representing the mapping for two main reasons. In the first place, it frames the relations between ontology classes as paths in the graph. Secondly, it enables the computation of graph algorithms to detect the correct mapping. The research activity related to this project concerns the automatic publishing of data into KGs from structured data sources through a mapping-based approach based on semantic models.
Objectives
The main goal of this research project is to automatize the process of creating semantic models. This goal is reached by exploiting a novel approach based on Graph Neural Networks (GNNs) to automatically identify the relations which connect already-annotated data attributes. GNNs are a subset of GRL techniques that are specialized in neighborhood aggregation to encode the representation of graph nodes and edges. In this approach, GNNs are trained on Linked Data (LD) graphs that contain semantic information and act as background knowledge to reconstruct the semantics of data sources: the intuition is that relations used by other people to semantically describe data in a domain are more likely to express the semantics of the target source in the same domain.
Results
The main achievements of this research project are the following: (i) A PhD Thesis entitled: “Neural Networks for Building Semantic Models and Knowledge Graphs”. (ii) A journal publication: Futia, G., Vetrò, A., & De Martin, J. C. (2020). SeMi: A SEmantic Modeling machIne to build Knowledge Graphs with graph neural networks. SoftwareX, 12, 100516. (iii) A journal publication: Futia, G., & Vetrò, A. (2020). On the integration of knowledge graphs into deep learning models for a more comprehensible AI—Three Challenges for Future Research. Information, 11(2), 122. (iv) A workshop paper: Futia, G., Garifo, G., Vetro, A., & De Martin, J. C. Modeling the semantics of data sources with graph neural networks, BRIDGE BETWEEN PERCEPTION AND REASONING: GRAPH NEURAL NETWORKS & BEYOND, ICML 2020 WORKSHOP. (v) A conference publication: “Training Neural Language Models with SPARQL queries for Semi-Automatic Semantic Mapping” (G. Futia, A. Vetrò, A. Melandri, JC. De Martin), Procedia Computer Science 137, 187-198, describing the use of neural language model called Word2Vec for the semi-automatic semantic type detection. (vi) A Technical Report: “Linked Data Validity”, carried out by researchers and students that participated in the International Semantic Web Research School (ISWS) 2018. (vii) A conference publication: “Removing barriers to transparency: A case study on the use of semantic technologies to tackle procurement data inconsistency” (G. Futia, A. Melandri, A. Vetrò, F. Morando, JC. De Martin) – European Semantic Web Conference, 623-637. (viii) A workshop publication: “ContrattiPubblici. org, a Semantic Knowledge Graph on Public Procurement Information” (G. Futia, F. Morando, A. Melandri, L. Canova, F. Ruggiero) – AI Approaches to the Complexity of Legal Systems, 380-393.
Related Publications
2020
Futia, Giuseppe; Garifo, Giovanni; Vetro’, Antonio; Martin, Juan Carlos De
@conference{<LineBreak> 11583_2855045,
title = {Modeling the semantics of data sources with graph neural networks},
author = {Giuseppe Futia and Giovanni Garifo and Antonio Vetro' and Juan Carlos De Martin},
url = {https://logicalreasoninggnn.github.io/papers/5.pdf},
year = {2020},
date = {2020-01-01},
urldate = {2020-01-01},
abstract = {Semantic models are fundamental to publish datainto Knowledge Graphs (KGs), since they encodethe precise meaning of data sources, through con-cepts and properties defined within reference on-tologies. However, building semantic models re-quires significant manual effort and expertise. Inthis paper, we present a novel approach based onGraph Neural Networks (GNNs) to build seman-tic models of data sources. GNNs are trained onLinked Data (LD) graphs, which serve as back-ground knowledge to automatically infer the se-mantic relations connecting the attributes of a datasource. At the best of our knowledge, this is thefirst approach that employs GNNs to identify thesemantic relations. We tested our approach on 15target sources from the advertising domain (usedin other studies in the literature), and comparedits performance against two baselines and a tech-nique largely used in the state of the art. Theevaluation showed that our approach outperformsthe state of the art in cases of data source withthe largest amount of semantic relations definedin the ground truth.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Semantic models are fundamental to publish datainto Knowledge Graphs (KGs), since they encodethe precise meaning of data sources, through con-cepts and properties defined within reference on-tologies. However, building semantic models re-quires significant manual effort and expertise. Inthis paper, we present a novel approach based onGraph Neural Networks (GNNs) to build seman-tic models of data sources. GNNs are trained onLinked Data (LD) graphs, which serve as back-ground knowledge to automatically infer the se-mantic relations connecting the attributes of a datasource. At the best of our knowledge, this is thefirst approach that employs GNNs to identify thesemantic relations. We tested our approach on 15target sources from the advertising domain (usedin other studies in the literature), and comparedits performance against two baselines and a tech-nique largely used in the state of the art. Theevaluation showed that our approach outperformsthe state of the art in cases of data source withthe largest amount of semantic relations definedin the ground truth.
@article{<LineBreak> 11583_2834234,
title = {SeMi: A SEmantic Modeling machIne to build Knowledge Graphs with graph neural networks},
author = {Giuseppe Futia and Antonio Vetrò and Juan Carlos De Martin},
url = {https://www.sciencedirect.com/science/article/pii/S2352711019302626},
doi = {10.1016/j.softx.2020.100516},
year = {2020},
date = {2020-01-01},
urldate = {2020-01-01},
journal = {SOFTWAREX},
volume = {12},
publisher = {Elsevier},
abstract = {SeMi (SEmantic Modeling machIne) is a tool to semi-automatically build large-scale Knowledge Graphs from structured sources such as CSV, JSON, and XML files. To achieve such a goal, SeMi builds the semantic models of the data sources, in terms of concepts and relations within a domain ontology. Most of the research contributions on automatic semantic modeling is focused on the detection of semantic types of source attributes. However, the inference of the correct semantic relations between these attributes is critical to reconstruct the precise meaning of the data. SeMi covers the entire process of semantic modeling: (i) it provides a semi-automatic step to detect semantic types; (ii) it exploits a novel approach to inference semantic relations, based on a graph neural network trained on background linked data. At the best of our knowledge, this is the first technique that exploits a graph neural network to support the semantic modeling process. Furthermore, the pipeline implemented in SeMi is modular and each component can be replaced to tailor the process to very specific domains or requirements. This contribution can be considered as a step ahead towards automatic and scalable approaches for building Knowledge Graphs.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
SeMi (SEmantic Modeling machIne) is a tool to semi-automatically build large-scale Knowledge Graphs from structured sources such as CSV, JSON, and XML files. To achieve such a goal, SeMi builds the semantic models of the data sources, in terms of concepts and relations within a domain ontology. Most of the research contributions on automatic semantic modeling is focused on the detection of semantic types of source attributes. However, the inference of the correct semantic relations between these attributes is critical to reconstruct the precise meaning of the data. SeMi covers the entire process of semantic modeling: (i) it provides a semi-automatic step to detect semantic types; (ii) it exploits a novel approach to inference semantic relations, based on a graph neural network trained on background linked data. At the best of our knowledge, this is the first technique that exploits a graph neural network to support the semantic modeling process. Furthermore, the pipeline implemented in SeMi is modular and each component can be replaced to tailor the process to very specific domains or requirements. This contribution can be considered as a step ahead towards automatic and scalable approaches for building Knowledge Graphs.
@article{<LineBreak> 11583_2797449,
title = {On the Integration of Knowledge Graphs into Deep Learning Models for a More Comprehensible AI—Three Challenges for Future Research},
author = {Giuseppe Futia and Antonio Vetro},
url = {https://www.mdpi.com/2078-2489/11/2/122},
doi = {10.3390/info11020122},
year = {2020},
date = {2020-01-01},
urldate = {2020-01-01},
journal = {INFORMATION},
volume = {11},
number = {2},
publisher = {MDPI},
abstract = {Deep learning models contributed to reaching unprecedented results in prediction and classification tasks of Artificial Intelligence (AI) systems. However, alongside this notable progress, they do not provide human-understandable insights on how a specific result was achieved. In contexts where the impact of AI on human life is relevant (e.g., recruitment tools, medical diagnoses, etc.), explainability is not only a desirable property, but it is -or, in some cases, it will be soon-a legal requirement. Most of the available approaches to implement eXplainable Artificial Intelligence (XAI) focus on technical solutions usable only by experts able to manipulate the recursive mathematical functions in deep learning algorithms. A complementary approach is represented by symbolic AI, where symbols are elements of a lingua franca between humans and deep learning. In this context, Knowledge Graphs (KGs) and their underlying semantic technologies are the modern implementation of symbolic AI—while being less flexible and robust to noise compared to deep learning models, KGs are natively developed to be explainable. In this paper, we review the main XAI approaches existing in the literature, underlying their strengths and limitations, and we propose neural-symbolic integration as a cornerstone to design an AI which is closer to non-insiders comprehension. Within such a general direction, we identify three specific challenges for future research—knowledge matching, cross-disciplinary explanations and interactive explanations.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Deep learning models contributed to reaching unprecedented results in prediction and classification tasks of Artificial Intelligence (AI) systems. However, alongside this notable progress, they do not provide human-understandable insights on how a specific result was achieved. In contexts where the impact of AI on human life is relevant (e.g., recruitment tools, medical diagnoses, etc.), explainability is not only a desirable property, but it is -or, in some cases, it will be soon-a legal requirement. Most of the available approaches to implement eXplainable Artificial Intelligence (XAI) focus on technical solutions usable only by experts able to manipulate the recursive mathematical functions in deep learning algorithms. A complementary approach is represented by symbolic AI, where symbols are elements of a lingua franca between humans and deep learning. In this context, Knowledge Graphs (KGs) and their underlying semantic technologies are the modern implementation of symbolic AI—while being less flexible and robust to noise compared to deep learning models, KGs are natively developed to be explainable. In this paper, we review the main XAI approaches existing in the literature, underlying their strengths and limitations, and we propose neural-symbolic integration as a cornerstone to design an AI which is closer to non-insiders comprehension. Within such a general direction, we identify three specific challenges for future research—knowledge matching, cross-disciplinary explanations and interactive explanations.
@conference{<LineBreak> 11583_2712334b,
title = {Training Neural Language Models with SPARQL queries for Semi-Automatic Semantic Mapping},
author = {Giuseppe Futia and Antonio Vetro' and Alessio Melandri and Juan Carlos De Martin},
url = {https://www.sciencedirect.com/science/article/pii/S1877050918316235},
doi = {10.1016/j.procs.2018.09.018},
year = {2018},
date = {2018-01-01},
urldate = {2018-01-01},
booktitle = {Procedia Computer Science},
publisher = {Elsevier},
abstract = {Knowledge graphs are labeled and directed multi-graphs that encode information in the form of entities and relationships. They are gaining attention in different areas of computer science: from the improvement of search engines to the development of virtual personal assistants. Currently, an open challenge in building large-scale knowledge graphs from structured data available on the Web (HTML tables, CSVs, JSONs) is the semantic integration of heterogeneous data sources. In fact, such diverse and scattered information rarely provide a formal description of metadata that is required to accomplish the integration task. In this paper we propose an approach based on neural networks to reconstruct the semantics of data sources to produce high quality knowledge graphs in terms of semantic accuracy. We developed a neural language model trained on a set of SPARQL queries performed on knowledge graphs. Through this model it is possible to semi-automatically generate a semantic map between the attributes of a data source and a domain ontology.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Knowledge graphs are labeled and directed multi-graphs that encode information in the form of entities and relationships. They are gaining attention in different areas of computer science: from the improvement of search engines to the development of virtual personal assistants. Currently, an open challenge in building large-scale knowledge graphs from structured data available on the Web (HTML tables, CSVs, JSONs) is the semantic integration of heterogeneous data sources. In fact, such diverse and scattered information rarely provide a formal description of metadata that is required to accomplish the integration task. In this paper we propose an approach based on neural networks to reconstruct the semantics of data sources to produce high quality knowledge graphs in terms of semantic accuracy. We developed a neural language model trained on a set of SPARQL queries performed on knowledge graphs. Through this model it is possible to semi-automatically generate a semantic map between the attributes of a data source and a domain ontology.
@conference{<LineBreak> 11583_2659315,
title = {ContrattiPubblici.org, a Semantic Knowledge Graph on Public Procurement Information},
author = {Giuseppe Futia and Federico Morando and Alessio Melandri and Lorenzo Canova and Francesco Ruggiero},
url = {https://link.springer.com/chapter/10.1007/978-3-030-00178-0_26},
doi = {10.1007/978-3-030-00178-0_26},
isbn = {978-3-030-00177-3},
year = {2018},
date = {2018-01-01},
urldate = {2018-01-01},
booktitle = {AI Approaches to the Complexity of Legal Systems},
pages = {380–393},
publisher = {Springer},
abstract = {The Italian anti-corruption Act (law n. 190/2012) requires all public administrations to spread procurement information as open data. Each body is therefore obliged to yearly release standardized XML files, on its public website, containing data that describe all issued public contracts. Though this information is currently available on a machine- readable format, the data is fragmented and published in different files on different websites, without a unified and human-readable view of the information. The ContrattiPubblici.org project aims at developing a se- mantic knowledge graph based on linked open data principles in order to overcome the fragmentation of existent datasets and to allow easy anal- ysis and the reuse of information. The objectives are to increase public awareness about public spending, to improve transparency on the public procurement chain and to help companies to retrieve useful knowledge for their business activities.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
The Italian anti-corruption Act (law n. 190/2012) requires all public administrations to spread procurement information as open data. Each body is therefore obliged to yearly release standardized XML files, on its public website, containing data that describe all issued public contracts. Though this information is currently available on a machine- readable format, the data is fragmented and published in different files on different websites, without a unified and human-readable view of the information. The ContrattiPubblici.org project aims at developing a se- mantic knowledge graph based on linked open data principles in order to overcome the fragmentation of existent datasets and to allow easy anal- ysis and the reuse of information. The objectives are to increase public awareness about public spending, to improve transparency on the public procurement chain and to help companies to retrieve useful knowledge for their business activities.
@conference{<LineBreak> 11583_2670034,
title = {Removing Barriers to Transparency: a Case Study on the Use of Semantic Technologies to Tackle Procurement Data Inconsistency},
author = {Giuseppe Futia and Alessio Melandri and Antonio Vetro' and Federico Morando and Juan Carlos De Martin},
url = {https://link.springer.com/chapter/10.1007/978-3-319-58068-5_38},
doi = {10.1007/978-3-319-58068-5_38},
isbn = {978-3-319-58067-8},
year = {2017},
date = {2017-01-01},
urldate = {2017-01-01},
booktitle = {The Semantic Web},
pages = {623–637},
publisher = {Springer},
abstract = {Public Procurement (PP) information, made available as Open Government Data (OGD), leads to tangible benefits to identify government spending for goods and services. Nevertheless, making data freely available is a necessary, but not sufficient condition for improving transparency. Fragmentation of OGD due to diverse processes adopted by different administrations and inconsistency within data affect opportunities to obtain valuable information. In this article, we propose a solution based on linked data to integrate existing datasets and to enhance information coherence. We present an application of such principles through a semantic layer built on Italian PP information available as OGD. As result, we overcame the fragmentation of datasources and increased the consistency of information, enabling new opportunities for analyzing data to fight corruption and for raising competition between companies in the market.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Public Procurement (PP) information, made available as Open Government Data (OGD), leads to tangible benefits to identify government spending for goods and services. Nevertheless, making data freely available is a necessary, but not sufficient condition for improving transparency. Fragmentation of OGD due to diverse processes adopted by different administrations and inconsistency within data affect opportunities to obtain valuable information. In this article, we propose a solution based on linked data to integrate existing datasets and to enhance information coherence. We present an application of such principles through a semantic layer built on Italian PP information available as OGD. As result, we overcame the fragmentation of datasources and increased the consistency of information, enabling new opportunities for analyzing data to fight corruption and for raising competition between companies in the market.