Keys as features for graph entity matching

Published in 2020 IEEE 36th International Conference on Data Engineering (ICDE), 2020

(Download paper here)   (Download video here)

Abstract

Keys for graphs aim to uniquely identify entities represented by vertices in a graph, using the combination of topological constraints and value equality constraints. This paper proposes graph matching keys, referred to as GMKs, an extension of graph keys with similarity predicates on values, supporting approximation entity matching. We treat entity matching as a classification problem, and propose GMKSLEM, a supervised learning method for graph entity matching. In GMKSLEM, a feature extraction method is provided to discover candidate GMKs (CGMKs) to construct features for vector representation, and then high-quality features and representations are generated by feature selection. Moreover, GMKSLEM provides support to explain the classification results. Using real-life data, we experimentally verify the effectiveness of GMKSLEM, as well as its interpretability.

Ting Deng, Lei Hou, and Ziyan Han. "Keys as features for graph entity matching." 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.