A hybrid machine-crowdsourcing approach for web table matching and cleaning

Chunhua Li, Pengpeng Zhao*, Victor S. Sheng, Zhixu Li, Guanfeng Liu, Jian Wu, Zhiming Cui

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Table matching and data cleaning are two crucial activities in integrating data from different web tables, which have traditionally been considered as separate activities. We show that data cleaning can effectively help us discover table matches, and vice versa. In this paper, we study a hybrid machine-crowdsourcing approach to handle the two activities together with a well-developed knowledge base. Understanding the semantics of tables is fundamental to both matching and cleaning.We select the most valuable columns to crowdsourcing validation and infer others by consolidating crowdsourcing results and machine-generated results. When resolving inconsistency between data and semantics, relative trust is taken into account to validate data or semantics via crowd. Our experimental results show the effectiveness of the proposed approach for matching and cleaning web tables using real-life datasets.

Original languageEnglish
Title of host publicationWeb-Age Information Management
Subtitle of host publication17th International Conference, WAIM 2016, Proceedings, Part II
EditorsBin Cui, Nan Zhang, Jianliang Xu, Xiang Lian, Dexi Liu
Place of PublicationSwitzerland
PublisherSpringer-VDI-Verlag GmbH & Co. KG
Pages132-144
Number of pages13
ISBN (Electronic)9783319399584
ISBN (Print)9783319399577
DOIs
Publication statusPublished - 1 Jan 2016
Externally publishedYes
Event17th International Conference on Web-Age Information Management, WAIM 2016 - Nanchang, China
Duration: 3 Jun 20165 Jun 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9659
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Web-Age Information Management, WAIM 2016
Country/TerritoryChina
CityNanchang
Period3/06/165/06/16

Keywords

  • Crowdsourcing
  • Data cleaning
  • Table matching

Fingerprint

Dive into the research topics of 'A hybrid machine-crowdsourcing approach for web table matching and cleaning'. Together they form a unique fingerprint.

Cite this