Abstract
The last few years have seen a rapid increase of sheer amount of data produced and communicated over the Internet and the Web. While it is widely believed that the availability of such "Big Data" holds the potential to revolutionize many aspects of our modern society (e.g., intelligent transportation, environmental monitoring, and energy saving), many challenges need to be addressed before this potential can be realized. This PhD project focuses on one critical challenge, namely extracting actionable knowledge from Big Data. Tremendous efforts have been contributed on mining large-scale data on the Web and constructing comprehensive knowledge bases (KBs). However, existing knowledge extraction systems retrieve data from limited types of Web sources. In addition, data fusion approaches consider very little of the noises produced by those knowledge extraction systems. Consequently, the constructed KBs are far from being comprehensive and accurate. In this paper, we present our initial design of a framework for extracting machine-readable data with high precision and recall from four types of data sources, namely Web texts, Document Object Model (DOM) trees, existing KBs, and query stream. Confidence scores are attached to the resulting knowledge, which can be used to further improve the knowledge fusion results.
Original language | English |
---|---|
Title of host publication | SIGMOD 2015 PhD Symposium - Proceedings of the 2015 ACM SIGMOD PhD Symposium |
Publisher | Association for Computing Machinery |
Pages | 3-8 |
Number of pages | 6 |
Volume | 2015 |
ISBN (Electronic) | 9781450335294 |
DOIs | |
Publication status | Published - 31 May 2015 |
Externally published | Yes |
Event | 2015 ACM SIGMOD/PODS Ph.D. Symposium, SIGMOD 2015 - Melbourne, Australia Duration: 31 May 2015 → … |
Other
Other | 2015 ACM SIGMOD/PODS Ph.D. Symposium, SIGMOD 2015 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 31/05/15 → … |
Keywords
- DOM tree
- knowledge base
- knowledge fusion