Abstract
The continuous improvement in connectivity, storage and data processing capabilities allow access to a data deluge from sensors, social-media, news, user-generated, government and private data sources. Accordingly, in a modern data-oriented landscape, with the advent of various data capture and management technologies, organizations are rapidly shifting to datafication of their processes. In such an environment, analysts may need to deal with a collection of datasets, from relational to NoSQL, that holds a vast amount of data gathered from various private/open data islands, i.e. Data Lake. Organizing, indexing and querying the growing volume of internal data and metadata, in a data lake, is challenging and requires various skills and experiences to deal with dozens of new databases and indexing technologies: How to store information items? What technology to use for persisting the data? How to deal with the large volume of streaming data? How to trace and persist information about data? What technology to use for indexing the data? How to query the data lake? To address the above mentioned challenges, we present CoreDB-an open source data lake service-which offers researchers and developers a single REST API to organize, index and query their data and metadata. CoreDB manages multiple database technologies and offers a built-in design for security and tracing.
Original language | English |
---|---|
Title of host publication | CIKM 2017 |
Subtitle of host publication | Proceedings of the 2017 ACM Conference on Information and Knowledge Management |
Publisher | Association for Computing Machinery |
Pages | 2451-2454 |
Number of pages | 4 |
Volume | Part F131841 |
ISBN (Electronic) | 9781450349185 |
DOIs | |
Publication status | Published - 6 Nov 2017 |
Externally published | Yes |
Event | 26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore Duration: 6 Nov 2017 → 10 Nov 2017 |
Conference
Conference | 26th ACM International Conference on Information and Knowledge Management, CIKM 2017 |
---|---|
Country/Territory | Singapore |
City | Singapore |
Period | 6/11/17 → 10/11/17 |
Keywords
- Data API
- Data lake
- Database service