CoreDB: a Data Lake service

Amin Beheshti, Boualem Benatallah, Reza Nouri, Van Munin Chhieng, Huangtao Xiong, Xu Zhao

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

48 Citations (Scopus)

Abstract

The continuous improvement in connectivity, storage and data processing capabilities allow access to a data deluge from sensors, social-media, news, user-generated, government and private data sources. Accordingly, in a modern data-oriented landscape, with the advent of various data capture and management technologies, organizations are rapidly shifting to datafication of their processes. In such an environment, analysts may need to deal with a collection of datasets, from relational to NoSQL, that holds a vast amount of data gathered from various private/open data islands, i.e. Data Lake. Organizing, indexing and querying the growing volume of internal data and metadata, in a data lake, is challenging and requires various skills and experiences to deal with dozens of new databases and indexing technologies: How to store information items? What technology to use for persisting the data? How to deal with the large volume of streaming data? How to trace and persist information about data? What technology to use for indexing the data? How to query the data lake? To address the above mentioned challenges, we present CoreDB-an open source data lake service-which offers researchers and developers a single REST API to organize, index and query their data and metadata. CoreDB manages multiple database technologies and offers a built-in design for security and tracing.

Original languageEnglish
Title of host publicationCIKM 2017
Subtitle of host publicationProceedings of the 2017 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages2451-2454
Number of pages4
VolumePart F131841
ISBN (Electronic)9781450349185
DOIs
Publication statusPublished - 6 Nov 2017
Externally publishedYes
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: 6 Nov 201710 Nov 2017

Conference

Conference26th ACM International Conference on Information and Knowledge Management, CIKM 2017
Country/TerritorySingapore
CitySingapore
Period6/11/1710/11/17

Keywords

  • Data API
  • Data lake
  • Database service

Fingerprint

Dive into the research topics of 'CoreDB: a Data Lake service'. Together they form a unique fingerprint.

Cite this