TY - GEN
T1 - CrowdCorrect
T2 - CAiSE Forum 2018 held as part of the 30th International Conference on Advanced Information Systems Engineering, CAiSE 2018
AU - Beheshti, Amin
AU - Vaghani, Kushal
AU - Benatallah, Boualem
AU - Tabebordbar, Alireza
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Process and data are equally important for business process management. Data-driven approaches in process analytics aims to value decisions that can be backed up with verifiable private and open data. Over the last few years, data-driven analysis of how knowledge workers and customers interact in social contexts, often with data obtained from social networking services such as Twitter and Facebook, have become a vital asset for organizations. For example, governments started to extract knowledge and derive insights from vastly growing open data to improve their services. A key challenge in analyzing social data is to understand the raw data generated by social actors and prepare it for analytic tasks. In this context, it is important to transform the raw data into a contextualized data and knowledge. This task, known as data curation, involves identifying relevant data sources, extracting data and knowledge, cleansing, maintaining, merging, enriching and linking data and knowledge. In this paper we present CrowdCorrect, a data curation pipeline to enable analysts cleansing and curating social data and preparing it for reliable business data analytics. The first step offers automatic feature extraction, correction and enrichment. Next, we design micro-tasks and use the knowledge of the crowd to identify and correct information items that could not be corrected in the first step. Finally, we offer a domain-model mediated method to use the knowledge of domain experts to identify and correct items that could not be corrected in previous steps. We adopt a typical scenario for analyzing Urban Social Issues from Twitter as it relates to the Government Budget, to highlight how CrowdCorrect significantly improves the quality of extracted knowledge compared to the classical curation pipeline and in the absence of knowledge of the crowd and domain experts.
AB - Process and data are equally important for business process management. Data-driven approaches in process analytics aims to value decisions that can be backed up with verifiable private and open data. Over the last few years, data-driven analysis of how knowledge workers and customers interact in social contexts, often with data obtained from social networking services such as Twitter and Facebook, have become a vital asset for organizations. For example, governments started to extract knowledge and derive insights from vastly growing open data to improve their services. A key challenge in analyzing social data is to understand the raw data generated by social actors and prepare it for analytic tasks. In this context, it is important to transform the raw data into a contextualized data and knowledge. This task, known as data curation, involves identifying relevant data sources, extracting data and knowledge, cleansing, maintaining, merging, enriching and linking data and knowledge. In this paper we present CrowdCorrect, a data curation pipeline to enable analysts cleansing and curating social data and preparing it for reliable business data analytics. The first step offers automatic feature extraction, correction and enrichment. Next, we design micro-tasks and use the knowledge of the crowd to identify and correct information items that could not be corrected in the first step. Finally, we offer a domain-model mediated method to use the knowledge of domain experts to identify and correct items that could not be corrected in previous steps. We adopt a typical scenario for analyzing Urban Social Issues from Twitter as it relates to the Government Budget, to highlight how CrowdCorrect significantly improves the quality of extracted knowledge compared to the classical curation pipeline and in the absence of knowledge of the crowd and domain experts.
UR - http://www.scopus.com/inward/record.url?scp=85048638999&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-92901-9_3
DO - 10.1007/978-3-319-92901-9_3
M3 - Conference proceeding contribution
AN - SCOPUS:85048638999
SN - 9783319929002
T3 - Lecture Notes in Business Information Processing
SP - 24
EP - 38
BT - Information systems in the big data era
A2 - Mendling, Jan
A2 - Mouratidis, Haralambos
PB - Springer-VDI-Verlag GmbH & Co. KG
Y2 - 11 June 2018 through 15 June 2018
ER -