Supporting accessibility and reproducibility in language research in the Alveo virtual laboratory

Steve Cassidy, Dominique Estival

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Reproducibility is an important part of scientific research and studies published in speech and language research usually make some attempt at ensuring that the work reported could be reproduced by other researchers. This paper looks at the current practice in the field relating to the citation and availability of both data and software methods. It is common to use widely available shared datasets in this field which helps to ensure that studies can be reproduced; however a brief survey of recent papers shows a wide range of styles of citation of data only some of which clearly identify the exact data used in the study. Similarly, practices in describing and sharing software artefacts vary considerably from detailed descriptions of algorithms to linked repositories. The Alveo Virtual Laboratory is a web based platform to support research based on collections of text, speech and video. Alveo provides a central repository for language data and provides a set of services for discovery and analysis of data. We argue that some of the features of the Alveo platform may make it easier for researchers to share their data more precisely and cite the exact software tools used to develop published results. Alveo makes use of ideas developed in other areas of science and we discuss these and how they can be applied to speech and language research.

LanguageEnglish
Pages375-391
Number of pages17
JournalComputer Speech and Language
Volume45
DOIs
Publication statusPublished - Sep 2017

Fingerprint

Virtual Laboratory
Reproducibility
Accessibility
Citations
Repository
Software
Availability
Software Tools
Web-based
Language
Sharing
Vary
Range of data
Speech

Keywords

  • Corpus infrastructure
  • Data citation
  • EResearch
  • Reproducibility
  • Research methods
  • Research workflow

Cite this

@article{5bc0f40f17b0434b9c11eced5d4d53a0,
title = "Supporting accessibility and reproducibility in language research in the Alveo virtual laboratory",
abstract = "Reproducibility is an important part of scientific research and studies published in speech and language research usually make some attempt at ensuring that the work reported could be reproduced by other researchers. This paper looks at the current practice in the field relating to the citation and availability of both data and software methods. It is common to use widely available shared datasets in this field which helps to ensure that studies can be reproduced; however a brief survey of recent papers shows a wide range of styles of citation of data only some of which clearly identify the exact data used in the study. Similarly, practices in describing and sharing software artefacts vary considerably from detailed descriptions of algorithms to linked repositories. The Alveo Virtual Laboratory is a web based platform to support research based on collections of text, speech and video. Alveo provides a central repository for language data and provides a set of services for discovery and analysis of data. We argue that some of the features of the Alveo platform may make it easier for researchers to share their data more precisely and cite the exact software tools used to develop published results. Alveo makes use of ideas developed in other areas of science and we discuss these and how they can be applied to speech and language research.",
keywords = "Corpus infrastructure, Data citation, EResearch, Reproducibility, Research methods, Research workflow",
author = "Steve Cassidy and Dominique Estival",
year = "2017",
month = "9",
doi = "10.1016/j.csl.2017.01.003",
language = "English",
volume = "45",
pages = "375--391",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",

}

Supporting accessibility and reproducibility in language research in the Alveo virtual laboratory. / Cassidy, Steve; Estival, Dominique.

In: Computer Speech and Language, Vol. 45, 09.2017, p. 375-391.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Supporting accessibility and reproducibility in language research in the Alveo virtual laboratory

AU - Cassidy, Steve

AU - Estival, Dominique

PY - 2017/9

Y1 - 2017/9

N2 - Reproducibility is an important part of scientific research and studies published in speech and language research usually make some attempt at ensuring that the work reported could be reproduced by other researchers. This paper looks at the current practice in the field relating to the citation and availability of both data and software methods. It is common to use widely available shared datasets in this field which helps to ensure that studies can be reproduced; however a brief survey of recent papers shows a wide range of styles of citation of data only some of which clearly identify the exact data used in the study. Similarly, practices in describing and sharing software artefacts vary considerably from detailed descriptions of algorithms to linked repositories. The Alveo Virtual Laboratory is a web based platform to support research based on collections of text, speech and video. Alveo provides a central repository for language data and provides a set of services for discovery and analysis of data. We argue that some of the features of the Alveo platform may make it easier for researchers to share their data more precisely and cite the exact software tools used to develop published results. Alveo makes use of ideas developed in other areas of science and we discuss these and how they can be applied to speech and language research.

AB - Reproducibility is an important part of scientific research and studies published in speech and language research usually make some attempt at ensuring that the work reported could be reproduced by other researchers. This paper looks at the current practice in the field relating to the citation and availability of both data and software methods. It is common to use widely available shared datasets in this field which helps to ensure that studies can be reproduced; however a brief survey of recent papers shows a wide range of styles of citation of data only some of which clearly identify the exact data used in the study. Similarly, practices in describing and sharing software artefacts vary considerably from detailed descriptions of algorithms to linked repositories. The Alveo Virtual Laboratory is a web based platform to support research based on collections of text, speech and video. Alveo provides a central repository for language data and provides a set of services for discovery and analysis of data. We argue that some of the features of the Alveo platform may make it easier for researchers to share their data more precisely and cite the exact software tools used to develop published results. Alveo makes use of ideas developed in other areas of science and we discuss these and how they can be applied to speech and language research.

KW - Corpus infrastructure

KW - Data citation

KW - EResearch

KW - Reproducibility

KW - Research methods

KW - Research workflow

UR - http://www.scopus.com/inward/record.url?scp=85013797411&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2017.01.003

DO - 10.1016/j.csl.2017.01.003

M3 - Article

VL - 45

SP - 375

EP - 391

JO - Computer Speech and Language

T2 - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

ER -