Abstract
Background: Datasets available for abstract sentence classification modelling are predominately comprised of abstracts sourced from biomedical research.
Aims: To contribute a large non-biomedical multidisciplinary dataset for abstract sentence classification model research.
Method: Bulk extract and transformation of Emerald Group Publishing structured abstracts indexed on Scopus.
Results: We present the largest multidisciplinary dataset for abstract sentence classification modelling, consisting of 1,050,397 sentences from 103,457 abstracts.
Aims: To contribute a large non-biomedical multidisciplinary dataset for abstract sentence classification model research.
Method: Bulk extract and transformation of Emerald Group Publishing structured abstracts indexed on Scopus.
Results: We present the largest multidisciplinary dataset for abstract sentence classification modelling, consisting of 1,050,397 sentences from 103,457 abstracts.
Original language | English |
---|---|
Title of host publication | Proceedings of the 17th Workshop of the Australasian Language Technology Association |
Editors | Meladel Mistica, Massimo Piccardi, Andrew MacKinlay |
Place of Publication | Melbourne, VIC |
Publisher | Australasian Language Technology Association |
Pages | 120-125 |
Number of pages | 6 |
Publication status | Published - 2019 |
Event | 17th Annual Workshop of The Australasian Language Technology Association (ALTA 2019) - Sydney, Australia Duration: 4 Dec 2019 → 6 Dec 2019 |
Conference
Conference | 17th Annual Workshop of The Australasian Language Technology Association (ALTA 2019) |
---|---|
Country/Territory | Australia |
City | Sydney |
Period | 4/12/19 → 6/12/19 |
Bibliographical note
Copyright the Publisher 2019. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Keywords
- Structured abstracts
- Natural language processing
- information systems