TY - JOUR
T1 - Property Graph vs RDF triple store
T2 - a comparison on glycan substructure search
AU - Alocci, Davide
AU - Mariethoz, Julien
AU - Horlacher, Oliver
AU - Bolleman, Jerven T.
AU - Campbell, Matthew P.
AU - Lisacek, Frederique
N1 - Copyright the Author(s) 2015. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
PY - 2015/12/14
Y1 - 2015/12/14
N2 - Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data.We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph.We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues.
AB - Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data.We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph.We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues.
UR - http://www.scopus.com/inward/record.url?scp=84957111696&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0144578
DO - 10.1371/journal.pone.0144578
M3 - Article
C2 - 26656740
AN - SCOPUS:84957111696
SN - 1932-6203
VL - 10
SP - 1
EP - 17
JO - PLoS ONE
JF - PLoS ONE
IS - 12
M1 - e0144578
ER -