TY - JOUR
T1 - A fast and progressive algorithm for skyline queries with totally- and partially-ordered domains
AU - Jung, Hyungsoo
AU - Han, Hyuck
AU - Yeom, Heon Y.
AU - Kang, Sooyong
PY - 2010/3
Y1 - 2010/3
N2 - We devise a skyline algorithm that can efficiently mitigate the enormous overhead of processing millions of tuples on totally- and partially-ordered domains (henceforth, TODs and PODs). With massive datasets, existing techniques spend a significant amount of time on a dominance comparison because of both a large number of skyline points and the unprogressive method of skyline computing with PODs. (If data has high dimensionality, the situation is undoubtedly aggravated.) The progressiveness property turns out to be the key feature for solving all remaining problems. This article presents a FAST-SKY algorithm that deals successfully with these two obstacles and improves skyline query processing time strikingly, even with high-dimensional data. Progressive skyline evaluation with PODs is guaranteed by new index structures and topological sorting order. A stratification technique is adopted to index data on PODs, and we propose two new index structures: stratified R-trees (SR-trees) for low-dimensional data and stratified MinMax treaps (SM-treaps) for high-dimensional data. A fast dominance comparison is achieved by using a reporting query instead of a dominance query, and a dimensionality reduction technique. Experimental results suggest that in general cases (anti-correlated and uniform distributions) FAST-SKY is orders of magnitude faster than existing algorithms.
AB - We devise a skyline algorithm that can efficiently mitigate the enormous overhead of processing millions of tuples on totally- and partially-ordered domains (henceforth, TODs and PODs). With massive datasets, existing techniques spend a significant amount of time on a dominance comparison because of both a large number of skyline points and the unprogressive method of skyline computing with PODs. (If data has high dimensionality, the situation is undoubtedly aggravated.) The progressiveness property turns out to be the key feature for solving all remaining problems. This article presents a FAST-SKY algorithm that deals successfully with these two obstacles and improves skyline query processing time strikingly, even with high-dimensional data. Progressive skyline evaluation with PODs is guaranteed by new index structures and topological sorting order. A stratification technique is adopted to index data on PODs, and we propose two new index structures: stratified R-trees (SR-trees) for low-dimensional data and stratified MinMax treaps (SM-treaps) for high-dimensional data. A fast dominance comparison is achieved by using a reporting query instead of a dominance query, and a dimensionality reduction technique. Experimental results suggest that in general cases (anti-correlated and uniform distributions) FAST-SKY is orders of magnitude faster than existing algorithms.
KW - Optimality
KW - Partially-ordered domain
KW - Progressiveness
KW - Skyline computation
UR - http://www.scopus.com/inward/record.url?scp=75349085532&partnerID=8YFLogxK
U2 - 10.1016/j.jss.2009.09.032
DO - 10.1016/j.jss.2009.09.032
M3 - Article
AN - SCOPUS:75349085532
VL - 83
SP - 429
EP - 445
JO - The Journal of Systems and Software
JF - The Journal of Systems and Software
SN - 0164-1212
IS - 3
ER -