TY - JOUR
T1 - Data-dependent Rectangular Bounding Processes
AU - Fan, Xuhui
AU - Li, Bin
AU - Rahman, Prosha A.
AU - Sisson, Scott A.
PY - 2025/10/6
Y1 - 2025/10/6
N2 - Stochastic partition processes divide a multi-dimensional space into a number of regions, such that the data within each region exhibit some form of homogeneity. Due to the nature of their partition strategies, partition processes can often create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a parsimonious partition model – the Rectangular Bounding Process (RBP) – to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. The RBP is self-consistent and as such can be directly extended from a finite hypercube to an infinite (unbounded) space. We extend the RBP to establish a data-dependent RBP (data-RBP) to generate bounding boxes only over existing data points in a sequential manner, which can effectively reduce model complexity and enable online learning. To achieve this, we design an alternative way to generate bounding boxes and prove the distributional equivalence between the data-RBP and the RBP when empty boxes are removed. We demonstrate application of the RBP and the data-RBP in three scenarios: regression trees, relational modelling, and random feature construction for online learning. Extensive experimental results validate the performance of the RBP and the data-RBP for both accuracy and efficiency.
AB - Stochastic partition processes divide a multi-dimensional space into a number of regions, such that the data within each region exhibit some form of homogeneity. Due to the nature of their partition strategies, partition processes can often create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a parsimonious partition model – the Rectangular Bounding Process (RBP) – to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. The RBP is self-consistent and as such can be directly extended from a finite hypercube to an infinite (unbounded) space. We extend the RBP to establish a data-dependent RBP (data-RBP) to generate bounding boxes only over existing data points in a sequential manner, which can effectively reduce model complexity and enable online learning. To achieve this, we design an alternative way to generate bounding boxes and prove the distributional equivalence between the data-RBP and the RBP when empty boxes are removed. We demonstrate application of the RBP and the data-RBP in three scenarios: regression trees, relational modelling, and random feature construction for online learning. Extensive experimental results validate the performance of the RBP and the data-RBP for both accuracy and efficiency.
UR - http://www.scopus.com/inward/record.url?scp=105018359805&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2025.3618585
DO - 10.1109/TPAMI.2025.3618585
M3 - Article
C2 - 41052111
AN - SCOPUS:105018359805
SN - 0162-8828
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
ER -