In this paper, we study the node distribution of an R-tree storing region data, like, for instance, islands, lakes, or human-inhabited areas. We will show that real region datasets are packed in an R-tree into minimum bounding rectangles (MBRs) whose area distribution follows the same power law, named REGAL (REGion Area Law), as that for the regions themselves. Moreover, these MBRs are packed in their turn into MBRs following the same law, and so on iteratively, up to the root of the R-tree. Based on this observation, we are able to accurately estimate the search effort for range queries, using a small number of easy-to-retrieve parameters. Furthermore, since our analysis exploits, through a realistic mathematical model, the proximity relations existing among the regions in the dataset, we show how to use our model to predict the selectivity of a self-spatial join query posed on the dataset. Experiments on a variety of real datasets (islands, lakes, human-inhabited areas) show that our estimations are accurate, enjoying a geometric average relative error ranging from 22 percent to 32 percent for the search effort of a range query, and from 14 percent to 34 percent for the selectivity of a self-spatial join query. This is significantly better than using a naive model based on uniformity assumption, which gives rise to a geometric average relative error up to 270 percent and up to 85 percent for the two problems, respectively.
Analysis of Range Queries and Self Spatial Join Queries on Real Region Datasets Stored Using an R-tree
PROIETTI, GUIDO;
2000-01-01
Abstract
In this paper, we study the node distribution of an R-tree storing region data, like, for instance, islands, lakes, or human-inhabited areas. We will show that real region datasets are packed in an R-tree into minimum bounding rectangles (MBRs) whose area distribution follows the same power law, named REGAL (REGion Area Law), as that for the regions themselves. Moreover, these MBRs are packed in their turn into MBRs following the same law, and so on iteratively, up to the root of the R-tree. Based on this observation, we are able to accurately estimate the search effort for range queries, using a small number of easy-to-retrieve parameters. Furthermore, since our analysis exploits, through a realistic mathematical model, the proximity relations existing among the regions in the dataset, we show how to use our model to predict the selectivity of a self-spatial join query posed on the dataset. Experiments on a variety of real datasets (islands, lakes, human-inhabited areas) show that our estimations are accurate, enjoying a geometric average relative error ranging from 22 percent to 32 percent for the search effort of a range query, and from 14 percent to 34 percent for the selectivity of a self-spatial join query. This is significantly better than using a naive model based on uniformity assumption, which gives rise to a geometric average relative error up to 270 percent and up to 85 percent for the two problems, respectively.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.