Accurate Modeling of Region Data

Proietti, Guido; Faloutos, C.

Spatial data appear in numerous applications, such as GIS, multimedia and even traditional databases. Most of the analysis on spatial data has focused on point data, typically using the uniformity assumption, or, more accurately, a fractal distribution. However, no results exist for nonpoint spatial data, like 2D regions (e.g., islands), 3D volumes (e.g., physical objects in the real world), etc. This is exactly the problem we solve in this paper. Based on experimental evidence that real areas and volumes follow a "power law," that we named REGAL (REGion Area Law), we show 1) the theoretical implications of our model and its connection with the ubiquitous fractals and 2) the first of its practical uses, namely, the selectivity estimation for range queries. Experiments on a variety of real data sets (islands, lakes, and human-inhabited areas) show that our method is extremely accurate, enjoying a maximum relative error ranging from 1 to 5 percent, versus 30-70 percent of a naive model that uses the uniformity assumption.