Class ExtendedQuadTree<T>

  • All Implemented Interfaces:
    Serializable

    public class ExtendedQuadTree<T>
    extends PartitioningUtils
    implements Serializable
    The ExtendedQuadTree class uses a modified quad-tree approach for partitioning spatial data, as described in "Simba: Efficient In-Memory Spatial Analytics".

    In this approach, a global R-tree index is constructed by taking a set of random samples from the dataset and building the R-tree on the master node. For each partition, the distance from the furthest point in the partition to its centroid is calculated. Using the R-tree, the k-nearest neighbors of each centroid are found, and a distance bound is derived for each partition. This bound ensures that the k-nearest neighbors of any point in a partition can be found within a subset of the data identified by a circle range query centered at the centroid with the derived radius. This method guarantees efficient and accurate k-nearest neighbor joins by leveraging both local and global spatial indexing.

    See Also:
    Serialized Form
    • Constructor Detail

      • ExtendedQuadTree

        public ExtendedQuadTree​(org.locationtech.jts.geom.Envelope boundary,
                                int numPartitions)
        Constructor to initialize the partitions list.
        Parameters:
        boundary -
        numPartitions -
      • ExtendedQuadTree

        public ExtendedQuadTree​(ExtendedQuadTree<?> extendedQuadTree,
                                boolean useNonOverlapped)
        Constructor to initialize the partitions list with non-overlapped boundaries.
        Parameters:
        extendedQuadTree -
        useNonOverlapped -
    • Method Detail

      • getExpandedBoundaries

        public HashMap<Integer,​List<org.locationtech.jts.geom.Envelope>> getExpandedBoundaries()
      • getSpatialExpandedBoundaryIndex

        public org.locationtech.jts.index.strtree.STRtree getSpatialExpandedBoundaryIndex()
      • getBoundary

        public org.locationtech.jts.geom.Envelope getBoundary()
        Returns the boundary of the partition zones.
        Returns:
      • getPartitionNum

        public int getPartitionNum()
        Returns the number of partitions.
        Returns:
      • insert

        public void insert​(org.locationtech.jts.geom.Envelope sample)
        Insert a new sample into the sample list.
        Parameters:
        sample - Envelope object to be inserted.
      • placeObject

        public Iterator<scala.Tuple2<Integer,​org.locationtech.jts.geom.Geometry>> placeObject​(org.locationtech.jts.geom.Geometry geometry)
        Check the geometry against the partition zones to find the IDs of overlapping ranges. Note that the geometry can be in multiple ranges because ranges can overlap.
        Specified by:
        placeObject in class PartitioningUtils
        Parameters:
        geometry - Geometry object to be placed.
        Returns:
        Iterator of Tuple2 containing partition ID and the corresponding geometry.
      • getKeys

        public Set<Integer> getKeys​(org.locationtech.jts.geom.Geometry geometry)
        Check the geometry against the partition zones to find the IDs of overlapping ranges. Only returns the IDs of the overlapping partitions. Note that the geometry can be in multiple ranges because ranges can overlap.
        Specified by:
        getKeys in class PartitioningUtils
        Parameters:
        geometry - Geometry object to be checked.
        Returns:
        Set of integers representing the IDs of the overlapping partitions.
      • build

        public void build​(int neighborSampleNumber)
                   throws Exception
        Builds the quad-tree partitioning structure and calculates the expanded boundaries for efficient spatial partitioning.

        This method performs the following steps: 1. Forces the quad-tree to grow to a minimum level to ensure a sufficient number of partitions, which might slightly differ from the specified number. 2. Constructs the quad-tree partitioning using the given samples and boundary, initializing it to the calculated minimum level. 3. Creates the expanded boundaries by building an STR (Sort-Tile-Recursive) tree using the provided number of neighbor samples and sampling probability. 4. Clears the samples to avoid broadcasting them to all nodes involved in partitioning.

        Parameters:
        neighborSampleNumber - the number of neighbor samples to consider for building the STR tree.
        Throws:
        Exception