Simba is a distributed in-memory spatial analytics engine based on Apache Spark. It extends the Spark SQL engine across the system stack to support rich spatial queries and analytics through both SQL and DataFrame query interfaces. Besides, Simba introduces native indexing support over RDDs in order to develop efficient spatial operators. It also extends Spark SQL's query optimizer with spatial-aware and cost-based optimizations to make the best use of existing indexes and statistics.
DownloadSimba extends the SQL and DatFrame query interfaces of Spark SQL, providing a natural way to express complex spatial analysis queries.
Simba supports building native (spatial) indexes over RDDs inside the kernel to achieve superior query performance over large data sets.
Simba implements efficient algorithms for different spatial operators, which are tailored to its indexing support and underlying Spark engine.
Simba introduces spatial and index-aware optimizations to both logical and physical optimizers of Spark SQL, and utilizes a CBO module to select good query plans.
Venue | Publication | Link |
---|---|---|
VLDB 2017 | Distributed Trajectory Similarity Search Dong Xie, Feifei Li, Jeff M. Philips |
|
SIGMOD 2016 | Simba: Efficient In-Memory Spatial Analytics Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, Minyi Guo |
|
SIGSPATIAL 2016 | Simba: Spatial In-Memory Big Data Analysis (Demo Paper) Dong Xie, Feifei Li, Bin Yao, Gefei Li, Zhongpu Chen, Liang Zhou, Minyi Guo |