Spatial In-Memory Big data Analytics

Simba is a distributed in-memory spatial analytics engine based on Apache Spark. It extends the Spark SQL engine across the system stack to support rich spatial queries and analytics through both SQL and DataFrame query interfaces. Besides, Simba introduces native indexing support over RDDs in order to develop efficient spatial operators. It also extends Spark SQL's query optimizer with spatial-aware and cost-based optimizations to make the best use of existing indexes and statistics.

Download

Core Features

SQL & DataFrame API

Simba extends the SQL and DatFrame query interfaces of Spark SQL, providing a natural way to express complex spatial analysis queries.

Index over RDDs

Simba supports building native (spatial) indexes over RDDs inside the kernel to achieve superior query performance over large data sets.

Efficient Algorithms

Simba implements efficient algorithms for different spatial operators, which are tailored to its indexing support and underlying Spark engine.

Query Optimizations

Simba introduces spatial and index-aware optimizations to both logical and physical optimizers of Spark SQL, and utilizes a CBO module to select good query plans.

Publications

Venue	Publication	Link
VLDB 2017	Distributed Trajectory Similarity Search Dong Xie, Feifei Li, Jeff M. Philips	pdf
SIGMOD 2016	Simba: Efficient In-Memory Spatial Analytics Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, Minyi Guo	pdf
SIGSPATIAL 2016	Simba: Spatial In-Memory Big Data Analysis (Demo Paper) Dong Xie, Feifei Li, Bin Yao, Gefei Li, Zhongpu Chen, Liang Zhou, Minyi Guo	pdf