Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a reason that Cassandra doesn't have Geospatial support?

Since Cassandra is based off of the Dynamo paper (distributed, self-balancing hash table) + BigTable and there are spatial indexes that would fit nicely into that paradigm (quadkey or geohash). Is there a reason that Geospatial support hasn't been implemented?

You could add a GeoPoint datatype as a tuple with an internal geohash and specify a CF as containing geo data. From there you can choose the behavior as having the geo data being a secondary index, or a denormalized SCF. That could lay the ground work for geospatial development and you could start by implementing some low hanging fruit such as .nearby() which could just return columns that share the same geohash. (I know that wouldn't give you the "nearest", you'd have to do a walk of surrounding geohashes or use a shape and a space filling curve for that which could be implemented later, but is a general operation for finding some nearby columns)

I know SimpleGeo/Urban Airship built geo support into Cassandra, but it doesn't look like that was ever opened up. Also, let me know if there's a better place to ask this (quora, mailing lists, etc...)

like image 598
agentargo Avatar asked Apr 04 '14 16:04

agentargo


People also ask

What type of database is Cassandra?

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Why use Cassandra?

Cassandra's benefits include: Open source: Increases innovation, speed of implementation, flexibility, and extensibility. More cost effective, while avoiding vendor lock-in. Handles a high volume of data with ease: Built to handle a massive amount of data across many servers.


1 Answers

I think there are two parts to the answer.

The reason for why it's not there, is because nobody who commits code into Cassandra has thought of this feature, or thought that this capability is of high enough priority to spend major time on it. Most of the development in Cassandra is done by Datastax, and they, being a commercial entity, are privy to user demands and suggestions and also pretty pragmatic about what can give them the most ROI in terms of new features.

If there were a good enough third-party developer (or a team) with enough time on their hands, this could be done, and conceptually C* committers would likely have no problems about adding a major feature like this.

The second aspect is that Cassandra supports blobs (byte arrays), which means that what you're describing can be implemented in the client app/driver in a relatively straightforward manner. The drive would in that case be responsible for translating geo calls into appropriate raw byte operations. I'm also suspecting this would be less work than supporting a whole new data primitive with relevant set of operators in the core storage engine.

like image 162
Daniel S. Avatar answered Sep 25 '22 15:09

Daniel S.