I am considering a Proof of concept for handling large volumes of data like > 10 G which requires atleast 200+ writes per second and about 50+ reads per second of spatial related data. This is a growing system as well. Currently I am considering moving this big volume data into a NoSql big table kind of db for performance reasons.
I have considered and taken some closer look at MongoDB and cassandra. As far as my reading goes,
Mongodb: - seems to have a writer lock problem - one of the posts in stackoverflow suggested this db if there is no need for multiple servers - indexes kept on memory. So the bigger the index growth, the performance is said to deteriorate - advantage is Mongodb has direct support for spatial data & indexing along with features like finding nearby locations etc., - I see this post Cassandra Or MongoDB For Our Location Based Application suggesting mongodb as the best choice
Cassandra:
- Seems to be the best of among the related dbs - Seems to have great write as well as read performance - Does not natively support spatial indexing but this can be extended via geohashing
My heart actually goes out for mongodb because of its good documentation and direct support for spatial data. Has any body had bad experience using mongodb for such big systems? I actually see lot of posts on mongodb iostat for performance.
If mongodb is not suited, can someone give some pointers on geohashing using cassandra? I saw the link http://code.google.com/p/geospatialweb/ for creating the hashes. But there are questions on how to query etc.?
MongoDB does have an advantage if your data model includes nested objects which require indexes as it has better support for secondary indexes. Cassandra however only has cursory support for secondary indexes. Secondary indexes are also limited to single columns and what are called equality comparisons.
The most common way that spatial data is processed and analyzed is using a GIS, or, geographic information system. These are programs or a combination of programs that work together to help users make sense of their spatial data.
NoSQL databases like MongoDB and ElasticSearch are good at handling large datasets and have decent Geospatial support. We also have Graph databases like Neo4j that are good at handling large datasets and support Geospatial queries.
With spatial data you can discover growth insights, manage facilities and networks, and provide location information to customer. Without considering spatial components and how they relate to your business, your risks and possibility of poor results will increase.
I realize this is an older question and I know that it doesn't directly answer your question, but depending on your queries, Cassandra may not be the best option, And getting your queries to work with indexing in MongoDB can be problematic as well (in my own experience). Mongo has a slight edge over Cassandra for heavy geo data and queries imho.
I'd suggest also consider looking into ElasticSearch, which depending on your data shape and the types of queries you'll be making is probably the best solution. When you posted your question it was likely less of an option than today though.
Try Cassandra + Solr. This might be useful: http://digbigdata.com/geospatial-search-cassandra-datastax-enterprise/
Regards, Goutham Kumar
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With