Hadoop on cassandra database

Tags:

I am using Cassandra to store my data and hive to process my data. I have 5 machines on which i have set up cassandra and 2 machines I use as analytics node(where hive runs) So I want to ask is does hive do map reduce on just two machines(analytics nodes) and brings data there or it moves the process/computation to 5 cassandra nodes as well and process/compute the data on those machines.(What I know is in hadoop, process moves to data not data to process).

633

asked Feb 12 '13 07:02

Aashish Katta

1 Answers

If you interested to marry Hadoop and Cassandra - the first link should DataStax company which is built around this concept. http://www.datastax.com/ They built and support hadoop with HDFS replaced with cassandra. In best of my understanding - they do have data locality:http://blog.octo.com/en/introduction-to-datastax-brisk-an-hadoop-and-cassandra-distribution/

There is good answer about Hadoop & Cassandra data locality if you run MapReduce against cassandra Cassandra and MapReduce - minimal setup requirements

Regarding your question - there is a tradeof: a) If you run Hadoop / Hive on separate nodes you loose data locality and thereof your data throughput is limited by your network bandwidth.
b) If you run hadoop / Hive on the same nodes as cassandra runs - you can get data locality but MapReduce processing behind hive queries might clogg your network (and other resources) and thereof affect your quality of service from cassandra.

My suggestion will be to have separate hive nodes if performance of your cassandra cluster are critical.
If your cassandra is mostly used as a data store and do not handle real-time requests - then running hive on each node will improve performance and hardware utilization.

answered Sep 23 '22 21:09

David Gruzman

Related questions
                            
                                How can I load Avros in Spark using the schema on-board the Avro file(s)?
                            
                                How to specify column list in hive insert into query
                            
                                How to convert a Hadoop Path object into a Java File object
                            
                                file path in hdfs
                            
                                HDFS access from remote host through Java API, user authentication
                            
                                How to use sqoop to export the default hive delimited output?
                            
                                Wrong result for count(*) in hive table
                            
                                In Spark is counting the records in an RDD expensive task?
                            
                                Setting permissions for cloudera hadoop
                            
                                Hadoop - get results from output files after reduce?
                            
                                Hive describe partitions to show partition url
                            
                                Hadoop error on Windows : java.lang.UnsatisfiedLinkError
                            
                                Hadoop DFS permission issue when running job
                            
                                What is Hue all about?
                            
                                How to mount HDFS on Ubuntu 14.04
                            
                                exporting Hive table to csv in hdfs
                            
                                Read ORC files directly from Spark shell
                            
                                Spark submit to yarn as a another user
                            
                                Should hadoop clusters run on identical hardware?
                            
                                hadoop vs teradata what is the difference

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hadoop on cassandra database

Tags:

cassandra

hadoop

hive

Aashish Katta

People also ask

1 Answers

David Gruzman

Recent Activity

Donate For Us