Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference Between Apache Spark SQL and MongoDB? [closed]

I have only experience in RDBMS PostgresSQL only But Im new to Apache Spark and MongoDB.
So im having following confusions please me

1) What is difference between Apache Spark SQL and MongoDB?
2) What kind of places/scenarios/domains i need to use SparkSQL or MongoDB or combined manner?
3) Apache Spark is replace of like mondoDB,cassandra... ?
4) Im having multiple terabyte of data's in MongoDB from that I want do data analytics and then need to provide reports.

So please share me your knowledge and give me your inputs

Regards
Shankar S

like image 634
Shankar S Avatar asked Sep 23 '16 06:09

Shankar S


2 Answers

1) Apache Spark: Apache Spark for doing Parallel Computing Operations on Big Data in SQL queries.

MongoDB: MongoDB is a document Store and essentially is a database so cannot be compared with Spark which is a computing engine and not a store.

2) SparkSQL can be ideal for processing Structure Data imported in the Spark Cluster where you have millions of data available for big computing. Mongodb can be use where you need NoSQL functionalities(It has full NoSQL Capabilities, compare to SparkSQL).

3) No Apache Spark is use for different purpose , you can not replace it with mondoDB,cassandra.It is like computing engine to give you predict results on large data sets

4) Use third party service like SLAM DATA http://slamdata.com/ to apply mongodb analytics also use spark data-frame to read in the data from MongoDB

like image 78
Wasiq Muhammad Avatar answered Sep 25 '22 15:09

Wasiq Muhammad


These are two different and broad questions but i am making attempt to answer:-

1) What is difference between Apache Spark SQL and MongoDB?

Spark SQL is a library provided by Apache Spark for doing Parallel Computing Operations on Big Data in SQL queries. MongoDB is a document Store and essentially is a database so cannot be compared with Spark which is a computing engine and not a store.

2) What kind of places/scenarios/domains i need to use SparkSQL or MongoDB or combined manner?

SparkSQL can be ideal for processing Structure Data imported in the Spark Cluster. Mongodb can be ideal where you need NoSQL functionalities(It has full NoSQL Capabilities, compare to SparkSQL)

3) Apache Spark is replace of like mondoDB,cassandra... ?

Not Exactly since they are in different scope. Apache Spark is not replacing but can be called out as successor to Map-reduce for parallel computing on big datsets.

4) Im having multiple terabyte of data's in MongoDB from that I want do data analytics and then need to provide reports.

Use spark dataframe to read in the data from MongoDB using jdbc driver and then you can run some Spark SQL queries on the dataframe and then you can use other visualization tools like pyplot to generate reports.

Thanks,

Charles.

like image 31
charles gomes Avatar answered Sep 22 '22 15:09

charles gomes