Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed alternatives to hadoop

I have a curious question.

What are some distributed and scalable alternatives to hadoop. Am looking for some distributed file systems like HDFS which can be used as a cheap and effective storage and would like a data processing engine(batch/real-time) on top of it. I know Spark can be a good alternative. But I would like to use this system as a file archive which is distributed,fault tolerant and scalable.Is there any apt solutions ? Suggestions are welcomed. Thanks :)

like image 809
Sachin Avatar asked Aug 17 '16 05:08

Sachin


People also ask

What is better than Hadoop?

What is Apache Spark? Apache Spark — which is also open source — is a data processing engine for big data sets. Like Hadoop, Spark splits up large tasks across different nodes. However, it tends to perform faster than Hadoop and it uses random access memory (RAM) to cache and process data instead of a file system.

Is Hadoop still relevant in 2022?

Apache Hadoop also known as Hadoop are still being used by many organizations as a robust data analytics solutions. Informatics say that even in 2022 all major cloud providers are actively supporting Apache Hadoop clusters in their respective platforms.

What has Kubernetes replaced in Hadoop?

There are alternatives to Hadoop for processing big data. One of these is Kubernetes. Initially, it was used primarily for stateless services. Now, Kubernetes is growing in popularity amongst data analytics teams and for stateful workloads.

Is Hadoop a distributed system?

Hadoop itself is an open source distributed processing framework that manages data processing and storage for big data applications.

What is after Hadoop?

Kubernetes already surpassed Hadoop Actually, it's pretty clear where we need to look next: Kubernetes. Kubernetes currently has higher adoption rate than Hadoop had at its peak. It's already being used for ML at scale by a lot of people, for instance at booking.com, and by my own team as well.

Is BigQuery same as Hadoop?

Google BigQuery is serverless, while Hadoop is not. If you use Hadoop, scaling the capacity of your systems is up to you. If you use BigQuery, you don't have to worry about it, because Google is responsible for scalability. This certainly means that BigQuery will be easier to manage for your in-house team.


2 Answers

These are some other alternatives to Hadoop and Apache Spark. Cluster Map Reduce, Hydra and Conclusion, they are all relatively good for big data projects. Read more here https://datafloq.com/read/Big-Data-Hadoop-Alternatives/1135

like image 151
Frank Odoom Avatar answered Sep 28 '22 09:09

Frank Odoom


If you still looking into alternatives, this Gigaom article may help: https://gigaom.com/2012/07/11/because-hadoop-isnt-perfect-8-ways-to-replace-hdfs/ By default Spark flushed to HDFS.

Since HDFS is open source alternative to GFS(Google FS), You can use a connector to GFS(Google FS is available via Google Cloud Platform Storage services) ... there is a catch: it is expensive on massive data transfers between nodes/clusters. Hadoop was not designed for realtime data, but less dynamic data. I hope this helps somehow.

  • MapR claims 20% faster than regular HDFS(but underlying FS is HDFS) https://mapr.com/why-mapr/
  • NetApp has an alternative to HDFS as well http://www.netapp.com/us/solutions/applications/big-data-analytics/index.aspx

All above links are the Gigaom article I shared. I hope this helps somehow.

like image 45
P.M Avatar answered Sep 28 '22 09:09

P.M