Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is difference between distributed cache and Tachyon?

Distributed cache is a method that store common requests and enabling quick retrieval.

Tachyon is a memory-centric distributed storage file system that avoids going to disk to load datasets that are frequently read.

What is the different between these two?

like image 532
Venu A Positive Avatar asked Dec 19 '22 00:12

Venu A Positive


2 Answers

The main difference is in programming paradigm, note that by your definition Tachyon is almost certainly a distributed cache.

Most distributed caches are typically some form of key value store, while higher level data structures can be built atop this the core paradigm tends to be key value.

Tachyon is designed to function as a software file system that is compatible with the HDFS interface prevalent in the big data analytics space. The point of doing this is that it can be used as a drop in accelerator rather than having to adapt each framework to use a distributed caching layer explicitly.

Note that both Apache Ignite and Apache Geode (Incubating) are related projects that offer both key-value and file system style APIs making them arguably more flexible.

like image 180
RobV Avatar answered Jan 18 '23 04:01

RobV


Tachyon (known as Alluxio now) is located between the computation layer (Apache Spark, Apache Flink, Apache MapReduce) and the storage layer (HDFS, Amazon S3, OpenStack Swift, ...).

It is basically an in-memory file system used to abstract the user from the storage systems underneath (one or multiple).

For the computations frameworks or jobs above it, Tachyon is the data storage where the data to be computed is kept.

It can't carry out distributed computing advanced features and doesn't provide SQL queries support natively like some of the distributed caches do (Apache Ignite or Hazelcast).

like image 22
opuertas Avatar answered Jan 18 '23 04:01

opuertas