Distributed cache is a method that store common requests and enabling quick retrieval.
Tachyon is a memory-centric distributed storage file system that avoids going to disk to load datasets that are frequently read.
What is the different between these two?
The main difference is in programming paradigm, note that by your definition Tachyon is almost certainly a distributed cache.
Most distributed caches are typically some form of key value store, while higher level data structures can be built atop this the core paradigm tends to be key value.
Tachyon is designed to function as a software file system that is compatible with the HDFS interface prevalent in the big data analytics space. The point of doing this is that it can be used as a drop in accelerator rather than having to adapt each framework to use a distributed caching layer explicitly.
Note that both Apache Ignite and Apache Geode (Incubating) are related projects that offer both key-value and file system style APIs making them arguably more flexible.
Tachyon (known as Alluxio now) is located between the computation layer (Apache Spark, Apache Flink, Apache MapReduce) and the storage layer (HDFS, Amazon S3, OpenStack Swift, ...).
It is basically an in-memory file system used to abstract the user from the storage systems underneath (one or multiple).
For the computations frameworks or jobs above it, Tachyon is the data storage where the data to be computed is kept.
It can't carry out distributed computing advanced features and doesn't provide SQL queries support natively like some of the distributed caches do (Apache Ignite or Hazelcast).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With