I recently started using Scribe, Facebooks solution for transferring and collecting log data from many different servers.
What I could not find is how Facebook stores the huge amounts of log data it gets (according to a presentation it was 25TB per day in 2009).
Has Facebook released any information on how they do it? Hadoop HDFS? Cassandra?
They use Hive on top of Hadoop. Cassandra is used for their email/messaging, not logging. Some links:
https://developers.facebook.com/opensource/
http://highscalability.com/blog/2008/11/24/product-scribe-facebooks-scalable-logging-system.html
http://wiki.apache.org/hadoop/Hive
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With