flume vs kafka vs others [closed]

Tags:

May be this question has been asked before but I think it is good to consider it again today given that these technologies have matured. We're looking to use one of flume, kafka, scribe, or others to store streaming facebook and twitter profile information into hbase for doing analytics later on. We're considering flume for the purpose but I have not worked with other technologies in order to make an informed decision. Anyone who can shed some light will be great! Thanks a lot.

210

asked Sep 24 '12 05:09

pranavsharma

1 Answers

Mediawiki (Wikipedia) went through this and published a nice article of how they arrived at their choice (Kafka) vs Scribe, Flume and others.

http://www.mediawiki.org/wiki/Analytics/Kraken/Request_Logging

new link:
https://wikitech.wikimedia.org/wiki/Analytics/Archive/Hadoop_Logging_-_Solutions_Recommendation

summary for posterity:

"Our recommendation is Apache Kafka, a distributed pub-sub messaging system designed for throughput. We evaluated about a dozen[1] best-of-breed systems drawn from the domains of distributed log collection, CEP / stream processing, and real-time messaging systems. While these systems offer surprisingly similar features, they differ substantially in implementation, and each is specialized to a particular work profile (a more thorough technical discussion is available as an appendix).

"Kafka stands out because it is specialized for throughput and explicitly distributed in all tiers of its architecture. Interestingly, it is also concerned enough with resource conservation[2] to offer sensible tradeoffs that loosen guarantees in exchange for performance — something that may not strike Facebook or Google as an important feature in the systems they design. Constraints breed creativity.

"In addition, Kafka has several perks of particular interest to Operations readers. While it is written in Scala, it ships with a native C++ producer library that can be embedded in a module for our cache servers, obviating the need to run the JVM on those servers. Second, producers can be configured to batch requests to optimize network traffic, but do not create a persistent local log which would require additional maintenance. Kafka's I/O and memory usage is left up to the OS rather than the JVM[3].

"Kafka was written by LinkedIn and is now an Apache project. In production at LinkedIn, approximately 10,000 producers are handled by eight Kafka servers per datacenter. These clusters consolidate their streams into a single analytics datacenter, which Kafka supports out of the box via a simple mirroring configuration.

"These features are a very apt fit for our intended use cases; even those we don't intend to use — such as sharding and routing by "topic" categories — are interesting and might prove useful in the future as we expand our goals.

"The rest of this document dives into these topics in greater detail..."

107

answered Oct 19 '22 14:10

Anentropic

Related questions
                            
                                Retrieving Linkedin Group discussion posts using ColdFusion
                            
                                Using Scribe library for Oauth in Twitter with callback url
                            
                                Twitter does not remember authorization
                            
                                Scribe strange uncontrolled exception with Twitter
                            
                                Unable to access user's profile from Google Plus
                            
                                Using OAuth with Scribe on Android
                            
                                Extending Interfaces in Go
                            
                                Java SSL DH Keypair Generation - Prime Size Error
                            
                                java "ClassNotFoundException" error
                            
                                HTTP PUT to upload a file in Java
                            
                                What are the main differences between Socialauth, Scribe-Java, and Spring Social?
                            
                                How to create a oAuth request using java?
                            
                                Upload a video file by chunks
                            
                                Magento Rest "Admin role not found" error
                            
                                How to set up Java VM to use the root certificates (truststore) handled by Mac OS X

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

flume vs kafka vs others [closed]

Tags:

scribe

flume

pranavsharma

People also ask

1 Answers

Anentropic

Recent Activity

Donate For Us