We're trying to build a BI system that will collect very large amounts of data that should be processed by other components.
We decided that it will be a good idea to have an intermediate layer to collect, store & distribute the data.
The data is represented by a big set of log messages. Each log message has:
System specifics:
We were thinking that Kafka would do this job but we encountered several problems.
We tried to create a topic for each action type and a partition for each product. By doing this we could be able to extract 1 product / 1 action type to be consumed.
Initially we had a problem with "too many opened files", but after we changed the server config to support more files we're getting out-of-memory error (12GB allocated / node)
Also, we had problems with Kafka stability. At a big number of topics, kafka tends to freeze.
Our questions:
Disadvantages Of Apache KafkaDo not have complete set of monitoring tools: Apache Kafka does not contain a complete set of monitoring as well as managing tools. Thus, new startups or enterprises fear to work with Kafka. Message tweaking issues: The Kafka broker uses system calls to deliver messages to the consumer.
Fault tolerance in Kafka is done by copying the partition data to other brokers which are known as replicas. There is a configuration that specifies how many copies of the partition you need. Its called a replication factor. Each broker will hold one or more partitions.
I'm posting this answer so that other users can see the solution we adopted.
Due to Kafka limitations (the large no. of partitions which cause the OS to reach almost reach max open files) and somewhat weak performance we decided to build a custom framework for exactly our needs using libraries like apache commons , guava, trove etc to achieve the performance we needed.
The entire system (distributed and scalable) has 3 main parts:
ETL (reads the data , process it and writes it to binary files)
Framework Core (used to read from the binary files and calculate stats)
As a side note: we tried other solutions like HBase, Storm etc but none live up to our needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With