Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can Kafka limitations be avoided? [closed]

We're trying to build a BI system that will collect very large amounts of data that should be processed by other components.
We decided that it will be a good idea to have an intermediate layer to collect, store & distribute the data.

The data is represented by a big set of log messages. Each log message has:

  • a product
  • an action type
  • a date
  • message payload

System specifics:

  • average: 1.5 million messages / minute
  • peak: 15 million messages / minute
  • the average message size is: 700 bytes (aprox 1.3TB / day)
  • we have 200 products
  • we have 1100 action types
  • the data should be ingested every 5 minutes
  • the consumer applications usually need 1-2-3 product with 1-2-3 action types (we need fast access for 1 product / 1 action type)

We were thinking that Kafka would do this job but we encountered several problems.
We tried to create a topic for each action type and a partition for each product. By doing this we could be able to extract 1 product / 1 action type to be consumed.

Initially we had a problem with "too many opened files", but after we changed the server config to support more files we're getting out-of-memory error (12GB allocated / node)
Also, we had problems with Kafka stability. At a big number of topics, kafka tends to freeze.

Our questions:

  • Is Kafka suitable for our use-case scenario? Can it support such a big number of topics / partitions?
  • Can we organize the data in Kafka in another way to avoid this problems but still to be able to have a good access speed for 1 product / 1 action type?
  • Do you recommend other Kafka alternatives that are better suitable for this?
like image 992
Stephan Avatar asked Jul 21 '14 11:07

Stephan


People also ask

What are the limitations of Kafka?

Disadvantages Of Apache KafkaDo not have complete set of monitoring tools: Apache Kafka does not contain a complete set of monitoring as well as managing tools. Thus, new startups or enterprises fear to work with Kafka. Message tweaking issues: The Kafka broker uses system calls to deliver messages to the consumer.

How do you make Kafka fault tolerant?

Fault tolerance in Kafka is done by copying the partition data to other brokers which are known as replicas. There is a configuration that specifies how many copies of the partition you need. Its called a replication factor. Each broker will hold one or more partitions.


1 Answers

I'm posting this answer so that other users can see the solution we adopted.

Due to Kafka limitations (the large no. of partitions which cause the OS to reach almost reach max open files) and somewhat weak performance we decided to build a custom framework for exactly our needs using libraries like apache commons , guava, trove etc to achieve the performance we needed.

The entire system (distributed and scalable) has 3 main parts:

  1. ETL (reads the data , process it and writes it to binary files)

  2. Framework Core (used to read from the binary files and calculate stats)

  3. API (used by many system to get data for display)

As a side note: we tried other solutions like HBase, Storm etc but none live up to our needs.

like image 59
Stephan Avatar answered Oct 06 '22 18:10

Stephan