How can Kafka limitations be avoided? [closed]

Tags:

We're trying to build a BI system that will collect very large amounts of data that should be processed by other components.
We decided that it will be a good idea to have an intermediate layer to collect, store & distribute the data.

The data is represented by a big set of log messages. Each log message has:

a product
an action type
a date
message payload

System specifics:

average: 1.5 million messages / minute
peak: 15 million messages / minute
the average message size is: 700 bytes (aprox 1.3TB / day)
we have 200 products
we have 1100 action types
the data should be ingested every 5 minutes
the consumer applications usually need 1-2-3 product with 1-2-3 action types (we need fast access for 1 product / 1 action type)

We were thinking that Kafka would do this job but we encountered several problems.
We tried to create a topic for each action type and a partition for each product. By doing this we could be able to extract 1 product / 1 action type to be consumed.

Initially we had a problem with "too many opened files", but after we changed the server config to support more files we're getting out-of-memory error (12GB allocated / node)
Also, we had problems with Kafka stability. At a big number of topics, kafka tends to freeze.

Our questions:

Is Kafka suitable for our use-case scenario? Can it support such a big number of topics / partitions?
Can we organize the data in Kafka in another way to avoid this problems but still to be able to have a good access speed for 1 product / 1 action type?
Do you recommend other Kafka alternatives that are better suitable for this?

992

asked Jul 21 '14 11:07

Stephan

1 Answers

I'm posting this answer so that other users can see the solution we adopted.

Due to Kafka limitations (the large no. of partitions which cause the OS to reach almost reach max open files) and somewhat weak performance we decided to build a custom framework for exactly our needs using libraries like apache commons , guava, trove etc to achieve the performance we needed.

The entire system (distributed and scalable) has 3 main parts:

ETL (reads the data , process it and writes it to binary files)
Framework Core (used to read from the binary files and calculate stats)
API (used by many system to get data for display)

As a side note: we tried other solutions like HBase, Storm etc but none live up to our needs.

answered Oct 06 '22 18:10

Stephan

Related questions
                            
                                android.os.TransactionTooLargeException retrieving installed applications
                            
                                findLoadedClass() returns null
                            
                                Google App Engine Data Store Unit Tests With Maven
                            
                                Eclipse: Overriding JNDI Resource in Tomcat
                            
                                Confusion around Spring Security anonymous access using Java Config
                            
                                Automate the webstart process
                            
                                How to set a window minimum size according to its content in JavaFX?
                            
                                Cannot release Mat object in Java
                            
                                Getting "EVP_DecryptFinal_ex:wrong final block length" during decryption
                            
                                Java timezone change at runtime
                            
                                Simple and flexible method to run java as a windows service
                            
                                How do final fields NOT leak memory?
                            
                                UnsatisfiedLinkError with sqlite4java, no sqlite4java-osx-amd64
                            
                                Organize imports of java files using eclipse from command line
                            
                                SSL Error when trying to download a Java JDK [closed]
                            
                                Load spring beans from custom groovy files in grails app
                            
                                How do I update the SSL cert in my android apps?
                            
                                Import phonegap's R.java to my plugin?
                            
                                Android: Google play games services connection error ( java.lang.IllegalStateException: GoogleApiClient must be connected.)
                            
                                OutOfOrderScannerNextException when filtering results in HBase

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can Kafka limitations be avoided? [closed]

Tags:

java

apache-kafka

business-intelligence

bigdata

Stephan

People also ask

1 Answers

Stephan

Recent Activity

Donate For Us