I have the following questions regarding topics and partitions
1) What is the difference between n-topics with m-partitions and nm topics ? Would there be a difference when accessing m-partitions through m threads and nm topics using n*m different processes
2)A perfect use case differentiating high level and low level consumer
3)In case of a failure (i.e) message not delivered where can i find the error logs in Kafka.
The high level consumer can manage things like offset commits and rebalancing across consumer instances in a consumer group automatically. Using the simple consumer you have to manage partition subscription, broker leader changes and offset commits yourself.
If the consumer crashes or is shut down, its partitions will be re-assigned to another member, which will begin consumption from the last committed offset of each partition. If the consumer crashes before any offset has been committed, then the consumer which takes over its partitions will use the reset policy.
More consumers in a group than partitions means idle consumers. The main way we scale data consumption from a Kafka topic is by adding more consumers to a consumer group. It is common for Kafka consumers to do high-latency operations such as write to a database or a time-consuming computation on the data.
While Kafka allows only one consumer per topic partition, there may be multiple consumer groups reading from the same partition.
1) What is the difference between n-topics with m-partitions and nm topics ?
There has to be at least one partition for every topic. Topic is just a named group of partitions and partitions are really streams of data. The code that uses Kafka producer normally is not concerned with partitions, it just sends a message to a topic. By default producer uses round robin approach to select a partiton to store a message but you can create a custom one if needed and select a partition based on message's content.
If there is only one partition, only one broker processes messages for the topic and appends them to a file. On the other hand, if there are as many partitions as brokers, message processing is parallelized and there is up to m times (minus overhead) speedup. That assumes that each broker is running on its own box and kafka data storage is not shared among brokers.
If there are more partitions for a topic than brokers, Kafka tries to distribute them evenly among all of brokers.
The same goes to reading from Kafka. If there is only one partition, the kafka consumer speed is limited by max read speed of a single disk. If there are multiple partitions, messages from all partitions (on different brokers) are retrieved in parallel.
1a) Would there be a difference when accessing m-partitions through m threads and nm topics using n*m different processes
You're mixing partitions and topics here, see my answer above.
2)A perfect use case differentiating high level and low level consumer
High level consumer : I just want to use Kafka as extermely fast persistent FIFO buffer and not worry much about details.
Low level consumer : I want to have a custom partition data consuming logic, e.g. start reading data from newly created topics without a need of consumer reconnection to brokers.
3)In case of a failure (i.e) message not delivered where can i find the error logs in Kafka.
Kafka uses log4j for logging. It depends on its configuration where the log is stored (in case of producer and consumer). Kafka broker logs are normally stored in /var/log/kafka/.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With