Looking for some insight on how to configure Secor to output fatter files that are partitioned by datetime rather than kafka offset. Something akin to hourly backups of kafka topic streams. Currently, my common.properties file contains these secor configs:
secor.generation=1
secor.consumer.threads=7
secor.messages.per.second=10000
secor.offsets.per.partition=10000000
secor.topic_partition.forget.seconds=600
secor.local.log.delete.age.hours=-1
secor.file.reader.writer.factory=com.pinterest.secor.io.impl.SequenceFileReaderWriterFactory
secor.max.message.size.bytes=100000
This file mentions that a partition could describe the date of a message:
LogFilePath.java:
(line 29) Log file path has the following form: prefix/topic/partition1/.../partitionN/generation_kafkaParition_firstMessageOffset
(line 34) "partition1, ..., partitionN is the list of partition names extracted from message content. * E.g., the partition may describe the message date such as dt=2014-01-01 [...]"
Secor's Readme File:
JSON date parser: parser that extracts timestamps from JSON messages and groups the output based on the date, similar to the Thrift parser above. To use this parser, start Secor with properties file secor.prod.partition.properties and set secor.message.parser.class=com.pinterest.secor.parser.JsonMessageParser. You may override the field used to extract the timestamp by setting the message.timestamp.name property.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With