I'm currently evaluating Loki and facing issues with running out of disk space due to the amount of chunks.
My instance is running in Docker containers using a docker-compose setup (Loki, Promtail, Grafana) from the official documentation (see docker-compose.yml below).
I'm more or less using the default configuration of Loki and Promtail. Except for some tweaks for the retention period (I need 3 months) plus a higher ingestion rate and ingestion burst size (see configs below).
I bind-mounted a volume containing 1TB of log files (MS Exchange logs) and set up a job in promtail using only one label.
The resulting chunks are constantly eating up disk space and I had to expand the VM disk incrementally up to 1TB.
Currently, I have 0.9 TB of chunks. Shouldn't this be far less? (Like 25% of initial log size?). Over the last weekend, I stopped the Promtail container to prevent running out of disk space. Today I started Promtail again and get the following warning.
level=warn ts=2022-01-24T08:54:57.763739304Z caller=client.go:349 component=client host=loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded (limit: 12582912 bytes/sec) while attempting to ingest '2774' lines totaling '1048373' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
I had this warning beforehand and increasing ingestion_rate_mb
to 12
and ingestion_burst_size_mb
to 24
fixed this...
Kind of at a dead-end here.
Docker Compose
version: "3"
networks:
loki:
services:
loki:
image: grafana/loki:2.4.1
container_name: loki
restart: always
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
volumes:
- ${DATADIR}/loki/etc:/etc/loki:rw
networks:
- loki
promtail:
image: grafana/promtail:2.4.1
container_name: promtail
restart: always
volumes:
- /var/log/exchange:/var/log
- ${DATADIR}/promtail/etc:/etc/promtail
ports:
- "1514:1514" # for syslog-ng
- "9080:9080" # for http web interface
command: -config.file=/etc/promtail/config.yml
networks:
- loki
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: always
volumes:
- grafana_var:/var/lib/grafana
ports:
- "3000:3000"
networks:
- loki
volumes:
grafana_var:
Loki Config:
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
# https://grafana.com/docs/loki/latest/configuration/#limits_config
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 12
ingestion_burst_size_mb: 24
per_stream_rate_limit: 12MB
chunk_store_config:
max_look_back_period: 336h
table_manager:
retention_deletes_enabled: true
retention_period: 2190h
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_encoding: snappy
Promtail Config
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: exchange
static_configs:
- targets:
- localhost
labels:
job: exchangelog
__path__: /var/log/*/*/*log
Issue was solved. Logs were stored on ZFS with compression enabled and were thus listed much smaller on the file system. Chunk size was actually accurate. My bad.
Grafana Loki creates a chunk file per each log stream per each 2 hours - see this article and this post at HackerNews. This means that the number of files is proportional to the number of log streams and to the data retention. The number of log streams is proportional to the number of unique sets of log fields (except message and timestamp fields). High number of chunks may point either to high number of log streams or to logs scattered over long retention. The solution is to either reduce the number of unique log streams (by removing high-cardinality labels with big number of unique values) or to reduce the data retention.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With