Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Prometheus consume so much memory?

I'm using Prometheus 2.9.2 for monitoring a large environment of nodes. As part of testing the maximum scale of Prometheus in our environment, I simulated a large amount of metrics on our test environment.

My management server has 16GB ram and 100GB disk space.

During the scale testing, I've noticed that the Prometheus process consumes more and more memory until the process crashes.

I've noticed that the WAL directory is getting filled fast with a lot of data files while the memory usage of Prometheus rises.

The management server scrapes its nodes every 15 seconds and the storage parameters are all set to default.

I would like to know why this happens, and how/if it is possible to prevent the process from crashing.

Thank you!

like image 844
Thomason Avatar asked May 13 '19 15:05

Thomason


3 Answers

The out of memory crash is usually a result of a excessively heavy query. This may be set in one of your rules. (this rule may even be running on a grafana page instead of prometheus itself)

If you have a very large number of metrics it is possible the rule is querying all of them. A quick fix is by exactly specifying which metrics to query on with specific labels instead of regex one.

like image 124
Sutirtha Das Avatar answered Nov 11 '22 05:11

Sutirtha Das


Because the combination of labels lies on your business, the combination and the blocks may be unlimited, there's no way to solve the memory problem for the current design of prometheus!!!! But i suggest you compact small blocks into big ones, that will reduce the quantity of blocks.

Huge memory consumption for TWO reasons:

  1. prometheus tsdb has a memory block which is named: "head", because head stores all the series in latest hours, it will eat a lot of memory.
  2. each block on disk also eats memory, because each block on disk has a index reader in memory, dismayingly, all labels, postings and symbols of a block are cached in index reader struct, the more blocks on disk, the more memory will be cupied.

in index/index.go, you will see:

type Reader struct {
    b ByteSlice

    // Close that releases the underlying resources of the byte slice.
    c io.Closer

    // Cached hashmaps of section offsets.
    labels map[string]uint64
    // LabelName to LabelValue to offset map.
    postings map[string]map[string]uint64
    // Cache of read symbols. Strings that are returned when reading from the
    // block are always backed by true strings held in here rather than
    // strings that are backed by byte slices from the mmap'd index file. This
    // prevents memory faults when applications work with read symbols after
    // the block has been unmapped. The older format has sparse indexes so a map
    // must be used, but the new format is not so we can use a slice.
    symbolsV1        map[uint32]string
    symbolsV2        []string
    symbolsTableSize uint64

    dec *Decoder

    version int
}
like image 33
chausat Avatar answered Nov 11 '22 05:11

chausat


This article explains why Prometheus may use big amounts of memory during data ingestion. If you need reducing memory usage for Prometheus, then the following actions can help:

  • Increasing scrape_interval in Prometheus configs.
  • Reducing the number of scrape targets and/or scraped metrics per target.

P.S. Take a look also at the project I work on - VictoriaMetrics. It can use lower amounts of memory compared to Prometheus. See this benchmark for details.

like image 38
valyala Avatar answered Nov 11 '22 05:11

valyala