Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove data from RRDTool

Tags:

rrdtool

rrd

I have several graphs created by RRDTool that collected bad data during a time period of a couple hours.

How can I remove the data from the RRD's during that time period so that it no longer displays?

like image 737
mscccc Avatar asked Apr 24 '12 13:04

mscccc


3 Answers

Best method I found to do this...

  1. Use RRDTool Dump to export RRD files to XML.
  2. Open the XML file, find and edit the bad data.
  3. Restore the RRD file using RRDTool Restore .
like image 91
mscccc Avatar answered Sep 29 '22 00:09

mscccc


I had a similar problem where I wanted to discard the most recent few hours from my RRDtool databases, so I wrote a quick script to do it (apologies for the unconventional variable names - coding style inherited from work, sigh):

#!/usr/bin/env python2                                                                                                                                                                                 
"""                                                                                                                                                                                                    
Modify XML data generated by `rrdtool dump` such that the last update was at                                                                                                                           
the unixtime specified (decimal). Data newer than this is simply omitted.                                                                                                                              

Sample usage::                                                                                                                                                                                         

    rrdtool dump foo.rrd \
       | python remove_samples_newer_than.py 1414782122 \
       | rrdtool restore - foo_trimmed.rrd                                                                                          
"""                                                                                                                                                                                                    

import sys                                                                                                                                                                                             

assert sys.argv[1:], "Must specify maximum Unix timestamp in decimal"                                                                                                                                  

iMaxUpdate = int(sys.argv[1])

for rLine in iter(sys.stdin.readline, ''):                                                                                                                                                             
    if "<lastupdate>" in rLine:                                                                                                                                                                        
        # <lastupdate>1414782122</lastupdate> <!-- 2014-10-31 19:02:02 GMT -->                                                                                                                         
        _, _, rData = rLine.partition("<lastupdate>")                                                                                                                                                  
        rData, _, _ = rData.partition("</lastupdate")                                                                                                                                                  
        iLastUpdate = int(rData)                                                                                                                                                                       
        assert iLastUpdate < iMaxUpdate, "Last update in RRD older than " \                                                                                                                            
                                    "the time you provided, nothing to do"                                                                                                                             
        print "<lastupdate>{0}</lastupdate>".format(iMaxUpdate)                                                                                                                                        
    elif "<row>" in rLine:                                                                                                                                                                             
        # <!-- 2014-10-17 20:04:00 BST / 1413572640 --> <row><v>9.8244774011e+01</v><v>8.5748587571e-01</v><v>4.2046610169e+00</v><v>9.3016101695e+01</v><v>5.0000000000e-02</v><v>1.6652542373e-01</  v><v>1.1757062147e+00</v><v>1.6901226735e+10</v><v>4.2023108608e+09</v><v>2.1457537707e+08</v><v>3.9597816832e+09</v><v>6.8812800000e+05</v><v>3.0433198080e+09</v><v>6.0198912250e+06</v><v>2.        0000000000e+00</v><v>0.0000000000e+00</v></row>                                                                                                                                                        
        rData, _, _ = rLine.partition("<row>")                                                                                                                                                         
        _, _, rData = rData.partition("/")                                                                                                                                                             
        rData, _, _ = rData.partition("--")                                                                                                                                                            
        rData = rData.strip()                                                                                                                                                                          
        iUpdate = int(rData)                                                                                                                                                                           
        if iUpdate < iMaxUpdate:                                                                                                                                                                       
            print rLine,                                                                                                                                                                               
    else:                                                                                                                                                                                              
        print rLine,                                                                                                                                                                                   

Worked for me. Hope it helps someone else.

like image 22
RobM Avatar answered Sep 28 '22 22:09

RobM


If you want to avoid writing and editing of xml file as this may takes few file IO calls(based on how much bad data you have) , you can also read entire rrd into memory using fetch and update values in-memory.

I did similar task using python + rrdtool and i ended up doing :

  1. read rrd in-memory in a dictionary
  2. fix values in the dictionary
  3. delete existing rrd file
  4. create new rrd with same name.
like image 44
Sumit Purohit Avatar answered Sep 29 '22 00:09

Sumit Purohit