Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrive more than 10k lines from InfluxDB using Pandas?

I am trying to use InfluxDB's Python client's to retrieve data stored on InfluxDB, but can't more than 10k lines. The examples I am (unsuccessfully) following are here. In summary:

import influxdb
dfclient = influxdb.DataFrameClient('localhost', 8086, 'root', 'root', 'mydb')
q = "select * from some_measurement"
df = dfclient.query(q, chunked=True)  # Returns only 10k points

The issue seems to be related to InfluxDB's internal limitations documented here (namely, the max-row-limit configuration option). I am going through the sources to try to find out how to get a DataFrame larger than 10k lines, but any help in solving this issue would be highly appreciated.

like image 368
Gustavo Bezerra Avatar asked Mar 05 '17 02:03

Gustavo Bezerra


1 Answers

The problem is caused by the DataFrameClient's query simply ignoring the chunked argument [code].

The workaround I found out is not use the standard InfluxDBClient instead. The code shown in the question becomes:

import influxdb
client = influxdb.InfluxDBClient('localhost', 8086, 'root', 'root', 'btc')
q = "select * from some_measurement"
df = pd.DataFrame(client.query(q, chunked=True, chunk_size=10000).get_points())  # Returns all points

It is also worth highlighting that from v1.2.2 the max-row-limit setting (i.e. the default value for chunk_size in the above code) has been change from 10k to unlimited.

like image 87
Gustavo Bezerra Avatar answered Oct 13 '22 20:10

Gustavo Bezerra