While reading large relations from a SQL database to a pandas dataframe, it would be nice to have a progress bar, because the number of tuples is known statically and the I/O rate could be estimated. It looks like the tqdm
module has a function tqdm_pandas
which will report progress on mapping functions over columns, but by default calling it does not have the effect of reporting progress on I/O like this. Is it possible to use tqdm
to make a progress bar on a call to pd.read_sql
?
tqdm is a library in Python which is used for creating Progress Meters or Progress Bars. tqdm got its name from the Arabic name taqaddum which means 'progress'.
4. Working with a while loop and unknown increments. Instead of using tqdm as a wrapper, we can create it outside the loop and update it inside the loop on each iteration. This makes tqdm more flexible for loops with unknown length or unknown increments.
Usage. Using tqdm is very simple, you just need to add your code between tqdm() after importing the library in your code. You need to make sure that the code you put in between the tqdm() function must be iterable or it would not work at all.
tqdm 1is a Python library for adding progress bar. It lets you configure and display a progress bar with metrics you want to track. Its ease of use and versatility makes it the perfect choice for tracking machine learning experiments.
Edit: Answer may be misleading - chunksize
has no effect on database side of the operation. See comments below.
You could use the chunksize
parameter to do something like this:
chunks = pd.read_sql('SELECT * FROM table', con=conn, chunksize=100)
df = pd.DataFrame()
for chunk in tqdm(chunks):
df = pd.concat([df, chunk])
I think this would use less memory as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With