Pandas to_csv progress bar with tqdm

Tags:

As the title suggests, I am trying to display a progress bar while performing pandas.to_csv.
I have the following script:

def filter_pileup(pileup, output, lists):
    tqdm.pandas(desc='Reading, filtering, exporting', bar_format=BAR_DEFAULT_VIEW)
    # Reading files
    pileup_df = pd.read_csv(pileup, '\t', header=None).progress_apply(lambda x: x)
    lists_df = pd.read_csv(lists, '\t', header=None).progress_apply(lambda x: x)
    # Filtering pileup
    intersection = pd.merge(pileup_df, lists_df, on=[0, 1]).progress_apply(lambda x: x)
    intersection.columns = [i for i in range(len(intersection.columns))]
    intersection = intersection.loc[:, 0:5]
    # Exporting filtered pileup
    intersection.to_csv(output, header=None, index=None, sep='\t')

On the first few lines I have found a way to integrate a progress bar but this method doesn't work for the last line, How can I achieve that?

507

asked Nov 05 '20 10:11

Eliran Turgeman

1 Answers

You can divide the dataframe into chunks of n rows and save the dataframe to a csv chunk by chunk using mode='w' for the first row and mode="a" for the rest:

Example:

import numpy as np
import pandas as pd
from tqdm import tqdm

df = pd.DataFrame(data=[i for i in range(0, 10000000)], columns = ["integer"])

print(df.head(10))

chunks = np.array_split(df.index, 100) # chunks of 100 rows

for chunck, subset in enumerate(tqdm(chunks)):
    if chunck == 0: # first row
        df.loc[subset].to_csv('data.csv', mode='w', index=True)
    else:
        df.loc[subset].to_csv('data.csv', header=None, mode='a', index=True)

Output:

   integer
0        0
1        1
2        2
3        3
4        4
5        5
6        6
7        7
8        8
9        9

100%|██████████| 100/100 [00:12<00:00,  8.12it/s]

116

answered Oct 17 '22 15:10

Chicodelarose

Related questions
                            
                                How to use the past with HuggingFace Transformers GPT-2?
                            
                                Flask-Mail queue messages to be sent to different emails
                            
                                Moving Celery chain to a dead letter queue automatically when a task within fails
                            
                                Debugging Jinja2 templates in VSCode
                            
                                How to implement role based access control in Flask?
                            
                                How to deal with different state space size in reinforcement learning?
                            
                                how to calculate the minimum unfairness sum of a list
                            
                                Find paired records after groupby Python
                            
                                Test with FastAPI TestClient returns 422 status code
                            
                                How to speed up numpy.all and numpy.nonzero()?
                            
                                Can't open PDF file with PyPDF2
                            
                                Is there a way to get the original link from which a file was download to Python?
                            
                                How to ignore comments inside string literals
                            
                                Use django PasswordResetView functionality in my own view
                            
                                Advice on vectorizing block-wise operations in Numpy
                            
                                Sending large dictionary via API call breaks development server
                            
                                How to reindex with MultiIndex?
                            
                                mask 0 values during normalization
                            
                                any workaround to do forward forecasting for estimating time series in python?
                            
                                AWS - Step functions, use execution input within a TuningStep

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas to_csv progress bar with tqdm

Tags:

python

pandas

tqdm

Eliran Turgeman

People also ask

1 Answers

Chicodelarose

Recent Activity

Donate For Us