from https://pypi.org/project/tqdm/: <pre class="prettyprint"><code>import pandas as pd import numpy as np from tqdm import tqdm df = pd.DataFrame(np.random.randint(0, 100, (100000, 6))) tqdm.pandas(desc="my bar!")p` df.progress_apply(lambda x: x**2) </code></pre> I took this code and edited it so that I create a DataFrame from load_excel rather than using random numbers: <pre class="prettyprint"><code>import pandas as pd from tqdm import tqdm import numpy as np filename="huge_file.xlsx" df = pd.DataFrame(pd.read_excel(filename)) tqdm.pandas() df.progress_apply(lambda x: x**2) </code></pre> This gave me an error, so I changed df.progress_apply to this: <pre class="prettyprint"><code>df.progress_apply(lambda x: x) </code></pre> Here is the final code: <pre class="prettyprint"><code>import pandas as pd from tqdm import tqdm import numpy as np filename="huge_file.xlsx" df = pd.DataFrame(pd.read_excel(filename)) tqdm.pandas() df.progress_apply(lambda x: x) </code></pre> This results in a progress bar, but it doesn't actually show any progress, rather it loads the bar, and when the operation is done it jumps to 100%, defeating the purpose. My question is this: How do I make this progress bar work? What does the function inside of progress_apply actually do? Is there a better approach? Maybe an alternative to tqdm? Any help is greatly appreciated.

Will not work. <code>pd.read_excel</code> blocks until the file is read, and there is no way to get information from this function about its progress during execution. It would work for read operations which you can do chunk wise, like <pre class="prettyprint"><code>chunks = [] for chunk in pd.read_csv(..., chunksize=1000): update_progressbar() chunks.append(chunk) </code></pre> But as far as I understand <code>tqdm</code> also needs the number of chunks in advance, so for a propper progress report you would need to read the full file first....

How do I make a progress bar for loading pandas DataFrame from a large xlsx file?

Tags:

from https://pypi.org/project/tqdm/:

import pandas as pd
import numpy as np
from tqdm import tqdm

df = pd.DataFrame(np.random.randint(0, 100, (100000, 6)))
tqdm.pandas(desc="my bar!")p`
df.progress_apply(lambda x: x**2)

I took this code and edited it so that I create a DataFrame from load_excel rather than using random numbers:

import pandas as pd
from tqdm import tqdm
import numpy as np

filename="huge_file.xlsx"
df = pd.DataFrame(pd.read_excel(filename))
tqdm.pandas()
df.progress_apply(lambda x: x**2)

This gave me an error, so I changed df.progress_apply to this:

df.progress_apply(lambda x: x)

Here is the final code:

import pandas as pd
from tqdm import tqdm
import numpy as np

filename="huge_file.xlsx"
df = pd.DataFrame(pd.read_excel(filename))
tqdm.pandas()
df.progress_apply(lambda x: x)

This results in a progress bar, but it doesn't actually show any progress, rather it loads the bar, and when the operation is done it jumps to 100%, defeating the purpose.

My question is this: How do I make this progress bar work?
What does the function inside of progress_apply actually do?
Is there a better approach? Maybe an alternative to tqdm?

Any help is greatly appreciated.

421

asked Sep 06 '18 17:09

user2303336

2 Answers

Will not work. pd.read_excel blocks until the file is read, and there is no way to get information from this function about its progress during execution.

It would work for read operations which you can do chunk wise, like

chunks = []
for chunk in pd.read_csv(..., chunksize=1000):
    update_progressbar()
    chunks.append(chunk)

But as far as I understand tqdm also needs the number of chunks in advance, so for a propper progress report you would need to read the full file first....

answered Apr 27 '23 22:04

rocksportrocker

This might help for people with similar problem. here you can get help

for example:

for i in tqdm(range(0,3), ncols = 100, desc ="Loading data.."): 
    df=pd.read_excel("some_file.xlsx",header=None)
    LC_data=pd.read_excel("some_file.xlsx",'Sheet1', header=None)
    FC_data=pd.read_excel("some_file.xlsx",'Shee2', header=None)    
print("------Loading is completed ------")

answered Apr 27 '23 22:04

sardor mirzaev

Related questions
                            
                                How do I resolve npm audit returning ENOAUDIT: Your configured registry does not support audit requests?
                            
                                Standard-layout and tail padding
                            
                                How to detect when browser throttles timers and websockets disconnection after a user leaves a tab or turns off the screen? (javascript)
                            
                                What do I need to escape when sending a query?
                            
                                HTML Editor in a Windows Forms Application [closed]
                            
                                How to get the source file name and the line number of a type member?
                            
                                MSBuild - can it work out project dependencies in a solution file? If so how?
                            
                                What's the purpose of claims-based authorization?
                            
                                NUnit "missing" GPSVC.DLL on Windows 7/64
                            
                                Retrieve Facebook Fan Names
                            
                                Why does .NET decimal.ToString(string) round away from zero, apparently inconsistent with the language spec?
                            
                                In MVVM with WPF how do I unit test the link between the ViewModel and the View

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With