pandas much slower than numpy?

Tags:

The code below suggests that pandas may be much slower than numpy, at least in the specifi case of the function clip(). What is surprising is that making a roundtrip from pandas to numpy and back to pandas, while performing the calculations in numpy, is still much faster than doing it in pandas.

Shouldn't the pandas function have been implemented in this roundabout way?

In [49]: arr = np.random.randn(1000, 1000)

In [50]: df=pd.DataFrame(arr)

In [51]: %timeit np.clip(arr, 0, None)
100 loops, best of 3: 8.18 ms per loop

In [52]: %timeit df.clip_lower(0)
1 loops, best of 3: 344 ms per loop

In [53]: %timeit pd.DataFrame(np.clip(df.values, 0, None))
100 loops, best of 3: 8.4 ms per loop

397

asked Nov 07 '13 10:11

Soldalma

1 Answers

In master/0.13 (release very shortly), this is much faster (still slightly slower that native numpy because of handling of alignment/dtype/nans).

In 0.12 it was applying per column, so this was a relatively expensive operation.

In [4]: arr = np.random.randn(1000, 1000)

In [5]: df=pd.DataFrame(arr)

In [6]: %timeit np.clip(arr, 0, None)
100 loops, best of 3: 6.62 ms per loop

In [7]: %timeit df.clip_lower(0)
100 loops, best of 3: 12.9 ms per loop

154

answered Oct 10 '22 10:10

Jeff

Related questions
                            
                                Python: find out whether a list of integers is coherent
                            
                                Python, How to extend Decimal class to add helpful methods
                            
                                Harvesting the power of highly-parallel computers with python scientific code [closed]
                            
                                Tkinter importing without *?
                            
                                COM: excelApplication.Application.Quit() preserves the process
                            
                                Is there any tool to translate Lisp code into Python? [closed]
                            
                                filling numpy array with random element from another array
                            
                                CSVs in Python with newline in quotes [duplicate]
                            
                                Making the `nosetests` script select folder by Python version
                            
                                Regex re.sub list in a file
                            
                                Python: why not (a, b, c) = (*x, 3)
                            
                                Django db error: could not identify an equality operator for type json when trying to annotate a model with jsonfield
                            
                                Using Mutagen to process all accepted file types
                            
                                Sphinx autodoc functions within module
                            
                                Comparing Pandas Dataframe Rows & Dropping rows with overlapping dates
                            
                                'float' object can't be interpreted as int, but converting to int yields no output
                            
                                How to read records terminated by custom separator from file in python?
                            
                                Pandas DataFrame column concatenation
                            
                                Setting up pythonpath in OS X
                            
                                grequests pool with multiple request.session?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas much slower than numpy?

Tags:

python

pandas

numpy

Soldalma

People also ask

1 Answers

Jeff

Recent Activity

Donate For Us