"Reduce" function for Series

Tags:

Is there an analog for reduce for a pandas Series?

For example, the analog for map is pd.Series.apply, but I can't find any analog for reduce.

My application is, I have a pandas Series of lists:

>>> business["categories"].head()  0                      ['Doctors', 'Health & Medical'] 1                                        ['Nightlife'] 2                 ['Active Life', 'Mini Golf', 'Golf'] 3    ['Shopping', 'Home Services', 'Internet Servic... 4    ['Bars', 'American (New)', 'Nightlife', 'Loung... Name: categories, dtype: object

I'd like to merge the Series of lists together using reduce, like so:

categories = reduce(lambda l1, l2: l1 + l2, categories)

but this takes a horrific time because merging two lists together is O(n) time in Python. I'm hoping that pd.Series has a vectorized way to perform this faster.

722

asked Jan 26 '16 00:01

hlin117

1 Answers

With `itertools.chain()` on the values

This could be faster:

from itertools import chain categories = list(chain.from_iterable(categories.values))

Performance

from functools import reduce from itertools import chain  categories = pd.Series([['a', 'b'], ['c', 'd', 'e']] * 1000)  %timeit list(chain.from_iterable(categories.values)) 1000 loops, best of 3: 231 µs per loop  %timeit list(chain(*categories.values.flat)) 1000 loops, best of 3: 237 µs per loop  %timeit reduce(lambda l1, l2: l1 + l2, categories) 100 loops, best of 3: 15.8 ms per loop

For this data set the chaining is about 68x faster.

Vectorization?

Vectorization works when you have native NumPy data types (pandas uses NumPy for its data after all). Since we have lists in the Series already and want a list as result, it is rather unlikely that vectorization will speed things up. The conversion between standard Python objects and pandas/NumPy data types will likely eat up all the performance you might get from the vectorization. I made one attempt to vectorize the algorithm in another answer.

answered Sep 21 '22 04:09

Mike Müller

Related questions
                            
                                Python cProfile results: two numbers for ncalls
                            
                                Pandas: IndexingError: Unalignable boolean Series provided as indexer
                            
                                Nvidia Cudatoolkit vs Conda Cudatoolkit
                            
                                Using AD as authentication for Django
                            
                                How to read line by line in pdf file using PyPdf?
                            
                                Importing everything ( * ) dynamically from a module
                            
                                Make a 2D pixel plot with matplotlib
                            
                                What are dict_keys, dict_items and dict_values?
                            
                                Sorting a pandas series
                            
                                Built-in binary search tree in Python? [closed]
                            
                                python : can reduce be translated into list comprehensions like map, lambda and filter?
                            
                                Make Flask's url_for use the 'https' scheme in an AWS load balancer without messing with SSLify
                            
                                Keras: Binary_crossentropy has negative values
                            
                                How to install python in a docker image?
                            
                                The workers in ThreadPoolExecutor is not really daemon
                            
                                django model Form. Include fields from related models
                            
                                Testing Python Decorators?
                            
                                How to make an internal hyperlink in Sphinx documentation [duplicate]
                            
                                are user defined classes mutable
                            
                                Beginner Python: AttributeError: 'list' object has no attribute

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"Reduce" function for Series

Tags:

performance

python

pandas

vectorization

reduce

hlin117

People also ask

1 Answers

With `itertools.chain()` on the values

Performance

Vectorization?

Mike Müller

Recent Activity

Donate For Us

"Reduce" function for Series

Tags:

performance

python

pandas

vectorization

reduce

hlin117

People also ask

1 Answers

With itertools.chain() on the values

Performance

Vectorization?

Mike Müller

Related questions

Recent Activity

Donate For Us

With `itertools.chain()` on the values