Find length of longest string in Pandas dataframe column

Tags:

pandas

Is there a faster way to find the length of the longest string in a Pandas DataFrame than what's shown in the example below?

import numpy as np import pandas as pd  x = ['ab', 'bcd', 'dfe', 'efghik'] x = np.repeat(x, 1e7) df = pd.DataFrame(x, columns=['col1'])  print df.col1.map(lambda x: len(x)).max() # result --> 6

It takes about 10 seconds to run df.col1.map(lambda x: len(x)).max() when timing it with IPython's %timeit.

475

asked Jan 22 '14 22:01

1 Answers

DSM's suggestion seems to be about the best you're going to get without doing some manual microoptimization:

%timeit -n 100 df.col1.str.len().max() 100 loops, best of 3: 11.7 ms per loop  %timeit -n 100 df.col1.map(lambda x: len(x)).max() 100 loops, best of 3: 16.4 ms per loop  %timeit -n 100 df.col1.map(len).max() 100 loops, best of 3: 10.1 ms per loop

Note that explicitly using the str.len() method doesn't seem to be much of an improvement. If you're not familiar with IPython, which is where that very convenient %timeit syntax comes from, I'd definitely suggest giving it a shot for quick testing of things like this.

Update Added screenshot:

enter image description here

190

answered Sep 21 '22 10:09

Marius

Related questions
                            
                                Python - Module Not Found
                            
                                Type hints when unpacking a tuple?
                            
                                Is it possible to modify lines in a file in-place?
                            
                                Locate first and last non NaN values in a Pandas DataFrame
                            
                                Importing correctly with pytest
                            
                                Custom Filter in Django Admin on Django 1.3 or below
                            
                                Python pickle protocol choice?
                            
                                Django setUpTestData() vs. setUp()
                            
                                How to print Docstring of python function from inside the function itself?
                            
                                Why is a trailing comma a SyntaxError in an argument list that uses *args syntax?
                            
                                What does Python's socket.recv() return for non-blocking sockets if no data is received until a timeout occurs?
                            
                                Why doesn't Pylint like built-in functions?
                            
                                Generating movie from python without saving individual frames to files
                            
                                How to print all variables values when debugging Python with pdb, without specifying each variable?
                            
                                Difference between Class and Instance methods
                            
                                Does virtualenv serve a purpose (in production) when using docker?
                            
                                What is the Python egg cache (PYTHON_EGG_CACHE)?
                            
                                importing izip from itertools module gives NameError in Python 3.x
                            
                                Which maximum does Python pick in the case of a tie?
                            
                                Is everything greater than None?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find length of longest string in Pandas dataframe column

Tags:

python

pandas

ebressert

People also ask

1 Answers

Marius

Recent Activity

Donate For Us