Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find length of longest string in Pandas dataframe column

Tags:

python

pandas

Is there a faster way to find the length of the longest string in a Pandas DataFrame than what's shown in the example below?

import numpy as np import pandas as pd  x = ['ab', 'bcd', 'dfe', 'efghik'] x = np.repeat(x, 1e7) df = pd.DataFrame(x, columns=['col1'])  print df.col1.map(lambda x: len(x)).max() # result --> 6 

It takes about 10 seconds to run df.col1.map(lambda x: len(x)).max() when timing it with IPython's %timeit.

like image 475
ebressert Avatar asked Jan 22 '14 22:01

ebressert


People also ask

How do I find the length of a string in a DataFrame column?

To find the length of strings in a data frame you have the len method on the dataframes str property. But to do this you need to call this method on the column that contains the string data.

How do you find the length of a long string in Python?

Use Python's built-in max() function with a key argument to find the longest string in a list. Call max(lst, key=len) to return the longest string in lst using the built-in len() function to associate the weight of each string—the longest string will be the maximum.


1 Answers

DSM's suggestion seems to be about the best you're going to get without doing some manual microoptimization:

%timeit -n 100 df.col1.str.len().max() 100 loops, best of 3: 11.7 ms per loop  %timeit -n 100 df.col1.map(lambda x: len(x)).max() 100 loops, best of 3: 16.4 ms per loop  %timeit -n 100 df.col1.map(len).max() 100 loops, best of 3: 10.1 ms per loop 

Note that explicitly using the str.len() method doesn't seem to be much of an improvement. If you're not familiar with IPython, which is where that very convenient %timeit syntax comes from, I'd definitely suggest giving it a shot for quick testing of things like this.

Update Added screenshot:

enter image description here

like image 190
Marius Avatar answered Sep 21 '22 10:09

Marius