Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply slicing on pandas Series of strings

Tags:

python

pandas

I'm playing with pandas and trying to apply string slicing on a Series of strings object. Instead of getting the strings sliced, the series gets sliced:

In [22]: s = p.Series(data=['abcdef']*20)
In [23]: s.apply(lambda x:x[:2])
Out[24]:
0    abcdef
1    abcdef

On the other hand:

In [25]: s.apply(lambda x:x+'qwerty')
Out[25]:
0     abcdefqwerty
1     abcdefqwerty
2     abcdefqwerty
...

I got it to work by using the map function instead, but I think I'm missing something about how it's supposed to work.

Would very much appreciate a clarification.

like image 513
davidbrai Avatar asked Jan 12 '12 20:01

davidbrai


3 Answers

Wes McKinney's answer is a bit out of date, but he made good on his wish--pandas now has efficient string processing methods, including slicing:

In [2]: s = Series(data=['abcdef']*20)

In [3]: s.str[:2]
Out[3]:
0     ab
1     ab
2     ab
...
like image 149
Tal Yarkoni Avatar answered Nov 17 '22 15:11

Tal Yarkoni


You're on the right track:

In [3]: s = Series(data=['abcdef']*20)

In [4]: s
Out[4]: 
0     abcdef
1     abcdef
2     abcdef
3     abcdef
4     abcdef
5     abcdef
6     abcdef
7     abcdef
8     abcdef
9     abcdef
10    abcdef
11    abcdef
12    abcdef
13    abcdef
14    abcdef
15    abcdef
16    abcdef
17    abcdef
18    abcdef
19    abcdef

In [5]: s.map(lambda x: x[:2])
Out[5]: 
0     ab
1     ab
2     ab
3     ab
4     ab
5     ab
6     ab
7     ab
8     ab
9     ab
10    ab
11    ab
12    ab
13    ab
14    ab
15    ab
16    ab
17    ab
18    ab
19    ab

I would really like to add a bunch of vectorized, NA-friendly string processing tools in pandas (See here). Always appreciate any development help also.

like image 7
Wes McKinney Avatar answered Nov 17 '22 14:11

Wes McKinney


apply first tries to apply the function to the whole series. Only if that fails it maps the given function to each element. [:2] is a valid function on a series, + 'qwerty' apparently isn't, that's why you do get the implicit mapping on the latter. If you always want to do the mapping you can use s.map.

apply's source code for reference:

    try:
        result = func(self)
        if not isinstance(result, Series):
            result = Series(result, index=self.index, name=self.name)
        return result
    except Exception:
        mapped = lib.map_infer(self.values, func)
        return Series(mapped, index=self.index, name=self.name)
like image 4
Rob Wouters Avatar answered Nov 17 '22 15:11

Rob Wouters