Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get last "column" after .str.split() operation on column in pandas DataFrame

I have a column in a pandas DataFrame that I would like to split on a single space. The splitting is simple enough with DataFrame.str.split(' '), but I can't make a new column from the last entry. When I .str.split() the column I get a list of arrays and I don't know how to manipulate this to get a new column for my DataFrame.

Here is an example. Each entry in the column contains 'symbol data price' and I would like to split off the price (and eventually remove the "p"... or "c" in half the cases).

import pandas as pd temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']}) temp2 = temp.ticker.str.split(' ') 

which yields

0    ['spx', '5/25/2001', 'p500'] 1    ['spx', '5/25/2001', 'p600'] 2    ['spx', '5/25/2001', 'p700'] 

But temp2[0] just gives one list entry's array and temp2[:][-1] fails. How can I convert the last entry in each array to a new column? Thanks!

like image 550
Richard Herron Avatar asked Sep 20 '12 01:09

Richard Herron


People also ask

What does STR split do in pandas?

The str. split() function is used to split strings around given separator/delimiter. The function splits the string in the Series/Index from the beginning, at the specified delimiter string.

How do I split a column in a Dataframe pandas?

Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.


2 Answers

Do this:

In [43]: temp2.str[-1] Out[43]:  0    p500 1    p600 2    p700 Name: ticker 

So all together it would be:

>>> temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']}) >>> temp['ticker'].str.split(' ').str[-1] 0    p500 1    p600 2    p700 Name: ticker, dtype: object 
like image 84
Wes McKinney Avatar answered Sep 20 '22 23:09

Wes McKinney


You could use the tolist method as an intermediary:

In [99]: import pandas as pd  In [100]: d1 = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})  In [101]: d1.ticker.str.split().tolist() Out[101]:  [['spx', '5/25/2001', 'p500'],  ['spx', '5/25/2001', 'p600'],  ['spx', '5/25/2001', 'p700']] 

From which you could make a new DataFrame:

In [102]: d2 = pd.DataFrame(d1.ticker.str.split().tolist(),     .....:                   columns="symbol date price".split())  In [103]: d2 Out[103]:    symbol       date price 0    spx  5/25/2001  p500 1    spx  5/25/2001  p600 2    spx  5/25/2001  p700 

For good measure, you could fix the price:

In [104]: d2["price"] = d2["price"].str.replace("p","").astype(float)  In [105]: d2 Out[105]:    symbol       date  price 0    spx  5/25/2001    500 1    spx  5/25/2001    600 2    spx  5/25/2001    700 

PS: but if you really just want the last column, apply would suffice:

In [113]: temp2.apply(lambda x: x[2]) Out[113]:  0    p500 1    p600 2    p700 Name: ticker 
like image 22
DSM Avatar answered Sep 19 '22 23:09

DSM