I have a column in a pandas DataFrame that I would like to split on a single space. The splitting is simple enough with DataFrame.str.split(' ')
, but I can't make a new column from the last entry. When I .str.split()
the column I get a list of arrays and I don't know how to manipulate this to get a new column for my DataFrame.
Here is an example. Each entry in the column contains 'symbol data price' and I would like to split off the price (and eventually remove the "p"... or "c" in half the cases).
import pandas as pd temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']}) temp2 = temp.ticker.str.split(' ')
which yields
0 ['spx', '5/25/2001', 'p500'] 1 ['spx', '5/25/2001', 'p600'] 2 ['spx', '5/25/2001', 'p700']
But temp2[0]
just gives one list entry's array and temp2[:][-1]
fails. How can I convert the last entry in each array to a new column? Thanks!
The str. split() function is used to split strings around given separator/delimiter. The function splits the string in the Series/Index from the beginning, at the specified delimiter string.
Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
Do this:
In [43]: temp2.str[-1] Out[43]: 0 p500 1 p600 2 p700 Name: ticker
So all together it would be:
>>> temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']}) >>> temp['ticker'].str.split(' ').str[-1] 0 p500 1 p600 2 p700 Name: ticker, dtype: object
You could use the tolist
method as an intermediary:
In [99]: import pandas as pd In [100]: d1 = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']}) In [101]: d1.ticker.str.split().tolist() Out[101]: [['spx', '5/25/2001', 'p500'], ['spx', '5/25/2001', 'p600'], ['spx', '5/25/2001', 'p700']]
From which you could make a new DataFrame:
In [102]: d2 = pd.DataFrame(d1.ticker.str.split().tolist(), .....: columns="symbol date price".split()) In [103]: d2 Out[103]: symbol date price 0 spx 5/25/2001 p500 1 spx 5/25/2001 p600 2 spx 5/25/2001 p700
For good measure, you could fix the price:
In [104]: d2["price"] = d2["price"].str.replace("p","").astype(float) In [105]: d2 Out[105]: symbol date price 0 spx 5/25/2001 500 1 spx 5/25/2001 600 2 spx 5/25/2001 700
PS: but if you really just want the last column, apply
would suffice:
In [113]: temp2.apply(lambda x: x[2]) Out[113]: 0 p500 1 p600 2 p700 Name: ticker
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With