Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to do a left,right and mid of a string in a pandas dataframe

Tags:

python

pandas

in a pandas dataframe how can I apply a sort of excel left('state',2) to only take the first two letters. Ideally I want to learn how to use left,right and mid in a dataframe too. So need an equivalent and not a "trick" for this specific example.

data = {'state': ['Auckland', 'Otago', 'Wellington', 'Dunedin', 'Hamilton'], 'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]} df = pd.DataFrame(data)  print df       pop       state  year  0  1.5    Auckland  2000  1  1.7       Otago  2001  2  3.6  Wellington  2002  3  2.4     Dunedin  2001  4  2.9    Hamilton  2002 

I want to get this:

    pop       state     year  StateInitial  0  1.5       Auckland    2000     Au  1  1.7       Otago       2001     Ot  2  3.6       Wellington  2002     We  3  2.4       Dunedin     2001     Du  4  2.9       Hamilton    2002     Ha 
like image 482
IcemanBerlin Avatar asked Jan 07 '14 11:01

IcemanBerlin


People also ask

How do I use substring in pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.

Can we do slicing in DataFrame?

Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.

What is on the left hand side of pandas series?

Elements in the right-hand side column are the Series object's actual values, and the elements in the left-hand side column are the index labels associated with each value.


1 Answers

First two letters for each value in a column:

>>> df['StateInitial'] = df['state'].str[:2] >>> df    pop       state  year StateInitial 0  1.5    Auckland  2000           Au 1  1.7       Otago  2001           Ot 2  3.6  Wellington  2002           We 3  2.4     Dunedin  2001           Du 4  2.9    Hamilton  2002           Ha 

For last two that would be df['state'].str[-2:]. Don't know what exactly you want for middle, but you can apply arbitrary function to a column with apply method:

>>> df['state'].apply(lambda x: x[len(x)/2-1:len(x)/2+1]) 0    kl 1    ta 2    in 3    ne 4    il 
like image 127
alko Avatar answered Oct 10 '22 02:10

alko