in a pandas dataframe how can I apply a sort of excel left('state',2) to only take the first two letters. Ideally I want to learn how to use left,right and mid in a dataframe too. So need an equivalent and not a "trick" for this specific example.
data = {'state': ['Auckland', 'Otago', 'Wellington', 'Dunedin', 'Hamilton'], 'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]} df = pd.DataFrame(data) print df pop state year 0 1.5 Auckland 2000 1 1.7 Otago 2001 2 3.6 Wellington 2002 3 2.4 Dunedin 2001 4 2.9 Hamilton 2002
I want to get this:
pop state year StateInitial 0 1.5 Auckland 2000 Au 1 1.7 Otago 2001 Ot 2 3.6 Wellington 2002 We 3 2.4 Dunedin 2001 Du 4 2.9 Hamilton 2002 Ha
Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.
Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.
Elements in the right-hand side column are the Series object's actual values, and the elements in the left-hand side column are the index labels associated with each value.
First two letters for each value in a column:
>>> df['StateInitial'] = df['state'].str[:2] >>> df pop state year StateInitial 0 1.5 Auckland 2000 Au 1 1.7 Otago 2001 Ot 2 3.6 Wellington 2002 We 3 2.4 Dunedin 2001 Du 4 2.9 Hamilton 2002 Ha
For last two that would be df['state'].str[-2:]
. Don't know what exactly you want for middle, but you can apply arbitrary function to a column with apply
method:
>>> df['state'].apply(lambda x: x[len(x)/2-1:len(x)/2+1]) 0 kl 1 ta 2 in 3 ne 4 il
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With