Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas make new column from string slice of another column

Tags:

python

pandas

People also ask

How do I split a string into another column in pandas?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do you create a new column in DataFrame based on another column?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.

How do I slice columns in pandas?

To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.


You can call the str method and apply a slice, this will be much quicker than the other method as this is vectorised (thanks @unutbu):

df['New_Sample'] = df.Sample.str[:1]

You can also call a lambda function on the df but this will be slower on larger dataframes:

In [187]:

df['New_Sample'] = df.Sample.apply(lambda x: x[:1])
df
Out[187]:
  Sample  Value New_Sample
0    AAB     23          A
1    BAB     25          B

You can also use slice() to slice string of Series as following:

df['New_sample'] = df['Sample'].str.slice(0,1)

From pandas documentation:

Series.str.slice(start=None, stop=None, step=None)

Slice substrings from each element in the Series/Index

For slicing index (if index is of type string), you can try:

df.index = df.index.str.slice(0,1)

Adding solution to a common variation when the slice width varies across DataFrame Rows:

#--Here i am extracting the ID part from the Email (i.e. the part before @)

#--First finding the position of @ in Email
d['pos'] = d['Email'].str.find('@')

#--Using position to slice Email using a lambda function
d['new_var'] = d.apply(lambda x: x['Email'][0:x['pos']],axis=1)

#--Imagine x['Email'] as a string on which, slicing is applied

Hope this Helps !