Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas: remove everything after a delimiter in a string

I have data frames which contain e.g.:

"vendor a::ProductA" "vendor b::ProductA" "vendor a::Productb" 

I need to remove everything (and including) the two :: so that I end up with:

"vendor a" "vendor b" "vendor a" 

I tried str.trim (which seems to not exist) and str.split without success. what would be the easiest way to accomplish this?

like image 624
f0rd42 Avatar asked Nov 20 '16 14:11

f0rd42


People also ask

How do you remove unwanted parts from pandas strings?

Using map() method In our specific example, we can use map() to apply a lambda function that removes +/- from the beginning of the string and any ascii character from the end of the string.

How do you strip a string in pandas?

lstrip() is used to remove spaces from the left side of string, str. rstrip() to remove spaces from right side of the string and str. strip() removes spaces from both sides.


1 Answers

You can use pandas.Series.str.split just like you would use split normally. Just split on the string '::', and index the list that's created from the split method:

>>> df = pd.DataFrame({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]}) >>> df                  text 0  vendor a::ProductA 1  vendor b::ProductA 2  vendor a::Productb >>> df['text_new'] = df['text'].str.split('::').str[0] >>> df                  text  text_new 0  vendor a::ProductA  vendor a 1  vendor b::ProductA  vendor b 2  vendor a::Productb  vendor a 

Here's a non-pandas solution:

>>> df['text_new1'] = [x.split('::')[0] for x in df['text']] >>> df                  text  text_new text_new1 0  vendor a::ProductA  vendor a  vendor a 1  vendor b::ProductA  vendor b  vendor b 2  vendor a::Productb  vendor a  vendor a 

Edit: Here's the step-by-step explanation of what's happening in pandas above:

# Select the pandas.Series object you want >>> df['text'] 0    vendor a::ProductA 1    vendor b::ProductA 2    vendor a::Productb Name: text, dtype: object  # using pandas.Series.str allows us to implement "normal" string methods  # (like split) on a Series >>> df['text'].str <pandas.core.strings.StringMethods object at 0x110af4e48>  # Now we can use the split method to split on our '::' string. You'll see that # a Series of lists is returned (just like what you'd see outside of pandas) >>> df['text'].str.split('::') 0    [vendor a, ProductA] 1    [vendor b, ProductA] 2    [vendor a, Productb] Name: text, dtype: object  # using the pandas.Series.str method, again, we will be able to index through # the lists returned in the previous step >>> df['text'].str.split('::').str <pandas.core.strings.StringMethods object at 0x110b254a8>  # now we can grab the first item in each list above for our desired output >>> df['text'].str.split('::').str[0] 0    vendor a 1    vendor b 2    vendor a Name: text, dtype: object 

I would suggest checking out the pandas.Series.str docs, or, better yet, Working with Text Data in pandas.

like image 177
blacksite Avatar answered Sep 29 '22 09:09

blacksite