I have data frames which contain e.g.:
"vendor a::ProductA" "vendor b::ProductA" "vendor a::Productb"
I need to remove everything (and including) the two :: so that I end up with:
"vendor a" "vendor b" "vendor a"
I tried str.trim (which seems to not exist) and str.split without success. what would be the easiest way to accomplish this?
Using map() method In our specific example, we can use map() to apply a lambda function that removes +/- from the beginning of the string and any ascii character from the end of the string.
lstrip() is used to remove spaces from the left side of string, str. rstrip() to remove spaces from right side of the string and str. strip() removes spaces from both sides.
You can use pandas.Series.str.split
just like you would use split
normally. Just split on the string '::'
, and index the list that's created from the split
method:
>>> df = pd.DataFrame({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]}) >>> df text 0 vendor a::ProductA 1 vendor b::ProductA 2 vendor a::Productb >>> df['text_new'] = df['text'].str.split('::').str[0] >>> df text text_new 0 vendor a::ProductA vendor a 1 vendor b::ProductA vendor b 2 vendor a::Productb vendor a
Here's a non-pandas solution:
>>> df['text_new1'] = [x.split('::')[0] for x in df['text']] >>> df text text_new text_new1 0 vendor a::ProductA vendor a vendor a 1 vendor b::ProductA vendor b vendor b 2 vendor a::Productb vendor a vendor a
Edit: Here's the step-by-step explanation of what's happening in pandas
above:
# Select the pandas.Series object you want >>> df['text'] 0 vendor a::ProductA 1 vendor b::ProductA 2 vendor a::Productb Name: text, dtype: object # using pandas.Series.str allows us to implement "normal" string methods # (like split) on a Series >>> df['text'].str <pandas.core.strings.StringMethods object at 0x110af4e48> # Now we can use the split method to split on our '::' string. You'll see that # a Series of lists is returned (just like what you'd see outside of pandas) >>> df['text'].str.split('::') 0 [vendor a, ProductA] 1 [vendor b, ProductA] 2 [vendor a, Productb] Name: text, dtype: object # using the pandas.Series.str method, again, we will be able to index through # the lists returned in the previous step >>> df['text'].str.split('::').str <pandas.core.strings.StringMethods object at 0x110b254a8> # now we can grab the first item in each list above for our desired output >>> df['text'].str.split('::').str[0] 0 vendor a 1 vendor b 2 vendor a Name: text, dtype: object
I would suggest checking out the pandas.Series.str docs, or, better yet, Working with Text Data in pandas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With