I have data frames which contain e.g.: <pre class="prettyprint"><code>"vendor a::ProductA" "vendor b::ProductA" "vendor a::Productb" </code></pre> I need to remove everything (and including) the two :: so that I end up with: <pre class="prettyprint"><code>"vendor a" "vendor b" "vendor a" </code></pre> I tried str.trim (which seems to not exist) and str.split without success. what would be the easiest way to accomplish this?

You can use <code>pandas.Series.str.split</code> just like you would use <code>split</code> normally. Just split on the string <code>'::'</code>, and index the list that's created from the <code>split</code> method: <pre class="prettyprint"><code>>>> df = pd.DataFrame({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]}) >>> df text 0 vendor a::ProductA 1 vendor b::ProductA 2 vendor a::Productb >>> df['text_new'] = df['text'].str.split('::').str[0] >>> df text text_new 0 vendor a::ProductA vendor a 1 vendor b::ProductA vendor b 2 vendor a::Productb vendor a </code></pre> Here's a non-pandas solution: <pre class="prettyprint"><code>>>> df['text_new1'] = [x.split('::')[0] for x in df['text']] >>> df text text_new text_new1 0 vendor a::ProductA vendor a vendor a 1 vendor b::ProductA vendor b vendor b 2 vendor a::Productb vendor a vendor a </code></pre> Edit: Here's the step-by-step explanation of what's happening in <code>pandas</code> above: <pre class="prettyprint"><code># Select the pandas.Series object you want >>> df['text'] 0 vendor a::ProductA 1 vendor b::ProductA 2 vendor a::Productb Name: text, dtype: object # using pandas.Series.str allows us to implement "normal" string methods # (like split) on a Series >>> df['text'].str <pandas.core.strings.StringMethods object at 0x110af4e48> # Now we can use the split method to split on our '::' string. You'll see that # a Series of lists is returned (just like what you'd see outside of pandas) >>> df['text'].str.split('::') 0 [vendor a, ProductA] 1 [vendor b, ProductA] 2 [vendor a, Productb] Name: text, dtype: object # using the pandas.Series.str method, again, we will be able to index through # the lists returned in the previous step >>> df['text'].str.split('::').str <pandas.core.strings.StringMethods object at 0x110b254a8> # now we can grab the first item in each list above for our desired output >>> df['text'].str.split('::').str[0] 0 vendor a 1 vendor b 2 vendor a Name: text, dtype: object </code></pre> I would suggest checking out the pandas.Series.str docs, or, better yet, Working with Text Data in pandas.

Python pandas: remove everything after a delimiter in a string

Tags:

python

python-3.x

pandas

I have data frames which contain e.g.:

"vendor a::ProductA" "vendor b::ProductA" "vendor a::Productb"

I need to remove everything (and including) the two :: so that I end up with:

"vendor a" "vendor b" "vendor a"

I tried str.trim (which seems to not exist) and str.split without success. what would be the easiest way to accomplish this?

624

asked Nov 20 '16 14:11

f0rd42

1 Answers

You can use pandas.Series.str.split just like you would use split normally. Just split on the string '::', and index the list that's created from the split method:

>>> df = pd.DataFrame({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]}) >>> df                  text 0  vendor a::ProductA 1  vendor b::ProductA 2  vendor a::Productb >>> df['text_new'] = df['text'].str.split('::').str[0] >>> df                  text  text_new 0  vendor a::ProductA  vendor a 1  vendor b::ProductA  vendor b 2  vendor a::Productb  vendor a

Here's a non-pandas solution:

>>> df['text_new1'] = [x.split('::')[0] for x in df['text']] >>> df                  text  text_new text_new1 0  vendor a::ProductA  vendor a  vendor a 1  vendor b::ProductA  vendor b  vendor b 2  vendor a::Productb  vendor a  vendor a

Edit: Here's the step-by-step explanation of what's happening in pandas above:

# Select the pandas.Series object you want >>> df['text'] 0    vendor a::ProductA 1    vendor b::ProductA 2    vendor a::Productb Name: text, dtype: object  # using pandas.Series.str allows us to implement "normal" string methods  # (like split) on a Series >>> df['text'].str <pandas.core.strings.StringMethods object at 0x110af4e48>  # Now we can use the split method to split on our '::' string. You'll see that # a Series of lists is returned (just like what you'd see outside of pandas) >>> df['text'].str.split('::') 0    [vendor a, ProductA] 1    [vendor b, ProductA] 2    [vendor a, Productb] Name: text, dtype: object  # using the pandas.Series.str method, again, we will be able to index through # the lists returned in the previous step >>> df['text'].str.split('::').str <pandas.core.strings.StringMethods object at 0x110b254a8>  # now we can grab the first item in each list above for our desired output >>> df['text'].str.split('::').str[0] 0    vendor a 1    vendor b 2    vendor a Name: text, dtype: object

I would suggest checking out the pandas.Series.str docs, or, better yet, Working with Text Data in pandas.

177

answered Sep 29 '22 09:09

blacksite

Related questions
                            
                                Place image over PDF
                            
                                Maximal Length of List to Shuffle with Python random.shuffle?
                            
                                How can I programmatically change the background in Mac OS X?
                            
                                UnicodeDecodeError : 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
                            
                                Why does 1 == True but 2 != True in Python? [duplicate]
                            
                                Python Matplotlib Boxplot Color
                            
                                Printing BFS (Binary Tree) in Level Order with Specific Formatting
                            
                                python: most elegant way to intersperse a list with an element
                            
                                How to "scale" a numpy array?
                            
                                List of python keywords [duplicate]
                            
                                Splitting large text file into smaller text files by line numbers using Python
                            
                                ImportError after successful pip installation [duplicate]
                            
                                ipython : get access to current figure()
                            
                                Remove the extra plot in the matplotlib subplot
                            
                                Flask CLI throws 'OSError: [Errno 8] Exec format error' when run through docker-compose
                            
                                how to efficiently get the k bigger elements of a list in python
                            
                                Use a string to call function in Python [duplicate]
                            
                                OpenCV: Invert a mask?
                            
                                How to use terminal color palette with curses
                            
                                Combine pandas DataFrame query() method with isin()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With