To add a prefix/suffix to a dataframe, I usually do the following.. For instance, to add a suffix <code>'@'</code>, <pre class="prettyprint"><code>df = df.astype(str) + '@' </code></pre> This has basically appended a <code>'@'</code> to all cell values. I would like to know how to remove this suffix. Is there a method available with the pandas.DataFrame class directly that removes a particular prefix/suffix character from the entire DataFrame ? I've tried iterating through the rows (as series) while using <code>rstrip('@')</code> as follows: <pre class="prettyprint"><code>for index in range(df.shape[0]): row = df.iloc[index] row = row.str.rstrip('@') </code></pre> Now, in order to make dataframe out of this series, <pre class="prettyprint"><code>new_df = pd.DataFrame(columns=list(df)) new_df = new_df.append(row) </code></pre> However, this doesn't work. Gives empty dataframe. Is there something really basic that I am missing?

You could use applymap to apply your string method to each element: <pre class="prettyprint"><code>df = df.applymap(lambda x: str(x).rstrip('@')) </code></pre> Note: I wouldn't expect this to be as fast as the vectorized approach: <code>pd.Series.str.rstrip</code> i.e. transforming each column separately

You can use <code>apply</code> and the <code>str.strip</code> method of pd.Series: <pre class="prettyprint"><code>In [13]: df Out[13]: a b c 0 dog quick the 1 lazy lazy fox 2 brown quick dog 3 quick the over 4 brown over lazy 5 fox brown quick 6 quick fox the 7 dog jumped the 8 lazy brown the 9 dog lazy the In [14]: df = df + "@" In [15]: df Out[15]: a b c 0 dog@ quick@ the@ 1 lazy@ lazy@ fox@ 2 brown@ quick@ dog@ 3 quick@ the@ over@ 4 brown@ over@ lazy@ 5 fox@ brown@ quick@ 6 quick@ fox@ the@ 7 dog@ jumped@ the@ 8 lazy@ brown@ the@ 9 dog@ lazy@ the@ In [16]: df = df.apply(lambda S:S.str.strip('@')) In [17]: df Out[17]: a b c 0 dog quick the 1 lazy lazy fox 2 brown quick dog 3 quick the over 4 brown over lazy 5 fox brown quick 6 quick fox the 7 dog jumped the 8 lazy brown the 9 dog lazy the </code></pre> Note, your approach doesn't work because when you do the following assignment in your for-loop: <pre class="prettyprint"><code>row = row.str.rstrip('@') </code></pre> This merely assigns the result of <code>row.str.strip</code> to the name <code>row</code> without mutating the <code>DataFrame</code>. This is the same behavior for all python objects and simple name assignment: <pre class="prettyprint"><code>In [18]: rows = [[1,2,3],[4,5,6],[7,8,9]] In [19]: print(rows) [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In [20]: for row in rows: ...: row = ['look','at','me'] ...: In [21]: print(rows) [[1, 2, 3], [4, 5, 6], [7, 8, 9]] </code></pre> To actually change the underlying data structure you need to use a mutator method: <pre class="prettyprint"><code>In [22]: rows Out[22]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In [23]: for row in rows: ...: row.append("LOOKATME") ...: In [24]: rows Out[24]: [[1, 2, 3, 'LOOKATME'], [4, 5, 6, 'LOOKATME'], [7, 8, 9, 'LOOKATME']] </code></pre> Note that slice-assignment is just syntactic sugar for a mutator method: <pre class="prettyprint"><code>In [26]: rows Out[26]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In [27]: for row in rows: ...: row[:] = ['look','at','me'] ...: ...: In [28]: rows Out[28]: [['look', 'at', 'me'], ['look', 'at', 'me'], ['look', 'at', 'me']] </code></pre> This is analogous to <code>pandas</code> <code>loc</code> or <code>iloc</code> based assignment.

You could make this real easy and just use pandas.DataFrame.replace() method to replace all "@" with a "": <pre class="prettyprint"><code>df.replace("@", "") </code></pre> If you are worried about "@" being replaced not just at the end of your values, you could use regex: <pre class="prettyprint"><code>df.replace("@$", "", regex=True) </code></pre>

pandas dataframe : add & remove prefix/suffix from all cell values of entire dataframe

Tags:

python

string

pandas

dataframe

suffix

To add a prefix/suffix to a dataframe, I usually do the following..

For instance, to add a suffix '@',

df = df.astype(str) + '@'

This has basically appended a '@' to all cell values.

I would like to know how to remove this suffix. Is there a method available with the pandas.DataFrame class directly that removes a particular prefix/suffix character from the entire DataFrame ?

I've tried iterating through the rows (as series) while using rstrip('@') as follows:

for index in range(df.shape[0]):
    row = df.iloc[index]
    row = row.str.rstrip('@')

Now, in order to make dataframe out of this series,

new_df = pd.DataFrame(columns=list(df))
new_df = new_df.append(row)

However, this doesn't work. Gives empty dataframe.

Is there something really basic that I am missing?

871

asked Dec 13 '16 00:12

murphy1310

3 Answers

You could use applymap to apply your string method to each element:

df = df.applymap(lambda x: str(x).rstrip('@'))

Note: I wouldn't expect this to be as fast as the vectorized approach: pd.Series.str.rstrip i.e. transforming each column separately

150

answered Sep 28 '22 07:09

AlexG

You can use apply and the str.strip method of pd.Series:

In [13]: df
Out[13]:
       a       b      c
0    dog   quick    the
1   lazy    lazy    fox
2  brown   quick    dog
3  quick     the   over
4  brown    over   lazy
5    fox   brown  quick
6  quick     fox    the
7    dog  jumped    the
8   lazy   brown    the
9    dog    lazy    the

In [14]: df = df + "@"

In [15]: df
Out[15]:
        a        b       c
0    dog@   quick@    the@
1   lazy@    lazy@    fox@
2  brown@   quick@    dog@
3  quick@     the@   over@
4  brown@    over@   lazy@
5    fox@   brown@  quick@
6  quick@     fox@    the@
7    dog@  jumped@    the@
8   lazy@   brown@    the@
9    dog@    lazy@    the@

In [16]: df = df.apply(lambda S:S.str.strip('@'))

In [17]: df
Out[17]:
       a       b      c
0    dog   quick    the
1   lazy    lazy    fox
2  brown   quick    dog
3  quick     the   over
4  brown    over   lazy
5    fox   brown  quick
6  quick     fox    the
7    dog  jumped    the
8   lazy   brown    the
9    dog    lazy    the

Note, your approach doesn't work because when you do the following assignment in your for-loop:

row = row.str.rstrip('@')

This merely assigns the result of row.str.strip to the name row without mutating the DataFrame. This is the same behavior for all python objects and simple name assignment:

In [18]: rows = [[1,2,3],[4,5,6],[7,8,9]]

In [19]: print(rows)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [20]: for row in rows:
    ...:     row = ['look','at','me']
    ...:

In [21]: print(rows)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

To actually change the underlying data structure you need to use a mutator method:

In [22]: rows
Out[22]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [23]: for row in rows:
    ...:     row.append("LOOKATME")
    ...:

In [24]: rows
Out[24]: [[1, 2, 3, 'LOOKATME'], [4, 5, 6, 'LOOKATME'], [7, 8, 9, 'LOOKATME']]

Note that slice-assignment is just syntactic sugar for a mutator method:

In [26]: rows
Out[26]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [27]: for row in rows:
    ...:     row[:] = ['look','at','me']
    ...:
    ...:

In [28]: rows
Out[28]: [['look', 'at', 'me'], ['look', 'at', 'me'], ['look', 'at', 'me']]

This is analogous to pandas loc or iloc based assignment.

answered Sep 28 '22 06:09

juanpa.arrivillaga

You could make this real easy and just use pandas.DataFrame.replace() method to replace all "@" with a "":

df.replace("@", "")

If you are worried about "@" being replaced not just at the end of your values, you could use regex:

df.replace("@$", "", regex=True)

answered Sep 28 '22 08:09

SummerEla

Related questions
                            
                                Pass array as argument in python
                            
                                Unable to install R package due to XML dependency mismatch
                            
                                Flask sessions, where are the cookies stored?
                            
                                Removing element from a list by a regexp in Python
                            
                                Python print environment variable memory address
                            
                                Tensorflow: Creating a graph in a class and running it outside
                            
                                How to obtain the right alpha value to perfectly blend two images?
                            
                                Pandas: Why is default column type for numeric float?
                            
                                How to get localized day-names in django?
                            
                                STARTTLS extension not supported by server - Getting this error when trying to send an email through Django and a private email address
                            
                                Add value to every "other" field ((i+j)%2==0) of numpy array
                            
                                Download a full page with scrapy
                            
                                Does time.sleep help the processor?
                            
                                Are there any examples of anomaly detection algorithms implemented with TensorFlow?
                            
                                inserting numpy integer types into sqlite with python3
                            
                                Passing a command line argument to a py.test fixture as a parameter
                            
                                SQLAlchemy: is it possible to operate Query without bounding to session?
                            
                                limited number of user-initiated background processes
                            
                                pandas, convert DataFrame to MultiIndex'ed DataFrame
                            
                                Saving objects and their related objects at the same time in Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With