I am using pandas library on Python 3.5.1. How can I remove html tags from field values? Here are my input and output: <img src="https://i.stack.imgur.com/bZmrF.png" alt="enter image description here"> My code returned an error: <pre class="prettyprint"><code>import pandas as pd code=[1,2,3] overview =['Environments subject.', '<ul><li> property ;</li></ul><ul><li>markets and exchange;</li></ul>', ''] # 'The subject.'] df= pd.DataFrame(overview,code) df.columns = ['overview'] df['overview_copy'] = df['overview'] # print(df) tags_list = ['' ,'' , '<p*>', '<ul>','</ul>', '<li>','</li>', ' ', '','', '<span*>','', '<a href*>','</a>', '',''] for tag in tags_list: # df['overview_copy'] = df['overview_copy'].str.replace(tag, '') df['overview_copy'].replace(to_replace=tag, value='', regex=True, inplace=True) print(df) </code></pre>

Like so <code>re.sub('<[^<]+?>', '', text)</code> You can find details answer there.

Removing html tags in pandas

Q: How do you remove HTML tags from a column of a DataFrame in Python?

replace('<[^<]+?> ', '') # Use regex to remove html tags.

Tags:

python

html

regex

python-3.x

pandas

I am using pandas library on Python 3.5.1. How can I remove html tags from field values? Here are my input and output:

enter image description here

My code returned an error:

Click to copy

import pandas as pd

code=[1,2,3]
overview =['<p>Environments subject.</p>',
          '<ul><li> property ;</li></ul><ul><li>markets and exchange;</li></ul>',
          '<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">']
# '<p class="SSPBodyText" style="padding: 0cm; text-align: justify;">The subject.</p>']
df= pd.DataFrame(overview,code)

df.columns = ['overview']
df['overview_copy'] = df['overview']

# print(df)

tags_list = ['<p>' ,'</p>' , '<p*>',
             '<ul>','</ul>',
             '<li>','</li>',
             '<br>',
             '<strong>','</strong>',
             '<span*>','</span>',
             '<a href*>','</a>',
             '<em>','</em>']

for tag in tags_list:
#     df['overview_copy'] = df['overview_copy'].str.replace(tag, '')
  df['overview_copy'].replace(to_replace=tag, value='', regex=True, inplace=True)
print(df)

708

asked Sep 01 '17 11:09

Hamideh

1 Answers

Like so re.sub('<[^<]+?>', '', text)

You can find details answer there.

answered Oct 03 '22 10:10

Pobe

Related questions
                            
                                Can I use insert() on an empty list in Python?
                            
                                How to get the position of the turtle?
                            
                                Why is this usage of python F-string interpolation wrapping with quotes?
                            
                                Multiprocessing with threading?
                            
                                What side should a django 'many-to-many' relationship reside on
                            
                                How to remove the Undo button in plotly dash after a dropdown update
                            
                                pyodbc Incorrect syntax near '-'. (102)
                            
                                How to redirect python script cmd output to a file?
                            
                                How to compare all columns with one column in pandas?
                            
                                unsupported operand type(s) for <<: 'str' and 'int' while reading file
                            
                                How to draw a frame on a matplotlib figure
                            
                                CFFI: TypeError: initializer for ctype 'char[]' must be a bytes or list or tuple, not str
                            
                                Send raw POST request using Socket
                            
                                Group by column in pandas dataframe and average arrays
                            
                                Why is this simple python toast notification not working?
                            
                                How to DM everyone with a bot - discord.py
                            
                                'WSGIRequest' object has no attribute 'FILE'
                            
                                How to connect to Tor control port (9051) from a remote host?
                            
                                Convert pandas dataframe with Timestamps to String
                            
                                What is the most efficient way to plot 3d array in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With