Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing html tags in pandas

I am using pandas library on Python 3.5.1. How can I remove html tags from field values? Here are my input and output:

enter image description here

My code returned an error:

import pandas as pd

code=[1,2,3]
overview =['<p>Environments subject.</p>',
          '<ul><li> property ;</li></ul><ul><li>markets and exchange;</li></ul>',
          '<p class="MsoNormal" style="margin: 0cm 0cm 0pt;">']
# '<p class="SSPBodyText" style="padding: 0cm; text-align: justify;">The subject.</p>']
df= pd.DataFrame(overview,code)

df.columns = ['overview']
df['overview_copy'] = df['overview']

# print(df)

tags_list = ['<p>' ,'</p>' , '<p*>',
             '<ul>','</ul>',
             '<li>','</li>',
             '<br>',
             '<strong>','</strong>',
             '<span*>','</span>',
             '<a href*>','</a>',
             '<em>','</em>']

for tag in tags_list:
#     df['overview_copy'] = df['overview_copy'].str.replace(tag, '')
  df['overview_copy'].replace(to_replace=tag, value='', regex=True, inplace=True)
print(df)
like image 708
Hamideh Avatar asked Sep 01 '17 11:09

Hamideh


People also ask

How do you remove HTML tags in Python?

Use the re. sub() method to remove the HTML tags from a string, e.g. result = re. sub(r'<.

How do you remove HTML tags from a column of a DataFrame in Python?

replace('<[^<]+?> ', '') # Use regex to remove html tags.

How do you remove all HTML tags from text in Python?

Remove HTML tags from string in python Using the lxml Module The fromstring() method takes the original string as an input and returns a parser. After getting the parser, we can extract the text using the text_content() method, leaving behind the HTML tags. The text_content() method returns an object of lxml. etree.

How do I remove labels from DataFrame pandas?

Pandas DataFrame: drop() function The drop() function is used to drop specified labels from rows or columns. Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names.


1 Answers

Like so re.sub('<[^<]+?>', '', text)

You can find details answer there.

like image 78
Pobe Avatar answered Oct 03 '22 10:10

Pobe