Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Python regular expression for HTML parsing (BeautifulSoup)

Tags:

python

regex

screen-scraping

I want to grab the value of a hidden input field in HTML.

<input type="hidden" name="fooId" value="12-3456789-1111111111" />

I want to write a regular expression in Python that will return the value of fooId, given that I know the line in the HTML follows the format

<input type="hidden" name="fooId" value="**[id is here]**" />

Can someone provide an example in Python to parse the HTML for the value?

like image

400

asked Sep 10 '08 21:09

mshafrir

People also ask

Can you use regular expressions to parse HTML?

HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts.

1 Answers

For this particular case, BeautifulSoup is harder to write than a regex, but it is much more robust... I'm just contributing with the BeautifulSoup example, given that you already know which regexp to use :-)

from BeautifulSoup import BeautifulSoup

#Or retrieve it from the web, etc. 
html_data = open('/yourwebsite/page.html','r').read()

#Create the soup object from the HTML data
soup = BeautifulSoup(html_data)
fooId = soup.find('input',name='fooId',type='hidden') #Find the proper tag
value = fooId.attrs[2][1] #The value of the third attribute of the desired tag 
                          #or index it directly via fooId['value']

like image

108

answered Sep 20 '22 08:09

Vinko Vrsalovic

Sign in to Comment

Related questions
                            
                                Tool to convert python indentation from spaces to tabs? [closed]
                            
                                Parsing srt subtitles
                            
                                Subtracting the current and previous item in a list
                            
                                Python read in string from file and split it into values [closed]
                            
                                Python TA-Lib install problems
                            
                                How to handle functions return value in Python
                            
                                django object is not JSON serializable error after upgrading django to 1.6.5
                            
                                Perl like regex in Python
                            
                                Python Data structure index Start at 1 instead of 0?
                            
                                How Python dict stores key, value when collision occurs? [duplicate]
                            
                                Python source header comment
                            
                                How to replace a Widget with another using Qt?
                            
                                Minimum Edit Distance Reconstruction
                            
                                Why is 3<<1 == 6 in python? [duplicate]
                            
                                NameError: name 'python' is not defined [closed]
                            
                                Change the color of text within a pandas dataframe html table python using styles and css
                            
                                An elegant way to get hashtags out of a string in Python?
                            
                                Python curses dilemma
                            
                                More efficient algorithm for shortest superstring search
                            
                                Error in reading stock data : 'DatetimeProperties' object has no attribute 'weekday_name' and 'NoneType' object has no attribute 'to_csv'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With