Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get JS var value in HTML source using BeautifulSoup in Python

I'm trying to get a JavaScript var value from an HTML source code using BeautifulSoup.

For example I have:

<script>
[other code]
var my = 'hello';
var name = 'hi';
var is = 'halo';
[other code]
</script>

I want something to return the value of the var "my" in Python

How can I achieve that?

like image 557
L. K. Avatar asked Oct 20 '25 17:10

L. K.


1 Answers

Another idea would be to use a JavaScript parser and locate a variable declaration node, check the identifier to be of a desired value and extract the initializer. Example using slimit parser:

from bs4 import BeautifulSoup
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


data = """
<script>
var my = 'hello';
var name = 'hi';
var is = 'halo';
</script>
"""

soup = BeautifulSoup(data, "html.parser")

script = soup.find("script", text=lambda text: text and "var my" in text)

# parse js
parser = Parser()
tree = parser.parse(script.text)
for node in nodevisitor.visit(tree):
    if isinstance(node, ast.VarDecl) and node.identifier.value == 'my':
        print(node.initializer.value)

Prints hello.

like image 134
alecxe Avatar answered Oct 22 '25 06:10

alecxe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!