Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert html source code to json object

I am fetching html source code of many pages from one website, I need to convert it into json object and combine with other elements in json doc. . I have seen many questions on same topic but non of them were helpful.

My code:

url = "https://totalhash.cymru.com/analysis/?1ce201cf28c6dd738fd4e65da55242822111bd9f"
htmlContent = requests.get(url, verify=False)
data = htmlContent.text
print("data",data)
jsonD = json.dumps(htmlContent.text)
jsonL = json.loads(jsonD)

ContentUrl='{ \"url\" : \"'+str(urls)+'\" ,'+"\n"+' \"uid\" : \"'+str(uniqueID)+'\" ,\n\"page_content\" : \"'+jsonL+'\" , \n\"date\" : \"'+finalDate+'\"}'

above code gives me unicode type, however, when I put that output in jsonLint it gives me invalid json error. Can somebody help me understand how can I convert the complete html into a json objet?

like image 200
Umesh Kaushik Avatar asked Apr 18 '17 10:04

Umesh Kaushik


People also ask

How can I convert HTML code into a JSON object?

var json = JSON. stringify({html:html}) , that is JSON. What you have is simply a javascript object, no such thing as a "JSON object".

How do I save HTML form data to JSON file?

stringify($("#emails_form"). serializeArray()); If you want to store formData in a JSON file, you need to post it to the server (e.g. per AJAX) and save it. But in that case, you can simply post the form und convert it to JSON on the server itself.

Can HTML be stored in JSON?

Answered by Maiku Mori: The limit for an HTML attribute is potentially 65536 characters. Answered by Quentin: Encode the JSON string before putting it into the attribute, as per the usual conventions. For PHP, use the htmlentities() function.

How do I create a JSON file?

Open a Text editor like Notepad, Visual Studio Code, Sublime, or your favorite one. Copy and Paste below JSON data in Text Editor or create your own base on the What is JSON article. Once file data are validated, save the file with the extension of . json, and now you know how to create the Valid JSON document.


1 Answers

jsonD = json.dumps(htmlContent.text) converts the raw HTML content into a JSON string representation. jsonL = json.loads(jsonD) parses the JSON string back into a regular string/unicode object. This results in a no-op, as any escaping done by dumps() is reverted by loads(). jsonL contains the same data as htmlContent.text.

Try to use json.dumps to generate your final JSON instead of building the JSON by hand:

ContentUrl = json.dumps({
    'url': str(urls),
    'uid': str(uniqueID),
    'page_content': htmlContent.text,
    'date': finalDate
})
like image 143
cg909 Avatar answered Oct 23 '22 22:10

cg909