I am fetching html source code of many pages from one website, I need to convert it into json object and combine with other elements in json doc. . I have seen many questions on same topic but non of them were helpful.
My code:
url = "https://totalhash.cymru.com/analysis/?1ce201cf28c6dd738fd4e65da55242822111bd9f"
htmlContent = requests.get(url, verify=False)
data = htmlContent.text
print("data",data)
jsonD = json.dumps(htmlContent.text)
jsonL = json.loads(jsonD)
ContentUrl='{ \"url\" : \"'+str(urls)+'\" ,'+"\n"+' \"uid\" : \"'+str(uniqueID)+'\" ,\n\"page_content\" : \"'+jsonL+'\" , \n\"date\" : \"'+finalDate+'\"}'
above code gives me unicode type, however, when I put that output in jsonLint it gives me invalid json error. Can somebody help me understand how can I convert the complete html into a json objet?
var json = JSON. stringify({html:html}) , that is JSON. What you have is simply a javascript object, no such thing as a "JSON object".
stringify($("#emails_form"). serializeArray()); If you want to store formData in a JSON file, you need to post it to the server (e.g. per AJAX) and save it. But in that case, you can simply post the form und convert it to JSON on the server itself.
Answered by Maiku Mori: The limit for an HTML attribute is potentially 65536 characters. Answered by Quentin: Encode the JSON string before putting it into the attribute, as per the usual conventions. For PHP, use the htmlentities() function.
Open a Text editor like Notepad, Visual Studio Code, Sublime, or your favorite one. Copy and Paste below JSON data in Text Editor or create your own base on the What is JSON article. Once file data are validated, save the file with the extension of . json, and now you know how to create the Valid JSON document.
jsonD = json.dumps(htmlContent.text)
converts the raw HTML content into a JSON string representation.
jsonL = json.loads(jsonD)
parses the JSON string back into a regular string/unicode object. This results in a no-op, as any escaping done by dumps()
is reverted by loads()
. jsonL
contains the same data as htmlContent.text
.
Try to use json.dumps
to generate your final JSON instead of building the JSON by hand:
ContentUrl = json.dumps({
'url': str(urls),
'uid': str(uniqueID),
'page_content': htmlContent.text,
'date': finalDate
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With