I am using Selenium and python to scrape a website. I am scraping some '£' Characters, however I am getting this instead: \u00a3
, when writing to JSON (they appear as '£' with I print them to terminal).
I understand they are Unicode and I need them in UTF8 (?). I've tried a few things I've found on SO and haven't had much success.
I have tried .replace (.replace('\u00a3', '£') - However I'm not having much success.
How do I get the characters to look like '£' instead of \u00a3
?
This is the line that's printing incorrectly. Let me know if you want to see my entire code.
price = page.find_element_by_class_name('header_tags').text
you can turn it into JSON in Python using the json. loads() function. The json. loads() function accepts as input a valid string and converts it to a Python dictionary.
If you have a Python object, you can convert it into a JSON string by using the json. dumps() method.
JSON is a syntax for storing and exchanging data. JSON is text, written with JavaScript object notation.
Method 2: Writing JSON to a file in Python using json.dump() It takes 2 parameters: dictionary – the name of a dictionary which should be converted to a JSON object. file pointer – pointer of the file opened in write or append mode.
If you're using json.dump()
or json.dumps()
, try setting ensure_ascii=False
you can encode the string like below
s = 'This is a Pound sign \u00a3'
s.encode('utf8')
print(s)
Output
This is a Pound sign £
You need to call text("utf-8")
while printing as follows:
print(page.find_element_by_class_name('header_tags').text("utf-8"))
But this issue can occur at some lines as well. So as per best practices start the Python file with the line:
# -*- coding: UTF-8 -*-
An example:
from selenium import webdriver
# other lines of code
price = page.find_element_by_class_name('header_tags').text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With