Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing to JSON - Converting \u00a3 to £

I am using Selenium and python to scrape a website. I am scraping some '£' Characters, however I am getting this instead: \u00a3, when writing to JSON (they appear as '£' with I print them to terminal).

I understand they are Unicode and I need them in UTF8 (?). I've tried a few things I've found on SO and haven't had much success.

I have tried .replace (.replace('\u00a3', '£') - However I'm not having much success.

How do I get the characters to look like '£' instead of \u00a3?

This is the line that's printing incorrectly. Let me know if you want to see my entire code.

price = page.find_element_by_class_name('header_tags').text
like image 242
James5949 Avatar asked Oct 21 '18 12:10

James5949


People also ask

How to convert text into JSON Python?

you can turn it into JSON in Python using the json. loads() function. The json. loads() function accepts as input a valid string and converts it to a Python dictionary.

How to convert something to JSON in Python?

If you have a Python object, you can convert it into a JSON string by using the json. dumps() method.

What is JSON in Python?

JSON is a syntax for storing and exchanging data. JSON is text, written with JavaScript object notation.

How do I save JSON in Python?

Method 2: Writing JSON to a file in Python using json.dump() It takes 2 parameters: dictionary – the name of a dictionary which should be converted to a JSON object. file pointer – pointer of the file opened in write or append mode.


3 Answers

If you're using json.dump() or json.dumps(), try setting ensure_ascii=False

like image 99
Vikrant Sharma Avatar answered Sep 28 '22 08:09

Vikrant Sharma


you can encode the string like below

s = 'This is a Pound sign \u00a3'
s.encode('utf8')
print(s)

Output

This is a Pound sign £

like image 23
ansu5555 Avatar answered Sep 28 '22 06:09

ansu5555


You need to call text("utf-8") while printing as follows:

print(page.find_element_by_class_name('header_tags').text("utf-8"))

But this issue can occur at some lines as well. So as per best practices start the Python file with the line:

# -*- coding: UTF-8 -*-

An example:

from selenium import webdriver
# other lines of code
price = page.find_element_by_class_name('header_tags').text
like image 30
undetected Selenium Avatar answered Sep 28 '22 08:09

undetected Selenium