Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do i get python to write swedish letters(åäö) into a html file? [duplicate]

So the code I have copied an HTML file into a string and then changed everything to lower case except normal text and comments. The problem is it also changes the åäö into something the VS code can't recognise. What I can find is its a problem with the encoding but can't find anything about it on py3 and the solutions I found for py2 didn't work. Any help is appreciated and if you know how to improve the code plz tell me.

import re
import os


text_list = []

for root, dirs, files in os.walk("."):
    for filename in files:

        if (
            filename.endswith(".html")
        ):
            text_list.append(os.path.join(root, filename))

for file in text_list:

    file_content = open(f"{file}", "r+").read()

    if file.endswith(".html"):
        os.rename(file, file.replace(" ", "_").lower())
        code_strings = re.findall(r"<.+?>", file_content)
        for i, str in enumerate(code_strings):
            new_code_string = code_strings[i].lower()
            file_content = file_content.replace(code_strings[i], new_code_string)

    else:
        os.rename(file, file.replace(" ", "_").lower())
        file_content = file_content.lower()

    open(f"{file}", "r+").write(file_content)
like image 563
Fredrik Berzins Avatar asked Nov 07 '22 06:11

Fredrik Berzins


1 Answers

Open your file with codecs and use Unicode encoding. Example:

import codecs
codecs.open('your_filename_here', encoding='utf-8', mode='w+')

Docs: Python Unicode Docs

like image 161
CFV Avatar answered Nov 14 '22 02:11

CFV