So the code I have copied an HTML file into a string and then changed everything to lower case except normal text and comments. The problem is it also changes the åäö into something the VS code can't recognise. What I can find is its a problem with the encoding but can't find anything about it on py3 and the solutions I found for py2 didn't work. Any help is appreciated and if you know how to improve the code plz tell me.
import re
import os
text_list = []
for root, dirs, files in os.walk("."):
for filename in files:
if (
filename.endswith(".html")
):
text_list.append(os.path.join(root, filename))
for file in text_list:
file_content = open(f"{file}", "r+").read()
if file.endswith(".html"):
os.rename(file, file.replace(" ", "_").lower())
code_strings = re.findall(r"<.+?>", file_content)
for i, str in enumerate(code_strings):
new_code_string = code_strings[i].lower()
file_content = file_content.replace(code_strings[i], new_code_string)
else:
os.rename(file, file.replace(" ", "_").lower())
file_content = file_content.lower()
open(f"{file}", "r+").write(file_content)
Open your file with codecs
and use Unicode encoding.
Example:
import codecs
codecs.open('your_filename_here', encoding='utf-8', mode='w+')
Docs: Python Unicode Docs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With