Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Registered Trademark: Why does strip remove ® but replace can't find it? How do I remove symbols from folder and file names?

If the registered trademark symbol does not appear at the end of a file or folder name, strip cannot be used. Why doesn't replace work?

I have some old files and folders named with a registered trademark symbol that I want to remove.

The files don't have an extension.

  • folder: "\data\originals\Word Finder®"
  • file 1: "\data\originals\Word Finder® DA"
  • file 2: "\data\originals\Word Finder® Thesaurus"

For the folder, os.rename(p,p.strip('®')) works. However, replace os.rename(p,p.replace('®','')) does not work on either the folder or the files.

Replace works on strings fed to it, ie: print 'Registered® Trademark®'.replace('®',''). Is there a reason the paths don't follow this same logic?

note:

  • I'm using os.walk() to get the folder and file names
like image 789
skas Avatar asked Jul 30 '14 17:07

skas


2 Answers

I have been unable to recreate your issue, so I'm not sure why it isn't working for you. Here is a workaround though: instead of using the registered character in your source code with the string methods, try being more explicit with something like this:

import os

for root, folders, files in os.walk(os.getcwd()):
    for fi in files:
        oldpath = os.path.join(root, fi)
        newpath = os.path.join(root, fi.decode("utf-8").replace(u'\u00AE', '').encode("utf-8"))
        os.rename(oldpath, newpath)

Explicitly specifying the unicode codepoint you're looking for can help eliminate the number of places your code could be going wrong. The interpreter no longer has to worry about the encoding of your source code itself.

like image 60
skrrgwasme Avatar answered Sep 28 '22 03:09

skrrgwasme


My original question 'Registered Trademark: Why does strip remove ® but replace can't find it?' is no longer applicable. The problem isn't strip or replace, but how os.rename() deals with unicode characters. So, I added to my question.

Going off of what Cameron said, os.rename() seems like it doesn't work with unicode characters. (please correct me if this is wrong - I don't know much about this). shutil.move() ultimately gives the same result that os.rename() should have.

Despite ScottLawson's suggestion to use u'\u00AE' instead of '®', I could not get it to work.

Basically, use shutil.move(old_name,new_name) instead.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import shutil
import os

# from this answer: https://stackoverflow.com/q/1033424/3889452
def remove(value):
    deletechars = '®'
    for c in deletechars:
        value = value.replace(c,'')
    return value

for root, folders, files in os.walk(r'C:\Users\myname\da\data\originals\Word_4_0'):
    for f in files:
        rename = remove(f)
        shutil.move(os.path.join(root,f),os.path.join(root,rename))
    for folder in folders:
        rename = remove(folder)
        shutil.move(os.path.join(root,folder),os.path.join(root,rename))    

This also works for the immediate directory (based off of this) and catches more symbols, chars, etc. that aren't included in string.printable and ® doesn't have to appear in the python code.

import shutil
import os
import string

directory_path = r'C:\Users\myname\da\data\originals\Word_4_0'
for file_name in os.listdir(directory_path):
    new_file_name = ''.join(c for c in file_name if c in string.printable)
    shutil.move(os.path.join(directory_path,file_name),os.path.join(directory_path,new_file_name))
like image 26
skas Avatar answered Sep 28 '22 04:09

skas