Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Renaming Cyrillic file names

What I have in mind is iterating through a folder to check whether the file names contain any Cyrillic characters, if they do, rename those files to something else.

How could I do this ?

like image 261
Mario Geuenich Avatar asked Mar 17 '23 12:03

Mario Geuenich


2 Answers

Python 3
This one checks each character of the passed string, whether it's in the Cyrillic block and returns True if the string has a Cyrillic character in it. Strings in Python3 are unicode by default. The function encodes each character to utf-8 and checks whether this yields two bytes matching the table block that contains Cyrillic characters.

def isCyrillic(filename):
    for char in filename:            
        char_utf8 = char.encode('utf-8')      # encode to utf-8 

        if len(char_utf8) == 2 \              # check if we have 2 bytes and if the
            and 0xd0 <= char_utf8[0] <= 0xd3\ # first and second byte point to
            and 0x80 <= char_utf8[1] <= 0xbf: # Cyrillic block (unicode U+0400-U+04FF)
            return True

    return False

Same function using ord() as suggested in comment

def isCyrillicOrd(filename):
    for char in filename:                  
        if 0x0400 <= ord(char) <= 0x04FF:    # directly checking unicode code point
            return True

    return False

Test Directory

cycont
   |---- asciifile.txt
   |---- кириллфайл.txt
   |---- украї́нська.txt
   |---- संस्कृत.txt

Test

import os
for (dirpath, dirnames, filenames) in os.walk('G:/cycont'):
    for filename in filenames:
        print(filename, isCyrillic(filename), isCyrillicOrd(filename))

Output

asciifile.txt False False
кириллфайл.txt True True
украї́нська.txt True True
संस्कृत.txt False False
like image 129
embert Avatar answered Mar 27 '23 14:03

embert


Python 2:

# -*- coding: utf-8 -*-
def check_value(value):
    try:
        value.decode('ascii')
    except UnicodeDecodeError:
        return False
    else:
        return True

Python 3:

Python 3 'str' object doesn't have the attribute 'decode'. So you can use the encode as follows.

# -*- coding: utf-8 -*-
def check_value(value):
    try:
        value.encode('ascii')
    except UnicodeEncodeError:
        return False
    else:
        return True

Then you can gather your file names, and pass them through the check_value function.

like image 41
CodeLikeBeaker Avatar answered Mar 27 '23 12:03

CodeLikeBeaker