Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to emulate file opened in text mode in Python

I am looking into ways of testing some code that acts on files, but I would like to write some tests that only rely on specific strings within the source file rather than having a specific file somewhere in the file system.

I know that it is possible to provide a file-like stream interface to strings via io.StringIO.

The problem is that the operations do not follow the same semantic. For example, a combination of file.seek() and file.read() would produce different results depending on whether the file object comes from open() or from io.StringIO for strings containing non-ASCII characters:

import io

#      'abgdezhjiklmnxoprstufqyw'
text = 'αβγδεζηθικλμνξoπρστυφχψω'


with open('test.txt', 'w') as file_obj:
    file_obj.write(text)


with open('test.txt', 'r') as file_obj:
    file_obj.seek(8)
    print(file_obj.read(8))
# εζηθικλμ


with io.StringIO(text) as file_obj:
    file_obj.seek(8)
    print(file_obj.read(8))
# ικλμνξoπ

The issue does not appear for ASCII-only strings:

import io

text = 'abgdezhjiklmnxoprstufqyw'


with open('test.txt', 'w') as file_obj:
    file_obj.write(text)


with open('test.txt', 'r') as file_obj:
    file_obj.seek(8)
    print(file_obj.read(8))
# iklmnxop


with io.StringIO(text) as file_obj:
    file_obj.seek(8)
    print(file_obj.read(8))
# iklmnxop

Obviously, this is due to .seek() following a bytes semantic for the offset parameter in the case of files opened with open(), while for io.StringIO it follows a str semantic.

I do understand that for performance reasons it is not practical to have a seek() following str semantic, not even if the file is opened in text mode.

Hence, my question is: how do I get an equivalent io.StringIO() with a seek method following the bytes semantic? Do I have to override io.StringIO myself or there is a better approach?

like image 552
norok2 Avatar asked Aug 15 '19 13:08

norok2


1 Answers

You can use BytesIO and TextIOWrapper to emulate the behavior of a real file:

text = 'αβγδεζηθικλμνξoπρστυφχψω'

with io.BytesIO(text.encode('utf8')) as binary_file:
    with io.TextIOWrapper(binary_file, encoding='utf8') as file_obj:
        file_obj.seek(8)
        print(file_obj.read(8))
        # εζηθικλμ
like image 135
Aran-Fey Avatar answered Sep 21 '22 03:09

Aran-Fey