Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access a file from a python egg

Tags:

python

egg

Hi I am working with python packaging. I have 3 non-code files namely ['synonyms.csv', 'acronyms.csv', 'words.txt'].

  • These files exist in a folder structure Wordproject/WordProject/Repository/DataBank/
  • I have a RepositoryReader class at the path Wordproject/WordProject/Repository/
  • I've written a code that pulls the current location of the RepositoryReader and then looks for a subdirectory called DataBank and looks for the 3 files there.

The problem is when I create an egg out of the code, and then run it,

My code gives me the error:

Could not find the file at X:\1. Projects\Python\Wordproject\venv\lib\site-packages\Wordproject-1.0-py3.6.egg\Wordproject\Repository\DataBank\synonyms.csv

It's not able to fetch the file or read it from the path if the path is of an egg. Is there any way around it? These files have to be in an egg.

like image 685
iam.Carrot Avatar asked Apr 11 '18 18:04

iam.Carrot


People also ask

How do I view an egg file?

egg" file extension, using Android. Egg files can be opened in its native PANDA3D on any computer, but since this program is not available on mobile, you will have to use ALZip to view the compressed file contents of an Egg archive. ALZip is one of the few zipper apps available that support the Egg format.

What is a Python egg file?

Python eggs are an older distribution format for Python. The new format is called a Python wheel, which we will look at in the next chapter. An egg file is basically a zip file with a different extension. Python can import directly from an egg. You will need the SetupTools package to work with eggs.


2 Answers

egg files are just renamed .zip files.

You can use the zipfile library to open the egg and extract or read the file you need.

import zipfile

zip = zipfile.ZipFile('/path/to/file.egg', 'r')

# open file from within the egg
f = zip.open('synonyms.csv', 'r')
txt = f.read()
like image 113
Brendan Abel Avatar answered Sep 29 '22 20:09

Brendan Abel


There are two different things you could be trying to do here:

  • Treat the data files as part of your package, like the Python modules, and access them at runtime as if your package were a normal directory tree even if it isn't.
  • Get the data files installed somewhere else at pip install time, to a location you can access normally.

Both are explained in the section on data files in the PyPA/setuptools docs. I think you want the first one here, which is covered in the subsection on Accessing Data Files at Runtime:

Typically, existing programs manipulate a package’s __file__ attribute in order to find the location of data files. However, this manipulation isn’t compatible with PEP 302-based import hooks, including importing from zip files and Python Eggs. It is strongly recommended that, if you are using data files, you should use the ResourceManager API of pkg_resources to access them. The pkg_resources module is distributed as part of setuptools, so if you’re using setuptools to distribute your package, there is no reason not to use its resource management API. See also Accessing Package Resources for a quick example of converting code that uses __file__ to use pkg_resources instead.

Follow that link, and you find what look like some crufty old PEAK docs, but that's only because they really are crufty old PEAK docs. There is a version buried inside the setuptools docs that you may find easier to read and navigate once you manage to find it.

As it says, you could try using get_data (which will work inside an egg/zip) and then fall back to accessing a file (which will work when running from source), but you're better off using the wrappers in pkg_resources. Basically, if your code was doing this:

path = os.path.join(__file__, 'Wordproject/WordProject/Repository/DataBank/', datathingy)
with open(path) as f:
    for line in f:
        do_stuff(line)

… you'll change it to this:

path = 'Wordproject/WordProject/Repository/DataBank/' + datathingy
f = pkg_resources.resource_stream(__name__, path)
for line in f:
    do_stuff(line.decode())

Notice that resource_stream files are always opened in binary mode. So if you want to read them as text, you need to wrap a TextIOWrapper around them, or decode each line.

like image 27
abarnert Avatar answered Sep 29 '22 20:09

abarnert