Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to organize multiple python files into a single module without it behaving like a package?

Is there a way to use __init__.py to organize multiple files into a module?

Reason: Modules are easier to use than packages, because they don't have as many layers of namespace.

Normally it makes a package, this I get. Problem is with a package, 'import thepackage' gives me an empty namespace. Users must then either use "from thepackage import *" (frowned upon) or know exactly what is contained and manually pull it out into a usable namespace.

What I want to have is the user do 'import thepackage' and have nice clean namespaces that look like this, exposing functions and classes relevant to the project for use.

current_module \   doit_tools/   \    - (class) _hidden_resource_pool    - (class) JobInfo    - (class) CachedLookup    - (class) ThreadedWorker    - (Fn) util_a    - (Fn) util_b    - (Fn) gather_stuff    - (Fn) analyze_stuff 

The maintainer's job would be to avoid defining the same name in different files, which should be easy when the project is small like mine is.

It would also be nice if people can do from doit_stuff import JobInfo and have it retrieve the class, rather than a module containing the class.

This is easy if all my code is in one gigantic file, but I like to organize when things start getting big. What I have on disk looks sort of like this:

place_in_my_python_path/   doit_tools/     __init__.py     JobInfo.py       - class JobInfo:     NetworkAccessors.py       - class _hidden_resource_pool:       - class CachedLookup:       - class ThreadedWorker:     utility_functions.py       - def util_a()       - def util_b()     data_functions.py       - def gather_stuff()       - def analyze_stuff() 

I only separate them so my files aren't huge and unnavigable. They are all related, though someone (possible me) may want to use the classes by themselves without importing everything.

I've read a number of suggestions in various threads, here's what happens for each suggestion I can find for how to do this:

If I do not use an __init__.py, I cannot import anything because Python doesn't descend into the folder from sys.path.

If I use a blank __init__.py, when I import doit_tools it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.

If I list the submodules in __all__, I can use the (frowned upon?) from thing import * syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should use from x import * instead of import x, (2) manually reshuffle classes until they can reasonably obey line width style constraints.

If I add from thatfile import X statements to __init__.py, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there. In the below example, you'll see that:

  1. The class JobInfo overwrote the module object named JobInfo because their names were the same. Somehow Python can figure this out, because JobInfo is of type <class 'doit_tools.JobInfo.JobInfo'>. (doit_tools.JobInfo is a class, but doit_tools.JobInfo.JobInfo is that same class... this is tangled and seems very bad, but doesn't seem to break anything.)
  2. Each filename made its way into the doit_tools namespace, which makes it more confusing to look through if anyone is looking at the contents of the module. I want doit_tools.utility_functions.py to hold some code, not define a new namespace.

.

current_module \   doit_tools/   \    - (module) JobInfo       \        - (class) JobInfo    - (class) JobInfo    - (module) NetworkAccessors       \        - (class) CachedLookup        - (class) ThreadedWorker    - (class) CachedLookup    - (class) ThreadedWorker    - (module) utility_functions       \        - (Fn) util_a        - (Fn) util_b    - (Fn) util_a    - (Fn) util_b    - (module) data_functions       \        - (Fn) gather_stuff        - (Fn) analyze_stuff    - (Fn) gather_stuff    - (Fn) analyze_stuff 

Also someone importing just the data abstraction class would get something different than they expect when they do 'from doit_tools import JobInfo':

current_namespace \  JobInfo (module)   \    -JobInfo (class)  instead of:  current_namespace \  - JobInfo (class) 

So, is this just a wrong way to organize Python code? If not, what is a correct way to split related code up but still collect it in a module-like way?

Maybe the best case scenario is that doing 'from doit_tools import JobInfo' is a little confusing for someone using the package?

Maybe a python file called 'api' so that people using the code do the following?:

import doit_tools.api from doit_tools.api import JobInfo 

============================================

Examples in response to comments:

Take the following package contents, inside folder 'foo' which is in python path.

foo/__init__.py

__all__ = ['doit','dataholder','getSomeStuff','hold_more_data','SpecialCase'] from another_class import doit from another_class import dataholder from descriptive_name import getSomeStuff from descriptive_name import hold_more_data from specialcase import SpecialCase 

foo/specialcase.py

class SpecialCase:     pass 

foo/more.py

def getSomeStuff():     pass  class hold_more_data(object):     pass 

foo/stuff.py

def doit():     print "I'm a function."  class dataholder(object):     pass 

Do this:

>>> import foo >>> for thing in dir(foo): print thing ...  SpecialCase __builtins__ __doc__ __file__ __name__ __package__ __path__ another_class dataholder descriptive_name doit getSomeStuff hold_more_data specialcase 

another_class and descriptive_name are there cluttering things up, and also have extra copies of e.g. doit() underneath their namespaces.

If I have a class named Data inside a file named Data.py, when I do 'from Data import Data' then I get a namespace conflict because Data is a class in the current namespace that is inside module Data, somehow is also in the current namespace. (But Python seems to be able to handle this.)

like image 809
Brian Avatar asked Sep 22 '12 02:09

Brian


People also ask

How do you organize a Python module?

Organize your modules into packages. Each package must contain a special __init__.py file. Your project should generally consist of one top-level package, usually containing sub-packages. That top-level package usually shares the name of your project, and exists as a directory in the root of your project's repository.

What is the difference between a package and a module Python?

A module is a file containing Python code in run time for a user-specific code. A package also modifies the user interpreted code in such a way that it gets easily functioned in the run time.

What content should a folder contain for Python to consider it as a package?

The Python interpreter recognizes a folder as the package if it contains __init__.py file.

Do Python modules need to be in the same folder?

No, files can be imported from different directories.


2 Answers

You can sort of do it, but it's not really a good idea and you're fighting against the way Python modules/packages are supposed to work. By importing appropriate names in __init__.py you can make them accessible in the package namespace. By deleting module names you can make them inaccessible. (For why you need to delete them, see this question). So you can get close to what you want with something like this (in __init__.py):

from another_class import doit from another_class import dataholder from descriptive_name import getSomeStuff from descriptive_name import hold_more_data del another_class, descriptive_name __all__ = ['doit', 'dataholder', 'getSomeStuff', 'hold_more_data'] 

However, this will break subsequent attempts to import package.another_class. In general, you can't import anything from a package.module without making package.module accessible as an importable reference to that module (although with the __all__ you can block from package import module).

More generally, by splitting up your code by class/function you are working against the Python package/module system. A Python module should generally contain stuff you want to import as a unit. It's not uncommon to import submodule components directly in the top-level package namespace for convenience, but the reverse --- trying to hide the submodules and allow access to their contents only through the top-level package namespace --- is going to lead to problems. In addition, there is nothing to be gained by trying to "cleanse" the package namespace of the modules. Those modules are supposed to be in the package namespace; that's where they belong.

like image 120
BrenBarn Avatar answered Oct 09 '22 22:10

BrenBarn


Define __all__ = ['names', 'that', 'are', 'public'] in the __init__.py e.g.:

__all__ = ['foo']  from ._subpackage import foo 

Real-world example: numpy/__init__.py.


You have some misconception about how Python packages work:

If I do not use an __init__.py, I cannot import anything because Python doesn't descend into the folder from sys.path.

You need __init__.py file in Python versions older than Python 3.3 to mark a directory as containing a Python package.

If I use a blank __init__.py, when I import doit_tools it's an empty namespace with nothing in it. None of my files imported, which makes it more difficult to use.

It doesn't prevent the import:

from doit_tools import your_module 

It works as expected.

If I list the submodules in __all__, I can use the (frowned upon?) from thing import * syntax, but all of my classes are behind unnecessary namespace barriers again. The user has to (1) know they should use from x import * instead of import x, (2) manually reshuffle classes until they can reasonably obey line width style constraints.

(1) Your users (in most cases) should not use from your_package import * outside an interactive Python shell.

(2) you could use () to break a long import line:

from package import (function1, Class1, Class2, ..snip many other names..,                      ClassN) 

If I add from thatfile import X statements to __init__.py, I get closer but I have namespace conflicts (?) and extra namespaces for things I didn't want to be in there.

It is upto you to resolve namespace conflicts (different objects with the same name). The name can refer to any object: integer, string, package, module, class, functions, etc. Python can't know what object you might prefer and even if it could it would be inconsistent to ignore some name bindings in this particular case with respect to the usage of name bindings in all other cases.

To mark names as non-public you could prefix them with _ e.g., package/_nonpublic_module.py.

like image 30
jfs Avatar answered Oct 09 '22 23:10

jfs