I am writing a program that categorizes a list of Python files by which modules they import. As such I need to scan the collection of .py files ad return a list of which modules they import. As an example, if one of the files I import has the following lines:
import os import sys, gtk   I would like it to return:
["os", "sys", "gtk"]   I played with modulefinder and wrote:
from modulefinder import ModuleFinder  finder = ModuleFinder() finder.run_script('testscript.py')  print 'Loaded modules:' for name, mod in finder.modules.iteritems():     print '%s ' % name,   but this returns more than just the modules used in the script. As an example in a script which merely has:
import os print os.getenv('USERNAME')   The modules returned from the ModuleFinder script return:
tokenize  heapq  __future__  copy_reg  sre_compile  _collections  cStringIO  _sre  functools  random  cPickle  __builtin__  subprocess  cmd  gc  __main__  operator  array  select  _heapq  _threading_local  abc  _bisect  posixpath  _random  os2emxpath  tempfile  errno  pprint  binascii  token  sre_constants  re  _abcoll  collections  ntpath  threading  opcode  _struct  _warnings  math  shlex  fcntl  genericpath  stat  string  warnings  UserDict  inspect  repr  struct  sys  pwd  imp  getopt  readline  copy  bdb  types  strop  _functools  keyword  thread  StringIO  bisect  pickle  signal  traceback  difflib  marshal  linecache  itertools  dummy_thread  posix  doctest  unittest  time  sre_parse  os  pdb  dis   ...whereas I just want it to return 'os', as that was the module used in the script.
Can anyone help me achieve this?
UPDATE: I just want to clarify that I would like to do this without running the Python file being analyzed, and just scanning the code.
To check all the installed Python modules, we can use the following two commands with the 'pip': Using 'pip freeze' command. Using 'pip list command.
You can inspect Python's import path by printing sys. path .
When importing a module from a package, note that __import__('A.B', ...) returns package A when fromlist is empty, but its submodule B when fromlist is not empty. Level is used to determine whether to perform absolute or relative imports.
IMO the best way todo this is to use the http://furius.ca/snakefood/ package. The author has done all of the required work to get not only directly imported modules but it uses the AST to parse the code for runtime dependencies that a more static analysis would miss.
Worked up a command example to demonstrate:
sfood ./example.py | sfood-cluster > example.deps   That will generate a basic dependency file of each unique module. For even more detail use:
sfood -r -i ./example.py | sfood-cluster > example.deps   To walk a tree and find all imports, you can also do this in code: Please NOTE - The AST chunks of this routine were lifted from the snakefood source which has this copyright: Copyright (C) 2001-2007 Martin Blais. All Rights Reserved.
 import os  import compiler  from compiler.ast import Discard, Const  from compiler.visitor import ASTVisitor   def pyfiles(startPath):      r = []      d = os.path.abspath(startPath)      if os.path.exists(d) and os.path.isdir(d):          for root, dirs, files in os.walk(d):              for f in files:                  n, ext = os.path.splitext(f)                  if ext == '.py':                      r.append([d, f])      return r   class ImportVisitor(object):      def __init__(self):          self.modules = []          self.recent = []      def visitImport(self, node):          self.accept_imports()          self.recent.extend((x[0], None, x[1] or x[0], node.lineno, 0)                             for x in node.names)      def visitFrom(self, node):          self.accept_imports()          modname = node.modname          if modname == '__future__':              return # Ignore these.          for name, as_ in node.names:              if name == '*':                  # We really don't know...                  mod = (modname, None, None, node.lineno, node.level)              else:                  mod = (modname, name, as_ or name, node.lineno, node.level)              self.recent.append(mod)      def default(self, node):          pragma = None          if self.recent:              if isinstance(node, Discard):                  children = node.getChildren()                  if len(children) == 1 and isinstance(children[0], Const):                      const_node = children[0]                      pragma = const_node.value          self.accept_imports(pragma)      def accept_imports(self, pragma=None):          self.modules.extend((m, r, l, n, lvl, pragma)                              for (m, r, l, n, lvl) in self.recent)          self.recent = []      def finalize(self):          self.accept_imports()          return self.modules   class ImportWalker(ASTVisitor):      def __init__(self, visitor):          ASTVisitor.__init__(self)          self._visitor = visitor      def default(self, node, *args):          self._visitor.default(node)          ASTVisitor.default(self, node, *args)    def parse_python_source(fn):      contents = open(fn, 'rU').read()      ast = compiler.parse(contents)      vis = ImportVisitor()        compiler.walk(ast, vis, ImportWalker(vis))      return vis.finalize()   for d, f in pyfiles('/Users/bear/temp/foobar'):      print d, f      print parse_python_source(os.path.join(d, f))    
                        I recently needed all the dependencies for a given python script and I took a different approach than the other answers. I only cared about top level module module names (eg, I wanted foo from import foo.bar).
This is the code using the ast module:
import ast   modules = set()  def visit_Import(node):     for name in node.names:         modules.add(name.name.split(".")[0])  def visit_ImportFrom(node):     # if node.module is missing it's a "from . import ..." statement     # if level > 0 it's a "from .submodule import ..." statement     if node.module is not None and node.level == 0:         modules.add(node.module.split(".")[0])  node_iter = ast.NodeVisitor() node_iter.visit_Import = visit_Import node_iter.visit_ImportFrom = visit_ImportFrom   Testing with a python file foo.py that contains:
# foo.py import sys, os import foo1 from foo2 import bar from foo3 import bar as che import foo4 as boo import foo5.zoo from foo6 import * from . import foo7, foo8 from .foo12 import foo13 from foo9 import foo10, foo11  def do():     import bar1     from bar2 import foo     from bar3 import che as baz   I could get all the modules in foo.py by doing something like this:
with open("foo.py") as f:     node_iter.visit(ast.parse(f.read())) print(modules)   which would give me this output:
set(['bar1', 'bar3', 'bar2', 'sys', 'foo9', 'foo4', 'foo5', 'foo6', 'os', 'foo1', 'foo2', 'foo3']) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With