Based on "Split a string by spaces in Python", which uses shlex.split to split a string with quotes smartly, I would be interested in hearing about other common tasks solved by non-obvious standard library functions.
If this turns into Module of The Week, that's fine too.
Modularization is the technique of splitting a large programming task into smaller, separate, and manageable subtasks. Like most modern programming languages, Python is a modular programming language. Python scripts are modularized through functions, modules, and packages.
A module in Python is just a file containing Python definitions and statements. The module name is moulded out of the file name by removing the suffix . py. For example, if the file name is fibonacci.py, the module name is fibonacci. Let's turn our Fibonacci functions into a module.
Take some lines of code, give them a name, and you've got a function (which can be reused). Take a collection of functions and package them as a file, and you've got a module (which can also be reused).
You need to use the import keyword along with the desired module name. When interpreter comes across an import statement, it imports the module to your current program. You can use the functions inside a module by using a dot(.) operator along with the module name.
I was quite surprised to learn that you could use the bisect module to do a very fast binary search in a sequence. It's documentation doesn't say anything about it:
This module provides support for maintaining a list in sorted order without having to sort the list after each insertion.
The usage is very simple:
>>> import bisect
>>> lst = [4, 7, 10, 23, 25, 100, 103, 201, 333]
>>> bisect.bisect_left(lst, 23)
3
You have to remember though, that it's quicker to linearly look for something in a list goes item by item, than sorting the list and then doing a binary search on it. The first option is O(n), the second is O(nlogn).
Oft overlooked modules, uses and tricks:
collections.defaultdict(): for when you want missing keys in a dict to have a default value.
functools.wraps(): for writing decorators that play nicely with introspection.
posixpath: the os.path module for POSIX systems. You can use it for manipulating POSIX paths (including URI elements) even on Windows and other non-POSIX systems.
ntpath: the os.path module for Windows; usable for manipulation of Windows paths on non-Windows systems.
(also: macpath, for MacOS 9 and earlier, os2emxpath for OS/2 EMX, but I'm not sure if anyone still cares.)
pprint: more structured printing of the repr() of containers makes debugging much easier.
imp: all the tools you need to write your own plugin system or make Python import modules from arbitrary archives.
rlcompleter: getting tab-completion in the normal interactive interpreter. Just do "import readline, rlcompleter; readline.parse_and_bind('tab: complete')"
the PYTHONSTARTUP environment variable: can be set to the path to a file that will be executed (in the main namespace) when entering the interactive interpreter; useful for putting things in like the rlcompleter recipe above.
I use itertools (especially cycle, repeat, chain) to make python behave more like R and in other functional / vector applications. Often this lets me avoid the overhead and complication of Numpy.
# in R, shorter iterables are automatically cycled
# and all functions "apply" in a "map"-like way over lists
> 0:10 + 0:2
[1] 0 2 4 3 5 7 6 8 10 9 11
Python #Normal python In [1]: range(10) + range(3) Out[1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2]
## this code is terrible, but it demos the idea.
from itertools import cycle
def addR(L1,L2):
n = max( len(L1), len(L2))
out = [None,]*n
gen1,gen2 = cycle(L1), cycle(L2)
ii = 0
while ii < n:
out[ii] = gen1.next() + gen2.next()
ii += 1
return out
In [21]: addR(range(10), range(3))
Out[21]: [0, 2, 4, 3, 5, 7, 6, 8, 10, 9]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With