Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Readability of Scientific Python Code (Line Continuations, Variable Names, Imports)

Do Python's stylistic best practices apply to scientific coding?

I am finding it difficult to keep scientific Python code readable.

For example, it is suggested to use meaningful names for variables and to keep the namespace ordered by avoiding import *. Thus, e.g. :

    import numpy as np
    normbar = np.random.normal(mean, std, np.shape(foo))

But these suggestions can lead to some difficult-to-read code, especially given the 79-character line width. For example, I just wrote the following operation:

net["weights"][ix1][ix2] += lrate * (CD / nCases - opts["weightcost_pretrain"].dot(net["weights"][ix1][ix2]))

I can span the expression across lines:

net["weights"][ix1][ix2] += lrate * (CD / nCases - 
     opts["weightcost_pretrain"].dot(net["weights"][ix1][ix2]))

but this does not seem much better, and I am not sure how deep to indent the second line. These kinds of line continuations become even trickier when one is double-indented into a nested loop, and there are only 50 characters available on a line.

Should I accept that scientific Python looks clunky, or are there ways to avoid lines like the example above?

Some potential approaches are:

  • using shorter variable names
  • using shorter dictionary key names
  • importing numpy functions directly and assigning them short names
  • defining helper functions for combinations of arithmetic operations
  • breaking operations into smaller pieces, and placing one on each line

I would appreciate any wisdom on which of these to pursue and which to avoid, as well as suggestions for other remedies.

like image 716
cjh Avatar asked Aug 08 '13 19:08

cjh


2 Answers

  • defining helper functions for combinations of arithmetic operations
  • breaking operations into smaller pieces, and placing one on each line

These are both good ideas—in keeping with the intent behind PEP 8, and with Pythonic style in general. In fact, whenever someone suggests modifying PEP 8 to give more information about long lines, half the responses are usually "If you're going over the line limit, you're probably doing too much in one expression".

And, more generally, factoring out code and giving sensible names to sensible operations are always a good idea.

Of course without knowing exactly what all these things represent, I can only guess at how to split them up, but I think something like this would be pretty readable and meaningful:

cost = opts["weightcost_pretrain"].dot(net["weights"][ix1][ix2])
weight = lrate * (CD / nCases - cost)
net["weights"][ix1][ix2] += weight
like image 184
abarnert Avatar answered Jan 02 '23 06:01

abarnert


I think the style guide always applies- I use Python daily for scientific work and find that I'm able to read my code more easily and come back to it months later with little effort if I've split up long lines into logical components and sensible variable names, or used a function.

I'd do something more like this:

weights = net["weights"][ix1][ix2]
opts_arr = opts["weightcost_pretrain"]
weights += lrate * (CD / nCases - opts_arr.dot(weights))

Another way of saying that Python is "concise" is that Python is syntactically dense, and I find it harder to read and understand a long line of Python than a long line of Java (especially when using high-level functions from 3rd party libraries that hide low-level logic, like NumPy).

like image 22
mdscruggs Avatar answered Jan 02 '23 07:01

mdscruggs