Pylint message about module length reasoning and ratio of docstrings to lines of code

Question

I know that this could be dismissed as opinion-based, but googling isn't finding the resources I was hoping for, and I am looking for links to any established and agreed best practices in the Python community.

I am an intermediate Python programmer in an organization with a pretty terrible history of writing obfuscated code in every language ever invented. I would really like to set an example of good programming styles and practices. To that end, I am following PEP 8, running pylint on everything I write, and thinking deeply about each of its suggestions rather than simply dismissing them. I have broken up longer, complex methods into shorter ones, in part due to its advice. I also write detailed docstrings following this style: http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

One challenge for me is that, while I am not the only Python programmer in my organization, I seem to be the only one who takes any of this stuff seriously, and my colleagues don't seem to mind undocumented, repetitive code with naming that doesn't follow any particular schema, for example. So I don't think getting them to review my code or learning from them is my best option.

I just got my first "Too many lines in module" message from pylint. I am not done writing the module - I wanted to add at least one more class and several methods to the existing classes. I know that the idea is that a module should "do one thing" but that "thing" is not yet fully implemented.

Here are some statistics that pylint gives me:

+---------+-------+-----------+-----------+------------+---------+
|type     |number |old number |difference |%documented |%badname |
+=========+=======+===========+===========+============+=========+
|module   |1      |1          |=          |100.00      |0.00     |
+---------+-------+-----------+-----------+------------+---------+
|class    |3      |3          |=          |100.00      |0.00     |
+---------+-------+-----------+-----------+------------+---------+
|method   |27     |27         |=          |100.00      |0.00     |
+---------+-------+-----------+-----------+------------+---------+
|function |2      |2          |=          |100.00      |0.00     |
+---------+-------+-----------+-----------+------------+---------+

+----------+-------+------+---------+-----------+
|type      |number |%     |previous |difference |
+==========+=======+======+=========+===========+
|code      |266    |24.98 |266      |=          |
+----------+-------+------+---------+-----------+
|docstring |747    |70.14 |747      |=          |
+----------+-------+------+---------+-----------+
|comment   |41     |3.85  |41       |=          |
+----------+-------+------+---------+-----------+
|empty     |11     |1.03  |11       |=          |
+----------+-------+------+---------+-----------+

I really don't think that 266 lines of code is too many for a module. My docstrings are 75% of the lines in the module - is this standard? My docstrings are pretty repetitive, since my methods are smallish operations on data. Each docstring will tend to state, for example, that one argument is a pandas dataframe and list the required and optional columns of the dataframe with their meanings, and that is repeated in each method or function that does anything to the dataframe.

Is there some sort of systematic error that it seems I might be making here? Are there guidelines for what to read in order to improve my code? Are my docstrings too long? Is there such a thing as a too-long docstring? Should I simply disable the pylint module-too-long message and get on with my life?

ChipJust · Accepted Answer

Wow, great question. Having the desire to write high-quality code is really not all that common. Some advice about your coworkers, though. Don't dismiss their point of view. They probably don't intend to do a bad job, but you have to somehow connect the idea of software quality with their idea of value. Taking the time to talk to people about the code you have written is not all about what you get out of the experience. Influencing the organization to respect and pursue software quality will be necessary for you to have any lasting impact to the performance of your company. Otherwise, it just won't matter how good the code you write is. Sorry for the aside; I know this is not really your question.

In some languages, like Java, it is normal to have exactly one class in a file, and to name that file the same as the class it contains. This is not normal in Python, but I think it gives some good guidance. You want the code to be easy to navigate, and that requires striking a balance between putting things as closely together as you can and organizing them, which is the primary reason we separate things. So you might start by reviewing these two concerns with regard to your problem space and how well the ideas in your code are aligning with the ideas in your problem space.

I use doc strings, but I have not tried to make them with sphinx or restructured or latex. I work in a large code base that uses Doxygen, but I honestly don't put much effort into using the features of the tool in my comments, although I do occasionally poke at the Doxygen documentation to see if there is something I am missing. I have worked with form-like coding styles before, but I am not actually sold that the paperwork brings value. The important thing to go for in your comments is the same as what you go for in your design and in your implementation, and that is understanding. What does each word in each comment add to the understanding. I don't want valueless filler words like Name, Parameter, Return...I mean, I do, but only grudgingly because I want people to tell me up front what their interface is. I see all those filler words as paperwork I am willing to tolerate. I think it is a trap for people to feel like great comments make good code. They help, but often, if I feel I must comment, one of two things is happening: it is an interface or it is a design flaw. If I have to comment something that isn't an interface it probably means my design was not very clear or that my implementation is getting cluttered because I am too lazy to figure out how to make each function do one thing. I will probably come clean that up if I am ever in here again.

Without seeing your code I can't give very much advice on how to make it high quality, but it might help to think about how you define "software quality." I define it as "how easy the code is to change." This depends on the types of change the code is likely to need, which implies that evaluating your code for quality really must include some anticipation for what is likely to be needed. Counter-intuitively, actually making your code easier to change often involves not trying to implement anything that is not required right now. Even so, I often will implement some things at the slightest provocation, especially in Python. For example, implementing the str method is a great idea; making your object hash-able by implementing eq, ne and hash is even cooler, as that allows you to use your object as a key in a dictionary or a member of a set.

One other (somewhat random) item of advice would be to be wary of object oriented thinking. There is so much good about it, but there are some pitfalls. For example, don't make functions like get_thing(self). It is way better to have an attribute for that, and if you need to do extra work you can make a @property getter setter, and this will still leave the caller with a simple attribute access, which is way cleaner. I find that people who just learned some object oriented ideas tend to see making lots of get and set methods as a good thing to do, but I prefer to drive state completely out of the design, if I can, and all these get and set methods imply a state to the object.

Pylint message about module length reasoning and ratio of docstrings to lines of code

Tags:

python

docstring

pylint

moink

1 Answers

ChipJust

Recent Activity

Donate For Us

Pylint message about module length reasoning and ratio of docstrings to lines of code

Tags:

python

docstring

pylint

moink

1 Answers

ChipJust

Related questions

Recent Activity

Donate For Us