I am trying to work on a script that manipulates another script in Python, the script to be modified has structure like: <pre class="prettyprint"><code>class SomethingRecord(Record): description = 'This records something' author = 'john smith' </code></pre> I use <code>ast</code> to locate the <code>description</code> line number, and I use some code to change the original file with new description string base on the line number. So far so good. Now the only issue is <code>description</code> occasionally is a multi-line string, e.g. <pre class="prettyprint"><code> description = ('line 1' 'line 2' 'line 3') </code></pre> or <pre class="prettyprint"><code> description = 'line 1' \ 'line 2' \ 'line 3' </code></pre> and I only have the line number of the first line, not the following lines. So my one-line replacer would do <pre class="prettyprint"><code> description = 'new value' 'line 2' \ 'line 3' </code></pre> and the code is broken. I figured that if I know both the lineno of start and end/number of lines of <code>description</code> assignment I could repair my code to handle such situation. How do I get such information with Python standard library?

As a workaround you can change: <pre class="prettyprint"><code> description = 'line 1' \ 'line 2' \ 'line 3' </code></pre> to: <pre class="prettyprint"><code> description = 'new value'; tmp = 'line 1' \ 'line 2' \ 'line 3' </code></pre> etc. It is a simple change but indeed ugly code produced.

This is now available as <code>end_lineno</code> since Python 3.8.

How to get lineno of "end-of-statement" in Python ast

Tags:

python

abstract-syntax-tree

I am trying to work on a script that manipulates another script in Python, the script to be modified has structure like:

class SomethingRecord(Record):
    description = 'This records something'
    author = 'john smith'

I use ast to locate the description line number, and I use some code to change the original file with new description string base on the line number. So far so good.

Now the only issue is description occasionally is a multi-line string, e.g.

    description = ('line 1'
                   'line 2'
                   'line 3')

    description = 'line 1' \
        'line 2' \
        'line 3'

and I only have the line number of the first line, not the following lines. So my one-line replacer would do

    description = 'new value'
        'line 2' \
        'line 3'

and the code is broken. I figured that if I know both the lineno of start and end/number of lines of description assignment I could repair my code to handle such situation. How do I get such information with Python standard library?

924

asked Sep 29 '16 20:09

xis

4 Answers

I looked at the other answers; it appears people are doing backflips to get around the problems of computing line numbers, when your real problem is one of modifying the code. That suggests the baseline machinery is not helping you the way you really need.

If you use a program transformation system (PTS), you could avoid a lot of this nonsense.

A good PTS will parse your source code to an AST, and then let you apply source-level rewrite rules to modify the AST, and will finally convert the modified AST back into source text. Generically PTSes accept transformation rules of essentially this form:

   if you see *this*, replace it by *that*

[A parser that builds an AST is NOT a PTS. They don't allow rules like this; you can write ad hoc code to hack at the tree, but that's usually pretty awkward. Not do they do the AST to source text regeneration.]

(My PTS, see bio, called) DMS is a PTS that could accomplish this. OP's specific example would be accomplished easily by using the following rewrite rule:

 source domain Python; -- tell DMS the syntax of pattern left hand sides
 target domain Python; -- tell DMS the syntax of pattern right hand sides

 rule replace_description(e: expression): statement -> statement =
     " description = \e "
  ->
     " description = ('line 1'
                      'line 2'
                      'line 3')";

The one transformation rule is given an name replace_description to distinguish it from all the other rule we might define. The rule parameters (e: expression) indicate the pattern will allow an arbitrary expression as defined by the source language. statement->statement means the rule maps a statement in the source language, to a statement in the target language; we could use any other syntax category from the Python grammar provided to DMS. The " used here is a metaquote, used to distinguish the syntax of the rule language form the syntax of the subject language. The second -> separates the source pattern this from the target pattern that.

You'll notice that there is no need to mention line numbers. The PTS converts the rule surface syntax into corresponding ASTs by actually parsing the patterns with the same parser used to parse the source file. The ASTs produced for the patterns are used to effect the pattern match/replacement. Because this is driven from ASTs, the actual layout of the orginal code (spacing, linebreaks, comments) don't affect DMS's ability to match or replace. Comments aren't a problem for matching because they are attached to tree nodes rather than being tree nodes; they are preserved in the transformed program. DMS does capture line and precise column information for all tree elements; just not needed to implement transformations. Code layout is also preserved in the output by DMS, using that line/column information.

Other PTSes offer generally similar capabilities.

162

answered Oct 19 '22 05:10

Ira Baxter

As a workaround you can change:

    description = 'line 1' \
              'line 2' \
              'line 3'

to:

    description = 'new value'; tmp = 'line 1' \
              'line 2' \
              'line 3'

etc.

It is a simple change but indeed ugly code produced.

answered Oct 19 '22 03:10

Ohad Eytan

Indeed, the information you need is not stored in the ast. I don't know the details of what you need, but it looks like you could use the tokenize module from the standard library. The idea is that every logical Python statement is ended by a NEWLINE token (also it could be a semicolon, but as I understand it is not your case). I tested this approach with such file:

# first comment
class SomethingRecord:
    description = ('line 1'
                   'line 2'
                   'line 3')

class SomethingRecord2:
    description = ('line 1',
                   'line 2',
                   # comment in the middle

                   'line 3')

class SomethingRecord3:
    description = 'line 1' \
                  'line 2' \
                  'line 3'
    whatever = 'line'

class SomethingRecord3:
    description = 'line 1', \
                  'line 2', \
                  'line 3'
                  # last comment

And here is what I propose to do:

import tokenize
from io import BytesIO
from collections import defaultdict

with tokenize.open('testmod.py') as f:
    code = f.read()
    enc = f.encoding

rl = BytesIO(code.encode(enc)).readline
tokens = list(tokenize.tokenize(rl))

token_table = defaultdict(list)  # mapping line numbers to token numbers
for i, tok in enumerate(tokens):
    token_table[tok.start[0]].append(i)

def find_end(start):
    i = token_table[start][-1]  # last token number on the start line
    while tokens[i].exact_type != tokenize.NEWLINE:
        i += 1
    return tokens[i].start[0]

print(find_end(3))
print(find_end(8))
print(find_end(15))
print(find_end(21))

This prints out:

This seems to be correct, you could tune this approach depending on what exactly you need. tokenize is more verbose than ast but also more flexible. Of course the best approach is to use them both for different parts of your task.

EDIT: I tried this in Python 3.4, but I think it should also work in other versions.

answered Oct 19 '22 05:10

ivanl

This is now available as end_lineno since Python 3.8.

answered Oct 19 '22 03:10

davetapley

Related questions
                            
                                redirect prints to log file
                            
                                How do I know if my list has all 1s?
                            
                                Imbalance in scikit-learn
                            
                                Python - 'ascii' codec can't decode byte
                            
                                Where can I find mad (mean absolute deviation) in scipy?
                            
                                Rotating an image with orientation specified in EXIF using Python without PIL including the thumbnail
                            
                                In Python 2.4, how can I strip out characters after ';'?
                            
                                Python "Every Other Element" Idiom [duplicate]
                            
                                How to generate a random 4 digit number not starting with 0 and having unique digits?
                            
                                Create a List that contain each Line of a File
                            
                                How might I remove duplicate lines from a file?
                            
                                Extract Google Drive zip from Google colab notebook
                            
                                Detect 64bit OS (windows) in Python
                            
                                How to implement a Median-heap
                            
                                psycopg2 installation error - Library not loaded: libssl.dylib
                            
                                How to prevent a function from being overridden in python [duplicate]
                            
                                Why does "www".count("ww") return 1 and not 2? [duplicate]
                            
                                How to create a letter spacing attribute with pycairo?
                            
                                neo4j performance compared to mysql (how can it be improved?)
                            
                                opencv python documentation [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With