Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to split a sequence of pandas commands across multiple lines?

Tags:

python

pandas

I have a long string of pandas chained commands, for example:

df.groupby[['x','y']].apply(lambda x: (np.max(x['z'])-np.min(x['z']))).sort_values(ascending=False)

And I would like to be able to present it across multiple lines but still as a one liner (without saving results to a temporary object, or defining the lambda as a function)

an example of how I would like it to look:

df.groupby[['x','y']]
.apply(lambda x: (np.max(x['z'])-np.min(x['z'])))
.sort_values(ascending=False)

Is it possible to do so? (I know '_' has this functionality in python, but it doesn't seem to work with chained commands)

like image 520
user2808117 Avatar asked Nov 26 '15 17:11

user2808117


People also ask

How do you split the pandas series?

split() function is used to split strings around given separator/delimiter. The function splits the string in the Series/Index from the beginning, at the specified delimiter string. Equivalent to str. split().

How do you split multiple lines in Python?

You cannot split a statement into multiple lines in Python by pressing Enter . Instead, use the backslash ( \ ) to indicate that a statement is continued on the next line. In the revised version of the script, a blank space and an underscore indicate that the statement that was started on line 1 is continued on line 2.

How do I split a string into multiple rows in pandas?

To split cell into multiple rows in a Python Pandas dataframe, we can use the apply method. to call apply with a lambda function that calls str. split to split the x string value. And then we call explode to fill new rows with the split values.

How do I split a string into multiple lines?

You can have a string split across multiple lines by enclosing it in triple quotes. Alternatively, brackets can also be used to spread a string into different lines. Moreover, backslash works as a line continuation character in Python. You can use it to join text on separate lines and create a multiline string.


3 Answers

In python you can continue to the next line by ending your line with a reverse slash or by enclosing the expression in parenthesis.

df.groupby[['x','y']] \
.apply(lambda x: (np.max(x['z'])-np.min(x['z']))) \
.sort_values(ascending=False)

or

(df.groupby[['x','y']]
.apply(lambda x: (np.max(x['z'])-np.min(x['z'])))
.sort_values(ascending=False))
like image 98
GaryBishop Avatar answered Oct 02 '22 05:10

GaryBishop


The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation

from https://www.python.org/dev/peps/pep-0008/#id19

So may be better:

df.groupby[['x', 'y']].apply(
    lambda x: (np.max(x['z'])-np.min(x['z']))
).sort_values(ascending=False)

The last printed expression variable "_" is known only in the Python console, so without explicit attribution cannot be used for that purpose in a script/module.

like image 37
Zoli Avatar answered Sep 28 '22 05:09

Zoli


Since this has the nature of a command, I would probably format it close to your example, like this:

df.groupby[['x','y']] \
    .apply(lambda x: np.max(x['z'])-np.min(x['z'])) \
    .sort_values(ascending=False)

It took me a long time to realize I could break these expressions before the dots, which is often more readable than breaking inside the parentheses (same goes for "some long string".format()).

If this were more like an expression evaluation, I'd wrap the whole thing in parentheses, which is considered more "Pythonic" than line continuation markers:

var = (
    df.groupby[['x','y']]
        .apply(
            lambda x: np.max(x['z'])-np.min(x['z'])
        ) 
        .sort_values(ascending=False)
)

Update Since writing this, I've moved away from backslashes for line continuation whenever possible, including here, where it's not meaningful to chain the operations without assigning it to a variable or passing it to a function. I've also switched to using one level of indentation for each level of nesting inside parentheses or brackets, to avoid going to deep and/or getting a wiggly effect. So I would now write your expression like this:

 var = (
    df
    .groupby[['x','y']]
    .apply(
        lambda x: np.max(x['z']) - np.min(x['z'])
    ) 
    .sort_values(ascending=False)
)
like image 24
Matthias Fripp Avatar answered Sep 30 '22 05:09

Matthias Fripp