In R I frequently use <code>dplyr</code>'s <code>select</code> in combination with <code>everything()</code> <pre class="prettyprint"><code>df %>% select(var4, var17, everything()) </code></pre> The above for example would reorder the columns of the dataframe, such that <code>var4</code> is the first, <code>var17</code> is the second and subsequently all remaining columns are listed. What is the most pandathonic way of doing this? Working with many columns makes explicitly spelling them out a pain as well as keeping track of their position. The ideal solution is short, readable and can be used in pandas chaining.

Use <code>Index.difference</code> for all values without specified in list and join together: <pre class="prettyprint"><code>df = pd.DataFrame({ 'G':list('abcdef'), 'var17':[4,5,4,5,5,4], 'A':[7,8,9,4,2,3], 'var4':[1,3,5,7,1,0], 'E':[5,3,6,9,2,4], 'F':list('aaabbb') }) cols = ['var4','var17'] another = df.columns.difference(cols, sort=False).tolist() df = df[cols + another] print (df) var4 var17 G A E F 0 1 4 a 7 5 a 1 3 5 b 8 3 a 2 5 4 c 9 6 a 3 7 5 d 4 9 b 4 1 5 e 2 2 b 5 0 4 f 3 4 b </code></pre> EDIT: For chaining is possible use <code>DataFrame.pipe</code> with passed <code>DataFrame</code>: <pre class="prettyprint"><code>def everything_after(df, cols): another = df.columns.difference(cols, sort=False).tolist() return df[cols + another] df = df.pipe(everything_after, ['var4','var17'])) print (df) var4 var17 G A E F 0 1 4 a 7 5 a 1 3 5 b 8 3 a 2 5 4 c 9 6 a 3 7 5 d 4 9 b 4 1 5 e 2 2 b 5 0 4 f 3 4 b </code></pre>

Pandas equivalent of dplyr everything()

Tags:

pandas

r

dplyr

In R I frequently use dplyr's select in combination with everything()

df %>% select(var4, var17, everything())

The above for example would reorder the columns of the dataframe, such that var4 is the first, var17 is the second and subsequently all remaining columns are listed. What is the most pandathonic way of doing this? Working with many columns makes explicitly spelling them out a pain as well as keeping track of their position.

The ideal solution is short, readable and can be used in pandas chaining.

580

asked Jun 05 '20 07:06

safex

2 Answers

Use Index.difference for all values without specified in list and join together:

df = pd.DataFrame({
        'G':list('abcdef'),
         'var17':[4,5,4,5,5,4],
         'A':[7,8,9,4,2,3],
         'var4':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

cols = ['var4','var17']
another = df.columns.difference(cols, sort=False).tolist()
df = df[cols + another]
print (df)
   var4  var17  G  A  E  F
0     1      4  a  7  5  a
1     3      5  b  8  3  a
2     5      4  c  9  6  a
3     7      5  d  4  9  b
4     1      5  e  2  2  b
5     0      4  f  3  4  b

EDIT: For chaining is possible use DataFrame.pipe with passed DataFrame:

def everything_after(df, cols):
    another = df.columns.difference(cols, sort=False).tolist()
    return df[cols + another]

df = df.pipe(everything_after, ['var4','var17']))
print (df)
   var4  var17  G  A  E  F
0     1      4  a  7  5  a
1     3      5  b  8  3  a
2     5      4  c  9  6  a
3     7      5  d  4  9  b
4     1      5  e  2  2  b
5     0      4  f  3  4  b

107

answered Oct 14 '22 07:10

jezrael

Now how smoothly you can do it with datar!

>>> from datar import f
>>> from datar.datasets import iris
>>> from datar.dplyr import select, everything, slice_head
>>> iris >> slice_head(5)
   Sepal_Length  Sepal_Width  Petal_Length  Petal_Width Species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
>>> iris >> select(f.Species, everything()) >> slice_head(5)
  Species  Sepal_Length  Sepal_Width  Petal_Length  Petal_Width
0  setosa           5.1          3.5           1.4          0.2
1  setosa           4.9          3.0           1.4          0.2
2  setosa           4.7          3.2           1.3          0.2
3  setosa           4.6          3.1           1.5          0.2
4  setosa           5.0          3.6           1.4          0.2

I am the author of the package. Feel free to submit issues if you have any questions.

answered Oct 14 '22 08:10

Panwen Wang

Related questions
                            
                                finding a point on a sigmoidal curve in r
                            
                                Automatic rounding in dplyr::summarise() function [duplicate]
                            
                                ploting an ellipse in log plot with ggplot
                            
                                How to flatten non atomic function results so that can be assigned as part of a dplyr mutate step?
                            
                                R max function returns pseudo values when used within 'dplyr'
                            
                                How to rbind() / dplyr::bind_rows() / data.table::rbindlist() data frames which contain data frame columns?
                            
                                RStudio README.Rmd and README.md should be both staged use 'git commit --no-verify' to override this check
                            
                                R Draws Plots with Rectangles Instead of Text
                            
                                How to kill own Oracle SQL sessions without DBA privileges?
                            
                                Is there an expand.grid like function with matrix output
                            
                                How do you position the title and legend in tmap?
                            
                                ggiraph plot not appearing in shiny app, but works in RStudio
                            
                                Performing operations with lag on a dataframe to calculate a new value in R [duplicate]
                            
                                Call to weight in lm() within function doesn't evaluate properly
                            
                                Accessing a table in a stack overflow (SO) question to use as a dataframe for an answer [duplicate]
                            
                                How to Retain the values in the Check Box in R Shiny App?
                            
                                Rcpp proxy model and R memory allocation
                            
                                ggplot2: how to assign value of variable to ggplot title
                            
                                Display layers at certain zoom levels in R Leaflet
                            
                                R install.packages returns 'ERROR: failed to lock directory'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With