In R I frequently use dplyr
's select
in combination with everything()
df %>% select(var4, var17, everything())
The above for example would reorder the columns of the dataframe, such that var4
is the first, var17
is the second and subsequently all remaining columns are listed. What is the most pandathonic way of doing this? Working with many columns makes explicitly spelling them out a pain as well as keeping track of their position.
The ideal solution is short, readable and can be used in pandas chaining.
Dplython. Package dplython is dplyr for Python users. It provide infinite functionality for data preprocessing.
Both Pandas and dplyr can connect to virtually any data source, and read from any file format. That's why we won't spend any time exploring connection options but will use a build-in dataset instead. There's no winner in this Pandas vs. dplyr comparison, as both libraries are near identical with the syntax.
Pandas DataFrame all() Method The all() method returns one value for each column, True if ALL values in that column are True, otherwise False. By specifying the column axis ( axis='columns' ), the all() method returns True if ALL values in that axis are True.
One of the great things about the R world has been a collection of R packages called tidyverse that are easy for beginners to learn and provide a consistent data manipulation and visualisation space. The value of these tools has been so great that many of them have been ported to Python.
Use Index.difference
for all values without specified in list and join together:
df = pd.DataFrame({
'G':list('abcdef'),
'var17':[4,5,4,5,5,4],
'A':[7,8,9,4,2,3],
'var4':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
cols = ['var4','var17']
another = df.columns.difference(cols, sort=False).tolist()
df = df[cols + another]
print (df)
var4 var17 G A E F
0 1 4 a 7 5 a
1 3 5 b 8 3 a
2 5 4 c 9 6 a
3 7 5 d 4 9 b
4 1 5 e 2 2 b
5 0 4 f 3 4 b
EDIT: For chaining is possible use DataFrame.pipe
with passed DataFrame
:
def everything_after(df, cols):
another = df.columns.difference(cols, sort=False).tolist()
return df[cols + another]
df = df.pipe(everything_after, ['var4','var17']))
print (df)
var4 var17 G A E F
0 1 4 a 7 5 a
1 3 5 b 8 3 a
2 5 4 c 9 6 a
3 7 5 d 4 9 b
4 1 5 e 2 2 b
5 0 4 f 3 4 b
Now how smoothly you can do it with datar
!
>>> from datar import f
>>> from datar.datasets import iris
>>> from datar.dplyr import select, everything, slice_head
>>> iris >> slice_head(5)
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
>>> iris >> select(f.Species, everything()) >> slice_head(5)
Species Sepal_Length Sepal_Width Petal_Length Petal_Width
0 setosa 5.1 3.5 1.4 0.2
1 setosa 4.9 3.0 1.4 0.2
2 setosa 4.7 3.2 1.3 0.2
3 setosa 4.6 3.1 1.5 0.2
4 setosa 5.0 3.6 1.4 0.2
I am the author of the package. Feel free to submit issues if you have any questions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With