If I have an R data.frame df
and
colnames(df)
,
[1] "a" "b" "c" "d" "e"
I can select columns "a", "c", "d" and "e" as follow:
df[ , c(1, 3:5)]
There is a simple equivalent in pandas? I know I can use
df.loc[:, ['a', 'c', 'd', 'e']]
and this is fine for few columns.
For many sequences of columns, R code is still straightforward
df2[ , c(1:10, 25:30, 40, 50:100)]
UPDATE: No need to use numpy.hstack
, you can just call numpy.r_
as below
Use iloc
+ numpy.r_
:
In [20]: df = DataFrame(randn(10, 3), columns=list('abc'))
In [21]: df
Out[21]:
a b c
0 0.228163 -1.311485 -1.335604
1 0.292547 -1.636901 0.001765
2 0.744605 -0.325580 0.205003
3 -0.580471 -0.531553 -0.740697
4 0.250574 1.076019 -0.594915
5 -0.148449 0.076951 -0.653595
6 -1.065314 -0.166018 -1.471532
7 1.133336 -0.529738 -1.213841
8 -1.715281 -2.058831 0.113237
9 -0.382412 -0.072540 0.294853
[10 rows x 3 columns]
In [22]: df.iloc[:, r_[:2]]
Out[22]:
a b
0 0.228163 -1.311485
1 0.292547 -1.636901
2 0.744605 -0.325580
3 -0.580471 -0.531553
4 0.250574 1.076019
5 -0.148449 0.076951
6 -1.065314 -0.166018
7 1.133336 -0.529738
8 -1.715281 -2.058831
9 -0.382412 -0.072540
[10 rows x 2 columns]
To concatenate integer ranges use numpy.r_
:
In [35]: df = DataFrame(randn(10, 6), columns=list('abcdef'))
In [36]: df.iloc[:, r_[:2, 2:df.columns.size:2]]
Out[36]:
a b c e
0 -1.358623 -0.622909 0.025609 -1.166303
1 0.527027 0.310530 2.892384 0.190451
2 -0.251138 -1.246113 0.738264 0.062078
3 -1.716028 0.419139 0.060225 -1.191527
4 -1.308635 0.045396 -0.599367 -0.202491
5 -0.620343 0.796364 -0.008802 0.160020
6 0.199739 0.111816 -0.278119 1.051317
7 -0.311206 0.090348 -0.237887 0.958215
8 0.363161 2.449031 1.023352 0.743853
9 0.039451 -0.855733 -0.836921 -0.835078
[10 rows x 4 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With