Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

keep/slice specific columns in pandas

Tags:

python

pandas

I know about these column slice methods:

df2 = df[["col1", "col2", "col3"]] and df2 = df.ix[:,0:2]

but I'm wondering if there is a way to slice columns from the front/middle/end of a dataframe in the same slice without specifically listing each one.

For example, a dataframe df with columns: col1, col2, col3, col4, col5 and col6.

Is there a way to do something like this?

df2 = df.ix[:, [0:2, "col5"]]

I'm in the situation where I have hundreds of columns and routinely need to slice specific ones for different requests. I've checked through the documentation and haven't seen something like this. Have I overlooked something?

like image 768
bdiamante Avatar asked Feb 25 '13 16:02

bdiamante


People also ask

How do I select columns to keep in Pandas?

Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.

How do I only display certain columns in Python?

If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc .


2 Answers

IIUC, the simplest way I can think of would be something like this:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.randn(5, 10))
>>> df[list(df.columns[:2]) + [7]]
          0         1         7
0  0.210139  0.533249  1.780426
1  0.382136  0.083999 -0.392809
2 -0.237868  0.493646 -1.208330
3  1.242077 -0.781558  2.369851
4  1.910740 -0.643370  0.982876

where the list call isn't optional because otherwise the Index object will try to vector-add itself to the 7.

It would be possible to special-case something like numpy's r_ so that

df[col_[:2, "col5", 3:6]]

would work, although I don't know if it would be worth the trouble.

like image 173
DSM Avatar answered Oct 05 '22 22:10

DSM


If your column names have information that you can filter for, you could use df.filter(regex='name*'). I am using this to filter between my 189 data channels from a1_01 to b3_21 and it works fine.

like image 39
K.-Michael Aye Avatar answered Oct 05 '22 22:10

K.-Michael Aye