Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get every nth column in pandas?

Tags:

python

pandas

I have a dataframe which looks like this:

    a1    b1    c1    a2    b2    c2    a3    ...
x   1.2   1.3   1.2   ...   ...   ...   ...
y   1.4   1.2   ...   ...   ...   ...   ...
z   ...

What I want is grouping by every nth column. In other words, I want a dataframe with all the as, one with bs and one with cs

    a1     a2     a4
x   1.2    ...    ...
y
z

In another SO question I saw that is possibile to do df.iloc[::5,:], for example, to get every 5th raw. I could do of course df.iloc[:,::3] to get the c cols but it doesn't work for getting a and b.

Any ideas?

like image 327
Angelo Avatar asked Mar 10 '16 22:03

Angelo


People also ask

How do you get every nth row in pandas?

To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.

How do you select every nth row in python?

To select every nth row of a DataFrame - we will use the slicing method. Slicing in pandas DataFrame is similar to slicing a list or a string in python. Suppose we want every 2nd row of DataFrame we will use slicing in which we will define 2 after two :: (colons).

What is ILOC [] in python?

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. Allowed inputs are: An integer, e.g. 5 . A list or array of integers, e.g. [4, 3, 0] . A slice object with ints, e.g. 1:7 .

What does .values in pandas do?

The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.


1 Answers

slice the columns:

df[df.columns[::2]]

To get every nth column

Example:

In [2]:
cols = ['a1','b1','c1','a2','b2','c2','a3']
df = pd.DataFrame(columns=cols)
df

Out[2]:
Empty DataFrame
Columns: [a1, b1, c1, a2, b2, c2, a3]
Index: []

In [3]:
df[df.columns[::3]]
Out[3]:

Empty DataFrame
Columns: [a1, a2, a3]
Index: []

You can also filter using startswith:

In [5]:
a = df.columns[df.columns.str.startswith('a')]
df[a]

Out[5]:
Empty DataFrame
Columns: [a1, a2, a3]
Index: []

and do the same for b cols and c cols etc..

You can get a set of all the unique col prefixes using the following:

In [19]:
df.columns.str.extract(r'([a-zA-Z])').unique()

Out[19]:
array(['a', 'b', 'c'], dtype=object)

You can then use these values to filter the columns using startswith

like image 161
EdChum Avatar answered Sep 21 '22 06:09

EdChum