I have a dataframe which looks like this:
a1 b1 c1 a2 b2 c2 a3 ...
x 1.2 1.3 1.2 ... ... ... ...
y 1.4 1.2 ... ... ... ... ...
z ...
What I want is grouping by every nth column. In other words, I want a dataframe with all the as, one with bs and one with cs
a1 a2 a4
x 1.2 ... ...
y
z
In another SO question I saw that is possibile to do df.iloc[::5,:]
, for example, to get every 5th raw. I could do of course df.iloc[:,::3]
to get the c cols but it doesn't work for getting a and b.
Any ideas?
To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.
To select every nth row of a DataFrame - we will use the slicing method. Slicing in pandas DataFrame is similar to slicing a list or a string in python. Suppose we want every 2nd row of DataFrame we will use slicing in which we will define 2 after two :: (colons).
.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. Allowed inputs are: An integer, e.g. 5 . A list or array of integers, e.g. [4, 3, 0] . A slice object with ints, e.g. 1:7 .
The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.
slice the columns:
df[df.columns[::2]]
To get every nth column
Example:
In [2]:
cols = ['a1','b1','c1','a2','b2','c2','a3']
df = pd.DataFrame(columns=cols)
df
Out[2]:
Empty DataFrame
Columns: [a1, b1, c1, a2, b2, c2, a3]
Index: []
In [3]:
df[df.columns[::3]]
Out[3]:
Empty DataFrame
Columns: [a1, a2, a3]
Index: []
You can also filter using startswith
:
In [5]:
a = df.columns[df.columns.str.startswith('a')]
df[a]
Out[5]:
Empty DataFrame
Columns: [a1, a2, a3]
Index: []
and do the same for b cols and c cols etc..
You can get a set of all the unique col prefixes using the following:
In [19]:
df.columns.str.extract(r'([a-zA-Z])').unique()
Out[19]:
array(['a', 'b', 'c'], dtype=object)
You can then use these values to filter the columns using startswith
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With