Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly selecting columns from dataframe

Tags:

python

pandas

My question is quite simple: Is there any way to randomly choose columns from a dataframe in Pandas? To be clear, I want to randomly pick out n columns with the values attached. I know there is such a method for randomly picking rows:

import pandas as pd

df = pd.read_csv(filename, sep=',', nrows=None)
a = df.sample(n = 2)

So the question is, does it exist an equivalent method for finding random columns?

like image 672
ewolsen Avatar asked Aug 08 '17 12:08

ewolsen


People also ask

How do I select only certain columns in pandas?

To select a single column, use square brackets [] with the column name of the column of interest.

What function is used to randomly select a row from a DataFrame?

The easiest way to randomly select rows from a Pandas dataframe is to use the sample() method. For example, if your dataframe is called “df”, df. sample(n=250) will result in that 200 rows were selected randomly. Note, removing the n parameter will result in one random row instead of multiple rows.

How do you shuffle a data frame?

One of the easiest ways to shuffle a Pandas Dataframe is to use the Pandas sample method. The df. sample method allows you to sample a number of rows in a Pandas Dataframe in a random order. Because of this, we can simply specify that we want to return the entire Pandas Dataframe, in a random order.


2 Answers

sample also accepts an axis parameter:

df = pd.DataFrame(np.random.randint(1, 10, (10, 5)), columns=list('abcde'))

df
Out: 
   a  b  c  d  e
0  4  5  9  8  3
1  7  2  2  8  7
2  1  5  7  9  2
3  3  3  5  2  4
4  8  4  9  8  6
5  6  5  7  3  4
6  6  3  6  4  4
7  9  4  7  7  3
8  4  4  8  7  6
9  5  6  7  6  9

df.sample(2, axis=1)
Out: 
   a  d
0  4  8
1  7  8
2  1  9
3  3  2
4  8  8
5  6  3
6  6  4
7  9  7
8  4  7
9  5  6
like image 89
ayhan Avatar answered Sep 26 '22 07:09

ayhan


You can just do df.columns.to_series.sample(n=2)

to randomly sample the columns, first you need to convert to a Series by calling to_series then you can call sample as before

In[24]:
df.columns.to_series().sample(2)

Out[24]: 
C    C
A    A
dtype: object

Example:

In[30]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[30]: 
          a         b         c
0 -0.691534  0.889799  1.137438
1 -0.949422  0.799294  1.360521
2  0.974746 -1.231078  0.812712
3  1.043434  0.982587  0.352927
4  0.462011 -0.591438 -0.214508

In[31]:
df[df.columns.to_series().sample(2)]

Out[31]: 
          b         a
0  0.889799 -0.691534
1  0.799294 -0.949422
2 -1.231078  0.974746
3  0.982587  1.043434
4 -0.591438  0.462011
like image 29
EdChum Avatar answered Sep 22 '22 07:09

EdChum