Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shuffle all rows of a csv file with Python

I have an input csv file with data:

a   15
b   14
c   20
d   45

I want to generate a different csv file which will contain complete data rows from input file but rows should be shuffled.

like output file may contain values-

b 14
a 15
c 20
d 45 

I have tried this code:

import random
import sys
op=open('random.csv','w+')
ip=open(sys.argv[1],'r')
data=ip.read()
data1=str(random.choices(data))
op.write(data1)
op.close()
like image 689
Rosh Verma Avatar asked Feb 24 '17 12:02

Rosh Verma


2 Answers

If your CSV contains headers then you can shuffle it using pandas like this.

df = pd.read_csv(file_name) # avoid header=None. 
shuffled_df = df.sample(frac=1)
shuffled_df.to_csv(new_file_name, index=False)

This way you can avoid shuffling headers and remove index from your new CSV.

like image 162
a_k_v Avatar answered Sep 19 '22 12:09

a_k_v


Another shot using pandas. You can read your .csv file with:

df = pd.read_csv('yourfile.csv', header=None)

and then using df.sample to shuffle your rows. This will return a random sample of your dataframe with rows shuffled. Using frac=1 you consider the whole set as sample:

In [18]: df
Out[18]: 
   0   1
0  a  15
1  b  14
2  c  20
3  d  45

In [19]: ds = df.sample(frac=1)

In [20]: ds
Out[20]: 
   0   1
1  b  14
3  d  45
0  a  15
2  c  20

If you need to save out again the new shuffled file you can just:

ds.to_csv('newfile.csv')
like image 33
Fabio Lamanna Avatar answered Sep 19 '22 12:09

Fabio Lamanna