Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bin values based on ranges with pandas [duplicate]

Tags:

I have multiple CSV files with values like this in a folder:

The GroupID.csv is the filename. There are multiple files like this, but the value ranges are defined in the same XML file. I'm trying to group them How can I do that?

UPDATE1: Based on BobHaffner's comments, I've done this

import pandas as pd  import glob path =r'path/to/files'  allFiles = glob.glob(path + "/*.csv") frame = pd.DataFrame() list_ = [] for file_ in allFiles:     df = pd.read_csv(file_,index_col=None, header=None)     df['file'] = os.path.basename('path/to/files/'+file_)     list_.append(df) frame = pd.concat(list_) print frame 

to get something like this:

I need to group the values based on the bins from the XML file. I'd truly appreciate any help.

like image 517
pam Avatar asked Jul 31 '15 01:07

pam


Video Answer


1 Answers

In order to bucket your series, you should use the pd.cut() function, like this:

df['bin'] = pd.cut(df['1'], [0, 50, 100,200])           0    1        file         bin 0  person1   24     age.csv     (0, 50] 1  person2   17     age.csv     (0, 50] 2  person3   98     age.csv   (50, 100] 3  person4    6     age.csv     (0, 50] 4  person2  166  Height.csv  (100, 200] 5  person3  125  Height.csv  (100, 200] 6  person5  172  Height.csv  (100, 200] 

If you want to name the bins yourself, you can use the labels= argument, like this:

df['bin'] = pd.cut(df['1'], [0, 50, 100,200], labels=['0-50', '50-100', '100-200'])           0    1        file      bin 0  person1   24     age.csv     0-50 1  person2   17     age.csv     0-50 2  person3   98     age.csv   50-100 3  person4    6     age.csv     0-50 4  person2  166  Height.csv  100-200 5  person3  125  Height.csv  100-200 6  person5  172  Height.csv  100-200 
like image 194
firelynx Avatar answered Sep 27 '22 22:09

firelynx