Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Creating new data frame from only certain columns

Tags:

python

pandas

csv

I have a csv file with measurements, and I want to create a new csv file with the hourly averages and standard deviations. But only for certain columns.

Example:

csv1:

YY-MO-DD HH-MI-SS_SSS    |     Acceleration  |        Lumx     |    Pressure
2015-12-07 20:51:06:608  |        22.7       |        32.3     |     10
2015-12-07 20:51:07:609  |        22.5       |        47.7     |     15

to csv 2 (only for the pressure and acceleration:

 YY-MO-DD HH-MI-SS_SSS       | Acceleration avg  |   Pressure avg
    2015-12-07 20:00:00:000  |        22.6       |        12.5     
    2015-12-07 21:00:00:000  |        ....       |        ....    

Now I have an idea (thanks to the people on this site) on how to calculate the averages - but i'm having trouble on creating a new smaller dataframe that contains the calculations for a few columns.

Thanks !!!

like image 250
ValientProcess Avatar asked Apr 09 '16 14:04

ValientProcess


2 Answers

You can make a smaller DataFrame like below:

csv2 = csv1[['Acceleration', 'Pressure']].copy()

Then you can handle csv2, which only has the columns you want. (You said you have an idea about avg calculation.)
FYI, .copy() could be omitted if you are sure about view versus copy.

like image 99
su79eu7k Avatar answered Oct 25 '22 17:10

su79eu7k


csv2 = csv1.loc[:, ['Acceleration', 'Pressure']]
  • .loc[] helps keep the subsetting operation explicit and consistent.

  • .loc[] always returns a copy so the original dataframe is never modified.

(for further discussion and great examples of the different view vs. copy alternatives please see: Pandas: Knowing when an operation affects the original dataframe)

like image 38
leerssej Avatar answered Oct 25 '22 18:10

leerssej