using pandas to store experimental data

Question

I am using a pandas DataFrame to store data from a series of experiments so that I can easily make cuts across various parameter values for the next stage of analysis. I have a few questions about how to do this most effectively.

Currently I create my DataFrame from a dictionary of lists. There is typically a few thousand rows in the DataFrame. One of the columns is a device_id which indicates which of the 20 devices that the experimental data pertains to. Other columns include info about the experimental setup, like temperature, power, etc. and measurement results, like resonant_frequency, bandwidth, etc.

So far, I've been using this DataFrame rather "naively," that is, I use it sort of like a numpy record array, and so I don't think I'm fully taking advantage of the power of the DataFrame. The following are some examples of what I'm trying to achieve.

First I want to create a new column which is the maximum resonant_frequency measured for a given device over all experiments: call it max_freq. I do this like so:

df['max_freq'] = np.zeros((data.shape[0])) #  create the new column
for index in np.unique(df.device_index):
    group = df[df.device_index == index]
    max = group.resonant_frequency.max()
    df.max_freq[df.resonator_index == index] = max

Second One of my columns contains 1-D numpy arrays of a noise measurement. I want to compute a statistic on this 1-D array and put it into a new column. Currently I do this as:

noise_est = []
for vals,freq in (df.noise,df.resonant_freq):
    noise_est.append(vals.std()/(1e6*freq))
df['noise_est'] = noise_est

Third Related the the previous one: Is it possible to iterate through rows of a DataFrame where the resulting object has attribute access to the columns? I.e. something like:

for row in df:
    row.noise_est = row.noise.std()/(1e6*row.resonant_freq)

I know that this instead iterates through columns. I also know there is an iterrows method, but this provides a Series which doesn't allow attribute access.

I think this should get me started for now, thanks for your time!

edited to add df.info(), df.head() as requested:

df.info() # df.head() looks the same, but 5 non-null values

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9620 entries, 0 to 9619
Data columns (total 83 columns):
A_mag                                       9620  non-null values
A_mag_err                                   9620  non-null values
A_phase                                     9620  non-null values
A_phase_err                                 9620  non-null values
....
total_dac_atten                             9600  non-null values
round_temp                                  9620  non-null values
dtypes: bool(1), complex128(4), float64(39), int64(12), object(27)

I trimmed this down because it's 83 columns, and I don't think this adds much to the example code snippets I shared, but have posted this bit in case it's helpful.

Jeff · Accepted Answer

Create data. Note that storing a numpy array INSIDE a frame is generally not a good idea as its pretty inefficient.

In [84]: df = pd.DataFrame(dict(A = np.random.randn(20), B = np.random.randint(0,3,size=20), C = [ np.random.randn(5) for i in range(20) ]))

In [85]: df
Out[85]: 
           A  B                                                  C
0  -0.493730  1  [-0.8790126045, -1.87366673214, 0.76227570837,...
1  -0.105616  2  [0.612075134682, -1.64452324091, 0.89799758012...
2   1.487656  1  [-0.379505426885, 1.17611806172, 0.88321152932...
3   0.351694  2  [0.132071242514, -1.54701609348, 1.29813626801...
4  -0.330538  2  [0.395383858214, 0.874419943107, 1.21124463921...
5   0.360041  0  [0.439133138619, -1.98615530266, 0.55971723554...
6  -0.505198  2  [-0.770830608002, 0.243255072359, -1.099514797...
7   0.631488  1  [0.676233200011, 0.622926691271, -0.1110029751...
8   1.292087  1  [1.77633938532, -0.141683361957, 0.46972952154...
9   0.641987  0  [1.24802709304, 0.477527098462, -0.08751885691...
10  0.732596  2  [0.475771915314, 1.24219702097, -0.54304296895...
11  0.987054  1  [-0.879620967644, 0.657193159735, -0.093519342...
12 -1.409455  1  [1.04404325784, -0.310849157425, 0.60610368623...
13  1.063830  1  [-0.760467872808, 1.33659372288, -0.9343171844...
14  0.533835  1  [0.985463451645, 1.76471927635, -0.59160181340...
15  0.062441  1  [-0.340170594584, 1.53196133354, 0.42397775978...
16  1.458491  2  [-1.79810090668, -1.82865815817, 1.08140831482...
17 -0.886119  2  [0.281341969073, -1.3516126536, 0.775326038501...
18  0.662076  1  [1.03992509625, 1.17661862104, -0.562683934951...
19  1.216878  2  [0.0746149754367, 0.156470450639, -0.477269150...

In [86]: df.dtypes
Out[86]: 
A    float64
B      int64
C     object
dtype: object

Apply an operation to the value of a series (2 and 3)

In [88]: df['C_std'] = df['C'].apply(np.std)

Get the max of each group and return the value (1)

In [91]: df['A_max_by_group'] = df.groupby('B')['A'].transform(lambda x: x.max())

In [92]: df
Out[92]: 
           A  B                                                  C  A_max_by_group     C_std
0  -0.493730  1  [-0.8790126045, -1.87366673214, 0.76227570837,...        1.487656  1.058323
1  -0.105616  2  [0.612075134682, -1.64452324091, 0.89799758012...        1.458491  0.987980
2   1.487656  1  [-0.379505426885, 1.17611806172, 0.88321152932...        1.487656  1.264522
3   0.351694  2  [0.132071242514, -1.54701609348, 1.29813626801...        1.458491  1.150026
4  -0.330538  2  [0.395383858214, 0.874419943107, 1.21124463921...        1.458491  1.045408
5   0.360041  0  [0.439133138619, -1.98615530266, 0.55971723554...        0.641987  1.355853
6  -0.505198  2  [-0.770830608002, 0.243255072359, -1.099514797...        1.458491  0.443872
7   0.631488  1  [0.676233200011, 0.622926691271, -0.1110029751...        1.487656  0.432342
8   1.292087  1  [1.77633938532, -0.141683361957, 0.46972952154...        1.487656  1.021847
9   0.641987  0  [1.24802709304, 0.477527098462, -0.08751885691...        0.641987  0.676835
10  0.732596  2  [0.475771915314, 1.24219702097, -0.54304296895...        1.458491  0.857441
11  0.987054  1  [-0.879620967644, 0.657193159735, -0.093519342...        1.487656  0.628655
12 -1.409455  1  [1.04404325784, -0.310849157425, 0.60610368623...        1.487656  0.835633
13  1.063830  1  [-0.760467872808, 1.33659372288, -0.9343171844...        1.487656  0.936746
14  0.533835  1  [0.985463451645, 1.76471927635, -0.59160181340...        1.487656  0.991327
15  0.062441  1  [-0.340170594584, 1.53196133354, 0.42397775978...        1.487656  0.700299
16  1.458491  2  [-1.79810090668, -1.82865815817, 1.08140831482...        1.458491  1.649771
17 -0.886119  2  [0.281341969073, -1.3516126536, 0.775326038501...        1.458491  0.910355
18  0.662076  1  [1.03992509625, 1.17661862104, -0.562683934951...        1.487656  0.666237
19  1.216878  2  [0.0746149754367, 0.156470450639, -0.477269150...        1.458491  0.275065

using pandas to store experimental data

Tags:

python

pandas

user3658134

1 Answers

Jeff

Recent Activity

Donate For Us

using pandas to store experimental data

Tags:

python

pandas

user3658134

1 Answers

Jeff

Related questions

Recent Activity

Donate For Us