I am using a pandas DataFrame to store data from a series of experiments so that I can easily make cuts across various parameter values for the next stage of analysis. I have a few questions about how to do this most effectively.
Currently I create my DataFrame from a dictionary of lists. There is typically a few thousand rows in the DataFrame. One of the columns is a device_id which indicates which of the 20 devices that the experimental data pertains to. Other columns include info about the experimental setup, like temperature, power, etc. and measurement results, like resonant_frequency, bandwidth, etc.
So far, I've been using this DataFrame rather "naively," that is, I use it sort of like a numpy record array, and so I don't think I'm fully taking advantage of the power of the DataFrame. The following are some examples of what I'm trying to achieve.
First I want to create a new column which is the maximum resonant_frequency measured for a given device over all experiments: call it max_freq. I do this like so:
df['max_freq'] = np.zeros((data.shape[0])) # create the new column
for index in np.unique(df.device_index):
group = df[df.device_index == index]
max = group.resonant_frequency.max()
df.max_freq[df.resonator_index == index] = max
Second One of my columns contains 1-D numpy arrays of a noise measurement. I want to compute a statistic on this 1-D array and put it into a new column. Currently I do this as:
noise_est = []
for vals,freq in (df.noise,df.resonant_freq):
noise_est.append(vals.std()/(1e6*freq))
df['noise_est'] = noise_est
Third Related the the previous one: Is it possible to iterate through rows of a DataFrame where the resulting object has attribute access to the columns? I.e. something like:
for row in df:
row.noise_est = row.noise.std()/(1e6*row.resonant_freq)
I know that this instead iterates through columns. I also know there is an iterrows method, but this provides a Series which doesn't allow attribute access.
I think this should get me started for now, thanks for your time!
edited to add df.info(), df.head() as requested:
df.info() # df.head() looks the same, but 5 non-null values
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9620 entries, 0 to 9619
Data columns (total 83 columns):
A_mag 9620 non-null values
A_mag_err 9620 non-null values
A_phase 9620 non-null values
A_phase_err 9620 non-null values
....
total_dac_atten 9600 non-null values
round_temp 9620 non-null values
dtypes: bool(1), complex128(4), float64(39), int64(12), object(27)
I trimmed this down because it's 83 columns, and I don't think this adds much to the example code snippets I shared, but have posted this bit in case it's helpful.
Create data. Note that storing a numpy array INSIDE a frame is generally not a good idea as its pretty inefficient.
In [84]: df = pd.DataFrame(dict(A = np.random.randn(20), B = np.random.randint(0,3,size=20), C = [ np.random.randn(5) for i in range(20) ]))
In [85]: df
Out[85]:
A B C
0 -0.493730 1 [-0.8790126045, -1.87366673214, 0.76227570837,...
1 -0.105616 2 [0.612075134682, -1.64452324091, 0.89799758012...
2 1.487656 1 [-0.379505426885, 1.17611806172, 0.88321152932...
3 0.351694 2 [0.132071242514, -1.54701609348, 1.29813626801...
4 -0.330538 2 [0.395383858214, 0.874419943107, 1.21124463921...
5 0.360041 0 [0.439133138619, -1.98615530266, 0.55971723554...
6 -0.505198 2 [-0.770830608002, 0.243255072359, -1.099514797...
7 0.631488 1 [0.676233200011, 0.622926691271, -0.1110029751...
8 1.292087 1 [1.77633938532, -0.141683361957, 0.46972952154...
9 0.641987 0 [1.24802709304, 0.477527098462, -0.08751885691...
10 0.732596 2 [0.475771915314, 1.24219702097, -0.54304296895...
11 0.987054 1 [-0.879620967644, 0.657193159735, -0.093519342...
12 -1.409455 1 [1.04404325784, -0.310849157425, 0.60610368623...
13 1.063830 1 [-0.760467872808, 1.33659372288, -0.9343171844...
14 0.533835 1 [0.985463451645, 1.76471927635, -0.59160181340...
15 0.062441 1 [-0.340170594584, 1.53196133354, 0.42397775978...
16 1.458491 2 [-1.79810090668, -1.82865815817, 1.08140831482...
17 -0.886119 2 [0.281341969073, -1.3516126536, 0.775326038501...
18 0.662076 1 [1.03992509625, 1.17661862104, -0.562683934951...
19 1.216878 2 [0.0746149754367, 0.156470450639, -0.477269150...
In [86]: df.dtypes
Out[86]:
A float64
B int64
C object
dtype: object
Apply an operation to the value of a series (2 and 3)
In [88]: df['C_std'] = df['C'].apply(np.std)
Get the max of each group and return the value (1)
In [91]: df['A_max_by_group'] = df.groupby('B')['A'].transform(lambda x: x.max())
In [92]: df
Out[92]:
A B C A_max_by_group C_std
0 -0.493730 1 [-0.8790126045, -1.87366673214, 0.76227570837,... 1.487656 1.058323
1 -0.105616 2 [0.612075134682, -1.64452324091, 0.89799758012... 1.458491 0.987980
2 1.487656 1 [-0.379505426885, 1.17611806172, 0.88321152932... 1.487656 1.264522
3 0.351694 2 [0.132071242514, -1.54701609348, 1.29813626801... 1.458491 1.150026
4 -0.330538 2 [0.395383858214, 0.874419943107, 1.21124463921... 1.458491 1.045408
5 0.360041 0 [0.439133138619, -1.98615530266, 0.55971723554... 0.641987 1.355853
6 -0.505198 2 [-0.770830608002, 0.243255072359, -1.099514797... 1.458491 0.443872
7 0.631488 1 [0.676233200011, 0.622926691271, -0.1110029751... 1.487656 0.432342
8 1.292087 1 [1.77633938532, -0.141683361957, 0.46972952154... 1.487656 1.021847
9 0.641987 0 [1.24802709304, 0.477527098462, -0.08751885691... 0.641987 0.676835
10 0.732596 2 [0.475771915314, 1.24219702097, -0.54304296895... 1.458491 0.857441
11 0.987054 1 [-0.879620967644, 0.657193159735, -0.093519342... 1.487656 0.628655
12 -1.409455 1 [1.04404325784, -0.310849157425, 0.60610368623... 1.487656 0.835633
13 1.063830 1 [-0.760467872808, 1.33659372288, -0.9343171844... 1.487656 0.936746
14 0.533835 1 [0.985463451645, 1.76471927635, -0.59160181340... 1.487656 0.991327
15 0.062441 1 [-0.340170594584, 1.53196133354, 0.42397775978... 1.487656 0.700299
16 1.458491 2 [-1.79810090668, -1.82865815817, 1.08140831482... 1.458491 1.649771
17 -0.886119 2 [0.281341969073, -1.3516126536, 0.775326038501... 1.458491 0.910355
18 0.662076 1 [1.03992509625, 1.17661862104, -0.562683934951... 1.487656 0.666237
19 1.216878 2 [0.0746149754367, 0.156470450639, -0.477269150... 1.458491 0.275065
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With