Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas interpolate data with units

Hi Everyone,

I've been looking to Stackoverflow for couple of years, and it helped me a lot, so much that I never have to register before :)

But today I'm stuck on a problem using Python with Pandas and Quantities (could be unum or pint as well). I try to do my best to make a clear post, but since it's my first one, I apologize if something is confusing and will try to correct any mistake you'll find :)


I want to import data from a source and build a Pandas dataframe as follow:

import pandas as pd
import quantities as pq

depth = [0.0,1.1,2.0] * pq.m
depth2 = [0,1,1.1,1.5,2] * pq.m

s1 = pd.DataFrame(
        {'depth' : [x for x in depth]},
        index = depth)

This gives:

S1=
     depth
0.0  0.0 m
1.1  1.1 m
2.0  2.0 m

Now I want to extend the data to the depth2 values: (obviously there is not point to interpolate depth over depth, but it's a test before it gets more complicated).

s2 = s1.reindex(depth2)

This gives:

S2=
      depth
0.0   0.0 m
1.0   NaN
1.1   1.1 m
1.5   NaN
2.0   2.0 m

So far no problem.


But when I try to interpolate the missing values doing:

s2['depth'].interpolate(method='values')

I got the following error:

C:\Python27\lib\site-packages\numpy\lib\function_base.pyc in interp(x, xp, fp, left, right)
   1067         return compiled_interp([x], xp, fp, left, right).item()
   1068     else:
-> 1069         return compiled_interp(x, xp, fp, left, right)
  1070 
  1071 
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

I understand that interpolation from numpy does not work on object.


But if I try now to interpolate the missing values by dropping the units, it works:

s3 = s2['depth'].astype(float).interpolate(method='values')

This gives:

s3 = 
0.0   0
1.0   1
1.1   1.1
1.5   1.5
2.0   2
Name: depth, dtype: object

How can I get back the unit in the depth column?

I can't find any trick to put back the unit...

Any help will be greatly appreciated. Thanks

like image 273
Julien Avatar asked Nov 02 '22 13:11

Julien


1 Answers

Here's a way to do what you want.

Split apart the quantities and create a set of 2 columns for each quantity

In [80]: df = concat([ col.apply(lambda x: Series([x.item(),x.dimensionality.string],
                       index=[c,"%s_unit" % c])) for c,col in s1.iteritems() ])

In [81]: df
Out[81]: 
     depth depth_unit
0.0    0.0          m
1.1    1.1          m
2.0    2.0          m

In [82]: df = df.reindex([0,1.0,1.1,1.5,2.0])

In [83]: df
Out[83]: 
     depth depth_unit
0.0    0.0          m
1.0    NaN        NaN
1.1    1.1          m
1.5    NaN        NaN
2.0    2.0          m

Interpolate

In [84]: df['depth'] = df['depth'].interpolate(method='values')

Propogate the units

In [85]: df['depth_unit'] = df['depth_unit'].ffill()

In [86]: df
Out[86]: 
     depth depth_unit
0.0    0.0          m
1.0    1.0          m
1.1    1.1          m
1.5    1.5          m
2.0    2.0          m
like image 50
Jeff Avatar answered Nov 09 '22 14:11

Jeff