resampling non-time-series data

Tags:

pandas

I have some data which I'm handling with dataframes and pandas. They contain about 10 000 rows and 6 columns.

The problem is, that I have done several trials and the different datasets have slightly different index numbers. (It's a "force - length" testing with several materials and of course the measurement points are not alined perfectly.)

Now my idea was, to "resample" the data using the index which contains the value for the length. It seems that the resampling function in pandas is only available for datetime datatypes.

I tried to convert the index via to_datetime and succeeded. But after the resampling, I need to get back to the original scale. some kind of from_datetime function.

Is there any way or am I on the completely wrong track and should better use functions like groupby?

Edit to add:

Data loks like below. Length is usesed as index. Of those Dataframes I have a few so that it woulf be really nice to allign them all to the same "framerate" and then cut them e.g. so that I can compare different datasets.

The Idea I already tried was this one:

    df_1_dt = df_1 #generate a table for the conversion
    df_1_dt.index = pd.to_datetime(df_1_dt.index, unit='s') # convert it simulating seconds.. good idea?!
    df_1_dt_rs= df_1_dt # generate a df for the resampling
    df_1_dt_rs = df_1_dt_rs.resample (rule='s') #resample by the generatet time

Data:

+---------------------------------------------------+  
¦  Index (Lenght)   ¦    Force1     ¦    Force2     ¦  
¦-------------------+---------------+---------------¦  
¦ 8.04662074828e-06 ¦ 4.74251270294 ¦ 4.72051584721 ¦  
¦ 8.0898882798e-06  ¦ 4.72051584721 ¦ 4.72161570191 ¦  
¦ 1.61797765596e-05 ¦ 4.69851899147 ¦ 4.72271555662 ¦  
¦ 1.65476570973e-05 ¦ 4.65452528    ¦ 4.72491526604 ¦  
¦ 2.41398605024e-05 ¦ 4.67945501539 ¦ 4.72589291467 ¦  
¦ 2.42696630876e-05 ¦ 4.70438475079 ¦ 4.7268705633  ¦  
¦ 9.60953101751e-05 ¦ 4.72931448619 ¦ 4.72784821192 ¦  
¦ 0.00507703541206  ¦ 4.80410369237 ¦ 4.73078115781 ¦  
¦ 0.00513927175509  ¦ 4.87889289856 ¦ 4.7337141037  ¦  
¦ 0.00868965311878  ¦ 4.9349848032  ¦ 4.74251282215 ¦  
¦ 0.00902026197556  ¦ 4.99107670784 ¦ 4.7513115406  ¦  
¦ 0.00929150878827  ¦ 5.10326051712 ¦ 4.76890897751 ¦  
¦ 0.0291729332784   ¦ 5.14945375919 ¦ 4.78650641441 ¦  
¦ 0.0296332588857   ¦ 5.17255038023 ¦ 4.79530513287 ¦  
¦ 0.0297080942518   ¦ 5.19564700127 ¦ 4.80410385132 ¦  
¦ 0.0362595526707   ¦ 5.2187436223  ¦ 4.80850321054 ¦  
¦ 0.0370305483177   ¦ 5.24184024334 ¦ 4.81290256977 ¦  
¦ 0.0381506204153   ¦ 5.28803348541 ¦ 4.82170128822 ¦  
¦ 0.0444440795306   ¦ 5.30783069134 ¦ 4.83050000668 ¦  
¦ 0.0450121369102   ¦ 5.3177292943  ¦ 4.8348993659  ¦  
¦ 0.0453465140473   ¦ 5.32762789726 ¦ 4.83929872513 ¦  
¦ 0.0515533437013   ¦ 5.33752650023 ¦ 4.85359662771 ¦  
¦ 0.05262489708     ¦ 5.34742510319 ¦ 4.8678945303  ¦  
¦ 0.0541273847206   ¦ 5.36722230911 ¦ 4.89649033546 ¦  
¦ 0.0600755845953   ¦ 5.37822067738 ¦ 4.92508614063 ¦  
¦ 0.0607712385295   ¦ 5.38371986151 ¦ 4.93938404322 ¦  
¦ 0.0612954159368   ¦ 5.38921904564 ¦ 4.9536819458  ¦  
¦ 0.0670288249293   ¦ 5.39471822977 ¦ 4.97457891703 ¦  
¦ 0.0683640870058   ¦ 5.4002174139  ¦ 4.99547588825 ¦  
¦ 0.0703192637772   ¦ 5.41121578217 ¦ 5.0372698307  ¦  
¦ 0.0757871634772   ¦ 5.43981158733 ¦ 5.07906377316 ¦  
¦ 0.0766597757545   ¦ 5.45410948992 ¦ 5.09996074438 ¦  
¦ 0.077317850103    ¦ 5.4684073925  ¦ 5.12085771561 ¦  
¦ 0.0825991083545   ¦ 5.48270529509 ¦ 5.13295596838 ¦  
¦ 0.0841354654428   ¦ 5.49700319767 ¦ 5.14505422115 ¦  
¦ 0.0865525182528   ¦ 5.52559900284 ¦ 5.1692507267  ¦  
+---------------------------------------------------+

397

asked May 31 '15 17:05

dr_basst

1 Answers

It sounds like all you want to do is round the length figures to a lower precision.

If this is the case, you could just use the in-built rounding function:

(dummy data)

>>> df=pd.DataFrame([[1.0000005,4],[1.232463632,5],[5.234652,9],[5.675322,10]],columns=['length','force'])
>>> df
33:      length  force
0  1.000001      4
1  1.232464      5
2  5.234652      9
3  5.675322     10
>>> df['rounded_length'] = df.length.apply(round, ndigits=0)
>>> df
34:      length  force  rounded_length
0  1.000001      4             1.0
1  1.232464      5             1.0
2  5.234652      9             5.0
3  5.675322     10             6.0
>>>

Then you could replicate the resample().... workflow using groupby:

>>> df.groupby('rounded_length').mean().force
35: rounded_length
1.0     4.5
5.0     9.0
6.0    10.0
Name: force, dtype: float64

Generally, resample IS just for dates. If you're using it for something other than dates, there's probably a more elegant solution!

125

answered Sep 23 '22 12:09

greg_data

Related questions
                            
                                How can I transform this (100, 100) numpy array into a grayscale sprite in pygame?
                            
                                PickleType with Mutable Tracking in SqlAlchemy
                            
                                Django Elastic Beanstalk Deploy showing 404
                            
                                How to search content across multiple channel in youtube api?
                            
                                LiveServerTestCase hangs at python-requests post call in django view
                            
                                How do you visualize a ward tree from sklearn.cluster.ward_tree?
                            
                                Exit code 139 when performing image subtraction
                            
                                Python - detect when my object is written to stdout?
                            
                                Django REST Framework - OAuth2 Consumer API from external provider
                            
                                Extending numpy.digitize to multi-dimensional data
                            
                                How to modify cookies in Requests
                            
                                Celery and signals
                            
                                Why does Django's HTTPResponseRedirect use the same HTTP method for PUT but not POST?
                            
                                Add one month to a given date (rounded day after) with Python
                            
                                django dev server, adding headers to static files
                            
                                Difference between send(None) and Next()
                            
                                "Can't find Python executable..." - npm install hangs
                            
                                Downloading large file in python error: Compressed file ended before the end-of-stream marker was reached
                            
                                Output of subprocess both to PIPE and directly to stdout
                            
                                How can I perform a likelihood ratio test on a linear mixed-effect model?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With