I have some data which I'm handling with dataframes and pandas. They contain about 10 000 rows and 6 columns.
The problem is, that I have done several trials and the different datasets have slightly different index numbers. (It's a "force - length" testing with several materials and of course the measurement points are not alined perfectly.)
Now my idea was, to "resample" the data using the index which contains the value for the length. It seems that the resampling function in pandas is only available for datetime datatypes.
I tried to convert the index via to_datetime and succeeded. But after the resampling, I need to get back to the original scale. some kind of from_datetime function.
Is there any way or am I on the completely wrong track and should better use functions like groupby?
Edit to add:
Data loks like below. Length is usesed as index. Of those Dataframes I have a few so that it woulf be really nice to allign them all to the same "framerate" and then cut them e.g. so that I can compare different datasets.
The Idea I already tried was this one:
df_1_dt = df_1 #generate a table for the conversion
df_1_dt.index = pd.to_datetime(df_1_dt.index, unit='s') # convert it simulating seconds.. good idea?!
df_1_dt_rs= df_1_dt # generate a df for the resampling
df_1_dt_rs = df_1_dt_rs.resample (rule='s') #resample by the generatet time
Data:
+---------------------------------------------------+
¦ Index (Lenght) ¦ Force1 ¦ Force2 ¦
¦-------------------+---------------+---------------¦
¦ 8.04662074828e-06 ¦ 4.74251270294 ¦ 4.72051584721 ¦
¦ 8.0898882798e-06 ¦ 4.72051584721 ¦ 4.72161570191 ¦
¦ 1.61797765596e-05 ¦ 4.69851899147 ¦ 4.72271555662 ¦
¦ 1.65476570973e-05 ¦ 4.65452528 ¦ 4.72491526604 ¦
¦ 2.41398605024e-05 ¦ 4.67945501539 ¦ 4.72589291467 ¦
¦ 2.42696630876e-05 ¦ 4.70438475079 ¦ 4.7268705633 ¦
¦ 9.60953101751e-05 ¦ 4.72931448619 ¦ 4.72784821192 ¦
¦ 0.00507703541206 ¦ 4.80410369237 ¦ 4.73078115781 ¦
¦ 0.00513927175509 ¦ 4.87889289856 ¦ 4.7337141037 ¦
¦ 0.00868965311878 ¦ 4.9349848032 ¦ 4.74251282215 ¦
¦ 0.00902026197556 ¦ 4.99107670784 ¦ 4.7513115406 ¦
¦ 0.00929150878827 ¦ 5.10326051712 ¦ 4.76890897751 ¦
¦ 0.0291729332784 ¦ 5.14945375919 ¦ 4.78650641441 ¦
¦ 0.0296332588857 ¦ 5.17255038023 ¦ 4.79530513287 ¦
¦ 0.0297080942518 ¦ 5.19564700127 ¦ 4.80410385132 ¦
¦ 0.0362595526707 ¦ 5.2187436223 ¦ 4.80850321054 ¦
¦ 0.0370305483177 ¦ 5.24184024334 ¦ 4.81290256977 ¦
¦ 0.0381506204153 ¦ 5.28803348541 ¦ 4.82170128822 ¦
¦ 0.0444440795306 ¦ 5.30783069134 ¦ 4.83050000668 ¦
¦ 0.0450121369102 ¦ 5.3177292943 ¦ 4.8348993659 ¦
¦ 0.0453465140473 ¦ 5.32762789726 ¦ 4.83929872513 ¦
¦ 0.0515533437013 ¦ 5.33752650023 ¦ 4.85359662771 ¦
¦ 0.05262489708 ¦ 5.34742510319 ¦ 4.8678945303 ¦
¦ 0.0541273847206 ¦ 5.36722230911 ¦ 4.89649033546 ¦
¦ 0.0600755845953 ¦ 5.37822067738 ¦ 4.92508614063 ¦
¦ 0.0607712385295 ¦ 5.38371986151 ¦ 4.93938404322 ¦
¦ 0.0612954159368 ¦ 5.38921904564 ¦ 4.9536819458 ¦
¦ 0.0670288249293 ¦ 5.39471822977 ¦ 4.97457891703 ¦
¦ 0.0683640870058 ¦ 5.4002174139 ¦ 4.99547588825 ¦
¦ 0.0703192637772 ¦ 5.41121578217 ¦ 5.0372698307 ¦
¦ 0.0757871634772 ¦ 5.43981158733 ¦ 5.07906377316 ¦
¦ 0.0766597757545 ¦ 5.45410948992 ¦ 5.09996074438 ¦
¦ 0.077317850103 ¦ 5.4684073925 ¦ 5.12085771561 ¦
¦ 0.0825991083545 ¦ 5.48270529509 ¦ 5.13295596838 ¦
¦ 0.0841354654428 ¦ 5.49700319767 ¦ 5.14505422115 ¦
¦ 0.0865525182528 ¦ 5.52559900284 ¦ 5.1692507267 ¦
+---------------------------------------------------+
The resampling recipe transforms time series data occurring in irregular time intervals into equispaced data. The recipe is also useful for transforming equispaced data from one frequency level to another (for example, minutes to hours).
Quoting the words from documentation, resample is a “Convenient method for frequency conversion and resampling of time series.” In practice, there are 2 main reasons why using resample. To inspect how data behaves differently under different resolutions or frequency. To join tables with different resolutions.
Resampling involves changing the frequency of your time series observations. Two types of resampling are: Upsampling: Where you increase the frequency of the samples, such as from minutes to seconds. Downsampling: Where you decrease the frequency of the samples, such as from days to months.
Pandas Series: resample() functionThe resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
It sounds like all you want to do is round the length figures to a lower precision.
If this is the case, you could just use the in-built rounding function:
(dummy data)
>>> df=pd.DataFrame([[1.0000005,4],[1.232463632,5],[5.234652,9],[5.675322,10]],columns=['length','force'])
>>> df
33: length force
0 1.000001 4
1 1.232464 5
2 5.234652 9
3 5.675322 10
>>> df['rounded_length'] = df.length.apply(round, ndigits=0)
>>> df
34: length force rounded_length
0 1.000001 4 1.0
1 1.232464 5 1.0
2 5.234652 9 5.0
3 5.675322 10 6.0
>>>
Then you could replicate the resample().... workflow using groupby:
>>> df.groupby('rounded_length').mean().force
35: rounded_length
1.0 4.5
5.0 9.0
6.0 10.0
Name: force, dtype: float64
Generally, resample IS just for dates. If you're using it for something other than dates, there's probably a more elegant solution!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With