Analyzing noisy data

Tags:

I recently launched a rocket with a barometric altimeter that is accurate to roughly 10 ft (calculated via data acquired during flight). The recorded data is in time increments of 0.05 sec per sample and a graph of altitude vs. time looks pretty much like it should when zoomed out over the entire flight.

The problem is when I try to calculate other values such as velocity or acceleration from the data, the accuracy of the measurements makes the calculated values pretty much worthless. What techniques can I use to smooth out the data so that I can calculate (or approximate) reasonable values for the velocity and acceleration? It is important that major events remain in place in time, most notably the 0 for for the first entry and the highest point during flight (2707).

The altitude data follows and is measured in ft above ground level. The first time would be 0.00 and each sample is 0.05 seconds after the previous sample. The spike at the beginning of the flight is due to a technical problem that occurred during liftoff and removing the spike is optimal.

I originally tried using linear interpolation, averaging nearby data points, but it took many iterations to smooth the data enough for integration and the flattening of the curve removed the important apogee and ground level events.

All help is greatly appreciated. Please note this is not the complete data set and I am looking for suggestions on better ways to analyze the data, not for someone to reply with a transformed data set. It would be nice to use an algorithm on board future rockets which can predict current altitude/velocity/acceleration without knowing the full flight data, though that is not required.

Click to copy

562

asked Dec 24 '09 05:12

Nick Larsen

3 Answers

Here is my solution, using a Kalman filter. You will need to tune the parameters (even +- orders of magnitude) if you want to smooth more or less.

Click to copy

#!/usr/bin/env octave

% Kalman filter to smooth measures of altitude and estimate
% speed and acceleration. The continuous time model is more or less as follows:
% derivative of altitude := speed
% derivative of speed := acceleration
% acceleration is a Wiener process

%------------------------------------------------------------
% Discretization of the continuous-time linear system
% 
%   d  |x|   | 0 1 0 | |x|
%  --- |v| = | 0 0 1 | |v|   + "noise"
%   dt |a|   | 0 0 0 | |a|
%
%   y = [1 0 0] |x|     + "measurement noise"
%               |v|
%               |a|
%
st = 0.05;    % Sampling time
A = [1  st st^2/2;
     0  1  st    ;
     0  0  1];
C = [1 0 0];

%------------------------------------------------------------
% Fine-tune these parameters! (in particular qa and R)
% The acceleration follows a "random walk". The greater is the variance qa,
% the more "reactive" the system is expected to be, i.e.
% the more the acceleration is expected to vary
% The greater is R, the more noisy is your measurement instrument
% (less "accuracy" of the barometric altimeter);
% if you increase R, you will smooth the estimate more
qx = 1.0;                      % Variance of model noise for position
qv = 1.0;                      % Variance of model noise for speed
qa = 50.0;                     % Variance of model noise for acceleration
Q  = diag([qx, qv, qa]);
R  = 100.0;                    % Variance of measurement noise
                               % (10^2, if 10ft is the standard deviation)

load data.txt  % Put your measures in this file

est_position     = zeros(length(data), 1);
est_speed        = zeros(length(data), 1);
est_acceleration = zeros(length(data), 1);

%------------------------------------------------------------
% Kalman filter
xhat = [0;0;0];     % Initial estimate
P    = zeros(3,3);  % Initial error variance
for i=1:length(data),
   y = data(i);
   xpred = A*xhat;                                    % Prediction
   Ppred = A*P*A' + Q;                                % Prediction error variance
   Lambdainv = 1/(C*Ppred*C' + R);
   xhat  = xpred + Ppred*C'*Lambdainv*(y - C*xpred);  % Update estimation
   P = Ppred - Ppred*C'*Lambdainv*C*Ppred;            % Update estimation error variance
   est_position(i)     = xhat(1);
   est_speed(i)        = xhat(2);
   est_acceleration(i) = xhat(3);
end

%------------------------------------------------------------
% Plot
figure(1);
hold on;
plot(data, 'k');               % Black: real data
plot(est_position, 'b');       % Blue:  estimated position
plot(est_speed, 'g');          % Green: estimated speed
plot(est_acceleration, 'r');   % Red:   estimated acceleration
pause

answered Oct 28 '22 22:10

Federico A. Ramponi

You could try running the data through a low-pass filter. This will smooth out high frequency noise. Maybe a simple FIR.

Also, you could pull your major events from the raw data, but use a polynomial fit for velocity and acceleration data.

answered Oct 28 '22 21:10

Rob Curtis

have you tried performing a scrolling window average of your values ? Basically you perform a window of, say 10 values (from 0 to 9), and calculate its average. then you scroll the window one point (from 1 to 10) and recalculate. This will smooth the values while keeping the number of points relatively unchanged. Larger windows give smoother data at the price of loosing more high-frequency information.

You can use the median instead of the average if your data happen to present outlier spikes.

You can also try with Autocorrelation.

answered Oct 28 '22 22:10

Stefano Borini

Related questions
                            
                                Maximal vs. Closed Patterns in Association Rule Mining
                            
                                can "splitting attribute" appear many times in decision tree?
                            
                                rapid miner: how to add a 'label' attribute to a dataset?
                            
                                Weka GUI - Not enough memory, won't load?
                            
                                Hadoop beginners [closed]
                            
                                Indexing and Searching Over Word Level Annotation Layers in Lucene
                            
                                Are there any classification algorithms which target data with a one to many (1:n) relationship?
                            
                                How to plot/visualize a C50 decision tree in R?
                            
                                WEKA K-Means Clustering
                            
                                Handling missing attributes in Naive Bayes classifier
                            
                                How to get a fixed size SIFT feature vector?
                            
                                Historical weather data from NOAA
                            
                                Application of machine learning
                            
                                Computing Jaccard Similarity in Python
                            
                                Machine learning library for .net analog of Apache Mahout [closed]
                            
                                Can I use hdf5 for large amounts of text data?
                            
                                MATLAB's glmfit vs fitglm
                            
                                Formula for "Relative absolute error" and "Root relative squared error" used in machine learning (as computed by Weka)
                            
                                Bytes vs Characters vs Words - which granularity for n-grams?
                            
                                What is the meaning of jitter in visualize tab of weka

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Analyzing noisy data

Tags:

data-mining

numerical-analysis