I am trying to find a heart rate from a video of my skin. In order to do so, I am taking a cropped rectangle of pixels from my video frame and averaging the red (or green) component in all those pixels (then, of course, seeing how this average changes frame-to-frame).
I take a fast fourier transform of the vector (the average color value of each frame's cropped section) to see what frequencies are most prominent. I am hoping to see the resting human heart rate, ~1Hz, as very prominent.
As a test, I took a video of just a wall, or other objects that should have no periodic color change. I used a tripod and three different cameras of different brands. Each of them have similar if not identical background frequency peaks particularly at 1Hz, 2Hz, 5 Hz, and 10 Hz. I have shot under natural light and fluorescent and it still occurs.
My ultimate goal is to distinguish, with this analysis, living skin from non-vascularized skin. So understanding why I am getting these frequency peaks for inanimate objects is VITAL.
Could any of you run this code on your own videos and help explain if I am simply an idiot?
Camera shooting:
Kodak Playsport
1920x1080 30fps (60i) imports as mp4
Canon Vixia HF200 1440x1080 30fps (60i) 12mbps bitrate imports as .mts which I reencode to mp4
Code based off of:
http://www.ignaciomellado.es/blog/Measuring-heart-rate-with-a-smartphone-camera#video
clear all
close all
clc
%% pick data file name to be analyzed, set directory it is found in
dataDir = './data';
vidname = ['Filename.MP4'];
%% define path to file and pull out video
inFile = fullfile(dataDir,vidname);
video = VideoReader(inFile);
%% make 1D array with length equal to number of frames (time)
brightness = zeros(1, video.NumberOfFrames);
video_framerate = round( video.FrameRate); % note some places in the code must use integer value for framerate, others we directly use the unrounded frame rate
%% set region of interest for what you want to get average brightness of
frame = read(video, 1);
imshow(frame)
rect = getrect;
close all
xmin_pt = round(rect(1));
ymin_pt = round(rect(2));
section_width = round(rect(3));
section_height = round(rect(4));
%% select component of video (red green or blue)
component_selection = 1; % pick red , green, or blue
%% make 1D array of ROI averages
for i = 1:video.NumberOfFrames,
frame = read(video, i);
section_of_interest = frame(ymin_pt:ymin_pt+section_height,xmin_pt:xmin_pt+section_width,:);
colorPlane = section_of_interest(:, :, component_selection);
brightness(i) = sum(sum(colorPlane)) / (size(frame, 1) * size(frame, 2));
end
%% Filter out non-physiological frequencies
BPM_L = 40; % Heart rate lower limit [bpm]
BPM_H = 600; % Heart rate higher limit [bpm] This is currently set high to investigate the signal
% Butterworth frequencies must be in [0, 1], where 1 corresponds to half the sampling rate
[b, a] = butter(2, [((BPM_L / 60) / video_framerate * 2), ((BPM_H / 60) / video_framerate * 2)]);
filtBrightness = filter(b, a, brightness);
%% Trim the video to exlude the time where the camera is stabilizing
FILTER_STABILIZATION_TIME = 3; % [seconds]
filtBrightness = filtBrightness((video_framerate * FILTER_STABILIZATION_TIME + 1):size(filtBrightness, 2));
%% Do FFT on filtered/trimmed signal
fftMagnitude = abs(fft(filtBrightness));
%% Plot results
figure(1)
subplot(3,1,1)
plot([1:length(brightness)]/video.FrameRate,brightness)
xlabel('Time (seconds)')
ylabel('Color intensity')
title('original signal')
subplot(3,1,2)
plot([1:length(filtBrightness)]/video.FrameRate,filtBrightness)
xlabel('Time (seconds)')
ylabel('Color intensity')
title('after butterworth filter and trim')
freq_dimension = ((1:round(length(filtBrightness)))-1)*(video_framerate/length(filtBrightness));
subplot(3,1,3)
plot(freq_dimension,fftMagnitude)
axis([0,15,-inf,inf])
xlabel('Frequency (Hz)')
ylabel('|Y(f)|')
title('Fft of filtered signal')
Your purpose is to "find a heart rate from a video of my skin" and "...to distinguish, with this analysis, living skin from non-vascularized skin" but your approach is "take video ... crop ... average red (or green) component .. look how it changes".
I think there is a problem with the premise. Average means the majority signal dominates. Human vision is an (amazing) processor of visual data and usually it takes decades of hard work to get a machine to come within even a small order of what they do. If you can't look with your eyes at 90% of folks around you and measure their heart-rate, then an averaging approach might not be the way to go.
There are an infinite number of statistics you can take to look at how a distribution changes over time, and mean is only one of them. If you look only at the mean you may be missing the information.
How I would do this:
AFTER the human vision apparatus can get a good signal, then I would work on making math/computer get the signal with the clear knowledge that the human apparatus is superior.
If I were tracking points/features that move over time, I would look at sub-pixel metrology methods. I did some of this a decade ago (thesis) and while I think the content might be relevant, it also might be different enough to merit a first read-through before copying and running code.
Best of luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With