I need to have ffmpeg decode my video(e.g. h264) using hardware acceleration. I'm using the usual way of decoding frames: read packet -> decode frame. And I'd like to have ffmpeg speed up decoding. So I've built it with --enable-vaapi
and --enable-hwaccel=h264
. But I don't really know what should I do next. I've tried to use avcodec_find_decoder_by_name("h264_vaapi")
but it returns nullptr.
Anyway, I might want to use others API and not just VA API. How one is supposed to speed up ffmpeg decoding?
P.S. I didn't find any examples on Internet which uses ffmpeg with hwaccel.
FFmpeg uses Video Codec SDKFFmpeg supports following functionality accelerated by video hardware on NVIDIA GPUs: Hardware-accelerated encoding of H. 264 and HEVC* Hardware-accelerated decoding of H.
FFmpeg with NVIDIA GPU acceleration is supported on all Windows platforms, with compilation through Microsoft Visual Studio 2013 SP2 and above, and MinGW. Depending upon the Visual Studio Version and CUDA SDK version used, the paths specified may have to be changed accordingly.
the CPU% has been more than 90 (375%). In case of ImageMagick, if I enable opencl while configuring, then the app will run in GPU, i.e., the CPU% will be less than 90. I noticed that. But in case of ffmpeg, it is GPU accelerated one.
After some investigation I was able to implement the necessary HW accelerated decoding on OS X (VDA) and Linux (VDPAU). I will update the answer when I get my hands on Windows implementation as well. So let's start with the easiest:
To get HW acceleration working on Mac OS you should just use the following:
avcodec_find_decoder_by_name("h264_vda");
Note, however that you can accelerate h264 videos only on Mac OS with FFmpeg.
On Linux things are much more complicated(who is surprised?). FFmpeg has 2 HW accelerators on Linux: VDPAU(Nvidia) and VAAPI(Intel) and only one HW decoder: for VDPAU. And it may seems perfectly reasonable to use vdpau decoder like in the Mac OS example above:
avcodec_find_decoder_by_name("h264_vdpau");
You might be surprised to find out that it doesn't change anything and you have no acceleration at all. That's because it is only the beginning, you have to write much more code to get the acceleration working. Happily, you don't have to come up with a solution on your own: there are at least 2 good examples of how to achieve that: libavg and FFmpeg itself. libavg has VDPAUDecoder class which is perfectly clear and which I've based my implementation on. You can also consult ffmpeg_vdpau.c to get another implementation to compare. In my opinion the libavg implementation is easier to grasp, though.
The only things both aforementioned examples lack is proper copying of the decoded frame to the main memory. Both examples uses VdpVideoSurfaceGetBitsYCbCr
which killed all the performance I gained on my machine. That's why you might want to use the following procedure to extract the data from a GPU:
bool VdpauDecoder::fillFrameWithData(AVCodecContext* context,
AVFrame* frame)
{
VdpauDecoder* vdpauDecoder = static_cast<VdpauDecoder*>(context->opaque);
VdpOutputSurface surface;
vdp_output_surface_create(m_VdpDevice, VDP_RGBA_FORMAT_B8G8R8A8, frame->width, frame->height, &surface);
auto renderState = reinterpret_cast<vdpau_render_state*>(frame->data[0]);
VdpVideoSurface videoSurface = renderState->surface;
auto status = vdp_video_mixer_render(vdpauDecoder->m_VdpMixer,
VDP_INVALID_HANDLE,
nullptr,
VDP_VIDEO_MIXER_PICTURE_STRUCTURE_FRAME,
0, nullptr,
videoSurface,
0, nullptr,
nullptr,
surface,
nullptr, nullptr, 0, nullptr);
if(status == VDP_STATUS_OK)
{
auto tmframe = av_frame_alloc();
tmframe->format = AV_PIX_FMT_BGRA;
tmframe->width = frame->width;
tmframe->height = frame->height;
if(av_frame_get_buffer(tmframe, 32) >= 0)
{
VdpStatus status = vdp_output_surface_get_bits_native(surface, nullptr,
reinterpret_cast<void * const *>(tmframe->data),
reinterpret_cast<const uint32_t *>(tmframe->linesize));
if(status == VDP_STATUS_OK && av_frame_copy_props(tmframe, frame) == 0)
{
av_frame_unref(frame);
av_frame_move_ref(frame, tmframe);
return;
}
}
av_frame_unref(tmframe);
}
vdp_output_surface_destroy(surface);
return 0;
}
While it has some "external" objects used inside you should be able to understand it once you have implemented the "get buffer" part(to which the aforementioned examples are of great help). Also I've used BGRA
format which was more suitable for my needs maybe you will choose another.
The problem with all of it is that you can't just get it working from FFmpeg you need to understand at least basics of the VDPAU API. And I hope that my answer will aid someone in implementing the HW acceleration on Linux. I've spent much time on it myself before I realized that there is no simple, one-line way of implementing HW accelerated decoding on Linux.
Since my original question was regarding VA-API I can't not leave it unanswered.
First of all there is no decoder for VA-API in FFmpeg so avcodec_find_decoder_by_name("h264_vaapi")
doesn't make any sense: it is nullptr
.
I don't know how much harder(or maybe simpler?) is to implement decoding via VA-API since all the examples I've seen were quite intimidating. So I choose not to use VA-API at all and I had to implement the acceleration for an Intel card. Fortunately enough for me, there is a VDPAU library(driver?) which works over VA-API. So you can use VDPAU on Intel cards!
I've used the following link to setup it on my Ubuntu.
Also, you might want to check the comments to the original question where @Timothy_G also mentioned some links regarding VA-API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With