I'm trying to wrap a little handy piece of C++ code that is intended to generate video+audio on windows using VFW, the C++ library lives here and the descriptions says:
Uses Video for Windows (so it's not portable). Handy if you want to quickly record a video somewhere and don't feel like wading through the VfW docs yourself.
I'd like to use that C++ library on Python so I've decided to wrap it up using swig.
Thing is, I'm having some problems when it comes to encode the audio, for some reason I'm trying to understand why the generated video is broken, it seems the audio has not been written properly in the video file. That means, if I try to open the video with VLC or any similar video player I'll get a message saying the video player can't identify the audio or video codec. The video images are fine so it's definitely a problem with the way I'm writing the audio to the file.
I'm attaching both the swig interface and a little Python test that's trying to be a port of the original c++ test.
aviwriter.i
%module aviwriter
%{
#include "aviwriter.h"
%}
%typemap(in) (const unsigned char* buffer) (char* buffer, Py_ssize_t length) %{
if(PyBytes_AsStringAndSize($input,&buffer,&length) == -1)
SWIG_fail;
$1 = (unsigned char*)buffer;
%}
%typemap(in) (const void* buffer) (char* buffer, Py_ssize_t length) %{
if(PyBytes_AsStringAndSize($input,&buffer,&length) == -1)
SWIG_fail;
$1 = (void*)buffer;
%}
%include "aviwriter.h"
test.py
import argparse
import sys
import struct
from distutils.util import strtobool
from aviwriter import AVIWriter
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-audio", action="store", default="1")
parser.add_argument('-width', action="store",
dest="width", type=int, default=400)
parser.add_argument('-height', action="store",
dest="height", type=int, default=300)
parser.add_argument('-numframes', action="store",
dest="numframes", type=int, default=256)
parser.add_argument('-framerate', action="store",
dest="framerate", type=int, default=60)
parser.add_argument('-output', action="store",
dest="output", type=str, default="checker.avi")
args = parser.parse_args()
audio = strtobool(args.audio)
framerate = args.framerate
num_frames = args.numframes
width = args.width
height = args.height
output = args.output
writer = AVIWriter()
if not writer.Init(output, framerate):
print("Couldn't open video file!")
sys.exit(1)
writer.SetSize(width, height)
data = [0]*width*height
sampleRate = 44100
samples_per_frame = 44100 / framerate
samples = [0]*int(samples_per_frame)
c1, s1, f1 = 24000.0, 0.0, 0.03
c2, s2, f2 = 1.0, 0.0, 0.0013
for frame in range(num_frames):
print(f"frame {frame}")
i = 0
for y in range(height):
for x in range(width):
on = ((x + frame) & 32) ^ ((y+frame) & 32)
data[i] = 0xffffffff if on else 0xff000000
i += 1
writer.WriteFrame(
struct.pack(f'{len(data)}L', *data),
width*4
)
if audio:
for i in range(int(samples_per_frame)):
c1 -= f1*s1
s1 += f1*c1
c2 += f2*s2
s2 -= f2*c2
val = s1 * (0.75 + 0.25 * c2)
if(frame == num_frames - 1):
val *= 1.0 * (samples_per_frame - 1 - i) / \
samples_per_frame
samples[i] = int(val)
if frame==0:
print(f"i={i} val={int(val)}")
writer.WriteAudioFrame(
struct.pack(f'{len(samples)}i', *samples),
int(samples_per_frame)
)
writer.Exit()
I don't think samples
is being generated incorrectly as I've already compared the values generated on the python side with the values generated on the c++ side, just the packet written for frame 0 though.
Some of my suspicions about what's wrong is the way I've created the typemap on swig, maybe that's not good... or maybe the problem lives in the line writer.WriteAudioFrame(struct.pack(f'{len(samples)}i', *samples), int(samples_per_frame))
, I don't know what could be, definitely the way I'm sending the audio buffer from Python to the C++ wrapper is not good.
So, would you know how to fix the attached code so test.py will be able to generate a video with the right audio similarly to the c++ test?
When generated ok, the video will display a magic scrolling checkerboard with hypnotic sinewaves as audio backdrop :D
Additional notes:
1) It seems the above code is not using writer.SetAudioFormat
wich is needed for the functions AVIFileCreateStreamA
and AVIStreamSetFormat
. Problem is I don't know how to export this structure on swig, that way I'd be able to use it on Python the same way than test.cpp
, from Mmreg.h I've seen the structure looks like this:
typedef struct tWAVEFORMATEX
{
WORD wFormatTag; /* format type */
WORD nChannels; /* number of channels (i.e. mono, stereo...) */
DWORD nSamplesPerSec; /* sample rate */
DWORD nAvgBytesPerSec; /* for buffer estimation */
WORD nBlockAlign; /* block size of data */
WORD wBitsPerSample; /* Number of bits per sample of mono data */
WORD cbSize; /* The count in bytes of the size of
extra information (after cbSize) */
} WAVEFORMATEX;
Unfortunately I don't know how to wrap that stuff on aviwriter.i? I've tried using %include windows.i and include the stuff directly on a block %{
...%}
but all I've got were a bunch of errors :/
2) I'd prefer not modifying neither aviwriter.h && aviwriter.cpp at all as that's basically external working code.
3) Assuming I'm able to wrap the WAVEFORMATEX
so I can use it on Python, how'd you use memset similarly to test.cpp
? ie: memset(&wfx,0,sizeof(wfx));
Two suggestions:
First, pack the data as short
instead of int
for the audio format, as per the C++ test. Audio data is 16-bit, not 32-bit. Use the 'h' extension for the packing format. For example, struct.pack(f'{len(samples)}h', *samples)
.
Second, see code modification below. Expose WAVEFORMATX
via SWIG, by editing aviwriter.i
. Then call writer.SetAudioFormat(wfx)
from Python.
In my tests, the memset()
was not necessary. From python you could manually set the field cbSize
to zero, that should be enough. The other six fields are mandatory so you'll be setting them anyways. It looks like this struct isn't meant to be revised in the future, because it does not have a struct size field, and also the semantics of cbSize
(appending arbitrary data to the end of the struct) conflict with an extension anyways.
aviwriter.i:
%inline %{
typedef unsigned short WORD;
typedef unsigned long DWORD;
typedef struct tWAVEFORMATEX
{
WORD wFormatTag; /* format type */
WORD nChannels; /* number of channels (i.e. mono, stereo...) */
DWORD nSamplesPerSec; /* sample rate */
DWORD nAvgBytesPerSec; /* for buffer estimation */
WORD nBlockAlign; /* block size of data */
WORD wBitsPerSample; /* Number of bits per sample of mono data */
WORD cbSize; /* The count in bytes of the size of
extra information (after cbSize) */
} WAVEFORMATEX;
%}
test.py:
from aviwriter import WAVEFORMATEX
later in test.py:
wfx = WAVEFORMATEX()
wfx.wFormatTag = 1 #WAVE_FORMAT_PCM
wfx.nChannels = 1
wfx.nSamplesPerSec = sampleRate
wfx.nAvgBytesPerSec = sampleRate * 2
wfx.nBlockAlign = 2
wfx.wBitsPerSample = 16
writer.SetAudioFormat(wfx)
Notes on SWIG: Since aviwriter.h only provides a forward declaration of tWAVEFORMATEX
, no other information is provided to SWIG, preventing get/set wrappers from being generated. You could ask SWIG to wrap a Windows header declaring the struct ... and open a can of worms because those headers are too large and complex, exposing further problems. Instead, you can individually define WAVEFORMATEX
as done above. The C++ types WORD
and DWORD
still are not declared, though. Including the SWIG file windows.i
only creates wrappers which, for example, allow string "WORD" in a Python script file to be understood as indicating 16-bit data in memory. But that doesn't declare the WORD
type from a C++ perspective. To resolve this, adding typedefs for WORD
and DWORD
in this %inline
statement in aviwriter.i
forces SWIG to copy that code directly inlined into the wrapper C++ file, making the declarations available. This also triggers get/set wrappers to be generated. Alternately, you could include that inlined code inside aviwriter.h if you're willing to edit it.
In short, the idea here is to fully enclose all types into standalone headers or declaration blocks. Remember that .i and .h file have separate functionality (wrappers and data conversion, versus functionality being wrapped). Similarly, notice how aviwriter.h
is included twice in the aviwriter.i
, once to trigger the generation of wrappers needed for Python, and once to declare types in the generated wrapper code needed for C++.
From what I saw in the code you don't initialize the audio format. This is done in the original test.cpp
code by calling writer.SetAudioFormat(&wfx);
at line 44, then it is set for mono 44.1 kHz PCM. I believe that since you do not initialize, the blank header is written, and video player is unable to open the unknown format.
Update
As you only need to pass the binary header structure, and you don't need to use the structure and declare it in the aviwriter.i
. You can use following code directly from Python:
import struct
from collection import namedtuple
WAVEFORMATEX = namedtuple('WAVEFORMATEX', 'wFormatTag nChannels nSamplesPerSec nAvgBytesPerSec nBlockAlign wBitsPerSample cbSize ')
wfx = WAVEFORMATEX(
wFormatTag = 1,
nChannels = 1,
nSamplesPerSec = sampleRate,
nAvgBytesPerSec = sampleRate * 2,
nBlockAlign = 2,
wBitsPerSample = 16,
cbSize = 0)
audio_format_obj = struct.pack('<HHIIHHH', *list(wfx))
writer.SetAudioFormat(audio_format_obj)
This will automatically solve your second and third concerns.
As for memset(&wfx,0,sizeof(wfx));
this is just an ugly way of old C to zero all variables in the structure.
P.S. As @MichaelsonBritt mentioned, your audio data format have to match the declaration in the header. But instead of converting to 16 bit short
, you can declare 2 channels, so you will get stereo sound with one channel silent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With