How to use DSP to speed-up a code on OMAP?

Question

I'm working on a video codec for OMAP3430. I already have code written in C++, and I try to modify/port certain parts of it to take advantage of the DSP (the SDK (OMAP ZOOM3430 SDK) I have has an additional DSP).

I tried to port a small for loop which is running over a very small amount of data (~250 bytes), but about 2M times on different data. But the overload from the communication between CPU and DSP is much more than the gain (if I have any).

I assume this task is much like optimizing a code for the GPU's in normal computers. My question is porting what kind of parts would be beneficial? How do GPU programmers take care of such tasks?

edit:

GPP application allocates a buffer of size 0x1000 bytes.
GPP application invokes DSPProcessor_ReserveMemory to reserve a DSP virtual address space for each allocated buffer using a size that is 4K greater than the allocated buffer to account for automatic page alignment. The total reservation size must also be aligned along a 4K page boundary.
GPP application invokes DSPProcessor_Map to map each allocated buffer to the DSP virtual address spaces reserved in the previous step.
GPP application prepares a message to notify the DSP execute phase of the base address of virtual address space, which have been mapped to a buffer allocated on the GPP. GPP application uses DSPNode_PutMessage to send the message to the DSP.
GPP invokes memcpy to copy the data to be processed into the shared memory.
GPP application invokes DSPProcessor_FlushMemory to ensure that the data cache has been flushed.
GPP application prepares a message to notify the DSP execute phase that it has finished writing to the buffer and the DSP may now access the buffer. The message also contains the amount of data written to the buffer so that the DSP will know just how much data to copy. The GPP uses DSPNode_PutMessage to send the message to the DSP and then invokes DSPNode_GetMessage to wait to hear a message back from the DSP.

After these the execution of DSP program starts, and DSP notifies the GPP with a message when it finishes the processing. Just to try I don't put any processing inside the DSP program. I just send a "processing finished" message back to the GPP. And this still consumes a lot of time. Could that be because of the internal/external memory usage, or is it merely because of the communication overload?

Mark · Accepted Answer

The OMAP3430 does not have an on board DSP, it has a IVA2+ Video/Audio decode engine hooked to the system bus and the Cortex core has DSP-like SIMD instructions. The GPU on the OMAP3430 is a PowerVR SGX based unit. While it does have programmable shaders and i don't believe there is any support for general purpose programming ala CUDA or OpenCL. I could be wrong but I've never heard of such support

If your using the IVA2+ encode/decode engine that is on board you need to use the proper libraries for this unit and it only supports specific codecs from that I know. Are you trying to write your own library to this module?

If your using the Cortex's built in DSPish (SIMD instructions), post some code.

If your dev board has some extra DSP on it, what is the DSP and how is it connected to the OMAP?

As to the desktop GPU question, in the case of video decode you use the vender supplied function libraries to make calls to the hardware, there are several, VDAPU for Nvidia on linux, similar libraries on windows(PureViewHD I think its called). ATI also has both linux and windows libraries for their on board decode engines, i don't know the names.

How to use DSP to speed-up a code on OMAP?

Tags:

c++

c

embedded

signal-processing

omap

edit:

Can Bal

1 Answers

Mark

Recent Activity

Donate For Us

How to use DSP to speed-up a code on OMAP?

Tags:

c++

c

embedded

signal-processing

omap

edit:

Can Bal

1 Answers

Mark

Related questions

Recent Activity

Donate For Us