Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC 4.7.2 Optimization Problems

Summary

I'm porting ST's USB OTG Library to a custom STM32F4 board using the latest version of Sourcery CodeBench Lite toolchain (GCC arm-none-eabi 4.7.2).

When I compile the code with -O0, the program runs fine. When I compile with -O1 or -O2 it fails. When I say fail, it just stops. No hard fault, nothing (Well, obviously there is something it's doing but I don't have a emulator to use to debug and find out, I'm sorry. My hard fault handler is not being called).

Details

I'm trying to make a call to the following function...

void USBD_Init(USB_OTG_CORE_HANDLE *pdev,
           USB_OTG_CORE_ID_TypeDef coreID, 
           USBD_DEVICE *pDevice,                  
           USBD_Class_cb_TypeDef *class_cb, 
           USBD_Usr_cb_TypeDef *usr_cb);

...but it doesn't seem to make it into the function body. (Is this a symptom of "stack-smashing"?)

The structures passed to this function have the following definitions:

typedef struct USB_OTG_handle
{
  USB_OTG_CORE_CFGS    cfg;
  USB_OTG_CORE_REGS    regs;
  DCD_DEV     dev;
}
USB_OTG_CORE_HANDLE , *PUSB_OTG_CORE_HANDLE;

typedef enum
{
  USB_OTG_HS_CORE_ID = 0,
  USB_OTG_FS_CORE_ID = 1
}USB_OTG_CORE_ID_TypeDef;

typedef struct _Device_TypeDef
{
  uint8_t  *(*GetDeviceDescriptor)( uint8_t speed , uint16_t *length);  
  uint8_t  *(*GetLangIDStrDescriptor)( uint8_t speed , uint16_t *length); 
  uint8_t  *(*GetManufacturerStrDescriptor)( uint8_t speed , uint16_t *length);  
  uint8_t  *(*GetProductStrDescriptor)( uint8_t speed , uint16_t *length);  
  uint8_t  *(*GetSerialStrDescriptor)( uint8_t speed , uint16_t *length);  
  uint8_t  *(*GetConfigurationStrDescriptor)( uint8_t speed , uint16_t *length);  
  uint8_t  *(*GetInterfaceStrDescriptor)( uint8_t speed , uint16_t *length);   
} USBD_DEVICE, *pUSBD_DEVICE;

typedef struct _Device_cb
{
  uint8_t  (*Init)         (void *pdev , uint8_t cfgidx);
  uint8_t  (*DeInit)       (void *pdev , uint8_t cfgidx);
 /* Control Endpoints*/
  uint8_t  (*Setup)        (void *pdev , USB_SETUP_REQ  *req);  
  uint8_t  (*EP0_TxSent)   (void *pdev );    
  uint8_t  (*EP0_RxReady)  (void *pdev );  
  /* Class Specific Endpoints*/
  uint8_t  (*DataIn)       (void *pdev , uint8_t epnum);   
  uint8_t  (*DataOut)      (void *pdev , uint8_t epnum); 
  uint8_t  (*SOF)          (void *pdev); 
  uint8_t  (*IsoINIncomplete)  (void *pdev); 
  uint8_t  (*IsoOUTIncomplete)  (void *pdev);   
  uint8_t  *(*GetConfigDescriptor)( uint8_t speed , uint16_t *length);  
  uint8_t  *(*GetUsrStrDescriptor)( uint8_t speed ,uint8_t index,  uint16_t *length);   

} USBD_Class_cb_TypeDef;

typedef struct _USBD_USR_PROP
{
  void (*Init)(void);   
  void (*DeviceReset)(uint8_t speed); 
  void (*DeviceConfigured)(void);
  void (*DeviceSuspended)(void);
  void (*DeviceResumed)(void);  

  void (*DeviceConnected)(void);  
  void (*DeviceDisconnected)(void);    

}
USBD_Usr_cb_TypeDef;

I've tried to include all the source code relevant to this problem. If you want to see the entire source code you can download it here: http://www.st.com/st-web-ui/static/active/en/st_prod_software_internet/resource/technical/software/firmware/stm32_f105-07_f2_f4_usb-host-device_lib.zip

Solutions Attempted

I tried playing with #pragma GCC optimize ("O0"), __attribute__((optimize("O0"))), and declaring certain definitions as volatile, but nothing worked. I'd rather just modify the code to make it play nicely with the optimizer anyway.

Question

How can I modify this code to make it play nice with GCC's optimizer?

like image 980
Verax Avatar asked Mar 14 '13 00:03

Verax


1 Answers

There doesn't seem to be anything wrong with the code you showed, so this answer will be more general.

What are typical errors with "close to hardware" code that works properly unoptimized and fails with higher optimization levels?

Think about the differences between -O0 and -O1/-O2: optimization strategies are - among others - loop unrolling (doesn't seem to be dangerous), attempting to hold values in registers as long as possible, dead code elimination and instruction reordering.

improved register usage typically leads to problems with higher optimization levels if hardware registers that can change anytime aren't declared volatileproperly (see PokyBrain's comment above). The optimized code will try to hold values in registers as long as possible resulting in your program failing to notice changes on the hardware side. Make sure to declare hardware registers volatile properly

dead code elimination will likely lead to problems if you need to read a hardware register to produce whatever effect on the hardware not known to the compiler and don't do anything with the value you just read. These hardware accesses might be optimized away if you don't declare the variable used for read access void properly (compiler should issue a warning, though). Make sure to cast dummy reads to (void)

instruction reordering: if you need to access different hardware registers in a certain sequence to produce the desired results and if you do that through pointers not related in any way otherwise, the compiler is free to reorder the resulting instructions as it sees fit (even if hardware registers are properly declared volatile). You will need to stray memory barriers into your code to enforce the required access sequence (__asm__ __volatile__(::: "memory");). Make sure to add memory barriers where needed.

Although unlikely, it might still be the case that you found a compiler bug. Optimization is not an easy job, especially when it comes close to hardware. It might be worth a peek into the gcc bug database.

If all this doesn't help, you sometimes just can't avoid to dig into the generated assembler code to make sure its doing what it is supposed to do.

like image 65
mfro Avatar answered Oct 11 '22 20:10

mfro