Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC to emit ARM idiv instructions

Tags:

gcc

arm

How can I instruct gcc to emit idiv (integer division, udiv and sdiv) instructions for arm application processors?

So far only way I can come up with is to use -mcpu=cortex-a15 with gcc 4.7.

$cat idiv.c
int test_idiv(int a, int b) {
    return a / b;
}

On gcc 4.7 (bundled with Android NDK r8e)

$gcc -O2 -mcpu=cortex-a15 -c idiv.c
$objdump -S idiv.o

00000000 <test_idiv>:
   0:   e710f110    sdiv    r0, r0, r1
   4:   e12fff1e    bx  lr

Even this one gives idiv.c:1:0: warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch [enabled by default] if you add -march=armv7-a next to -mcpu=cortex-a15 and doesn't emit idiv instruction.

$gcc -O2 -mcpu=cortex-a15 -march=armv7-a -c idiv.c

idiv.c:1:0: warning: switch -mcpu=cortex-a15 conflicts with -march=armv7-a switch [enabled by default]

$objdump -S idiv.o
00000000 <test_idiv>:
   0:   e92d4008    push    {r3, lr}
   4:   ebfffffe    bl  0 <__aeabi_idiv>
   8:   e8bd8008    pop {r3, pc}

On gcc 4.6 (bundled with Android NDK r8e) it doesn't emit idiv instructions at all but recognizes -mcpu=cortex-a15 also doesn't complain to -mcpu=cortex-a15 -march=armv7-a combination.

Afaik idiv is optional on armv7, so there should be a cleaner way to instruct gcc to emit them but how?

like image 668
auselen Avatar asked Apr 03 '13 08:04

auselen


1 Answers

If the instruction is not in the machine descriptions, then I doubt that gcc will emit code. Note1

You can always use inline-assembler to get the instruction if the compiler is not supporting it.Note2 Since your op-code is fairly rare/machine specific, there is probably not so much effort to get it in the gcc source. Especially, there are arch and tune/cpu flags. The tune/cpu is for a more specific machine, but the arch is suppose to allow all machines in that architecture. This op-code seems to break that rule, if I understand.

For gcc 4.6.2, it looks like thumb2 and cortex-r4 are cues to use these instructions and as you have noted with gcc 4.7.2, the cortex-a15 seems to be added to use these instructions. With gcc 4.7.2, the thumb2.md file no longer has udiv/sdiv. However, it might be included somewhere else; I am not 100% familiar with all the machine description language. It also seems that cortex-a7, cortex-a15, and cortex-r5 may enable these instructions with 4.7.2. Note3

This doesn't answer the question directly, but it does give some information/path to get the answer. You can compile the module with -mcpu=cortex-r4, although this may produce linker issues. Also, there is int my_idiv(int a, int b) __attribute__ ((__target__ ("arch=cortexe-r4")));, where you can specify on a per-function basis the machine-description used by the code generator. I haven't used any of these myself, but they are only possibilities to try. Generally you don't want to keep the wrong machine as it could generate sub-optimal (and possibly illegal) op-codes. You will have to experiment and maybe then provide the real answer.

Note1: This is for a stock gcc 4.6.2 and 4.7.2. I don't know if your Android compiler has patches.

gcc-4.6.2/gcc/config/arm$ grep [ius]div *.md
arm.md: "...,sdiv,udiv,other"
cortex-r4.md:;; We guess that division of A/B using sdiv or udiv, on average, 
cortex-r4.md:;; This gives a latency of nine for udiv and ten for sdiv.
cortex-r4.md:(define_insn_reservation "cortex_r4_udiv" 9
cortex-r4.md:       (eq_attr "insn" "udiv"))
cortex-r4.md:(define_insn_reservation "cortex_r4_sdiv" 10
cortex-r4.md:       (eq_attr "insn" "sdiv"))
thumb2.md:  "sdiv%?\t%0, %1, %2"
thumb2.md:   (set_attr "insn" "sdiv")]
thumb2.md:(define_insn "udivsi3"
thumb2.md:      (udiv:SI (match_operand:SI 1 "s_register_operand"  "r")
thumb2.md:  "udiv%?\t%0, %1, %2"
thumb2.md:   (set_attr "insn" "udiv")]
gcc-4.7.2/gcc/config/arm$ grep -i [ius]div *.md
arm.md:  "...,sdiv,udiv,other"
arm.md:  "TARGET_IDIV"
arm.md:  "sdiv%?\t%0, %1, %2"
arm.md:   (set_attr "insn" "sdiv")]
arm.md:(define_insn "udivsi3"
arm.md: (udiv:SI (match_operand:SI 1 "s_register_operand"  "r")
arm.md:  "TARGET_IDIV"
arm.md:  "udiv%?\t%0, %1, %2"
arm.md:   (set_attr "insn" "udiv")]
cortex-a15.md:(define_insn_reservation "cortex_a15_udiv" 9
cortex-a15.md:       (eq_attr "insn" "udiv"))
cortex-a15.md:(define_insn_reservation "cortex_a15_sdiv" 10
cortex-a15.md:       (eq_attr "insn" "sdiv"))
cortex-r4.md:;; We guess that division of A/B using sdiv or udiv, on average, 
cortex-r4.md:;; This gives a latency of nine for udiv and ten for sdiv.
cortex-r4.md:(define_insn_reservation "cortex_r4_udiv" 9
cortex-r4.md:       (eq_attr "insn" "udiv"))
cortex-r4.md:(define_insn_reservation "cortex_r4_sdiv" 10
cortex-r4.md:       (eq_attr "insn" "sdiv"))

Note2: See pre-processor as Assembler if gcc is passing options to gas that prevent use of the udiv/sdiv instructions. For example, you can use asm(" .long <opcode>\n"); where opcode is some token pasted stringified register encode macro output. Also, you can annotate your assembler to specify changes in the machine. So you can temporarily lie and say you have a cortex-r4, etc.

Note3:

gcc-4.7.2/gcc/config/arm$ grep -E 'TARGET_IDIV|arm_arch_arm_hwdiv|FL_ARM_DIV' *
arm.c:#define FL_ARM_DIV    (1 << 23)         /* Hardware divide (ARM mode).  */
arm.c:int arm_arch_arm_hwdiv;
arm.c:  arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0;
arm-cores.def:ARM_CORE("cortex-a7",  cortexa7,  7A, ... FL_ARM_DIV
arm-cores.def:ARM_CORE("cortex-a15", cortexa15, 7A, ... FL_ARM_DIV
arm-cores.def:ARM_CORE("cortex-r5",  cortexr5,  7R, ... FL_ARM_DIV
arm.h:  if (TARGET_IDIV)                                \
arm.h:#define TARGET_IDIV               ((TARGET_ARM && arm_arch_arm_hwdiv) \
arm.h:extern int arm_arch_arm_hwdiv;
arm.md:  "TARGET_IDIV"
arm.md:  "TARGET_IDIV"
like image 88
artless noise Avatar answered Nov 04 '22 05:11

artless noise