Keep target address of load in register until instruction is retired

Question

I want to use Precise Event-Based Sampling (PEBS) to record all the addresses of specific events (say cache misses for example), on a XeonE5 Sandy Bridge.

However, the Performance Analysis Guide for Core^TM i7 Processor and Intel® XeonTM 5500 processors, p.24, contains the following warning:

As the PEBS mechanism captures the values of the register at completion of the instruction, the dereferenced address for the following type of load instruction (Intel asm convention) cannot be reconstructed.
MOV RAX, [RAX+const]
This kind of instruction is mostly associated with pointer chasing
mystruc = mystruc->next;
This is a significant shortcoming of this approach to capturing memory instruction addresses.

I have a number of load instructions of that form in my program according to objdump. Is there any way I can avoid those?

As this is a intel specific problem, the solution does not have to be portable in any way, it just has to work. My code is written in C, and I'm ideally looking for a compiler-level solution (gcc or icc) but any suggestions are welcome.

Some examples:

mov    0x18(%rdi),%rdi

mov    (%rcx,%rax,8),%rax

In both cases, after the instruction retired (thus when I look at the register values to figure out where I loaded to/from) the values of the addresses (respectively %rdi + 18 and %rcx + 8 * %rax in these examples) are overwritten by the result of the mov.

The only way I can think of right now is to use the & (ampersand) assembler constraint. That would mean I would have to go through my code wherever such an instruction appears, and replace every pointer dereferencing mystruc = mystruc->next; with something like:
asm volatile("mov (%1),%0" : "=&r" (mystruc) : "r" (&(mystruc->next)))

However it is a very pedestrian approach and there are probably cases that are more complicated than a pointer inside a struct. I understand this is basically increasing register pressure so something compilers are actively trying to avoid. Is there any other way to do this?

Craig Estey · Accepted Answer

What you'd like to do is transform all instructions of the form:

mov    (%rcx,%rax,8),%rax

Into:

mov    (%rcx,%rax,8),%r11
mov    %r11,%rax

This can be done much more easily by modifying the assembler source generated by the compiler. Below is a perl script that will do all the necessary transformations by reading and modifying the .s file.

Just change the build to produce .s files instead of .o files, apply the script, and then generate the .o with either as or gcc

Here's the actual script. I've tested it on a few of my own sources, following the build procedure in the comments below.

The script has the following features:

scans and locates all function definitions
identifies all registers used in a given function
locates all return points for the function
selects a temporary register to use based upon the function's register usage (i.e. it will use a temporary register that is not already in use by the function)
replaces all "troublesome" instructions with a two instruction sequence
tries to use unused temporary registers (e.g. %r11 or unused argument registers) before trying to use a callee saved register
if the selected register is callee saved, will add push to function prolog and pop to function [multiple] ret statements
maintains a log of all analysis and transformations and appends this as comments to the output .s file

#!/usr/bin/perl
# pebsfix/pebsfixup -- fix assembler source for PEBS usage
#
# command line options:
#   "-a" -- use only full 64 bit targets
#   "-l" -- do _not_ use lea
#   "-D[diff-file]" -- show differences (default output: "./DIFF")
#   "-n10" -- do _not_ use register %r10 for temporary (default is use it)
#   "-o" -- overwrite input files (can be multiple)
#   "-O<outfile>" -- output file (only one .s input allowed)
#   "-q" -- suppress warnings
#   "-T[lvl]" -- debug trace
#
# "-o" and "-O" are mutually exclusive
#
# command line script test options:
#   "-N[TPA]" -- disable temp register types [for testing]
#   "-P" -- force push/pop on all functions
#
# command line arguments:
#   1-- list of .s files to process [or directory to search]
#       for a given file "foo.s", output is to "foo.TMP"
#       if (-o is given, "foo.TMP" is renamed to "foo.s")
#
# suggested usage:
#   change build to produce .s files
#   FROM:
#     cc [options] -c foo.c
#   TO:
#     cc [options] -c -S foo.c
#     pebsfixup -o foo.s
#     cc -c foo.s
#
# suggested compiler options:
# [probably only really needed if push/pop required. use -NP to verify]
#   (1) use either of
#       -O2 -fno-optimize-sibling-calls
#       -O1
#   (2) use -mno-omit-leaf-frame-pointer
#   (3) use -mno-red-zone [probably not required in any case]
#
# NOTES:
#   (1) red zones are only really useful for leaf functions (i.e. if fncA calls
#       fncB, fncA's red zone would be clobbered)
#   (2) pushing onto the stack isn't a problem if there is a formal stack frame
#   (3) the push is okay if the function has no more than six arguments (i.e.
#       does _not_ use positive offsets from %rsp to access them)

#pragma pgmlns
use strict qw(vars subs);

our $pgmtail;

our $opt_a;
our $opt_T;
our $opt_D;
our $opt_l;
our $opt_n10;
our $opt_N;
our $opt_P;
our $opt_q;
our $opt_o;
our $opt_O;
our $opt_s;

our @reguse;
our %reguse_tobase;
our %reguse_isbase;
our $regusergx;

our @regtmplist;
our %regtmp_type;

our $diff;

our $sepflg;
our $fatal;
our @cmtprt;

master(@ARGV);
exit(0);

# master -- master control
sub master
{
    my(@argv) = @_;
    my($xfsrc);
    my($file,@files);
    my($bf);

    $pgmtail = "pebsfixup";

    optget(\@argv);

    # define all known/usable registers
    regusejoin();

    # define all registers that we may use as a temporary
    regtmpall();

    if (defined($opt_D)) {
        unlink($opt_D);
    }

    # show usage
    if (@argv <= 0) {
        $file = $0;
        open($xfsrc,"<$file") ||
            sysfault("$pgmtail: unable to open '%s' -- $!
",$file);

        while ($bf = <$xfsrc>) {
            chomp($bf);
            next if ($bf =~ /^#!/);
            last unless ($bf =~ s/^#//);
            $bf =~ s/^# ?//;
            print($bf,"
");
        }

        close($xfsrc);
        exit(1);
    }

    foreach $file (@argv) {
        if (-d $file) {
            dodir(\@files,$file);
        }
        else {
            push(@files,$file);
        }
    }

    if (defined($opt_O)) {
        sysfault("$pgmtail: -O may have only one input file
")
            if (@files != 1);
        sysfault("$pgmtail: -O and -o are mutually exclusive
")
            if ($opt_o);
    }

    foreach $file (@files) {
        dofile($file);
    }

    if (defined($opt_D)) {
        exec("less",$opt_D);
    }
}

# dodir -- process directory
sub dodir
{
    my($files,$dir) = @_;
    my($file,@files);

    @files = (`find $dir -type f -name '*.s'`);
    foreach $file (@files) {
        chomp($file);
        push(@$files,$file);
    }
}

# dofile -- process file
sub dofile
{
    my($file) = @_;
    my($ofile);
    my($xfsrc);
    my($xfdst);
    my($bf,$lno,$outoff);
    my($fixoff);
    my($lhs,$rhs);
    my($xop,$arg);
    my($ix);
    my($sym,$val,$typ);
    my(%sym_type);
    my($fnc,$fnx,%fnx_lookup,@fnxlist);
    my($retlist);
    my($uselook,@uselist,%avail);
    my($fixreg,$fixrtyp);
    my($sixlist);
    my($fix,$fixlist);
    my($fixtot);
    my(@fix);
    my(@outlist);
    my($relaxflg);
    my($cmtchr);

    undef($fatal);
    undef(@cmtprt);

    msgprt("
")
        if ($sepflg);
    $sepflg = 1;
    msgprt("$pgmtail: processing %s ...
",$file);

    $cmtchr = "#";

    cmtprt("%s
","-" x 78);
    cmtprt("FILE: %s
",$file);

    # get the output file
    $ofile = $file;
    sysfault("$pgmtail: bad suffix -- file='%s'
",$file)
        unless ($ofile =~ s/[.]s$//);
    $ofile .= ".TMP";

    # use explicit output file
    if (defined($opt_O)) {
        $ofile = $opt_O;
        sysfault("$pgmtail: output file may not be input file -- use -o instead
")
            if ($ofile eq $file);
    }

    open($xfsrc,"<$file") ||
        sysfault("$pgmtail: unable to open '%s' -- $!
",$file);

    $lno = 0;
    while ($bf = <$xfsrc>) {
        chomp($bf);
        $bf =~ s/\s+$//;

        $outoff = $lno;
        ++$lno;

        push(@outlist,$bf);

        # clang adds comments
        $ix = index($bf,"#");
        if ($ix >= 0) {
            $bf = substr($bf,0,$ix);
            $bf =~ s/\s+$//;
        }

        # look for ".type blah, @function"
        # NOTE: this always comes before the actual label line [we hope ;-)]
        if ($bf =~ /^\s+[.]type\s+([^,]+),\s*(\S+)/) {
            ($sym,$val) = ($1,$2);
            $val =~ s/^\@//;
            $sym_type{$sym} = $val;
            cmtprt("
");
            cmtprt("TYPE: %s --> %s
",$sym,$val);
            next;
        }

        # look for "label:"
        if ($bf =~ /^([a-z_A-Z][a-z_A-Z0-9]*):$/) {
            $sym = $1;
            next if ($sym_type{$sym} ne "function");

            $fnc = $sym;
            cmtprt("FUNCTION: %s
",$fnc);

            $fnx = {};
            $fnx_lookup{$sym} = $fnx;
            push(@fnxlist,$fnx);

            $fnx->{fnx_fnc} = $fnc;
            $fnx->{fnx_outoff} = $outoff;

            $uselook = {};
            $fnx->{fnx_used} = $uselook;

            $retlist = [];
            $fnx->{fnx_retlist} = $retlist;

            $fixlist = [];
            $fnx->{fnx_fixlist} = $fixlist;

            $sixlist = [];
            $fnx->{fnx_sixlist} = $sixlist;
            next;
        }

        # remember all registers used by function:
        while ($bf =~ /($regusergx)/gpo) {
            $sym = ${^MATCH};
            $val = $reguse_tobase{$sym};
            dbgprt(3,"dofile: REGUSE sym='%s' val='%s'
",$sym,$val);

            $uselook->{$sym} += 1;

            $uselook->{$val} += 1
                if ($val ne $sym);
        }

        # handle returns
        if ($bf =~ /^\s+ret/) {
            push(@$retlist,$outoff);
            next;
        }
        if ($bf =~ /^\s+rep[a-z]*\s+ret/) {
            push(@$retlist,$outoff);
            next;
        }

        # split up "movq 16(%rax),%rax" ...
        $ix = rindex($bf,",");
        next if ($ix < 0);

        # ... into "movq 16(%rax)"
        $lhs = substr($bf,0,$ix);
        $lhs =~ s/\s+$//;

        # check for "movq 16(%rsp)" -- this means that the function has/uses
        # more than six arguments (i.e. we may _not_ push/pop because it
        # wreaks havoc with positive offsets)
        # FIXME/CAE -- we'd have to adjust them by 8 which we don't do
        (undef,$rhs) = split(" ",$lhs);
        if ($rhs =~ /^(\d+)[(]%rsp[)]$/) {
            push(@$sixlist,$outoff);
            cmtprt("SIXARG: %s (line %d)
",$rhs,$lno);
        }

        # ... and "%rax"
        $rhs = substr($bf,$ix + 1);
        $rhs =~ s/^\s+//;

        # target must be a [simple] register [or source scan will blow up]
        # (e.g. we actually had "cmp %ebp,(%rax,%r14)")
        next if ($rhs =~ /[)]/);

        # ensure we have the "%" prefix
        next unless ($rhs =~ /^%/);

        # we only want the full 64 bit reg as target
        # (e.g. "mov (%rbx),%al" doesn't count)
        $val = $reguse_tobase{$rhs};
        if ($opt_a) {
            next if ($val ne $rhs);
        }
        else {
            next unless (defined($val));
        }

        # source operand must contain target [base] register
        next unless ($lhs =~ /$val/);
        ###cmtprt("1: %s,%s
",$lhs,$rhs);

        # source operand must be of the "right" type
        # FIXME/CAE -- we may need to revise this
        next unless ($lhs =~ /[(]/);
        cmtprt("NEEDFIX: %s,%s (line %d)
",$lhs,$rhs,$lno);

        # remember the place we need to fix for later
        $fix = {};
        push(@$fixlist,$fix);
        $fix->{fix_outoff} = $outoff;
        $fix->{fix_lhs} = $lhs;
        $fix->{fix_rhs} = $rhs;
    }

    close($xfsrc);

    # get total number of fixups
    foreach $fnx (@fnxlist) {
        $fixlist = $fnx->{fnx_fixlist};
        $fixtot += @$fixlist;
    }
    msgprt("$pgmtail: needs %d fixups
",$fixtot)
        if ($fixtot > 0);

    # fix each function
    foreach $fnx (@fnxlist) {
        cmtprt("
");
        cmtprt("FNC: %s
",$fnx->{fnx_fnc});

        $fixlist = $fnx->{fnx_fixlist};

        # get the fixup register
        ($fixreg,$fixrtyp) = regtmploc($fnx,$fixlist);

        # show number of return points
        {
            $retlist = $fnx->{fnx_retlist};
            cmtprt("  RET: %d
",scalar(@$retlist));
            last if (@$retlist >= 1);

            # NOTE: we display this warning because we may not be able to
            # handle all situations

            $relaxflg = (@$fixlist <= 0) || ($fixrtyp ne "P");
            last if ($relaxflg && $opt_q);

            errprt("$pgmtail: in file '%s'
",$file);
            errprt("$pgmtail: function '%s' has no return points
",
                $fnx->{fnx_fnc});
            errprt("$pgmtail: suggest recompile with correct options
");

            if (@$fixlist <= 0) {
                errprt("$pgmtail: working around because function needs no fixups
");
                last;
            }

            if ($fixrtyp ne "P") {
                errprt("$pgmtail: working around because fixup reg does not need to be saved
");
                last;
            }
        }

        # show stats on register usage in function
        $uselook = $fnx->{fnx_used};
        @uselist = sort(keys(%$uselook));
        cmtprt("  USED:
");
        %avail = %reguse_isbase;
        foreach $sym (@uselist) {
            $val = $uselook->{$sym};

            $typ = $regtmp_type{$sym};
            $typ = sprintf(" (TYPE: %s)",$typ)
                if (defined($typ));

            cmtprt("    %s used %d%s
",$sym,$val,$typ);
            $val = $reguse_tobase{$sym};
            delete($avail{$val});
        }

        # show function's available [unused] registers
        @uselist = keys(%avail);
        @uselist = sort(regusesort @uselist);
        if (@uselist > 0) {
            cmtprt("  AVAIL:
");
            foreach $sym (@uselist) {
                $typ = $regtmp_type{$sym};
                $typ = sprintf(" (TYPE: %s)",$typ)
                    if (defined($typ));
                cmtprt("    %s%s
",$sym,$typ);
            }
        }

        # skip over any functions that don't need fixing _and_ have a temp
        # register
        if (@$fixlist <= 0 && (! $opt_P)) {
            next if (defined($fixreg));
        }

        msgprt("$pgmtail: function %s
",$fnx->{fnx_fnc});

        # skip function because we don't have a fixup register but report it
        # here
        unless (defined($fixreg)) {
            $bf = (@$fixlist > 0) ? "FATAL" : "can be ignored -- no fixups needed";
            msgprt("$pgmtail: FIXNOREG (%s)
",$bf);
            cmtprt("  FIXNOREG (%s)
",$bf);
            next;
        }

        msgprt("$pgmtail: FIXREG --> %s (TYPE: %s)
",$fixreg,$fixrtyp);
        cmtprt("  FIXREG --> %s (TYPE: %s)
",$fixreg,$fixrtyp);

        foreach $fix (@$fixlist) {
            $outoff = $fix->{fix_outoff};

            undef(@fix);
            cmtprt("  FIXOLD %s
",$outlist[$outoff]);

            # original
            if ($opt_l) {
                $bf = sprintf("%s,%s",$fix->{fix_lhs},$fixreg);
                push(@fix,$bf);
                $bf = sprintf("	mov	%s,%s",$fixreg,$fix->{fix_rhs});
                push(@fix,$bf);
            }

            # use lea
            else {
                ($xop,$arg) = split(" ",$fix->{fix_lhs});
                $bf = sprintf("	lea		%s,%s",$arg,$fixreg);
                push(@fix,$bf);
                $bf = sprintf("	%s	(%s),%s",$xop,$fixreg,$fix->{fix_rhs});
                push(@fix,$bf);
            }

            foreach $bf (@fix) {
                cmtprt("  FIXNEW %s
",$bf);
            }

            $outlist[$outoff] = [@fix];
        }

        unless ($opt_P) {
            next if ($fixrtyp ne "P");
        }

        # fix the function prolog
        $outoff = $fnx->{fnx_outoff};
        $lhs = $outlist[$outoff];
        $rhs = sprintf("	push	%s",$fixreg);
        $bf = [$lhs,$rhs,""];
        $outlist[$outoff] = $bf;

        # fix the function return points
        $retlist = $fnx->{fnx_retlist};
        foreach $outoff (@$retlist) {
            $rhs = $outlist[$outoff];
            $lhs = sprintf("	pop	%s",$fixreg);
            $bf = ["",$lhs,$rhs];
            $outlist[$outoff] = $bf;
        }
    }

    open($xfdst,">$ofile") ||
        sysfault("$pgmtail: unable to open '%s' -- $!
",$ofile);

    # output all the assembler text
    foreach $bf (@outlist) {
        # ordinary line
        unless (ref($bf)) {
            print($xfdst $bf,"
");
            next;
        }

        # apply a fixup
        foreach $rhs (@$bf) {
            print($xfdst $rhs,"
");
        }
    }

    # output all our reasoning as comments at the bottom
    foreach $bf (@cmtprt) {
        if ($bf eq "") {
            print($xfdst $cmtchr,$bf,"
");
        }
        else {
            print($xfdst $cmtchr," ",$bf,"
");
        }
    }

    close($xfdst);

    # get difference
    if (defined($opt_D)) {
        system("diff -u $file $ofile >> $opt_D");
    }

    # install fixed/modified file
    {
        last unless ($opt_o || defined($opt_O));
        last if ($fatal);
        msgprt("$pgmtail: installing ...
");
        rename($ofile,$file);
    }
}

# regtmpall -- define all temporary register candidates
sub regtmpall
{

    dbgprt(1,"regtmpall: ENTER
");

    regtmpdef("%r11","T");

    # NOTES:
    # (1) see notes on %r10 in ABI at bottom -- should we use it?
    # (2) a web search on "shared chain" and "x86" only produces 28 results
    # (3) some gcc code uses it as an ordinary register
    # (4) so, use it unless told not to
    regtmpdef("%r10","T")
        unless ($opt_n10);

    # argument registers (a6-a1)
    regtmpdef("%r9","A6");
    regtmpdef("%r8","A5");
    regtmpdef("%rcx","A4");
    regtmpdef("%rdx","A3");
    regtmpdef("%rsi","A2");
    regtmpdef("%rdi","A1");

    # callee preserved registers
    regtmpdef("%r15","P");
    regtmpdef("%r14","P");
    regtmpdef("%r13","P");
    regtmpdef("%r12","P");

    dbgprt(1,"regtmpall: EXIT
");
}

# regtmpdef -- define usable temp registers
sub regtmpdef
{
    my($sym,$typ) = @_;

    dbgprt(1,"regtmpdef: SYM sym='%s' typ='%s'
",$sym,$typ);

    push(@regtmplist,$sym);
    $regtmp_type{$sym} = $typ;
}

# regtmploc -- locate temp register to fix problem
sub regtmploc
{
    my($fnx,$fixlist) = @_;
    my($sixlist);
    my($uselook);
    my($regrhs);
    my($fixcnt);
    my($coretyp);
    my($reglhs,$regtyp);

    dbgprt(2,"regtmploc: ENTER fnx_fnc='%s'
",$fnx->{fnx_fnc});

    $sixlist = $fnx->{fnx_sixlist};
    $fixcnt = @$fixlist;
    $fixcnt = 1
        if ($opt_P);

    $uselook = $fnx->{fnx_used};

    foreach $regrhs (@regtmplist) {
        dbgprt(2,"regtmploc: TRYREG regrhs='%s' uselook=%d
",
            $regrhs,$uselook->{$regrhs});

        unless ($uselook->{$regrhs}) {
            $regtyp = $regtmp_type{$regrhs};
            $coretyp = $regtyp;
            $coretyp =~ s/\d+$//;

            # function uses stack arguments -- we can't push/pop
            if (($coretyp eq "P") && (@$sixlist > 0)) {
                dbgprt(2,"regtmploc: SIXREJ
");
                next;
            }

            if (defined($opt_N)) {
                dbgprt(2,"regtmploc: TRYREJ opt_N='%s' regtyp='%s'
",
                    $opt_N,$regtyp);
                next if ($opt_N =~ /$coretyp/);
            }

            $reglhs = $regrhs;
            last;
        }
    }

    {
        last if (defined($reglhs));

        errprt("regtmploc: unable to locate usable fixup register for function '%s'
",
            $fnx->{fnx_fnc});

        last if ($fixcnt <= 0);

        $fatal = 1;
    }

    dbgprt(2,"regtmploc: EXIT reglhs='%s' regtyp='%s'
",$reglhs,$regtyp);

    ($reglhs,$regtyp);
}

# regusejoin -- get regex for all registers
sub regusejoin
{
    my($reg);

    dbgprt(1,"regusejoin: ENTER
");

    # rax
    foreach $reg (qw(a b c d))  {
        regusedef($reg,"r_x","e_x","_l","_h");
    }

    #   rdi/rsi
    foreach $reg (qw(d s)) {
        regusedef($reg,"r_i","e_i","_i","_il");
    }

    # rsp/rbp
    foreach $reg (qw(b s)) {
        regusedef($reg,"r_p","e_p");
    }

    foreach $reg (8,9,10,11,12,13,14,15) {
        regusedef($reg,"r_","r_d","r_w","r_b");
    }

    $regusergx = join("|",reverse(sort(@reguse)));

    dbgprt(1,"regusejoin: EXIT regusergx='%s'
",$regusergx);
}

# regusedef -- define all registers
sub regusedef
{
    my(@argv) = @_;
    my($mid);
    my($pat);
    my($base);

    $mid = shift(@argv);

    dbgprt(1,"regusedef: ENTER mid='%s'
",$mid);

    foreach $pat (@argv) {
        $pat = "%" . $pat;
        $pat =~ s/_/$mid/;
        $base //= $pat;
        dbgprt(1,"regusedef: PAT pat='%s' base='%s'
",$pat,$base);

        push(@reguse,$pat);
        $reguse_tobase{$pat} = $base;
    }

    $reguse_isbase{$base} = 1;

    dbgprt(1,"regusedef: EXIT
");
}

# regusesort -- sort base register names
sub regusesort
{
    my($symlhs,$numlhs);
    my($symrhs,$numrhs);
    my($cmpflg);

    {
        ($symlhs,$numlhs) = _regusesort($a);
        ($symrhs,$numrhs) = _regusesort($b);

        $cmpflg = $symlhs cmp $symrhs;
        last if ($cmpflg);

        $cmpflg = $numlhs <=> $numrhs;
    }

    $cmpflg;
}

# _regusesort -- split up base register name
sub _regusesort
{
    my($sym) = @_;
    my($num);

    if ($sym =~ s/(\d+)$//) {
        $num = $1;
        $num += 0;
        $sym =~ s/[^%]/z/g;
    }

    ($sym,$num);
}

# optget -- get options
sub optget
{
    my($argv) = @_;
    my($bf);
    my($sym,$val);
    my($dft,%dft);

    foreach $sym (qw(a l n10 P q o s T)) {
        $dft{$sym} = 1;
    }
    $dft{"N"} = "T";
    $dft{"D"} = "DIFF";

    while (1) {
        $bf = $argv->[0];
        $sym = $bf;

        last unless ($sym =~ s/^-//);
        last if ($sym eq "-");

        shift(@$argv);

        {
            if ($sym =~ /([^=]+)=(.+)$/) {
                ($sym,$val) = ($1,$2);
                last;
            }

            if ($sym =~ /^(.)(.+)$/) {
                ($sym,$val) = ($1,$2);
                last;
            }

            undef($val);
        }

        $dft = $dft{$sym};
        sysfault("$pgmtail: unknown option -- '%s'
",$bf)
            unless (defined($dft));

        $val //= $dft;

        ${"opt_" . $sym} = $val;
    }
}

# cmtprt -- transformation comments
sub cmtprt
{

    $_ = shift(@_);
    $_ = sprintf($_,@_);
    chomp($_);
    push(@cmtprt,$_);
}

# msgprt -- progress output
sub msgprt
{

    printf(STDERR @_);
}

# errprt -- show errors
sub errprt
{

    cmtprt(@_);
    printf(STDERR @_);
}

# sysfault -- abort on error
sub sysfault
{

    printf(STDERR @_);
    exit(1);
}

# dbgprt -- debug print
sub dbgprt
{

    $_ = shift(@_);
    goto &_dbgprt
        if ($opt_T >= $_);
}

# _dbgprt -- debug print
sub _dbgprt
{

    printf(STDERR @_);
}

UPDATE:

I've updated the script to fix bugs, add more checking, and more options. Note: I had to remove ABI at bottom to fit 30,000 limit.

Otherwise weird results appear on other commands with parentheses for example cmpl %ebp, (%rax,%r14) splits into lhs='cmpl %ebp, (%rax' and rhs='%r14)' which in turn causes /$rhs/ to fail.

Yes, that was a bug. Fixed.

Your $rhs =~ /%[er](.x|\d+)/ doesn't match byte or word loads to di, or ax. That's unlikely, though. Oh, also, I think it fails to match rdi / rsi. so you don't need the trailing d in r10d

Fixed. Finds all variants.

Wow, I assumed something like this would have to happen at compile time, and that doing it after the fact would be too messy.

Shameless plug: Thanks for the "Wow!". perl is great for messy jobs like this. I've written assembler "injection" scripts like this before. (e.g.) Back in the day [before compiler support] to add profiling calls.

You could mark %r10 as another call-preserved register.

After doing a few web searches, I could only find about 84 matches on "static chain" x86. The only one of relevance was the x86 ABI. And, it offered no explanation other than to mention it as a footnote. Also, some gcc code uses r10 without any save as a callee register. So, I've now defaulted the program to use r10 (with a command line option to disable it if desired).

What happens if a function already uses all the registers?

If it's truly all, then we're out of luck. The script will detect and report this and suppress fixups if it can't find a spare register.

And, it will use a "callee must preserve" register by injecting a push as the 1st inst of the function and a corresponding pop just before the ret inst [can have multiple]. This can be disabled with an option.

You can't just push/pop, because that steps on the red-zone

No, it does not. For a few reasons:

(1) Almost as a side note: The red zone is only useful in leaf functions. Otherwise, if fncA calls fncB, the mere act of doing this by fncA would step on it's own red zone. See compile options in the script's top comment block.

(2) More importantly, because of the way that the push/pop is injected. The push occurs before any other insts. The pop occurs after any other insts [just before the ret].

The red zone is still there--intact. It is just offset by -8 from where it would have been otherwise. All red zone activity is preserved because these insts use negative offsets from %rsp

This is different than a push/pop done inside an inline asm block. The usual case is red zone code doing (e.g.) mov $23,-4(%rsp). An inline asm block coming afterwards that does a push/pop would step on that.

Some functions to show this:

# function_original -- original function before pebsfixup
# RETURNS: 23
function_original:
    mov     $23,-4(%rsp)                # red zone code generated by compiler
    ...
    mov     -4(%rsp),%rax               # will still have $23
    ret

# function_pebsfixup -- pebsfixup modified
# RETURNS: 23
function_pebsfixup:
    push    %r12                        # pebsfixup injected

    mov     $23,-4(%rsp)                # red zone code generated by compiler
    ...
    mov     -4(%rsp),%rax               # will still have $23

    pop     %r12                        # pebsfixup injected
    ret

# function_inline -- function with inline asm block and red zone
# RETURNS: unknown value
function_inline:
    mov     $23,-4(%rsp)                # red zone code generated by compiler

    # inline asm block -- steps on red zone
    push    %rdx
    push    %rcx
    ...
    pop     %rcx
    pop     %rdx

    ...

    mov     -4(%rsp),%rax               # now -4(%rsp) no longer has $23

    ret

Where the push/pop does get us in trouble is if the function uses more than six arguments (i.e. args 7+ are on the stack). Access to these arguments use a positive offset from %rsp:

mov     32(%rsp),%rax

With our "trick" push, the offset will be incorrect. The correct offset would now be 8 higher:

mov     40(%rsp),%rax

The script will detect this and complain. But, it does not [yet] adjust the positive offsets as this case is low probability. It would probably take about five more lines of code to fix this. Punting for now ...

Keep target address of load in register until instruction is retired

Tags:

c

assembly

intel

cpu-registers

performancecounter

Cimbali

1 Answers

Craig Estey

Recent Activity

Donate For Us

Keep target address of load in register until instruction is retired

Tags:

c

assembly

intel

cpu-registers

performancecounter

Cimbali

1 Answers

Craig Estey

Related questions

Recent Activity

Donate For Us