I'm trying to write a tool that will take as input some C code containing structs. It will compile the code, then find and output the size and offset of any padding the compiler decides to add to structs within it. This is pretty straightforward to do by hand for a known struct using offsetof, sizeof, and some addition, but I can't figure out an easy way to do it automatically for any input struct.
If I knew how to iterate through all elements in a struct, I think I could get the tool written with no problems, but as far as I know there's no way to do that. I'm hoping some StackOverflow people will know a way. However, I'm not stuck in my approach, and I'm certainly open to any alternate approaches to finding padding in a struct.
In C language, sizeof() operator is used to calculate the size of structure, variables, pointers or data types, data types could be pre-defined or user-defined. Using the sizeof() operator we can calculate the size of the structure straightforward to pass it as a parameter.
pad = (-size)&3; This should be the fastest.
The sizeof for a struct is not always equal to the sum of sizeof of each individual member. This is because of the padding added by the compiler to avoid alignment issues. Padding is only added when a structure member is followed by a member with a larger size or at the end of the structure.
Structure padding is a concept in C that adds the one or more empty bytes between the memory addresses to align the data in memory.
Isn't this what pahole does?
Say you have the following module.h
:
typedef void (*handler)(void);
struct foo {
char a;
double b;
int c;
};
struct bar {
float y;
short z;
};
A Perl program to generate unpack
templates begins with the customary front matter:
#! /usr/bin/perl
use warnings;
use strict;
sub usage { "Usage: $0 header\n" }
With structs
, we feed the header to ctags
and from its output collect struct members. The result is a hash whose keys are names of structs and whose values are arrays of pairs of the form [$member_name, $type]
.
Note that it handles only a few C types.
sub structs {
my($header) = @_;
open my $fh, "-|", "ctags", "-f", "-", $header
or die "$0: could not start ctags";
my %struct;
while (<$fh>) {
chomp;
my @f = split /\t/;
next unless @f >= 5 &&
$f[3] eq "m" &&
$f[4] =~ /^struct:(.+)/;
my $struct = $1;
die "$0: unknown type in $f[2]"
unless $f[2] =~ m!/\^\s*(float|char|int|double|short)\b!;
# [ member-name => type ]
push @{ $struct{$struct} } => [ $f[0] => $1 ];
}
wantarray ? %struct : \%struct;
}
Assuming that the header can be included by itself, generate_source
generates a C program that prints offsets to the standard output, fills structs with dummy values, and writes the raw structures to the standard output preceded by their respective sizes in bytes.
sub generate_source {
my($struct,$header) = @_;
my $path = "/tmp/my-offsets.c";
open my $fh, ">", $path
or die "$0: open $path: $!";
print $fh <<EOStart;
#include <stdio.h>
#include <stddef.h>
#include <$header>
void print_buf(void *b, size_t n) {
char *c = (char *) b;
printf("%zd\\n", n);
while (n--) {
fputc(*c++, stdout);
}
}
int main(void) {
EOStart
my $id = "a1";
my %id;
foreach my $s (sort keys %$struct) {
$id{$s} = $id++;
print $fh "struct $s $id{$s};\n";
}
my $value = 0;
foreach my $s (sort keys %$struct) {
for (@{ $struct->{$s} }) {
print $fh <<EOLine;
printf("%lu\\n", offsetof(struct $s,$_->[0]));
$id{$s}.$_->[0] = $value;
EOLine
++$value;
}
}
print $fh qq{printf("----\\n");\n};
foreach my $s (sort keys %$struct) {
print $fh "print_buf(&$id{$s}, sizeof($id{$s}));\n";
}
print $fh <<EOEnd;
return 0;
}
EOEnd
close $fh or warn "$0: close $path: $!";
$path;
}
Generate a template for unpack
where the parameter $members
is a value in the hash returned by structs
that has been augmented with offsets (i.e., arrayrefs of the form [$member_name, $type, $offset]
:
sub template {
my($members) = @_;
my %type2tmpl = (
char => "c",
double => "d",
float => "f",
int => "i!",
short => "s!",
);
join " " =>
map '@![' . $_->[2] . ']' . $type2tmpl{ $_->[1] } =>
@$members;
}
Finally, we reach the main program where the first task is to generate and compile the C program:
die usage unless @ARGV == 1;
my $header = shift;
my $struct = structs $header;
my $src = generate_source $struct, $header;
(my $cmd = $src) =~ s/\.c$//;
system("gcc -I`pwd` -o $cmd $src") == 0
or die "$0: gcc failed";
Now we read the generated program's output and decode the structs:
my @todo = map @{ $struct->{$_} } => sort keys %$struct;
open my $fh, "-|", $cmd
or die "$0: start $cmd failed: $!";
while (<$fh>) {
last if /^-+$/;
chomp;
my $m = shift @todo;
push @$m => $_;
}
if (@todo) {
die "$0: unfilled:\n" .
join "" => map " - $_->[0]\n", @todo;
}
foreach my $s (sort keys %$struct) {
chomp(my $length = <$fh> || die "$0: unexpected end of input");
my $bytes = read $fh, my($buf), $length;
if (defined $bytes) {
die "$0: unexpected end of input" unless $bytes;
print "$s: @{[unpack template($struct->{$s}), $buf]}\n";
}
else {
die "$0: read: $!";
}
}
Output:
$ ./unpack module.h bar: 0 1 foo: 2 3 4
For reference, the C program generated for module.h
is
#include <stdio.h>
#include <stddef.h>
#include <module.h>
void print_buf(void *b, size_t n) {
char *c = (char *) b;
printf("%zd\n", n);
while (n--) {
fputc(*c++, stdout);
}
}
int main(void) {
struct bar a1;
struct foo a2;
printf("%lu\n", offsetof(struct bar,y));
a1.y = 0;
printf("%lu\n", offsetof(struct bar,z));
a1.z = 1;
printf("%lu\n", offsetof(struct foo,a));
a2.a = 2;
printf("%lu\n", offsetof(struct foo,b));
a2.b = 3;
printf("%lu\n", offsetof(struct foo,c));
a2.c = 4;
printf("----\n");
print_buf(&a1, sizeof(a1));
print_buf(&a2, sizeof(a2));
return 0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With