Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to introspect regexes in the Perl API

I'm working on some code that needs to serialize Perl regexes, including any regex flags. Only a subset of flags are supported, so I need to detect when unsupported flags like /u are in the regex object.

The current version of the code does this:

static void serialize_regex_flags(buffer *buf, SV *sv) {
  char flags[] = {0,0,0,0,0,0};
  unsigned int i = 0, f = 0;
  STRLEN string_length;
  char *string = SvPV(sv, string_length);

Then manually processes string char-by-char to find flags.

The problem here is that the stringification of regex flags changed (I think in Perl 5.14) from e.g. (?i-xsm:foo) to (?^i:foo), which makes parsing a pain.

I could check the version of perl, or just write the parser to handle both cases, but something tells me there must be a superior method of introspection available.

like image 219
friedo Avatar asked Aug 07 '12 17:08

friedo


2 Answers

In Perl, you'd use re::regexp_pattern.

 my $re = qr/foo/i;
 my ($pat, $mods) = re::regexp_pattern($re);
 say $pat;   # foo
 say $mods;  # i

As you can see from the source of regexp_pattern, there's no function in the API to obtain that information, so I recommend that you call that function too from XS too.

perlcall covers calling Perl functions from C. I came up with the following untested code:

/* Calls re::regexp_pattern to extract the pattern
 * and flags from a compiled regex.
 *
 * When re isn't a compiled regex, returns false,
 * and *pat_ptr and *flags_ptr are set to NULL.
 *
 * The caller must free() *pat_ptr and *flags_ptr.
 */

static int regexp_pattern(char ** pat_ptr, char ** flags_ptr, SV * re) {
   dSP;
   int count;
   ENTER;
   SAVETMPS;
   PUSHMARK(SP);
   XPUSHs(re);
   PUTBACK;
   count = call_pv("re::regexp_pattern", G_ARRAY);
   SPAGAIN;

   if (count == 2) {
      /* Pop last one first. */
      SV * flags_sv = POPs;
      SV * pat_sv   = POPs;

      /* XXX Assumes no NUL in pattern */
      char * pat   = SvPVutf8_nolen(pat_sv); 
      char * flags = SvPVutf8_nolen(flags_sv);

      *pat_ptr   = strdup(pat);
      *flags_ptr = strdup(flags);
   } else {
      *pat_ptr   = NULL;
      *flags_ptr = NULL;
   }

   PUTBACK;
   FREETMPS;
   LEAVE;

   return *pat_ptr != NULL;
}

Usage:

SV * re = ...;

char * pat;
char * flags;
regexp_pattern(&pat, &flags, re);
like image 149
ikegami Avatar answered Sep 30 '22 18:09

ikegami


use Data::Dump::Streamer ':util';
my ($pattern, $flags) = regex( qr/foo/i );
print "pattern: $pattern, flags: $flags\n";
# pattern: foo, flags: i

But if you are trying to restrict more recent features, you have a lot more work to do than just checking for /u.

like image 27
ysth Avatar answered Sep 30 '22 20:09

ysth