Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string (or regex match) at position/index of nth character in Perl?

There is a similarly worded question, but I think this is slightly different.

Basically, say I have this string:

"aa{bb{dccd"

Here I would like to split the string at the last brace {; and have the parts returned as an array. I can easily find the position (0-based index) of this character using rindex:

perl -e '
$aa="aa{bb{dccd" ;
$ri = rindex($aa, "{") ;
print "$ri\n"; '

5

... and given that I'm not a Perl coder, first thing I think of is to use something like $str = split($aa, 3). Unfortunately, that is not correct syntax - split takes a regex as first argument (what to match for), and string as second - and it doesn't take an integer position index as argument.

I found posts like Perl Guru Forums: Perl Programming Help: Intermediate: split or splice string on char count?, which recommend using substr in a similar context; however, I'd have to write two substrs to populate the list as per the example above, and so I'd rather hear about alternatives to substr.

Basically, if the problem of matching the position of N-th character can be expressed as a regex match, the split could work just as well - so that would be my primary question. However, I'd also be interested in hearing if there are Perl built-in functions that could accept a list/array of integers specifying character positions, and return an array containing the split sections.

EDIT:

To summarize the above - I'd like to have the character indexes, because I'd like to print them out for debugging; and at the same time, use them for splitting a string into array - but without using substrs.

EDIT2: I just realized that I left something out from the OP -- and that is, that in the problem that I'm working on, I have to first retrieve character indexes (by rindex or otherwise); then I have to do calculations on them (so they may increase, or decrease) - and only then am I supposed to split the string (based on the new index values). It may have been that my original example was too simple, and didn't express this focus on indexes/character positions much (and not to mention that my first thought of split implies character indexes anyways - but I really cannot remember which programming language it came from :))

like image 921
sdaau Avatar asked Nov 01 '25 06:11

sdaau


2 Answers

You wrote:

I'd also be interested in hearing if there are Perl built-in functions that could accept a list/array of integers specifying character positions, and return an array containing the split sections.

To create a function that takes a list of offsets and produces a list of substrings with those split positions, convert the offsets to lengths and pass these as an argument to unpack.

There’s a &cut2fmt function in Chapter 1 of the Perl Cookbook that does this very thing. Here is an excerpt, reproduced here by kind permission of the author:

Sometimes you prefer to think of your data as being cut up at specific columns. For example, you might want to place cuts right before positions 8, 14, 20, 26, and 30. Those are the column numbers where each field begins. Although you could calculate that the proper unpack format is "A7 A6 A6 A6 A4 A*", this is too much mental strain for the virtuously lazy Perl programmer. Let Perl figure it out for you. Use the cut2fmt function below:

sub cut2fmt {
      my(@positions) = @_;
      my $template   = '';
      my $lastpos    = 1;
      foreach $place (@positions) {
          $template .= "A" . ($place - $lastpos) . " ";
          $lastpos   = $place;
      }
      $template .= "A*";
      return $template;
  }

  $fmt = cut2fmt(8, 14, 20, 26, 30);
  print "$fmt\n";

  A7 A6 A6 A6 A4 A*

So the way you would use that is like this:

$fmt = cut2fmt(8, 14, 20, 26, 30);
@list = unpack($fmt, $string);

or directly as

@list = unpack(cut2fmt(8, 14, 20, 26, 30), $string);

I believe this is what you were asking for.

like image 187
tchrist Avatar answered Nov 04 '25 00:11

tchrist


my ($pre, $post) = split /\{(?!.*\{)/s, $s;

or

my ($pre, $post) = $s =~ /^(.*)\{(.*)/s;

The second is probably better.

If you need the index of the {, use length($pre). (With the second solution, you could also use $-[2] - 1. See @- and @+ in perlvar.)

like image 27
ikegami Avatar answered Nov 03 '25 22:11

ikegami



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!