I have a very, very large text file that I am working on that has lines with various sizes of indentations. Those lines that are acceptable have 12-character-width of indentations, which are created by combination of tabs and spaces. Now I want to get all the lines that do not have 12-character-width of indentation, and those lines have anywhere from 0-to-11-character-width of indentations from combinations of tabs and space-chars.
if $badLine !~~ m/ ^^ [\s ** 12 ||
\t \s ** 4 ||
\s \t \s ** 3 ] / { say $badLine; }
But the problem is that when you are working on a text file with a word processor, pressing a tab key can give you anywhere from 0 to 8 space-char-width to fill the gap. What would be a smart way to get all those non-acceptable lines that do not have 12-char-width indentations?
Thanks.
For an indentation width of 12, assuming that tab stops are at positions 0, 8, 16 etc:
for $input.lines {
.say if not /
^ # start of line
[" " ** 8 || " " ** 0..7 \t] # whitespace up to first tab stop
[" " ** 4] # whitespace up to position 12
[\S | $] # non-space character or end of line
/;
}
Explanation:
To get from the start of the line (position 0) to the first tab stop (position 8), there are two possibilities we need to match:
The only way to get from the tab stop (position 8) to the indentation target (position 12), is 4 spaces. (A tab would jump past the target to the next tab stop at position 16.)
Anchoring to the start of the line, and to whatever comes after the indentation, is important so that we don't accidentally match part of a longer indentation.
The indentation matching can be factored out into a parameterized named token that can handle arbitrary widths:
my token indent ($width) {
[" " ** 8 || " " ** 0..7 \t] ** {$width div 8}
" " ** {$width % 8}
}
.say if not /^ <indent(12)> [\S | $]/ for $input.lines;
Explanation:
The same expression as above is used to get to the first tab stop, but now it is repeated as many times as necessary to get to the last tab stop before the target. ($width div 8
times in total, where div
is the integer division operator).
Whatever distance is left between the last tab stop and the target, must be filled with spaces. ($width % 8
spaces, where %
is the modulo operator.)
The token in the above example assumes that it starts matching at a tab stop position (such as the start of the line). It can be further generalized to match a given width of tabs and spaces, no matter where in the line you call it:
my token indent ($width) {
:my ($before-first-stop, $numer-of-stops, $after-last-stop);
{
$before-first-stop = min $width, 8 - $/.from % 8;
$numer-of-stops = ($width - $before-first-stop) div 8;
$after-last-stop = ($width - $before-first-stop) % 8;
}
[" " ** {$before-first-stop} || " " ** {^$before-first-stop} \t]
[" " ** 8 || " " ** 0..7 \t] ** {$numer-of-stops}
" " ** {$after-last-stop}
}
Explanation:
Same principle as before, except that now we first need to match as many spaces as necessary to get from the current position in the string to the first tab stop that follows it.
The current position in the string is given by $/.from
; the rest is simple arithmetic.
A few lexical variables (with hopefully descriptive names) are declared and used inside the token, to make the code easier to follow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With