Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex for numbers on multiple lines php

Tags:

regex

php

I have a file that looks like this (yes the line breaks are right):

39                                              9
30 30 30 31 34 30 30 32 33 32 36 30 31 38 0D 0A 00014002326018..
39 30 30 30 31 34 30 30 32 33 32 36 30 35 34 0D 900014002326054.
0A                                              .
39 30 30 30 31 34 30 30 32 33 32 36 30 39 31 0D 900014002326091.
0A                                              .
39 30 30 30 31 34 30 30 32 33 32 36 31 36 33 0D 900014002326163.
0A                                              .
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 32 30 30 0D 0A                            26200..
39                                              9
30 30 30 31 34 30 30 32 33 32 36 32 30 30 0D 0A 00014002326200..
39 30 30 30 31 34 30 30 32 33 32 36 31 32 32 0D 900014002326122.
0A                                              .
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 31 35 34 0D 0A                            26154..
39 30 30 30 31 34 30 30 32 33                   9000140023
32 36 31 33 31 0D 0A                            26131..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 31 30 34 0D 0A                            26104..
39 30 30 30 31 34 30 30 32 33 32 36 30 39 30 0D 900014002326090.
0A                                              .
39 30 30 30 31 34 30 30 32 33 32 36 31 39 37 0D 900014002326197.
0A                                              .
39                                              9
30 30 30 31 34 30 30 32 33 32 36 32 30 38 0D 0A 00014002326208..
39 30 30 30 31 34 30 30 32 33                   9000140023
32 36 31 31 35 0D 0A                            26115..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 31 36 34 0D 0A                            26164..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 30 31 36 0D 0A 39 30 30 30 31 34 30 30 32 26016..900014002
33                                              3
32 36 32 34 36 0D 0A                            26246..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 32 34 36 0D 0A                            26246..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 30 37 39 0D 0A                            26079..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 31 32 30 0D 0A                            26120..
39                                              9
30 30 30 31 34 30 30 32 33 32 36 32 32 38 0D 0A 00014002326228..
39 30 30 30 31 34 30 30 32 33                   9000140023
32 36 31 38 36 0D 0A                            26186..

I have this code that grabs the EID tags (the numbers that start with 9000) but I can't figure out how to get it to do multiple lines.

$data = file_get_contents('tags.txt');

$pattern = "/(\d{15})/i";

preg_match_all($pattern, $data, $tags);
$count = 0;
foreach ( $tags[0] as $tag ){

    echo $tag . '<br />';
    $count++;
}

echo "<br />" . $count . " total head scanned";

For example the first and second line should return 900014002326018 instead of ignoring the first and second line

I am not good with regular expressions, so if you could explain so I learn and stop having to have someone help me with simple regex, that would be awesome.

EDIT: The whole number is 15 digits starting with 9000

like image 296
Toby Joiner Avatar asked Dec 19 '22 20:12

Toby Joiner


2 Answers

You can do this:

$result = preg_replace('~\R?(?:[0-9A-F]{2}\h+)+~', '', $data);
$result = explode('..', rtrim($result, '.'));

pattern details:

\R?            # optional newline character
(?:            # open a non-capturing group
  [0-9A-F]{2}  # two hexadecimal characters
  \h+          # horizontal white characters (spaces or tabs)
)+             # repeat the non-capturing group one or more times

After this replacement the only content you must remove are the two dots. After removing the trailing dots, you can use these to explode the string to an array.

An other way

Since you know that there is always 48 characters before the part of integers (and dots), you can use this pattern too:

$result = preg_replace('~(?:^|\R).{48}~', '', $data);

An other way without regex

The idea is to read the file line by line and, since the length before the content is always the same (i.e. 16*3 characters -> 48 characters), extract the substring with the integer and concatenate it into the $data temporary variable.

ini_set("auto_detect_line_endings", true);
$data = '';
$handle = @fopen("tags.txt", "r");
if ($handle) {
    while (($buffer = fgets($handle, 128)) !== false) {
        $data .= substr($buffer, 48, -1);
    }
    if (!feof($handle)) {
        echo "Error: fgets() has failed\n";
    }
    fclose($handle);
} else {
    echo "Error opening the file\n";
}

$result = explode ('..', rtrim($data, '.'));

Note: if the file has a windows format (with the end of line \r\n) you must change the third parameter of the substr() function to -2. If you are interested by how to detect newlines type, you can take a look at this post.

like image 148
Casimir et Hippolyte Avatar answered Dec 24 '22 02:12

Casimir et Hippolyte


I don't think it's even possible to do this with a single regex, but your code will be far more legible and maintainable if you approach this one step at a time.

This works, and it shouldn't be too hard to figure out how it works:

$eid_tag_src = <<<END_EID_TAGS
39                                              9
30 30 30 31 34 30 30 32 33 32 36 30 31 38 0D 0A 00014002326018..
39 30 30 30 31 34 30 30 32 33 32 36 30 35 34 0D 900014002326054.
  :
 etc.
  :
39 30 30 30 31 34 30 30 32 33                   9000140023
32 36 31 38 36 0D 0A                            26186..
END_EID_TAGS;

/* Remove hex data from first 48 characters of each line */
$eid_tag_src = preg_replace('/^.{48}/m','',$eid_tag_src);

/* Remove all white space */
$eid_tag_src = preg_replace('/\s+/','',$eid_tag_src);

/* Replace dots (CRLF) with spaces */
$eid_tag_src = str_replace('..',' ',$eid_tag_src);

/* Convert to array of EID tags */
$eid_tags = explode(' ',trim($eid_tag_src));

print_r($eid_tags);

Here's the output:

Array
(
    [0] => 900014002326018
    [1] => 900014002326054
    [2] => 900014002326091
    [3] => 900014002326163
    [4] => 900014002326200
    [5] => 900014002326200
    [6] => 900014002326122
    [7] => 900014002326154
    [8] => 900014002326131
    [9] => 900014002326104
    [10] => 900014002326090
    [11] => 900014002326197
    [12] => 900014002326208
    [13] => 900014002326115
    [14] => 900014002326164
    [15] => 900014002326016
    [16] => 900014002326246
    [17] => 900014002326246
    [18] => 900014002326079
    [19] => 900014002326120
    [20] => 900014002326228
    [21] => 900014002326186
)
like image 22
r3mainer Avatar answered Dec 24 '22 00:12

r3mainer