Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP regex to detect pagination

I'm re-writting a route handling class for a MVC based site in PHP and need a regex to detect a pagination string in the URL. The pagination string is formed of three different parts;

  • Page number detection: /page/[NUMERIC]/
  • Items per page detection: /per_page/[NUMERIC]/
  • Ordering detection: /sort/[ALMOST_ANY_CHARACTER]/[asc or desc]/

Due to the way it was previously developed, these three parts can be in any order. There are a number of existing links which I need to keep working plus the code used to handle pagination (no plans for a re-write yet) - so changing the pagination code to always generate a consistent url isn't possible.

Therefore, I need to build a regex pattern to detect every possible combination of the pagination structure. I have three patterns to detect each part, which are as follows:

  • Page number detection: (page/\d+)
  • Items per page detection: (per_page/\d+)
  • Ordering detection: (sort/([a-zA-Z0-9\.\-_%=]+)/(asc|desc))

Being new to writing complex (well this is complex to me anyway!) regex patterns, the only I can think of doing it is two combine the three patterns I have for each of the url structures (eg /pagenum/ordering/perpage/, /pagenum/perpage/ordering/) and using the | operator as an 'or' statement.

Is there a better / more efficient way of doing this?

I am running the regex using preg_match.

like image 262
gbuckingham89 Avatar asked Dec 18 '25 14:12

gbuckingham89


2 Answers

You could use lookaheads. After a lookahead is completely matched position of the regex engine jumps back to where it start (that's why it's called *look*ahead; it doesn't actually advance the position in the subject string or include anything in the match). Since you don't know when the desired part occurs, start all three lookaheads from the beginning of the string, and prepend the capturing groups with .* to allow an arbitrary position:

^(?=.*(page/\d+))(?=.*(per_page/\d+))(?=.*(sort/([a-zA-Z0-9\.\-_%=]+)/(asc|desc)))

You can maybe even switch around the capturing groups a bit:

preg_match(
  '~^(?=.*page/(\d+))(?=.*per_page/(\d+))(?=.*sort/([a-zA-Z0-9\.\-_%=]+)/(asc|desc))~', 
  $input,
  $match
);

Now the captures will be:

$match[1] => page number
$match[2] => items per page
$match[3] => sort key
$match[4] => sort order

If any of these can be optional, you can simply make the entire lookahead optional with ?.

like image 102
Martin Ender Avatar answered Dec 20 '25 08:12

Martin Ender


You could use lookaheads, but unless I'm missing something, I don't think it's necessary here -- you probably can just use the OR operator:

(/(page/\d+)|/(per_page/\d+)|/(sort/([a-zA-Z0-9\.\-_%=]+)/(asc|desc)))+

The outer group here searches for 1 or more instances of any group 1 OR group 2 OR group 3.

More URL routing tips:

This general approach may actually allow you to simplify things a bit, too. Rather than defining all the rules for your route in the Regex, check first certain types of actions then handle them in code. The simplest version:

(/(page|per_page)/([\d+]))+

Now (for each outer-group match) you'll get a match list containing an "action" and a "value". Switch on the action, process the value accordingly.

To handle sort as you've spec'd it (two value parameters instead of one), we'll add another layer.. and to make it more interesting, let's say you decide to add a fourth action, search, which searches a specific field for some content:

(/(page|per_page)/([\d+])|/(sort|search)/([^/]+)/([^/]+))+

Again, when evaluating your match list, check for the action first -- depending on which action it is, you'll know how many successive match values to process.

Hope that's helpful.

like image 42
Brian Lacy Avatar answered Dec 20 '25 08:12

Brian Lacy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!