How can I generate URL slugs in Perl?

Tags:

Web frameworks such as Rails and Django has built-in support for "slugs" which are used to generate readable and SEO-friendly URLs:

Slugs in Rails
Slugs in Django

A slug string typically contains only of the characters a-z, 0-9 and - and can hence be written without URL-escaping (think "foo%20bar").

I'm looking for a Perl slug function that given any valid Unicode string will return a slug representation (a-z, 0-9 and -).

A super trivial slug function would be something along the lines of:

$input = lc($input),
$input =~ s/[^a-z0-9-]//g;

However, this implementation would not handle internationalization and accents (I want ë to become e). One way around this would be to enumerate all special cases, but that would not be very elegant. I'm looking for something more well thought out and general.

My question:

What is the most general/practical way to generate Django/Rails type slugs in Perl? This is how I solved the same problem in Java.

552

asked Oct 24 '10 16:10

knorv

2 Answers

The slugify filter currently used in Django translates (roughly) to the following Perl code:

use Unicode::Normalize;

sub slugify($) {
    my ($input) = @_;

    $input = NFKD($input);         # Normalize (decompose) the Unicode string
    $input =~ tr/\000-\177//cd;    # Strip non-ASCII characters (>127)
    $input =~ s/[^\w\s-]//g;       # Remove all characters that are not word characters (includes _), spaces, or hyphens
    $input =~ s/^\s+|\s+$//g;      # Trim whitespace from both ends
    $input = lc($input);
    $input =~ s/[-\s]+/-/g;        # Replace all occurrences of spaces and hyphens with a single hyphen

    return $input;
}

Since you also want to change accented characters to unaccented ones, throwing in a call to unidecode (defined in Text::Unidecode) before stripping the non-ASCII characters seems to be your best bet (as pointed out by phaylon).

In that case, the function could look like:

use Unicode::Normalize;
use Text::Unidecode;

sub slugify_unidecode($) {
    my ($input) = @_;

    $input = NFC($input);          # Normalize (recompose) the Unicode string
    $input = unidecode($input);    # Convert non-ASCII characters to closest equivalents
    $input =~ s/[^\w\s-]//g;       # Remove all characters that are not word characters (includes _), spaces, or hyphens
    $input =~ s/^\s+|\s+$//g;      # Trim whitespace from both ends
    $input = lc($input);
    $input =~ s/[-\s]+/-/g;        # Replace all occurrences of spaces and hyphens with a single hyphen

    return $input;
}

The former works well for strings that are primarily ASCII, but falls short when the entire string is formed of non-ASCII characters, since they all get stripped out, leaving you with an empty string.

Sample output:

string        | slugify       | slugify_unidecode
-------------------------------------------------
hello world     hello world     hello world
北亰                            bei-jing
liberté         liberta         liberte

Note how 北亰 gets slugifies to nothing with the Django-inspired implementation. Note also the difference the NFC normalization makes -- liberté becomes 'liberta' with NFKD after stripping out the second part of the decomposed character, but would becomes 'libert' after stripping out the re-assembled 'é' with NFC.

144

answered Nov 07 '22 03:11

Cameron

Are you looking for something like Text::Unidecode?

answered Nov 07 '22 02:11

phaylon

Related questions
                            
                                How to untaint system call in CGI.pm
                            
                                Truncate (not round) decimal places in sprintf?
                            
                                How to combine the data from two CSV files in BASH?
                            
                                Declare and populate a hash table in one step in Perl
                            
                                View Perl Variables as Bytes/Bits
                            
                                How to print a Perl character class?
                            
                                Perl Global symbol requires explicit package name
                            
                                How to extract the words through pattern matching?
                            
                                Is there a cell length limit writing CSV files with Text::CSV?
                            
                                Why can I print this treating as a reference and treating it as a scalar?
                            
                                Perl XML::LibXML: how to access comment nodes
                            
                                'use warnings' vs. '#!/usr/bin/perl -w' Is there a difference?
                            
                                Is it possible to have two different Perl versions?
                            
                                How to subsitute with variable options in perl script
                            
                                Perl optimizer question: Will the perl compiler optimize away all of these temporary variables?
                            
                                Why is my Perl regex using so much memory?
                            
                                How can I insert text into a string in Perl?
                            
                                In Perl, how can I change an element in an XML file without changing the format of the XML file?
                            
                                how to use a shell script to supervise a program?
                            
                                perl process queue

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I generate URL slugs in Perl?

Tags:

url-rewriting

seo

perl

cpan

knorv

People also ask

2 Answers

Cameron

phaylon

Recent Activity

Donate For Us