Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to normalize a URL?

Tags:

url

node.js

I am dealing with a situation where I need users to enter various URLs (for example: for their profiles). However, users do not always insert URLs in the https://example.com format. They might insert something like:

  • example.com
  • example.com/
  • example.com/somepage
  • but something like [email protected] or something else should not be acceptable

How can I normalize the URLs to a format that can potentially lead to a web address? I see this behavior in web browsers. We almost always enter crappy things in a web browser's bar and they can distinguish whether that's a search or something that can be turned into a URL.

I tried looking in many places but seems like I can't find any approach to this.

I would prefer a solution written for Node if it's possible. Thank you very much!

like image 890
Victor Avatar asked Jul 02 '18 21:07

Victor


2 Answers

Use node's URL API, alongside some manual checks.

  1. Manually check that the URL has a valid protocol.
  2. Instantiate the URL.
  3. Check that the URL does not contain additional information.

Example code:

const { URL } = require('url')
let myTestUrl = 'https://user:[email protected]:8080/p/a/t/h?query=string#hash';

try {
  if (!myTestUrl.startsWith('https://') && !myTestUrl.startsWith('http://')) {
    // The following line is based on the assumption that the URL will resolve using https.
    // Ideally, after all checks pass, the URL should be pinged to verify the correct protocol.
    // Better yet, it should need to be provided by the user - there are nice UX techniques to address this.
    myTestUrl = `https://${myTestUrl}`
  }

  const normalizedUrl = new URL(myTestUrl);

  if (normalizedUrl.username !== '' || normalized.password !== '') {
    throw new Error('Username and password not allowed.')
  }

  // Do your thing
} catch (e) {
  console.error('Invalid url provided', e)
}

I have only used http and https in this example, for a gist.

Straight from the docs, a nice visualisation of the API:

┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                                            href                                             │
├──────────┬──┬─────────────────────┬─────────────────────┬───────────────────────────┬───────┤
│ protocol │  │        auth         │        host         │           path            │ hash  │
│          │  │                     ├──────────────┬──────┼──────────┬────────────────┤       │
│          │  │                     │   hostname   │ port │ pathname │     search     │       │
│          │  │                     │              │      │          ├─┬──────────────┤       │
│          │  │                     │              │      │          │ │    query     │       │
"  https:   //    user   :   pass   @ sub.host.com : 8080   /p/a/t/h  ?  query=string   #hash "
│          │  │          │          │   hostname   │ port │          │                │       │
│          │  │          │          ├──────────────┴──────┤          │                │       │
│ protocol │  │ username │ password │        host         │          │                │       │
├──────────┴──┼──────────┴──────────┼─────────────────────┤          │                │       │
│   origin    │                     │       origin        │ pathname │     search     │ hash  │
├─────────────┴─────────────────────┴─────────────────────┴──────────┴────────────────┴───────┤
│                                            href                                             │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
like image 133
nowy Avatar answered Sep 19 '22 15:09

nowy


You want the normalize-url package:

const normalizeUrl = require('normalize-url');

normalizeUrl('example.com/');
//=> 'http://example.com'

It runs a bunch of normalizations on the URL.

like image 32
Sindre Sorhus Avatar answered Sep 22 '22 15:09

Sindre Sorhus