Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check for empty or blank links in all html files in root directory using gulp

I have a lot of HTML documents in the root of my projects. Let's take a simple skeleton HTML document like so:

<!doctype html>
<html class="no-js" lang="">
    <head>
        <meta charset="utf-8">
        <meta http-equiv="x-ua-compatible" content="ie=edge">
        <title></title>
        <meta name="description" content="">
        <meta name="viewport" content="width=device-width, initial-scale=1">

        <link rel="shortcut icon" type="image/x-icon" href="favicon.ico">
        <!-- Place favicon.ico in the root directory -->

        <link rel="stylesheet" href="css/style.css">
    </head>
    <body>
        <!--[if lt IE 8]>
            <p class="browserupgrade">You are using an <strong>outdated</strong> browser. Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your experience.</p>
        <![endif]-->



        <a href="#">hello</a>
        <a href="">hello</a>
        <a href="#">hello</a>
        <a href="">hello</a>
        <a href="#">hello</a>


        <script src="http://code.jquery.com/jquery-1.11.3.min.js"></script>
        <script src="js/scripts.js"></script>
    </body>
</html>

Now before I send all these files to the development team, I am assigned with the task of checking that there are no links which have no href, and empty href, or have an empty fragment as an href. I.e.,

Basically, there cannot be likes like so:

<a href="">

or

<a href="#">

or

 <a>

I found this gulp plugin and but I have a few issues with it. Let's have a look at the gulp file first:

gulp.task("checkDev", function(callback) {
  var options = {
    pageUrls: [
      'http://localhost:8080/Gulp-Test/index.html'
    ],
    checkLinks: true,
    summary: true
  };
  checkPages(console, options, callback);
});

Note that when you pass the option checkLinks: true , it's not just for the a tags , it for all of the tags mentioned on this page. The plugin doesn't have a problem if the <a> tag is empty or just has a # or is not present at all.

See what happens when I run the gulp tasks:

The result of running the gulp plugin

So what I would like instead is, if only the a links could be checked and if the <a> tag doesn't have an href or a blank value or just a #, then it should throw an error or show it in the summary report.

Lastly, see in the sample of the gulp file how I am passing the pageUrl (i.e. the pages to be checked basically) like so:

 pageUrls: [
          'http://localhost:8080/Gulp-Test/index.html'
        ],

How do I instead tell this plugin to check for all the .html files inside the Gulp-Test directory ?

So to summarize my question: how do I get this plugin to throw an error (i.e. show in the summary report) when it sees an <a> without a href or a href that is blank or has a value of # and also how do I tell this plugin to check for all .html files inside a directory.

like image 643
Alexander Solonik Avatar asked Mar 10 '16 09:03

Alexander Solonik


2 Answers

I am assigned with the task of checking that there are no links which have no href, and empty href, or have an empty fragment as an href.

If that's all you require, you don't really need any gulp plugins. And it's doubtful that you will find something that fits your specific requirements anyway.

You can accomplish this yourself pretty easily however. All you really have to do is:

  1. Read in all the HTML files you want to validate using gulp.src().
  2. Pipe each file to a function of your own using through2.
  3. Parse each file using any HTML parser you like (e.g. cheerio).
  4. Find the bad links in the parsed HTML DOM.
  5. Log the bad links using gutil.log() so you will know what to fix.
  6. Maybe throw a gutil.PluginError so your build fails (this is optional).

Here's a Gulpfile that does exactly that (referencing the above points in comments):

var gulp = require('gulp');
var through = require('through2').obj;
var cheerio = require('cheerio');
var gutil = require('gulp-util');
var path = require('path');

var checkLinks = function() {
  return through(function(file, enc, cb) { // [2]
    var badLinks = [];
    var $ = cheerio.load(file.contents.toString()); // [3]
    $('a').each(function() {
      var $a = $(this);
      if (!$a.attr('href') || $a.attr('href') == '#') { // [4]
        badLinks.push($.html($a));
      }
    });
    if (badLinks.length > 0) {
      var filePath = path.relative(file.cwd, file.path);
      badLinks.forEach(function(badLink) {
        gutil.log(gutil.colors.red(filePath + ': ' + badLink)); // [5]
      });
      throw new gutil.PluginError( 'checkLinks',
        badLinks.length + ' bad links in ' + filePath); // [6]
    }
    cb();
  });
}

gulp.task('checkLinks', function() {
  gulp.src('Gulp-Test/**/*.html') // [1]
    .pipe(checkLinks());
});

Running gulp checkLinks with a Gulp-Test/index.html like so ...

<html>
<head><title>Test</title></head>
<body>
<a>no href</a>
<a href="">empty href</a>
<a href="#">empty fragment</a>
<a href="#hash">non-empty fragment</a>
<a href="link.html">link</a>
</body>
</html>

... results in the following output:

[20:01:08] Using gulpfile ~/example/gulpfile.js
[20:01:08] Starting 'checkLinks'...
[20:01:08] Finished 'checkLinks' after 21 ms
[20:01:08] Gulp-Test/index.html: <a>no href</a>
[20:01:08] Gulp-Test/index.html: <a href="">empty href</a>
[20:01:08] Gulp-Test/index.html: <a href="#">empty fragment</a>

/home/sven/example/gulpfile.js:22
      throw new gutil.PluginError( 'checkLinks',
      ^
Error: 3 bad links in Gulp-Test/index.html
like image 184
Sven Schoenung Avatar answered Nov 15 '22 00:11

Sven Schoenung


var gulp = require('gulp');

var jsdom= require('jsdom').jsdom;

var fs=require('fs');

var colors= require('colors');

colors.setTheme({

  error:"red",

  file:"blue",

  info:"green",

  warn:"yellow"
});


gulp.task('checkLinks',function() {


  fs.readdir('.',function(err, files){

    if(err)
      throw err;


    var htmlFiles=files.filter(function(c,i,a){

      return c.substring(c.lastIndexOf('.')+1)==="html";

    });

    htmlFiles.forEach(function(c,i,a){

      fs.readFile(c,function(fileReadErr,data){

        if(fileReadErr)
          throw fileReadErr;

        var doc= jsdom(data);

        var window= doc.defaultView;

        var $=require('jquery')(window);

        var aTags=$('a').toArray(); 

        var k=0;

        console.log(("\n\n************************Checking File "+c+"***************************").info);

        for(var i=0; i<aTags.length; i++){

          if(!(aTags[i].hasAttribute("href")) || aTags[i].getAttribute("href")==="" || aTags[i].getAttribute("href")==="#" ) {

             k++;

             console.log("BAD LINK ".error+aTags[i].outerHTML.info+" IN FILE "+c.file);

          }
        }

        console.log(("BAD-LINKS COUNT IN " +c+" is "+k).bgRed.white);

        window.close();

      });
    });
  });

});

output:

output of script above

like image 21
Udit Bhardwaj Avatar answered Nov 15 '22 00:11

Udit Bhardwaj