Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent HTML Tidy from messing meta tags ( schema markup )

I am facing a serious problem with HTML Tidy (latest version -- https://html-tidy.org).

In short: HTML tidy convert these lines of HTML codes

<div class="breadcrumbs" typeof="BreadcrumbList" vocab="http://schema.org/">
<div class="wrap">
    <span property="itemListElement" typeof="ListItem">
        <a property="item" typeof="WebPage" title="Codes Category" href="https://mysite.works/codes/" class="taxonomy category">
            <span property="name">Codes</span>
        </a>
        <meta property="position" content="1">
    </span>
</div>

Into these lines of code -- Please take a close look at META TAGS placement.

<div class="breadcrumbs" typeof="BreadcrumbList" vocab="http://schema.org/">
<div class="wrap">
    <span property="itemListElement" typeof="ListItem">
        <a property="item" typeof="WebPage" title="Codes Category" href="https://mysite.works/codes/" class="taxonomy category">
            <span property="name">Codes</span>
        </a>
    </span>
    <meta property="position" content="1">
</div>

This is causing some serious issues with schema validations. You can check the codes here: https://search.google.com/structured-data/testing-tool/u/0/

Because of this issue, the client's (URL: https://techswami.in ) breadcrumb navigation is not visible in search results.

What am I beautifying?

My client wanted me to make his/her website's source code look "clean, readable and tidy".

So I am using these lines of codes to make it work for him/her.

Note: this code works 100% perfectly on the following WordPress setup.

  • Nginx with FastCGI Cache/MariaDB
  • PHP7
  • Ubuntu 18.04.1
  • Latest WordPress and is compatible with every cache plugin.

Code:

if( !is_user_logged_in() || !is_admin() ) {
function callback($buffer) {
    $tidy = new Tidy();
    $options = array('indent' => true, 'markup' => true, 'indent-spaces' => 2, 'tab-size' => 8, 'wrap' => 180, 'wrap-sections' => true, 'output-html' => true, 'hide-comments' => true, 'tidy-mark' => false);
    $tidy->parseString("$buffer", $options);
    $tidy->cleanRepair();
    $buffer = $tidy;
    return $buffer;
}
function buffer_start() { ob_start("callback"); }
function buffer_end() { if (ob_get_length()) ob_end_flush(); }
add_action('wp_loaded', 'buffer_start');
add_action('shutdown', 'buffer_end');

}

What help do I need from you guys?

Can you please tell me how do I prevent HTML Tidy from messing the META TAGS. I need the parameters.

Thanks.

like image 625
John Adam Avatar asked Aug 21 '18 08:08

John Adam


1 Answers

The <meta> tag should only be used in the parents elements: <head>, <meta charset>, <meta http-equiv> Additionally, there is no property attribute in the <meta> element.

These are most likely the reasons that HTML-Tidy is cleaning the markup.

Sources

  • https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta
  • https://www.w3schools.com/tags/tag_meta.asp
like image 108
janniks Avatar answered Oct 06 '22 19:10

janniks