Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect if source is CSS/HTML/JavaScript

I want to use js beautify on some source but there isn't a way to detect what type of source it is. Is there any way, crude or not, to detect if the source is css, html, javascript or none?

Looking at their site they have this that looks like it'll figure out if it's html:

function looks_like_html(source) {
    // <foo> - looks like html
    // <!--\nalert('foo!');\n--> - doesn't look like html
    var trimmed = source.replace(/^[ \t\n\r]+/, '');
    var comment_mark = '<' + '!-' + '-';
    return (trimmed && (trimmed.substring(0, 1) === '<' && trimmed.substring(0, 4) !== comment_mark));
}

just need to see if it's css, javascript or neither. This is running in node.js

So this code would need to tell me it's JavaScript:

var foo = {
    bar : 'baz'
};

where as this code needs to tell me it's CSS:

.foo {
    background : red;
}

So a function to test this would return the type:

function getSourceType(source) {
    if (isJs) {
        return 'js';
    }
    if (isHtml) {
        return 'html';
    }
    if (isCss) {
        return 'css';
    }
}

There will be cases where other languages are used like Java where I need to ignore but for css/html/js I can use the beautifier on.

like image 795
Mitchell Simoens Avatar asked Jun 10 '15 18:06

Mitchell Simoens


People also ask

How do you know if the code is CSS or HTML?

Press "Ctrl-F" and type "style." The window searches the code for that word. If you see a style tag in an HTML document, that document uses CSS. The code between the opening <style> tag and the closing </style> tag contains the CSS.

How to identify CSS in a page?

On Chrome's Developer Tools tab (CTRL + SHIFT + I), go to Resources (you may have to enable Resource tracking on that page), and click on the sub-tab Stylesheets. That will show all css files loaded by that page.

How can I tell if CSS is cached?

In a Chromium/Chrome browser open the Developer tools (alt + cmd/ctrl + I, or right click the window and hit inspect element), and then click the Network Tab it is the Size and Status properties that tell you if the asset came from browser cache, and whether a request was made to the server to check if the asset was ...


1 Answers

Short answer: Almost impossible.

- Thanks to Katana's input

The reason: A valid HTML can contain JS and CSS (and it usually does). JS can contain both css and html (i.e.: var myContent = '< div >< style >CSS-Rules< script >JS Commands';). And even CSS can contain both in comments.

So writing a parser for this close to impossible. You just cannot separate them easily.

The languages have rules upon how to write them, what you want to do is reverse architect something and check whether those rules apply. That's probably not worth the effort.


Approach 1

If the requirement is worth the effort, you could try to run different parsers on the source and see if they throw errors. I.e. Java is likely to not be a valid HTML/JS/CSS but a valid Java-Code (if written properly).


Approach 2 - Thanks to Bram's input

However if you know the source very well and have the assumption that these things don't occur in your code, you could try the following with Regular Expressions.

Example

<code><div>This div is HTML var i=32;</div></code> 
<code>#thisiscss { margin: 0; padding: 0; }</code>
<code>.thisismorecss { border: 1px solid; background-color: #0044FF;}</code>
<code>function jsfunc(){ { var i = 1; i+=1;<br>}</code>

Parsing

$("code").each(function() {
    code = $(this).text();
   if (code.match(/<(br|basefont|hr|input|source|frame|param|area|meta|!--|col|link|option|base|img|wbr|!DOCTYPE).*?>|<(a|abbr|acronym|address|applet|article|aside|audio|b|bdi|bdo|big|blockquote|body|button|canvas|caption|center|cite|code|colgroup|command|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frameset|head|header|hgroup|h1|h2|h3|h4|h5|h6|html|i|iframe|ins|kbd|keygen|label|legend|li|map|mark|menu|meter|nav|noframes|noscript|object|ol|optgroup|output|p|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video).*?<\/\2/)) {
      $(this).after("<span>This is HTML</span>");
   }
   else if (code.match(/(([ trn]*)([a-zA-Z-]*)([.#]{1,1})([a-zA-Z-]*)([ trn]*)+)([{]{1,1})((([ trn]*)([a-zA-Z-]*)([:]{1,1})((([ trn]*)([a-zA-Z-0-9#]*))+)[;]{1})*)([ trn]*)([}]{1,1})([ trn]*)/)) {
      $(this).after("<span>This is CSS</span>");
   }
   else {
      $(this).after("<span>This is JS</span>");
   }
});

What does it do: Parse the text.

HTML

If it contains characters like '<' followed by br (or any of the other tags above) and then '>' then it's html. (Include a check as well since you could compare numbers in js as well).

CSS

If it is made out of the pattern name(optional) followed by . or # followed by id or class followed by { you should get it from here... In the pattern above I also included possible spaces and tabs.

JS

Else it is JS.

You could also do Regex like: If it contains '= {' or 'function...' or ' then JS. Also check further for Regular Expressions to check more clearly and/or provide white- and blacklists (like 'var' but no < or > around it, 'function(asdsd,asdsad){assads}' ..)

Bram's Start with what I continued was:

$("code").each(function() {
   code = $(this).text();
   if (code.match(/^<[^>]+>/)) {
       $(this).after("<span>This is HTML</span>");
   }
   else if (code.match(/^(#|\.)?[^{]+{/)) {
     $(this).after("<span>This is CSS</span>");
   }
});

For more Information:

http://regexone.com is a good reference. Also check http://www.sitepoint.com/jquery-basic-regex-selector-examples/ for inspiration.

like image 94
hogan Avatar answered Oct 01 '22 07:10

hogan