Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautify / lint whitespace in JavaScript [closed]

I would like to lint (and perhaps even automatically) beautify whitespace in JavaScript. The question is whether there are any tools that can do this.

I know that JSLint and JSHint, e.g., can check indentation and trailing spaces, and that's both fine, but these are not the only kinds of whitespace you may have.

What I would like to check as well is:

  • Are there empty lines before or after certain constructs?
  • Is there more than one empty line?
  • Are single spaces between certain things (this is partially checked by JSLint / JSHint)?
  • ...

Basically, I'd like to have everything what a modern IDE such as Visual Studio plus Resharper can do, but as a command-line tool that can be embedded into Grunt.

The only thing I found which does something in this direction is esformatter, but according to its website it

is still on early development and is missing support for many important features.

Of course it would be great if there was something more mature.

Any ideas?

like image 897
Golo Roden Avatar asked Mar 07 '13 06:03

Golo Roden


2 Answers

Update after 6 months and lot of searching

I suggest you look to eslint. It is built on the idea of pluggable linting rules, which is what you asked. You can use the grunt-plugin grunt-eslint to specify linting rules and automate it. It is still pre-alpha but has progressed faster than esformatter. It has a well-defined roadmap.


Original

There is no simple answer to the question you are asking. So let me break it down and answer it in parts. You want features of modern IDE such as Visual Studio plus Resharper, on command-line which you can embed into Grunt. Broadly classifying the features you (may) want :

  • Compiler/Debugger
    • Tests for bugs, before/during runtime.
  • Lint/flag suspicious code
    • Follows coding standards, helps find possible flaws.
  • Formatter/Beautifier
    • Make code readable

All the above are kept in mind when making IDE. So finding a formatter just as powerful in IDE for node is hard.

Debugger

I know it is possible to use eclipse debugger for node. Check this link

Lint/Formatter

For grunt (based on JSLint/JSHint) :

  1. grunt-contrib-jshint
  2. grunt-jslint
  3. grunt-linter
  4. grunt-jsbeautifier

esformatter is powerful because it uses esprima to parse and format Javascript. It is a formatter not a lint, so you may have to lint code before using it. There are some other formatters using esprima you can look into :

codepainter JavaScript beautifier using ECMAscript

From its Supported style properties, it has some features you can use :

  1. Indentation: { character: '?', width: ? }
  2. LastEmptyLine: present, omitted
  3. QuoteType: single, double
  4. SpaceAfterControlStatements: present, omitted
  5. SpaceAfterAnonymousFunctions: present, omitted
  6. SpacesAroundOperators: present, omitted
  7. TrailingWhitespaces: strip

esmangle ECMAScript code mangler / minifier

esmangle is using esprima as parser and using escodegen as code generator. See demo.

There are many packages which are being developed for ECMAscript, you can check them here

like image 120
user568109 Avatar answered Oct 31 '22 05:10

user568109


"@Golo: So what you want is the ability to specify how whitespace occurs between every kind of language construct, in every kind of context? (e.g., how if-then-else is laid out inside a do loop vs. inside the top level of a function)?

Golo: That's correct :-)"

Then what you need is access to the structure of the language at each point in the code, and precise position information of each language element (starting/ending line/column). For linting, you want a way to write tests against combinations of those things. For repair, you want a way to regenerate text that meets your constraints. You obviously want all the to be easy to configure.

The "structure" you want is what is produced by a parser in a syntax tree. The context is the syntax structure around the structure of interest. You don't want an abstract syntax tree, because that loses the concrete tokens whose positions you want to inspect/control, so you want a full concrete parse tree.

Parsers aren't interested in precise source positions, but a lexer (needed to break input streams into language tokens to feed to the parser) is in the position to collect this precise information. You have worry about some complicating issues on "what constitutes column adjustments and by how much". Some examples: Tab characters: tab to next 8 character boundary? 4 characters? to prespecifed tab columns? On linux, "LF" advances the line number, and resets the column count to 1. On Windows, it is "CR/LF" as a pair. On other OS systems I have encounterd, it is "CR" only; on really modern systems, the Unicode newline character should do this. So, if on linux, how should you treat CR? How about null characters found in the text? ^Z? Other control characters (e.g., ^L [formfeed])?

Given a source file, precisely parsed into a CST with captured source positions, now you want to check that a structure is aligned the way you want. First, you need to specify the structure; do loop? constructor? data declaration? Then you need predicates on the column position to give you precise control.

Virtually all tools that provide syntax trees do not provide any easy way to refer to such structures. Pretty much you are stuck writing classic compiler-like procedural code that knows the shape of the syntax tree and climbs over it looking for a tree node of interest, and then looking around see if other relevant tree nodes are present. Once you are in this mode, you can recognize the trees you want, and then write more procedural code to check the spacing conventions.

Program transformation systems (PTS) often provide "source-to-source" rewrites, in which you can directly write patterns using the surface syntax of the language. That's far more convenient than climbing around the tree procedurally. Some only do source-to-source pattern pairs; some offer the ability to specify just a single pattern. The PT system must also be able to parse the language of interest, and enable you to add custom checks for your specific task.

As an example, our DMS Software Reengineering Toolkit parses ECMAScript, and offers such source-pattern specifications, along with the ability to attach custom conditions and actions. As an example:

domain ECMAScript;

pattern ideal_if_statement_layout(e:expression,s:statement):statement =
     " if (\e)
          \s"  if diagnose_not_equal(column(s),parentheses_column(e));

expresses the interest in "if then" statements (you'd use a different pattern for "if then else"), and a constraint over custom column comparison functions that check the position of the statement elements. The "diagnose_not_equal" custom function would produce lint-complaints. The quote marks are meta-quotes; they are part of the pattern matching language, not the underlying language. e and s are metavariables, and match any language structure expression and statement respectively. Because these are being applied to the CST, they cannot mismatch their intended targets. The custom function "column" merely picks up the starting column information associated with the left-most subtree of s; the tree management APIs in DMS make this essentially trivial to get. "parenthesescolumn" is needed because the pattern tells you where e is; the "(" is in the tree node above the e so some slight navigation of the tree is needed to find the "(", and then extract its rightmost column, also easily done with the DMS tree API.

You can build arbitrarily complex patterns; you can also make a condition in one patter, depend on the match of another. So, with a modest number of custom column extraction functions, you could write a variety of linting checks.

What this won't get you is a check that the "if" keyword is one space to the left of the "(" keyword, easily. You could express to some degree with addition custom checks, e.g., "statement_keyword_column", etc. but this is starting to get awkward.

You might notice the layout of the pattern; it would be nice to use that as constraints, too. DMS doesn't provide a direct way to do this. However, it is perfectly capable of reading its own pattern descriptions as trees. Using that, one could extract the apparant layout of the pattern, and use that to check the structural layout. This requires some sophistication in the use of DMS, but is a matter of sweat, not theory or missing mechanisms.

I personally don't like linting on layout much; I'd prefer the file simply get reshaped. DMS does have prettyprinting rules that will convert your CST, whatever its layout was, into a layout controlled by its prettyprinting rules. At the moment, those rules are specific to tree nodes, and encoded with the grammar, so they are somewhat limited. One can write (in the grammar):

   stmt =  'if' expression stmt ';'
   <<PrettyPrinter>>:  { V(H('if,expression),I(stmt[1])) }

This will cause all if-then statements to be regenerated as:

    if expresssion
       stmt

[V means "vertical box" of two subboxes; H means "horizontal box", I means "indented box"]

Careful use of such prettyprinting rules can do a pretty nice job of reformatting code. It isn't perfect, because you can't control the layout of multiple statements this way. But this is part of DMS and actually pretty easy to modify.

An ideal solution would be to use the pattern language, and to use the layout within the pattern to control the prettyprinting. This is in our plans, but alas, not yet in DMS.

I think other PTS can express patterns to some degree as above, and most of them have some way to specify prettyprinting something like DMS has. So the good news is these tools do much of what you want. The not so good news is it quite the effort to pick one of the tools up and learn to use it; an afternoon doesn't cut it, by a long shot.

like image 23
Ira Baxter Avatar answered Oct 31 '22 05:10

Ira Baxter