Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Are The Codebases for Modern Web Browsers so Large?

The codebases for modern web browsers like Chrome, Firefox, and Safari (WebKit) are quite large. I am curious about what specifically makes their implementations so non-trivial that they require vast amounts of code.

As a corollary question, if a hypothetical browser only supported strict HTML5 and JavaScript, to avoid compatibility hacks, would the codebase be significantly smaller?

like image 691
morrog Avatar asked Dec 15 '11 06:12

morrog


2 Answers

For your first question, consider the things a modern browser needs to implement (some browsers push some of this work out to operating system services):

  1. Several parsers: XML, HTML, JavaScript, CSS, at least.
  2. At least four separate layout systems (CSS box model, flexbox, SVG, MathML).
  3. At least one graphics library; for cross-platform browsers this needs per-platform backends (IE9+ just uses the system Direct2D library; Safari on Mac just uses Quartz as far as I know).
  4. A high-performance virtual machine with a JIT, a garbage collector, a bit of a standard library (growing all the time; see typed arrays and various other recent JavaScript features).
  5. A DOM implementation, including various things like the HTML-specific and SVG-specific DOM interfaces and so forth.
  6. Audio and video processing facilities (again Safari on Mac and IE offload these to the operating system).
  7. Image processing facilities, with support for at least JPG/GIF/PNG. Again, some browsers may be able to offload parts of this to the operating system.
  8. A library for converting byte streams to Unicode characters. Again, sometimes this can be offloaded to the operating system and sometimes not.
  9. For cross-platform browsers, some sort of portability layer that abstracts away the platform-specific bits.
  10. An HTML editor with transactions and a programmable API; think contenteditable.
  11. A plaintext editor for textareas. Some of this can be shared with the HTML editor, maybe.
  12. A spellchecker, which may or may not be offloaded to the OS.
  13. A network library supporting HTTP, maybe SPDY, probably FTP, and maybe a few other protocols. Again, this may or may not be offloaded to the OS.
  14. A cryptographic library to handle SSL and various other cryptography needs. Again, this may or may not be offloaded to the OS.
  15. At least one database implementation (sqlite seems to be popular).
  16. Various code for the actual user interface and whatnot.
  17. Glue code to handle interactions between all these: code that manages calls back and forth between JavaScript and the DOM, code that manages recomputing style and layout information when the DOM changes, code that handles things like document.write injecting strings from JavaScript into the parser's input stream, and so forth. Note that the amount of glue code is generally quadratic in the number of interacting modules.

I'm probably missing a few things, but that's off the top of my head.

In addition to this at least Gecko and WebKit have template libraries for things like strings and arrays (because the C++ standard library ones have various drawbacks).

For the rest... at this point a lot of the "compatibility hacks" are actually part of web standards. So you can't exactly avoid them. Your scenario talks about JavaScript and HTML but not SVG or MathML or CSS. If you really just mean HTML and JavaScript but not CSS or the rest, then you could obviously cut out a bunch of code. If you include all of those, plus the audio and video capabilities of HTML5 and want your browser to perform well, then I doubt you can make it much smaller.

like image 119
Boris Zbarsky Avatar answered Oct 21 '22 09:10

Boris Zbarsky


I think modern web browsers are complicated apps. Mainly, they have rendering engines which have to handle different kinds of HTML, ability to deal with not HTML formats (like XML, RSS etc.), CSS handlers, Javascript engines sometimes with a JIT.

Apart from that, they have plugin architectures and APIs, parts to abstract differences between platforms and are usually built using components that other apps use.

This makes them quite non-trivial. As for your collorary, I think so. Lynx is quite small and doesn't support Javascript or fancy HTML.

like image 35
Noufal Ibrahim Avatar answered Oct 21 '22 09:10

Noufal Ibrahim