Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Topojson: quantization VS simplification

What is the difference between quantization and simplification? Is quantization another way of doing simplification? Is it better to use quantization in certain situations? Or should i be using a combination of both?

like image 475
dance Avatar asked Sep 19 '13 16:09

dance


1 Answers

The total size of your geometry is controlled by two factors: the number of points and the number of digits (the precision) of each coordinate.

Say you have a large geometry with 1,000,000 points, where each two-dimensional point is represented as longitude in ±180° and latitude in ±90°:

[-90.07231180399987,29.501753271000098],[-90.06635619599979,29.499494248000133],… 

Real numbers can have arbitrary precision (in JSON; in JavaScript they are limited by the precision of IEEE 754) and thus an infinite number of digits. But in practice the above is pretty typical, so say each coordinate has 18 digits. Including extra symbols ([, ] and ,), each point takes at most 1 + 18 + 1 + 18 + 1 = 39 bytes to encode in JSON, and the entire geometry is about 39 * 1,000,000 ≈ 39MB.

Now say we convert these real numbers to integers: both longitude and latitude are reduced to integers x and y where 0 ≤ x ≤ 99 and 0 ≤ y ≤ 99. A simple mapping between real-number points ⟨λ,φ⟩ and integer coordinates ⟨x,y⟩ is:

x = floor((λ + 180) / 360 * 100); y = floor((φ + 90) / 180 * 100); 

Since each coordinate now takes at most 2 digits to encode, each point takes at most 1 + 2 + 1 + 2 + 1 = 7 bytes to encode in JSON, and the entire geometry is about 7MB; we reduced the total size by 82%.

Of course, nothing comes for free: if you remove too much information, you will no longer be able to display the geometry accurately. The rule of thumb is that the size of your grid should be at least twice as big as the largest expected display size for the entire map. For example, if you’re displaying a world map in a 960×500 pixel space, then the default 10,000×10,000 (-q 1e4) is a reasonable choice.

So, quantization removes information by reducing the precision of each coordinate, effectively snapping each point to a regular grid. This reduces the size of the generated TopoJSON file because each coordinate is represented as an integer (such as between 0 and 9,999) with fewer digits.

In contrast, simplification removes information by removing points, applying a heuristic that tries to measure the visual salience of each point and removing the least-noticeable points. There are many different methods of simplification, but the Visvalingam method used by the TopoJSON reference implementation is described in my Line Simplification article so I won’t repeat myself here.

While quantization and simplification address these two different types of information mostly independently, there’s an additional complication: quantization is applied before the topology is constructed, whereas simplification is necessarily applied after to preserve the topology. Since quantization frequently introduces coincident points ([24,62],[24,62],[24,62]…), and coincident points are removed, quantization can also remove points.

The reason that quantization is applied before the topology is constructed is that geometric inputs are often not topologically valid. For example, if you takes a shapefile of Nevada counties and combine it with a shapefile of Nevada’s state border, the coordinates in one shapefile might not exactly match the coordinates in the other shapefile. By quantizing the coordinates before constructing the topology, you snap the coordinates to a regular grid and can get a cleaner topology with fewer arcs, hopefully correctly identifying all shared arcs. (Of course, if you over-quantize, then you can cause too many coincident points and get self-intersecting arcs, which causes other problems.)

In a future release, maybe 1.5.0, TopoJSON will allow you to control the quantization before the topology is constructed independently from the quantization of the output TopoJSON file. Thus, you could use a finer grid (or no grid at all!) to compute the topology, then simplify, then use a coarser grid appropriate for a low-resolution screen display. For now, these are tied together, so I recommend using a finer grid (e.g., -q 1e6) that produces a clean topology, at the expense of a slightly larger file. Since TopoJSON also uses delta-encoded coordinates, you rarely pay the full price for all the digits anyway!

like image 112
mbostock Avatar answered Sep 21 '22 22:09

mbostock