I am trying to make JavaScript print all Unicode characters. According to my research, there are 1,114,112 Unicode characters. A script like the following could work: <pre class="prettyprint"><code>for(i = 0; i < 1114112; i++) console.log(String.fromCharCode(i)); </code></pre> But I found out that only 10% of the 1,114,112 Unicode characters are used. How can I can I only print the used unicode characters?

As Jukka said, JavaScript has no built-in way of knowing whether a given Unicode code point has been assigned a symbol yet or not. There is still a way to do what you want, though. I’ve written several scripts that parse the Unicode database and create separate data files for each category, property, script, block, etc. in Unicode. I’ve also created an HTTP API that allows you to programmatically get all code points (i.e. an array of numbers) in a given Unicode category, or all symbols (i.e. an array of strings for each character) with a given Unicode property, or a regular expression with that matches any symbols in a certain Unicode script. For example, to get an array of strings that contains one item for each Unicode code point that has been assigned a symbol in Unicode v6.3.0, you could use the following URL: <pre class="prettyprint"><code>http://mathias.html5.org/data/unicode/format?version=6.3.0&property=Assigned&type=symbols&prepend=window.symbols%20%3D%20&append=%3B </code></pre> Note that you can prepend and append anything you like to the output by tweaking the URL parameters, to make it easier to reuse the data in your own scripts. An example HTML page that <code>console.log()</code>s all these symbols, as you requested, could be written as follows: <pre class="prettyprint"><code><!DOCTYPE html> <meta charset="utf-8"> <title>All assigned Unicode v6.3.0 symbols</title> <script src="http://mathias.html5.org/data/unicode/format?version=6.3.0&property=Assigned&type=symbols&prepend=window.symbols%20%3D%20&append=%3B"></script> <script> window.symbols.forEach(function(symbol) { // Do what you want to do with `symbol` here, e.g. console.log(symbol); }); </script> </code></pre> Demo. Note that since this is a lot of data, you can expect your DevTools console to become slow when opening this page. <hr> Update: Nowadays, you should use Unicode data packages such as <code>unicode-11.0.0</code> instead. In Node.js, you can then do the following: <pre class="prettyprint"><code>const symbols = require('unicode-11.0.0/Binary_Property/Assigned/symbols.js'); console.log(symbols); // Or, to get the code points: require('unicode-11.0.0/Binary_Property/Assigned/code-points.js'); // Or, to get a regular expression that only matches these characters: require('unicode-11.0.0/Binary_Property/Assigned/regex.js'); </code></pre>

JavaScript print all used Unicode characters

Tags:

javascript

character-encoding

unicode

character

I am trying to make JavaScript print all Unicode characters. According to my research, there are 1,114,112 Unicode characters.

A script like the following could work:

for(i = 0; i < 1114112; i++) 
    console.log(String.fromCharCode(i));

But I found out that only 10% of the 1,114,112 Unicode characters are used.

How can I can I only print the used unicode characters?

369

asked Mar 29 '14 22:03

Progo

2 Answers

As Jukka said, JavaScript has no built-in way of knowing whether a given Unicode code point has been assigned a symbol yet or not.

There is still a way to do what you want, though.

I’ve written several scripts that parse the Unicode database and create separate data files for each category, property, script, block, etc. in Unicode. I’ve also created an HTTP API that allows you to programmatically get all code points (i.e. an array of numbers) in a given Unicode category, or all symbols (i.e. an array of strings for each character) with a given Unicode property, or a regular expression with that matches any symbols in a certain Unicode script.

For example, to get an array of strings that contains one item for each Unicode code point that has been assigned a symbol in Unicode v6.3.0, you could use the following URL:

http://mathias.html5.org/data/unicode/format?version=6.3.0&property=Assigned&type=symbols&prepend=window.symbols%20%3D%20&append=%3B

Note that you can prepend and append anything you like to the output by tweaking the URL parameters, to make it easier to reuse the data in your own scripts. An example HTML page that console.log()s all these symbols, as you requested, could be written as follows:

<!DOCTYPE html>
<meta charset="utf-8">
<title>All assigned Unicode v6.3.0 symbols</title>
<script src="http://mathias.html5.org/data/unicode/format?version=6.3.0&property=Assigned&type=symbols&prepend=window.symbols%20%3D%20&append=%3B"></script>
<script>
  window.symbols.forEach(function(symbol) {
    // Do what you want to do with `symbol` here, e.g.
    console.log(symbol);
  });
</script>

Demo. Note that since this is a lot of data, you can expect your DevTools console to become slow when opening this page.

Update: Nowadays, you should use Unicode data packages such as unicode-11.0.0 instead. In Node.js, you can then do the following:

const symbols = require('unicode-11.0.0/Binary_Property/Assigned/symbols.js');
console.log(symbols);

// Or, to get the code points:
require('unicode-11.0.0/Binary_Property/Assigned/code-points.js');

// Or, to get a regular expression that only matches these characters:
require('unicode-11.0.0/Binary_Property/Assigned/regex.js');

answered Oct 31 '22 09:10

Mathias Bynens

There is no direct way in JavaScript to find out whether a code point is assigned to a character or not, which appears to be the question here. You need information extracted from suitable sources, and this information needs to be updated whenever new characters are assigned in new versions of Unicode.

There are 1,114,112 code points in Unicode. The Unicode standard assigns to each code point the property gc, General Category. If the value of this property is anything but Cs, Co, or Cn, then the code point is assigned to a character. (Code points with gc equal to Co are Private Use code points, to which no character is assigned, but they may be used for characters by private agreements.)

What you would need to do is to get a copy of some relevant files in the Unicode character database (just a collection of files in specific formats, really) and write code that reads it and generates information about assigned code points. For the purposes of printing all Unicode characters, it might be best to generate the information as an array of ranges of assigned codepoints. And this would need to be repeated when the standard is updated with new characters.

Even the rest isn’t trivial. You would need to decide what it means to print a character. Some characters are control characters that may have an effect such as causing a newline, but lacking a visible glyph. Some (spaces) have empty glyphs. Some (combining marks) are meant to be rendered as marks attached to preceding character, though they have conventional renderings as “standalone” characters, too. Some are meant to take essentially different shapes depending on nearest context; they may have isolated forms, too, but just writing a character after another by no means guarantees that an isolated form is used.

Then there’s the problem of fonts. No single font can contain all Unicode characters, so you would need to find a collection of fonts that cover all of Unicode when used together, preferably so that they stylistically match somehow.

So if you are just looking for a compilation of all printable Unicode characters, consider using the Unicode code charts.

answered Oct 31 '22 07:10

Jukka K. Korpela

Related questions
                            
                                How to fill javascript variables with c# ones?
                            
                                Hide server-side technology information from the browser
                            
                                How do i get images file name from a given folder [duplicate]
                            
                                Implementing Event Streams in Haskell using MVars
                            
                                Replace all occurrences of character except in the beginning of string (Regex)
                            
                                JavaScript replace all comma in a string
                            
                                How to call a scope inside a ng-repeat just for the last element?
                            
                                Select all parents which has same classname
                            
                                IDE for AngularJS development with debugging, code complete and profiling capabilities?
                            
                                Complete path from the root node in Javascript Fancy tree
                            
                                insert at cursor in react
                            
                                Why is RequireJS not loading more than once a module I require multiple times?
                            
                                Difference between button and which in mouse events
                            
                                Create an array of colors (about 100) in JavaScript, but the colors must be quite distinct
                            
                                javascript hover one element to change text color of another
                            
                                Node JS avoid laboriously adding "use strict" to all my files
                            
                                Parse HTML table without IDs or CSS selectors in Node.js
                            
                                Sum of two numbers with prompt
                            
                                Simple Angular Test failing ($injector:unpr Unknown Provider) when I have no dependencies
                            
                                Diff two containers with JS or jQuery

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With