Index of /data/unicode
Name Size Description
Parent Directory -
2.0.14/ -
2.1.9/ -
3.0.1/ -
3.2.0/ -
4.0.1/ -
4.1.0/ -
5.0.0/ -
5.1.0/ -
5.2.0/ -
6.0.0/ -
6.1.0/ -
6.2.0/ -
# Unicode test data for JavaScript
If you ever need JavaScript arrays of all Unicode symbols per category per
Unicode version (for testing purposes, perhaps), or JavaScript-compatible
regular expressions to match those symbols, this directory has got you
covered. Because of the way JavaScript exposes “characters”
(http://mathiasbynens.be/notes/javascript-encoding), generating this data is
trickier than it sounds, as you have to account for surrogate pairs.
For example, I’ve used a variation of this data in the following test case:
http://mathias.html5.org/tests/javascript/identifiers/ It dynamically creates
and runs over 90k tests, based on the appropriate Unicode categories and
symbols.
The scripts I wrote to generate these files can be found here: http://git.io/unicode
## Tests for the generated data
The generated data is fully tested by a script that verifies that, within the
range of code points from 0x000000 to 0x10FFFF, _only_ the symbols in
${version}/${category}-symbols.js are matched by the regular expression in
${version}/${category}-regex.js. This test case is available at this URL:
http://mathias.html5.org/data/unicode/test?version=6.2.0
## HTTP API
There’s also an HTTP API of sorts, which allows you to customize the output a
little bit. This saves you from downloading and editing the generated files if
you only need to write some quick tests.
http://mathias.html5.org/data/unicode/format?version=6.2.0&category=Ll&type=symbols&prepend=window.symbols%20%3D%20&append=%3B
Available query string parameters:
* `category`: can be any Unicode category
* `script`: can be any Unicode script
* `property`: can be any Unicode property
* `block`: can be any Unicode block
* `type`: can be `code-points`, `symbols` or `regex`; defaults to `symbols`
* `version`: can be any Unicode version for which data is available; defaults to the latest available version
* `prepend`: a string to prepend to the output; defaults to the empty string
* `append`: a string to append to the output; defaults to the empty string