Index of /data/unicode

Name                    Size  Description
Parent Directory - 2.0.14/ - 2.1.9/ - 3.0.1/ - 3.2.0/ - 4.0.1/ - 4.1.0/ - 5.0.0/ - 5.1.0/ - 5.2.0/ - 6.0.0/ - 6.1.0/ - 6.2.0/ - 6.3.0/ - 7.0.0/ -
# Unicode test data for JavaScript

If you ever need JavaScript arrays of all Unicode symbols per category per
Unicode version (for testing purposes, perhaps), or JavaScript-compatible
regular expressions to match those symbols, this directory has got you
covered. Because of the way JavaScript exposes “characters”
(http://mathiasbynens.be/notes/javascript-encoding), generating this data is
trickier than it sounds, as you have to account for surrogate pairs.

For example, I’ve used a variation of this data in the following test case:
http://mathias.html5.org/tests/javascript/identifiers/ It dynamically creates
and runs over 90k tests, based on the appropriate Unicode categories and
symbols.

The scripts I wrote to generate these files can be found here: http://git.io/unicode

## Tests for the generated data

The generated data is fully tested by a script that verifies that, within the
range of code points from 0x000000 to 0x10FFFF, _only_ the symbols in
${version}/${category}-symbols.js are matched by the regular expression in
${version}/${category}-regex.js. This test case is available at this URL:

http://mathias.html5.org/data/unicode/test?version=7.0.0

## HTTP API

There’s also an HTTP API of sorts, which allows you to customize the output a
little bit. This saves you from downloading and editing the generated files if
you only need to write some quick tests.

http://mathias.html5.org/data/unicode/format?version=7.0.0&category=Ll&type=symbols&prepend=window.symbols%20%3D%20&append=%3B

Available query string parameters:

 * `category`: can be any Unicode category
 * `script`: can be any Unicode script
 * `property`: can be any Unicode property or derived core property
 * `block`: can be any Unicode block
 * `type`: can be `code-points`, `symbols` or `regex`; defaults to `symbols`
 * `version`: can be any Unicode version for which data is available; defaults to the latest available version
 * `prepend`: a string to prepend to the output; defaults to the empty string
 * `append`: a string to append to the output; defaults to the empty string