Index of /data/unicode

Name                    Size  Description
Parent Directory - 2.0.14/ - 2.1.9/ - 3.0.1/ - 3.2.0/ - 4.0.1/ - 4.1.0/ - 5.0.0/ - 5.1.0/ - 5.2.0/ - 6.0.0/ - 6.1.0/ - 6.2.0/ - 6.3.0/ -
# Unicode test data for JavaScript

If you ever need JavaScript arrays of all Unicode symbols per category per
Unicode version (for testing purposes, perhaps), or JavaScript-compatible
regular expressions to match those symbols, this directory has got you
covered. Because of the way JavaScript exposes “characters”
(, generating this data is
trickier than it sounds, as you have to account for surrogate pairs.

For example, I’ve used a variation of this data in the following test case: It dynamically creates
and runs over 90k tests, based on the appropriate Unicode categories and

The scripts I wrote to generate these files can be found here:

## Tests for the generated data

The generated data is fully tested by a script that verifies that, within the
range of code points from 0x000000 to 0x10FFFF, _only_ the symbols in
${version}/${category}-symbols.js are matched by the regular expression in
${version}/${category}-regex.js. This test case is available at this URL:


There’s also an HTTP API of sorts, which allows you to customize the output a
little bit. This saves you from downloading and editing the generated files if
you only need to write some quick tests.

Available query string parameters:

 * `category`: can be any Unicode category
 * `script`: can be any Unicode script
 * `property`: can be any Unicode property
 * `block`: can be any Unicode block
 * `type`: can be `code-points`, `symbols` or `regex`; defaults to `symbols`
 * `version`: can be any Unicode version for which data is available; defaults to the latest available version
 * `prepend`: a string to prepend to the output; defaults to the empty string
 * `append`: a string to append to the output; defaults to the empty string