Unicode beyond just characters:
Localization with the CLDR
Here are notes, slides, video, and additional resources for my talk on software localization with the Unicode CLDR, presented recently at OSCON and the NY Tech Localization Meetup. Code examples are in Python, Ruby, and Perl, plus libraries in other popular languages are highlighted.
Abstract
Unicode is much more than just characters. The Unicode Consortium defines open standards for collating, parsing, and formatting data in much of the world’s languages. The Common Locale Data Repository (CLDR) is the largest standard repository of locale data along with specifications for its use, and is a powerful resource for software localization.
The Unicode CLDR has become the de facto locale standard with widespread use among companies, including Google, Apple, and Microsoft; projects ranging from Linux distributions to Wikipedia; and increasing support in many programming languages. This talk provides an introduction to software localization and highlights popular CLDR-based open source libraries in a variety of programming languages.
Topics include localized formatting of:
- Numbers, percents, and ranges of numbers
- Prices and currencies
- Dates and times
- Localized sorting lists of strings
Slides
Direct link: https://speakerdeck.com/patch/localization-with-the-unicode-cldr
Video
Here’s the video from the first presentation of this talk at YAPC::NA 2014. Note though that this version was tailored to the Perl community and there’s been much development in CLDR-based open source libraries in many programming languages over the last year.
Direct link: https://youtu.be/DcPpUnlENAs
Resources
- Unicode CLDR — Common Locale Data Repository
- CLDR Survey Tool Accounts — Contribute CLDR data
- CLDR TL;DR — article I wrote for the Perl Advent Calendar
- Unicode Programming Examples — open source documentation project
Libraries
ICU
International Components for Unicode: the premiere library implementing and providing access to the CLDR.
- ICU4C and ICU4J — C/C++ and Java
- ICU4C-based Libraries — note that not all are maintained!
ICU4C-based Libraries
CLDR-based Libraries
- JS: twitter-cldr-js
- Node: twitter-cldr-npm
- JS/Node: Globalize
- Ruby: twitter-cldr-rb
- Ruby: Misc. RubyGems
- Perl: CLDR::Number
- Perl: Misc. CPAN
CLDR Data
- XML — official
- JSON — official
- JavaScript: cldr.js
- Ruby: cldr
- Go: text/cldr
Thanks to everyone who participated in the session and discussion!
Nova Patch (@novapatch) is a principal engineer at Shutterstock, specializing in internationalization, multilingual search, and building products that support the world’s languages, writing systems, and cultures. They are an open source developer, contributor to the Unicode CLDR, and member of the Unicode Consortium.