Unicode beyond just characters: Localization with the CLDR ⁕ Nova Patch

Here are notes, slides, video, and additional resources for my talk on software localization with the Unicode CLDR, presented recently at OSCON and the NY Tech Localization Meetup. Code examples are in Python, Ruby, and Perl, plus libraries in other popular languages are highlighted.

Abstract

Unicode is much more than just characters. The Unicode Consortium defines open standards for collating, parsing, and formatting data in much of the world’s languages. The Common Locale Data Repository (CLDR) is the largest standard repository of locale data along with specifications for its use, and is a powerful resource for software localization.

The Unicode CLDR has become the de facto locale standard with widespread use among companies, including Google, Apple, and Microsoft; projects ranging from Linux distributions to Wikipedia; and increasing support in many programming languages. This talk provides an introduction to software localization and highlights popular CLDR-based open source libraries in a variety of programming languages.

Topics include localized formatting of:

Numbers, percents, and ranges of numbers
Prices and currencies
Dates and times
Localized sorting lists of strings

Slides

Direct link: https://speakerdeck.com/patch/localization-with-the-unicode-cldr

Video

Here’s the video from the first presentation of this talk at YAPC::NA 2014. Note though that this version was tailored to the Perl community and there’s been much development in CLDR-based open source libraries in many programming languages over the last year.

Direct link: https://youtu.be/DcPpUnlENAs

Resources

Unicode CLDR — Common Locale Data Repository
CLDR Survey Tool Accounts — Contribute CLDR data
CLDR TL;DR — article I wrote for the Perl Advent Calendar
Unicode Programming Examples — open source documentation project

Libraries

ICU

International Components for Unicode: the premiere library implementing and providing access to the CLDR.

ICU4C and ICU4J — C/C++ and Java
ICU4C-based Libraries — note that not all are maintained!

ICU4C-based Libraries

CLDR-based Libraries

CLDR Data

XML — official
JSON — official
JavaScript: cldr.js
Ruby: cldr
Go: text/cldr

Thanks to everyone who participated in the session and discussion!

Nova Patch (@novapatch) is a principal engineer at Shutterstock, specializing in internationalization, multilingual search, and building products that support the world’s languages, writing systems, and cultures. They are an open source developer, contributor to the Unicode CLDR, and member of the Unicode Consortium.