Here are notes, slides, video, and additional resources for my talk on software localization with the Unicode CLDR, presented recently at OSCON and the NY Tech Localization Meetup. Code examples are in Python, Ruby, and Perl, plus libraries in other popular languages are highlighted.

Abstract

Unicode is much more than just characters. The Unicode Consortium defines open standards for collating, parsing, and formatting data in much of the world’s languages. The Common Locale Data Repository (CLDR) is the largest standard repository of locale data along with specifications for its use, and is a powerful resource for software localization.

The Unicode CLDR has become the de facto locale standard with widespread use among companies, including Google, Apple, and Microsoft; projects ranging from Linux distributions to Wikipedia; and increasing support in many programming languages. This talk provides an introduction to software localization and highlights popular CLDR-based open source libraries in a variety of programming languages.

Topics include localized formatting of:

  • Numbers, percents, and ranges of numbers
  • Prices and currencies
  • Dates and times
  • Localized sorting lists of strings

Slides

Direct link: https://speakerdeck.com/patch/localization-with-the-unicode-cldr

Video

Here’s the video from the first presentation of this talk at YAPC::NA 2014. Note though that this version was tailored to the Perl community and there’s been much development in CLDR-based open source libraries in many programming languages over the last year.

Direct link: https://youtu.be/DcPpUnlENAs

Resources

Libraries

ICU

International Components for Unicode: the premiere library implementing and providing access to the CLDR.

ICU4C-based Libraries

CLDR-based Libraries

CLDR Data

Thanks to everyone who participated in the session and discussion!


Nova Patch (@novapatch) is a principal engineer at Shutterstock, specializing in internationalization, multilingual search, and building products that support the world’s languages, writing systems, and cultures. They are an open source developer, contributor to the Unicode CLDR, and member of the Unicode Consortium.