Skip to content
Snippets Groups Projects
README.md 2.53 KiB
Newer Older
# Unicode RBNF

A pure Python implementation of [rule based number formatting](https://icu-project.org/docs/papers/a_rule_based_approach_to_number_spellout/) (RBNF) using the [Unicode Common Locale Data Repository](https://cldr.unicode.org) (CLDR).

This lets you spell out numbers for a large number of locales:

``` python
from unicode_rbnf import RbnfEngine

engine = RbnfEngine.for_language("en")
Edward Betts's avatar
Edward Betts committed
assert engine.format_number(1234).text == "one thousand two hundred thirty-four"
Edward Betts's avatar
Edward Betts committed
Different formatting purposes are supported as well, depending on the locale:
Edward Betts's avatar
Edward Betts committed
from unicode_rbnf import RbnfEngine, FormatPurpose

engine = RbnfEngine.for_language("en")
Edward Betts's avatar
Edward Betts committed
assert engine.format_number(1999, FormatPurpose.CARDINAL).text == "one thousand nine hundred ninety-nine"
assert engine.format_number(1999, FormatPurpose.YEAR).text == "nineteen ninety-nine"
assert engine.format_number(11, FormatPurpose.ORDINAL).text == "eleventh"
Edward Betts's avatar
Edward Betts committed
For locales with multiple genders, cases, etc., the different texts are accessible in the result of `format_number`:

``` python
from unicode_rbnf import RbnfEngine

engine = RbnfEngine.for_language("de")
print(engine.format_number(1))
```

Result:

```
FormatResult(
  text='eins',
  text_by_ruleset={
    'spellout-numbering': 'eins',
    'spellout-cardinal-neuter': 'ein',
    'spellout-cardinal-masculine': 'ein',
    'spellout-cardinal-feminine': 'eine',
    'spellout-cardinal-n': 'einen',
    'spellout-cardinal-r': 'einer',
    'spellout-cardinal-s': 'eines',
    'spellout-cardinal-m': 'einem'
  }
)
```

The `text` property of the result holds the text of the ruleset with the shortest name (least specific).

## Supported locales

See: https://github.com/unicode-org/cldr/tree/release-44/common/rbnf

## Engine implementation

Not [all features](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classRuleBasedNumberFormat.html) of the RBNF engine are implemented. The following features are available:

* Literal text (`hundred`)
* Quotient substitution (`<<` or `←←`)
* Reminder substitution (`>>` or `→→`)
* Optional substitution (`[...]`)
* Rule substituton (`←%ruleset_name←`)
* Rule replacement (`=%ruleset_name=`)
* Special rules:
    * Negative numbers (`-x`)
    * Improper fractions (`x.x`)
    * Not a number (`NaN`)
    * Infinity (`Inf`)
    
Some features that will need to be added eventually:

* Proper fraction rules (`0.x`)
* Preceding reminder substitution (`>>>` or `→→→`)
* Number format strings (`==`)
* Decimal format patterns (`#,##0.00`)
* Plural replacements (`$(ordinal,one{st}...)`)