The character conversion module is used to convert between any two string conventions, e.g., unicode ipa and praat symbols. It is useful in converting Praat TextGrid annotation to LaTeX (and onto PDF). It is also useful for converting any data not in Unicode to Unicode IPA. Unicode IPA refers to IPA symbols encoded in UTF-8. There is the CharConverter class itself, its constructor and the convert method.
Here’s how to create a converter from XSampa (e.g., used in Elan) to unicode IPA:
>>> elan_converter=CharConverter('xsampa','uni')
Now convert a string in XSampa to Unicode IPA:
>>> elan_converter.convert('kO nEnE Oku')
'k\xc9\x94 n\xc9\x9bn\xc9\x9b \xc9\x94ku'
In order to see the results, use a print statement:
>>> print elan_converter.convert('kO nEnE Oku')
kɔ nɛnɛ ɔku
The converter can be used for common latex escape characters as well.
The usage is the same:
>>> latex_converter=CharConverter('uni','latex')
>>> print latex_converter.convert('ü')
\"{u}
The data source for the character converterm. Keys are ipa unicode objects, and values are lists of symbols:
unicode: [informal name, tipa, praat, xsampa]
For example:
'ɖ':['vd retroflex plosive','\:d','\d.','d`']
Thus, ‘:d’ is the tipa symbol used for the voiced retroflex plosive.
Creates an instance of a character converter. Source is a string: ‘uni’, ‘name’, ‘tipa’, ‘praat’, ‘xsampa’, or ‘latex’. Target is also a string, one of: ‘uni’,’name’,’tipa’,’praat’,’xsampa’,or ‘latex’.
Parameters: |
|
---|
Returns a new string by replacing string w. a new value based on map. String is the a string to be converted.
Parameter: | string (string) – string to be converted |
---|---|
Return type: | unicode |