Unicode allows two different encodings for diacritics. This apparently creates some issues in practice.
The source of the problem
There are two ways to encode the character ǔ. I will analyze them in a python session.
>>> u1 = 'ǔ' >>> len(u1) 1 >>> ord(u1) 468 >>> hex(ord(u1)) '0x1d4' >>> u1.encode('utf-8') b'\xc7\x94' >>> u1.encode('utf-16') b'\xff\xfe\xd4\x01' >>> u2 = 'ǔ' >>> len(u2) 1 >>> ord(u2) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: ord() expected a character, but string of length 2 found >>> u2.encode('utf-8') b'u\xcc\x8c' >>> u2.encode('utf-16') b'\xff\xfeu\x00\x0c\x03' >>> import unicodedata >>> [unicodedata.normalize(std, 'ǔ') for std in ['NFC', 'NFKC', 'NFD', 'NFKD']] ['ǔ', 'ǔ', 'ǔ', 'ǔ'] >>> [len(unicodedata.normalize(std, 'ǔ')) for std in ['NFC', 'NFKC', 'NFD', 'NFKD']] [1, 1, 2, 2]
Learning platforms evaluate the correctness of students' responses. In the case of duolingo and lernu, this is done via string comparison without respect for the two possible encodings:
I wanted to present this as an issue with Unicode in practice. To solve the problem, you can apply some Unicode normalization algorithm before comparing strings.
I filed a bug report on duolingo.