Another one on i18n… the capital “ß”

by kai on 01/07/2008

When one is talking about Unicode and i18n, usually part of any discussion is the question what exactly a character is. I’ll give you a typical example in German.

Besides the well known “umlauts” (which are basically ä,ö,ü,Ä,Ö and Ü) there is one more character that usually causes some concern. The story behind the umlauts is that those are what’s called makrons – “normal” characters with a slight modification. In the case of German those would be dots, other languages have other modifications which might be horizontal bars (Maori), various accents (French) or little balls above the character (Swedish).

But anyway, in German there’s another unusual character; which is “ß”. It’s named “sz” and looks a bit like a Greek beta – but it’s a different character with a different Unicode value etc. That being said – the “sz” quite often launches a discussion on what exactly a character is. I can see why people react a bit puzzled on the “sz”, it basically doesn’t mean “sz” but a particular pronounced derivation of “s”. Depending on how the character in front of the “ß” is pronounced (long or short) one would have to use “ss” or “ß” in spelling a word.

So, is “ß” a character or just a short way of writing “ss”? The answer is that it’s a character and that by having its own unicode value, it’s something one would have to deal with when localizing an application for any German locale.

Now – the big news:

For whatever reason the “ß” never had a capital version of the character. In German each noun starts with a capital letter (besides the first word of a sentence). Well, there’s actually no word starting with an “ß”, so not having a capital version of it was not too much of an issue. But, there are edge cases. Think about using the “ß” in a headline which is capitalized.

For example, I’d like to use the word “Fußball” in capital letters. Up to now, I had two options, either “FUßBALL” (which uses the lowercase “ß” and looks poor) or resolving the “ß” to “SS” -> “FUSSBALL”. The latter makes my toenails falling off as the longly pronounced “U” doesn’t work at all with the “SS” – ugh!

But now this problem went away – the DIN institute of Germany has applied to the ISO for a capitalized version of the “ß” to be included in ISO-10646 (UCS), which defines encodings for Unicode. The Unicode code point for this new character will be 1E9E.

But that’s just the start of it – now font designers have to start including this new character into their fonts. Official government forms would have to be modified and checked for inconsistencies and to include the new character etc.

A funny note at the side – anyone remembers back to the days of the German Democratic Republic (GDR) (also known as East Germany)? People in the GDR used the capital “ß” up into the 50s and 60s of the last century before it faded away.

Comments on this entry are closed.

Previous post:

Next post: