ColdFusion Builder, file encodings and i18n

by kai on 25/04/2010

I recently came across a discussion on the German ColdFusion mailing list where people talked about the alleged inability of ColdFusion Builder to deal properly with German umlauts. Phrases like “this product is completely unusable” or “this is a bug that has been continuously ignored by the CF Builder team” were used.

After having tried it myself with files of a variety of encodings, I can assure everyone that those statements are wrong and the problems people run into most of the time would be issues of having a misconception of character and file encodings. Let’s clear up some of the confusion and myth around file encodings in CF Builder (and other Eclipse-based tools), shall we?

We all know Unicode and one of its particular encodings UTF-8. It’s 2010, folks – by now everyone should be aware of the fact that the world spans more than one single country and that UTF-8 is the only decent way forward when it comes to web-based applications and character encoding of your content, data and files.

Unfortunately the world sometimes isn’t that easy. Even if one knew about UTF-8 there are lots of legacy files around. Those files (and similar is true for database content) might be encoded in ISO-8859-1 or even US-ASCII. If you have files encoded as “Western Latin 1” or alike, you might have to deal with similar issues – this is not to mention a variety of encodings for Asian languages.

ColdFusion Builder’s default encoding is UTF-8 and that’s exactly what it should be. People in the said discussion thread claimed that CF Builder was just showing them question marks and weird characters when they opened files that contained German umlauts. One participant focussed on RDS and that CF Builder’s RDS integration breaks the correct display of umlauts. There’s been a bug logged on that which turned out to be closed as a duplicate of another bug and both have been fixed for the CF Builder 1.0 release (Kudos to Ram from the CFB team for following that up).

So – what’s the story here? It’s surprisingly simple: If one was to create a file containing German umlauts or in fact any other special characters with a non-UTF-8 encoding (let’s say in ISO-8859-1) and then opens that file (via RDS or locally, it doesn’t matter) in CF Builder, it would not display those special characters correctly. What exactly you’d get to see would depend on the specific encoding being used. Here is an example showing the wrong encoding (actually the terminology “wrong encoding” is wrong in itself – I should rather use “not matching the expected encoding”):

Now that we have established that we’re dealing with a non-matching encoding here, what can you do? There are multiple scenarios:

1. All or the vast majority of your files are not encoded in UTF-8: You might want to consider changing the default encoding for the whole workspace in “Preferences/General/Workspace”.

2. Just one or a few of your projects are completely done in a different encoding – change the encoding settings per project by doing a secondary click on the project name in the navigator view, picking “Properties” and then change the encoding in the “Resources” section.

3. Just one single file is encoded in a different way – use “set Encoding” in the “Edit” menu, this will apply a different encoding for this instance of the editor view:

Last but not least – you can also set the default encodings per file/content type. That’s possible in “Preferences/General/Content Type” – the CF-related types are in “Text”:

That should give everyone some leverage to work with files in non-UTF-8 encodings. So – ColdFusion Builder is NOT broken for non-English users, even if they don’t use Unicode and UTF-8 (you would really want to move to UTF-8 as soon as you can though). I personally use ColdFusion Builder to work with HTML and CFML files in twenty-something languages, sometimes there are even twenty-something languages in one file – and it works just fine (this is Unicode-based though).

To be fair and reasonable to both sides – it would be very cool if ColdFusion Builder was able to heuristically apply the correct encoding/decoding to files – basically a statistically driven guess and then fallback to whatever the default encoding is set to if it has no clue what to do with the file. At the end of the day my strongest recommendation would be to move on to UTF-8 and while one is at it, make sure you stop using RDS to talk to your ColdFusion server(s) in the first place and come up with a proper development/deployment model.

Marc-Andre December 17, 2010 at 2:27 am

Regrettably i have to inform you that the above is not true. At least not for Coldfusion Builder, it is for regular Eclipse.

In Coldfusion Builder, even if you specify that your entire project is in ISO-8859-1 (Like you say in step 2), Coldfusion Builder will still open every .cfm file as UTF-8 (or the default encoding specified for this file type (Like you say in step 3).

The only way to override the Content Type default encoding is to specify the character encoding directly on the file, otherwise CF Builder completely ignores the character encoding of enclosing objects.

The same goes with the Eclipse default charset (like in step 1). In CF Builder’s case it would only serve to specify the charset of newly created files (which would then override the Content Type default) but will not affect existing files.

Comments on this entry are closed.

Previous post:

Next post: