Friday 14 November 2008

How to UTF-8 in JSP/Webapps

Although I never planned to write any code in my blog, there was a certain annoyance which I think deserves to be documented.
I am talking about UTF-8/Unicode in Java WebApps and JSPs. In my opinion this issue should have been resolved by default already 10 years ago when JAVA was still new so that we wouldn’t have to mess with this unnecessary problem.

Why Unnecessary

Because hardly any component during the development uses UTF-8 or other Unicode encoding by default and therefore has to be setup by hand! I have lost a couple of days because of this.

How to Fix it?

During the development of DITO these components gave me a headache:
  • Browser/HTML
  • JSP Encoding/Post Request
  • Database
  • JDBC Connection
  • Tomcat / Java File Encoding
  • Fiels and input streams
  • Console
Each of the following can be the cause if your Cyrillic, Greek or special Characters turn into “?”, squares or other rubbish. I will go through each of those:

Browser

Although I personally didn’t have problems with this using Opera, FF and IE it is still recommended to use these lines in the beginning of your JSPs.:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
I suppose most good Tools such as Netbeans, Dreamweaver and Eclipse will add these Lines In the beginning of your Code.

JSP Encoding/Post Request

Two lines which will already fix most peoples problems are those ones:
response.setCharacterEncoding("utf-8");
request.setCharacterEncoding("utf-8");   
They force Java to send and receive information using the UTF-8 encoding. The receive part comes most handy if you receive text using POST request. I recommend using post as the user can’t see it and UTF-8 encoding can cause you trouble – with old Proxy server for instance.

Database

I first discovered this problem when I was using Oracle Express I had the worst nightmare as Oracle by default uses the latin charset and requires a recreation of the database when changing to UTF-8. Too late if you have some data stored already. Unfortunately I haven’t found the command to change the charset.
Luckily most of you will use MYSQL which is much more flexible. Mainly you have to set the specific char rows to UTF-8. The easiest to do so, is using MYSQL GUI Tools which can be downloaded on the MySQL website.
screenshot

There you should set the default Character Set to UTF-8 when creating a table and verify this setting in every char/varchar and text row.

JDBC Connection

Even if both, Streams and the database themselves are set to UTF-8, the JDBC connection also has to be forced as well. This is done by adding a short addition to your JDBC URL.
jdbc:mysql://localhost:3306/mydb?useUnicode=true&characterEncoding=UTF-8

Tomcat / Java File Encoding

Although that’s one of the purposes of JSP, you have to be very very careful with hardcoded text. Most IDEs such as Eclipse and Netbeans need that you either set you project or your files themselves as UTF-8 encoded otherwise the characters get lost. Luckily both of these will popup error messages when trying to save files that contain unsupported characters.
This works fine when using standard JAVA files.
Unfortunately this does not count straight away for JSPs! The problem hereby is that your WebServer first converts the JSPs into JAVA files. If you are using Tomcat this can give you a big headache as Tomcat converts your nice UTF-8 files into Standard ISO .JAVA files and you will lose the characters. Even worse: I have not yet found a solution for this issue. In that case I recommend outsourcing the strings into a standard text file.

Files and inputstreams

When outsourcing your text out to text files, you will find new UTF-8 related problems. One could be your editor as Wordpad and Notepad tend to fuck up your encoding as soon as you store your file.In this case I can highly recommend Notepad++.
notepad++
This tool is free, fully supports most character sets and offers a vast amount of other features. But as usual this is not enough. When reading these Files you will also have to set your Java Readers and need to be set as well. This code shows you how to open Files using UTF-8 encoding.
InputStreamReader frIn;
  BufferedReader brIn;
  frIn = new InputStreamReader(
          new FileInputStream(fileName),"UTF-8");
  brIn = new BufferedReader(frIn);

Console

A confusing thing when debugging, can be your console. The problem is that the Windows CMD does not support Unicode. Furthermore the console within NetBeans also converts special characters into “?”. So instead of using System.out you will have to find another solution. The probably simples would be to print you stuff into the HTML document or to write into a file.

Conclusion

I hope that this will help some of you and maybe some developers will read this and make our live easier by finally moving over to Unicode.

Sunday 9 November 2008

Development Continues

I'm finished with my Degree, but that does not mean that the project is dead.

History - What is DITO

logo This project initially started as my Final Year Project in DIT where I graduated in June 2008 as a Bachelor of Science in Computer Science.

Development stopped with a prototype to demonstrate the main concepts. Since that I have basically shut this project down for a nice summer break and because of the lack of time since I started to work.
However, I think this project desrves to suvive and I forced myself ton continue working on it.
Screenshot of Prototype
screenshot

Work in Progress

Since the last prototype I have already done a bit work. This mostly concerned the database as I moved over from Oracle to MySQL. The reason for that was mostly financial.
Furthermore I slightly changed the database design as MySQL support a number of features that I wanted to take advantage of. In contrast to my time in DIT the main target was/is/will be the website. So I will concentrate on that for the next wile and probably continue working on the messenger after the relase of DITO. But it is still planned to be release.

What's next?

As next I have to add a number of features to the website to make it competitive with all those other social networks. After that I will have to work on compatibility and some nice Java Scripts.
The last step before a first release will be extensive testing and translation work.