I am talking about UTF-8/Unicode in Java WebApps and JSPs. In my opinion this issue should have been resolved by default already 10 years ago when JAVA was still new so that we wouldn’t have to mess with this unnecessary problem.
Why Unnecessary
Because hardly any component during the development uses UTF-8 or other Unicode encoding by default and therefore has to be setup by hand! I have lost a couple of days because of this.How to Fix it?
During the development of DITO these components gave me a headache:- Browser/HTML
- JSP Encoding/Post Request
- Database
- JDBC Connection
- Tomcat / Java File Encoding
- Fiels and input streams
- Console
Browser
Although I personally didn’t have problems with this using Opera, FF and IE it is still recommended to use these lines in the beginning of your JSPs.:<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
I suppose most good Tools such as Netbeans, Dreamweaver and Eclipse will add these Lines In the beginning of your Code.
JSP Encoding/Post Request
Two lines which will already fix most peoples problems are those ones:response.setCharacterEncoding("utf-8");
request.setCharacterEncoding("utf-8");
They force Java to send and receive information using the UTF-8 encoding. The receive part comes most handy if you receive text using POST request. I recommend using post as the user can’t see it and UTF-8 encoding can cause you trouble – with old Proxy server for instance.
Database
I first discovered this problem when I was using Oracle Express I had the worst nightmare as Oracle by default uses the latin charset and requires a recreation of the database when changing to UTF-8. Too late if you have some data stored already. Unfortunately I haven’t found the command to change the charset.Luckily most of you will use MYSQL which is much more flexible. Mainly you have to set the specific char rows to UTF-8. The easiest to do so, is using MYSQL GUI Tools which can be downloaded on the MySQL website.
There you should set the default Character Set to UTF-8 when creating a table and verify this setting in every char/varchar and text row.
JDBC Connection
Even if both, Streams and the database themselves are set to UTF-8, the JDBC connection also has to be forced as well. This is done by adding a short addition to your JDBC URL.jdbc:mysql://localhost:3306/mydb?useUnicode=true&characterEncoding=UTF-8
Tomcat / Java File Encoding
Although that’s one of the purposes of JSP, you have to be very very careful with hardcoded text. Most IDEs such as Eclipse and Netbeans need that you either set you project or your files themselves as UTF-8 encoded otherwise the characters get lost. Luckily both of these will popup error messages when trying to save files that contain unsupported characters.This works fine when using standard JAVA files.
Unfortunately this does not count straight away for JSPs! The problem hereby is that your WebServer first converts the JSPs into JAVA files. If you are using Tomcat this can give you a big headache as Tomcat converts your nice UTF-8 files into Standard ISO .JAVA files and you will lose the characters. Even worse: I have not yet found a solution for this issue. In that case I recommend outsourcing the strings into a standard text file.
Files and inputstreams
When outsourcing your text out to text files, you will find new UTF-8 related problems. One could be your editor as Wordpad and Notepad tend to fuck up your encoding as soon as you store your file.In this case I can highly recommend Notepad++.InputStreamReader frIn;
BufferedReader brIn;
frIn = new InputStreamReader(
new FileInputStream(fileName),"UTF-8");
brIn = new BufferedReader(frIn);