Thursday, May 11, 2006

UTF-8 Encoding fix for MySQL (Tomcat, JSP)

In my previous post, I talked about how to get international characters to display properly on your jsp pages.

This post is going to talk about how to make sure the international characters posted through an html form gets saved in and retrieved from the MySQL database with UTF-8 encoding.

You know the case where you submit 'alımlı' in your form, but when you check the value stored in your database table, it becomes 'al?ml?'!

For a great explanation of what's going on behind the scenes, read 'CHARSET CONVERSION FROM BROWSER TO DATABASE' section on this page.

The required steps to overcome this problem are as follows:

  • Make sure you do everything explained here
  • .
  • Make sure your database and/or table and/or field is defined with character set UTF-8. Collation plays a role when comparing values, pick the one that fits your target language and pick the generic one.

  • In {tomcat dir}/conf/server.xml, the connector configuration should have 'URIEncoding=UTF-8'. For example:

    <Connector port="7000" maxHttpHeaderSize="8192"
    maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
    enableLookups="false" redirectPort="8443" acceptCount="100"
    connectionTimeout="20000" disableUploadTimeout="true" />

    This step is required if you will use 'get' as a form submission method. But it doesn't hurt to set it in any case.

  • Your database connection string should follow the format:
    url="jdbc:mysql://localhost:3306/{database name}?autoReconnect=true&useEncoding=true&characterEncoding=UTF-8"

  • THIS IS THE MOST IMPORTANT BIT OF INFO: Make sure to start your mysql server with the '--default-character-set=utf8' parameter. For example, on my system (MacOSX), I start the server with './safe_mysqld --default-character-set=utf8.'

And that's it! If you are still having problems, send me an email and I will try to assist you further.