Thursday, May 11, 2006

UTF-8 Encoding fix for MySQL (Tomcat, JSP)

In my previous post, I talked about how to get international characters to display properly on your jsp pages.

This post is going to talk about how to make sure the international characters posted through an html form gets saved in and retrieved from the MySQL database with UTF-8 encoding.

You know the case where you submit 'alımlı' in your form, but when you check the value stored in your database table, it becomes 'al?ml?'!


For a great explanation of what's going on behind the scenes, read 'CHARSET CONVERSION FROM BROWSER TO DATABASE' section on this page.

The required steps to overcome this problem are as follows:

  • Make sure you do everything explained here
  • .
  • Make sure your database and/or table and/or field is defined with character set UTF-8. Collation plays a role when comparing values, pick the one that fits your target language and pick the generic one.

  • In {tomcat dir}/conf/server.xml, the connector configuration should have 'URIEncoding=UTF-8'. For example:


    <Connector port="7000" maxHttpHeaderSize="8192"
    maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
    enableLookups="false" redirectPort="8443" acceptCount="100"
    URIEncoding="UTF-8"
    connectionTimeout="20000" disableUploadTimeout="true" />


    This step is required if you will use 'get' as a form submission method. But it doesn't hurt to set it in any case.


  • Your database connection string should follow the format:
    url="jdbc:mysql://localhost:3306/{database name}?autoReconnect=true&useEncoding=true&characterEncoding=UTF-8"

  • THIS IS THE MOST IMPORTANT BIT OF INFO: Make sure to start your mysql server with the '--default-character-set=utf8' parameter. For example, on my system (MacOSX), I start the server with './safe_mysqld --default-character-set=utf8.'



And that's it! If you are still having problems, send me an email and I will try to assist you further.

Friday, May 05, 2006

UTF-8 Encoding fix (Tomcat, JSP, etc)

I spent a whole day trying to get non-ascii characters to display properly in my JSP pages.
To save anyone from spending similarly frustrating hours, here's the solution to get to display those characters in your JSP page.

First please read this so that you understand what the concept of encoding is.

While trying to solve my problem I collected couple of links, you can browse them here.

So my setup is as follows:
  • Tomcat Application Server (5.5.17)
  • Stripes web framework.
  • Front-end implementation JSP (using Stripes' layout functionality).
  • OS: MacOSX
Two main problems:
  1. Get to display non-ascii characters (e.g. ç,ğ,ö,ş,ı, etc) in the jsp file when they are typed directly inside the jsp.
  2. Get to display these characters when read from an application resources file (for example StripesResources.properties for Stripes).

Ok let's begin...
First make sure you save all your files (jsps, application resources files) in UTF-8 encoding. In Dreamweaver for example, Ctrl-J (or Apple-J) will bring up the window to set that.

Solution to problem 1:

I may have overkilled here, but this setup works, so you may adopt the IIWDQ ('if it works don't question') approach.

  • Place '<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%>' as the first line in 'ALL' the jsps.

    If you are using a layout manager, similar to Stripes layout, you may think 'hey I'll just put it in the layout page that way it will work for all my pages'..THINK AGAIN. IT WON'T.

    You may also say 'hey wait I have a great idea, I have this include.jsp where I declare all the taglibs, I'll place this directive in that file...All my jsps include that file, so it will work'. To that I'll say NOPE.


  • Place <meta equiv="Content-Type" content="text/html; charset=UTF-8"> under <head>. This is to give browsers an idea about the content of the page so they can display the contents properly.
  • Write an Encoding filter and make sure all your requests pass through it. Not difficult at all. Here it is:




import java.io.IOException;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;

public class EncodingFilter implements Filter {
private String encoding;
private FilterConfig filterConfig;

/**
* @see javax.servlet.Filter#init(javax.servlet.FilterConfig)
*/
public void init(FilterConfig fc) throws ServletException {
this.filterConfig = fc;
this.encoding = filterConfig.getInitParameter("encoding");
}

/**
* @see javax.servlet.Filter#doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain)
*/
public void doFilter(ServletRequest req, ServletResponse resp,
FilterChain chain) throws IOException, ServletException {
req.setCharacterEncoding(encoding);
chain.doFilter(req, resp);
}

/**
* @see javax.servlet.Filter#destroy()
*/
public void destroy() {
}

}


The way you let your web application know about this filter is via the web.xml file:

<filter>
<filter-name>EncodingFilter</filter-name>
<filter-class>com.yourpackagestructurehere.EncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>

<filter-mapping>
<filter-name>EncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>




At this stage, if you type something along the lines of 'çanak çömlek patladı' in your jsp and run the web application, you should see it in your browser...

Are we done? Not yet. Because if you have something like <fmt:message key="username"> in your jsp and your resource properties file contains username=Kullanıcı Adı, you will end up
with something like 'Kullan?c? Ad?'...For that see:

Solution to problem 2:
I know you saved your ApplicationResources.properties (or StripesResources.properties, or xxx.properties) file in UTF-8. That should display fine right? Well wrong. It does not. But it will if you :
  1. Copy your ApplicationResources.properties file to something like ApplicationResources.properties.org.
  2. run 'native2ascii -encoding UTF-8 ApplicationResources.properties.org ApplicationResources.properties'
  3. Deploy your files...


And ta-ta! (At least for me it was 'ta-ta' at this stage)...

Special thanks to cleverpig, mj and Rick Smith from Stripes mailing list for their help on this subject.

Ha by the way, if you are not using Stripes yet, it's time you start using it.