Friday, May 05, 2006

UTF-8 Encoding fix (Tomcat, JSP, etc)

I spent a whole day trying to get non-ascii characters to display properly in my JSP pages.
To save anyone from spending similarly frustrating hours, here's the solution to get to display those characters in your JSP page.

First please read this so that you understand what the concept of encoding is.

While trying to solve my problem I collected couple of links, you can browse them here.

So my setup is as follows:
  • Tomcat Application Server (5.5.17)
  • Stripes web framework.
  • Front-end implementation JSP (using Stripes' layout functionality).
  • OS: MacOSX
Two main problems:
  1. Get to display non-ascii characters (e.g. ç,ğ,ö,ş,ı, etc) in the jsp file when they are typed directly inside the jsp.
  2. Get to display these characters when read from an application resources file (for example StripesResources.properties for Stripes).

Ok let's begin...
First make sure you save all your files (jsps, application resources files) in UTF-8 encoding. In Dreamweaver for example, Ctrl-J (or Apple-J) will bring up the window to set that.

Solution to problem 1:

I may have overkilled here, but this setup works, so you may adopt the IIWDQ ('if it works don't question') approach.

  • Place '<%@ page language="java" pageEncoding="utf-8" contentType="text/html;charset=utf-8"%>' as the first line in 'ALL' the jsps.

    If you are using a layout manager, similar to Stripes layout, you may think 'hey I'll just put it in the layout page that way it will work for all my pages'..THINK AGAIN. IT WON'T.

    You may also say 'hey wait I have a great idea, I have this include.jsp where I declare all the taglibs, I'll place this directive in that file...All my jsps include that file, so it will work'. To that I'll say NOPE.


  • Place <meta equiv="Content-Type" content="text/html; charset=UTF-8"> under <head>. This is to give browsers an idea about the content of the page so they can display the contents properly.
  • Write an Encoding filter and make sure all your requests pass through it. Not difficult at all. Here it is:




import java.io.IOException;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;

public class EncodingFilter implements Filter {
private String encoding;
private FilterConfig filterConfig;

/**
* @see javax.servlet.Filter#init(javax.servlet.FilterConfig)
*/
public void init(FilterConfig fc) throws ServletException {
this.filterConfig = fc;
this.encoding = filterConfig.getInitParameter("encoding");
}

/**
* @see javax.servlet.Filter#doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain)
*/
public void doFilter(ServletRequest req, ServletResponse resp,
FilterChain chain) throws IOException, ServletException {
req.setCharacterEncoding(encoding);
chain.doFilter(req, resp);
}

/**
* @see javax.servlet.Filter#destroy()
*/
public void destroy() {
}

}


The way you let your web application know about this filter is via the web.xml file:

<filter>
<filter-name>EncodingFilter</filter-name>
<filter-class>com.yourpackagestructurehere.EncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>

<filter-mapping>
<filter-name>EncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>




At this stage, if you type something along the lines of 'çanak çömlek patladı' in your jsp and run the web application, you should see it in your browser...

Are we done? Not yet. Because if you have something like <fmt:message key="username"> in your jsp and your resource properties file contains username=Kullanıcı Adı, you will end up
with something like 'Kullan?c? Ad?'...For that see:

Solution to problem 2:
I know you saved your ApplicationResources.properties (or StripesResources.properties, or xxx.properties) file in UTF-8. That should display fine right? Well wrong. It does not. But it will if you :
  1. Copy your ApplicationResources.properties file to something like ApplicationResources.properties.org.
  2. run 'native2ascii -encoding UTF-8 ApplicationResources.properties.org ApplicationResources.properties'
  3. Deploy your files...


And ta-ta! (At least for me it was 'ta-ta' at this stage)...

Special thanks to cleverpig, mj and Rick Smith from Stripes mailing list for their help on this subject.

Ha by the way, if you are not using Stripes yet, it's time you start using it.

27 comments:

  1. Ha! it works with struts as well.
    it's funny how i had it done before, which worked only with IE, not firefox. Now it works for both.

    thx

    ReplyDelete
  2. Hi thanks so much for this writing. I was really having problems in making this encoding stuff work in my web application. After I saw your blog and checked my application against your suggestions, I started to really get the idea what was going on and what i didn't do right. Now things working quite nicely.

    Thanks again.

    BR,
    Abu.

    ReplyDelete
  3. Omgoodness thanks to your article.. it sloved my prob in inserting and displaying in postgress! I spent the entire day till i stumble upon your site!

    ReplyDelete
  4. Nice work, one thing you left opened are GET requests ... they are not decoded correctly by default. I wrote an article about that too.

    ReplyDelete
  5. Interesting. All this works for me, but only if I put properties files inside a jar file in WEB-INF/lib/; if I drop .properties files in /WEB-INF/classes dir it doesn't get utf encoding right. Any ideas? (I didn't code the web.xml filter; might that solve it?)

    ReplyDelete
  6. Would it work with file uploads? Moreover, is it a viable solution for different browsers?

    ReplyDelete
  7. thank you you saved me. i've been going crazy over this

    ReplyDelete
  8. thanks man , u help a lot of people

    ReplyDelete
  9. Very good job!

    Thank you.

    ReplyDelete
  10. Thanks for the advise. This is quite different from the <valve/> config that Tomcat shows.

    Unfortunately, anything that does request.setCharacterEncoding() appears to create to values for each parameter name only when a post request is made, not a get request.

    Roger

    ReplyDelete
  11. Hi, I had a problem with the utf-8 when I use input text on my web page, and I send the info by post method, I resive the special's characters wrong If I send é I recive é...
    Any Ideas ?

    ReplyDelete
  12. Thanks! I have seen this solutions before, but you was best written guide!

    ReplyDelete
  13. I LOVE YOU MAN! This is an elegant, yet simple solution for a problem that has been bugging me for days. Thank you very much!

    ReplyDelete
  14. Thank you, Cagan!
    Very useful article.

    ReplyDelete
  15. Thank you for the solution. It really helped me!

    ReplyDelete
  16. Thanks for the solutions
    It worked!!!

    ReplyDelete
  17. Great!!!....I was desperately looking for a solution for this problem. Now it works fine.....

    Thanks a lot.....

    ReplyDelete
  18. Thanks man! The solution can be found on google in several places in the discussion groups etc, but it contains the answer to just part of the question. You covered it all, thank you sooooooo much!!:)

    ReplyDelete
  19. Good solution Thanks a lot.it helped me lot.

    ReplyDelete
  20. Süper abi ya, çok saol.

    ReplyDelete
  21. Thanks man. I just fixed my problem with rendering Cyrillic symbols for tomcat under mac os x. Just applied encoding directive at the very beginning of the jsp page. This helped a lot!

    - Alex Yakima

    ReplyDelete
  22. Yeah man!
    All works fine yet!

    Tks, tks, tks

    ReplyDelete
  23. Thanks for this article! I spent whole day changing encoding here and there and nothing really worked.
    Your solution works perfectly!
    Maciek

    ReplyDelete
  24. If you want to use this with jQuery.ajax() then you have to set the 'type' to 'post'.

    ReplyDelete
  25. Hey ...

    Can you elaborate on why below does not work :-

    "You may also say 'hey wait I have a great idea, I have this include.jsp where I declare all the taglibs, I'll place this directive in that file...All my jsps include that file, so it will work'. To that I'll say NOPE."

    If we are confident that first line in every jsp is setting the encoding, then why not the above works.

    Regards
    Anubhav

    ReplyDelete