Why?s my ?text? look like this??: MySQL and Character Encodings

Have you ever seen funny characters showing on-screen?

Users copying from Word and pasting their curly quotes, em-dashes, and accented characters into your text fields?

There are a number of settings that need to be in place in order to guarantee that extended character set characters that get input through the browser safely make it through the various layers of Java and back again.

MySQL has settings that allow you to choose the following:
1. Default character set for the server
2. Character set for the table
3. Character set for particular columns (as of MySQL 4.1)

This screenshot shows the Edit Table dialog in MySQL Query Browser and the table-level setting:

In order to support extended character sets, you need to have all of these set to UTF-8. On one project, we had a situation where the character set for the table was set at UTF-8 as was the column. However, the server was installed with latin-1 (also known as ISO-8859_1) as the character set.

If you’re logged-in to MySQL using the mysql command-line, you can see the server setting by issuing the status command:


...
Server characterset: latin1
Db characterset: latin1
Client characterset: latin1
Conn. characterset: latin1
UNIX socket: /var/lib/mysql/mysql.sock
...

Turns out that when you use JDBC with MySQL, the driver determines how to convert characters from Java Unicode form to the character encoding based on the server’s setting (not sure which of the 4 characterset settings applies).

However, you can override this on the JDBC URL string, e.g.:

jdbc:mysql://localhost/some_db?otherparameters=whatever&characterEncoding=UTF-8

We did this on the project in question and it worked. Our curly quotes, em-dashes, etc. were saved properly into the database and retrieved properly as well.

Note, that through the MySQL Query Browser you may still see strange characters, probably because the MySQL Query Browser itself does not support the character set conversion. But through the Web browser it works fine.

For further reading:
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html

One Response to “Why?s my ?text? look like this??: MySQL and Character Encodings”

  1. JLG says:

    I should add that somewhere the default is specified for creating new tables. In the particular project we had many tables (e.g. app_user, item) that are already UTF-8, but like a half dozen (e.g. private_msg) that are latin1.

    Why the mix? I’m guessing that it has to do with how the tables were created. Maybe when we create a table with mysql admin it defaults to latin1 (don’t know where this is set), so all the tables we made in this manner are wrong (if right means UTF-8).

    I’m in the process now of converting all those tables to utf-8, which means a script to change all the tables, as well as all the text columns in those tables (since if you just change the table setting as per cmc’s screenshot, all the text columns remain latin1).

Leave a Reply