Database character sets can be quite confusing at times. For example, if your Rails app serves up pages in UTF-8 (as it should) along with the MySQL database, but you haven’t specified in the database.yml that the database connection should be UTF-8, you’ll end up sending latin1 (the default) characters, which are stored in the database as UTF-8. Very confusing. The end effect of this, is that when you look at the raw data in the database from another app, say the console or phpmyadmin, the data will be shown incorrectly with funny sets of characters representing the originals. This also makes backups and modifications really troublesome.
In the remainder of this entry, I’ll describe a few simple steps to harmonize your Rails UTF-8 application with its database.
1. Firstly, we need to stop the web server so that no more requests and data are coming through. You might also want to make a backup at this point, just in case.
2. Modify the
/config/database.yml file to set the correct encoding, something like this:
3. Dump the current database, ensuring that the default character set is latin1:
mysqldump -u user -p --default-character-set=latin1 database > database_dump.sql
4. Open the dump in your favourite editor, preferably something console based like vi as the dump is likely to be big, and edit to following line:
/*!40101 SET NAMES latin1 */;
Which should be somewhere near the top, to:
/*!40101 SET NAMES utf8 */;
Worth mentioning here, is that you should ensure that your editor is running in UTF8 mode, with vi you can force this with the following vi command:
5. We’re now ready to restore the data back to where it came from:
mysql -u user -p --default-character-set=utf8 database < database_dump.sql
Thats it! Your rails application can be started again and you can rest assured that your future backups will be protected from strange characters.