HowTo – WordPress Curing Garbage Characters In Your Page Output

Filed under: Wordpress Tips

Every once in a while you will come across the problem where characters in your content do not render correctly. You may see a little box or diamond with a question mark or this character type mismatch could lead to other problems in the management of your site.

When WordPress transitioned from the 1x versions to 2x they made a change in the database properties for how data was stored. They went from latin to utf8 this was to cure a lot of problems between wordpress used around the world and also rogue heh plugin developers… basically there needed to be a standard set for how you save data to the database.

You may find that there are still installs that use a different character set. This may be due to language and font differences.. it may be because they don’t have a clue and thought it would be cool to use special characters not available in UTF8 General ci format… who knows…

You might also find that an import plugin such as a RSS reader or CSV import Plugin or even a MySql Database that was improperly dumped and restored will generate garbage characters.

Unfortunately it is not a pretty thing to deal with.

To cure the problem you need to understand a little bit about MySql and you will need to follow instructions to change the database with the understanding you could end up screwing up a lot of stuff.

Making the Fix

You must first make a Backup of your site.

Backup your database through MySql by using your favorite tool. I would suggest that you make a couple backups just in case.

Second export your WordPress site using the WordPress Export option in your Dashboard. This file is different then a straight MySql Dump but it is good to have.

Third you might as well backup all  your files for good measure. First Dump all your cached pages in SuperCache or TotalCrapCache or whatever tool you are using and then do a cPanel backup or FTP all your stuff to your local computer and then zip it and put it on a USB thumb drive and mail it to yourself in the future… heh

If you have decided to skip the backup then you are very stupid.. this is probably one of the most important backups you will ever make.. so don’t complain when you screw it up.

Changing Your MySql Character Set

Ok so this next step is fast … fast like playing Russian roulette..

Either you pull the trigger and everything is happy … or you pull the trigger and you are still alive laying on the ground holding your head crying…

YOU MUST READ THIS PAGE IN ITS TOTAL

http://codex.wordpress.org/Converting_Database_Character_Sets

in addition to reading that page you must understand that page

You MUST understand the commands that you will be giving to MySql and exactly what will happen…

Using PhPMyAdministrator

Ok, so, now that you fully understand what you will be doing you can get to doing it.

Personally i do not use the web interface or command line to perform actions on my MySql Databases I like to remote into the DB with my favorite GUI tool that will allow me to quickly find and modify data layouts and data within my databases.

There are a few GUI tools out there.. MySql offers one for free I have used it and it is good..  and I would have to say Navicat makes a decent tool.. not perfect … I still have to resort to other tools sometimes.. but Navicat is good.

You must understand how to use the tool whether it be command line or GUI to change the Character Set for your Database Tables.

For Presentation reasons the most important table will probably be wp_posts

A GUI tool or Webbased tool will allow you to quickly browse your data and see the corrupt characters. Hopefully your wp_options table is not corrupted. That table contains information used to run your site and you really have to hope the wp_users table is not corrupt because that contains your users and passwords.

FOLLOW THE DIRECTIONS

I am not going to even list the directions here because that would mean you did not go to the Codex and read that page I listed above.

YOU MUST READ IT…

I could tell you exactly what to do but it is better if you read it off the codex.

The overview is you will need to send an ALTER command to change the table character set.

This will read the data in the tables and convert it to the new character set.

If you currently have it set to UTF8 general ci … then sending the command will not alter data that is currently stored in that format… it may convert, eat/disguard characters that are outside of its format. It may replace characters that are outside of its format with a placeholder character…

This can get dangerous … you must have a backup… you must understand the command before you give it…

If you can get away with only processing data in your wp_posts table then count yourself lucky.

If you have corruption throughout your database you may need to take other measures such as exporting your data to a different format through the use of a plugin…

In other words you might have to export to Drupal and then reimport to wordpress….

It all depends on how and why that data got corrupted.

If it is corrupted because you are importing RSS Feed Content into Posts or using the CSV Import tool … then the likelyhood of internal damage to the management portions of your database is less likely and you probably only have a content problem.

But go slow.

And backup.

Final Note

I may sound a little dramatic about curing character mismatch problems in MySql but the reason is that this problem should not have happened in the first place.

Maybe it was the result of exporting from MySql in a improper way then making a restore.

OR maybe it is corruption of the drives on your server .. or a larger problem.

After you cure the problem you must find the reason.

Read the header information in your MySql Dumps it will give you information about the database export.

Review the settings of your plugins and make sure your plugins are up to date and that they clean your RSS Imports before storing them.