Short version: I’ve just updated a shell script, and created a Windows batch script version, for un-mangling UTF-8 from being double-encoded by things like old MySQL dumps.

Longer version: I recently read a blog post about rescuing data from the famous UTF8-in-old-mysql-latin1-tables problem which would have helped me several years ago if it had existed, and on reading it realised I could also have helped myself if I understood the underlying concepts as well then as I do now. That post in turn links to an earlier blog-post which inspired it, and which provided the original script to which it added support for Greek characters. I won’t bore you by repeating what they’ve already stated so succinctly, I will just link to my additions and edits to their scripts (the below links include both shell and batch scripts, and an example test file and the resulting file – edited on 2017/04/05 to update download locations).

Here’s a list of my tweaks:

  • shell-agnostic
  • edge-case handling
  • filename quoting
  • error-condition handling
  • separate sed script
  • fixed errant symbols
  • combined sed invocations into one
  • converted/adapted to Windows batch script (in addition to the updated shell-script)

If you find this useful please let any friends who may also have such problems know – I wish someone had spared me the headache all those years ago (I ended up hand-editing a monster SQL dump…). If I’ve made any mistakes (or for whatever reason) please comment below.

© 2012 rowanthorpe.wordpress.com. This RSS Feed is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Greece License. If you believe the version of this material which you are reading infringes this license, please send details to rowanthorpe(at)gmail[dot]com so legal action can be taken immediately.

Advertisements