Short version: I’ve just updated a shell script, and created a Windows batch script version, for un-mangling UTF-8 from being double-encoded by things like old MySQL dumps.
Longer version: I recently read a blog post about rescuing data from the famous UTF8-in-old-mysql-latin1-tables problem which would have helped me several years ago if it had existed, and on reading it realised I could also have helped myself if I understood the underlying concepts as well then as I do now. That post in turn links to an earlier blog-post which inspired it, and which provided the original script to which it added support for Greek characters. I won’t bore you by repeating what they’ve already stated so succinctly, I will just link to my additions and edits to their scripts (the below links include both shell and batch scripts, and an example test file and the resulting file – edited on 2017/04/05 to update download locations).
Here’s a list of my tweaks:
- shell-agnostic
- edge-case handling
- filename quoting
- error-condition handling
- separate sed script
- fixed errant symbols
- combined sed invocations into one
- converted/adapted to Windows batch script (in addition to the updated shell-script)
If you find this useful please let any friends who may also have such problems know – I wish someone had spared me the headache all those years ago (I ended up hand-editing a monster SQL dump…). If I’ve made any mistakes (or for whatever reason) please comment below.
© 2012 rowanthorpe.wordpress.com. This RSS Feed is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Greece License. If you believe the version of this material which you are reading infringes this license, please send details to rowanthorpe(at)gmail[dot]com so legal action can be taken immediately.