批量转换gb2312文档为utf8




最近发现一些老电影的srt字幕是gb2312编码的。如果在非中文系统上播放的话,就会变成乱码。但是字幕文件又特别多(>100),不适合手工转换。

在网上Search了一下发现用Notepad++的Python插件可以完美的实现。

实现步骤其实也很简单。打开Notepad++,打开Plugins-Plugins Admin。安装Python Script.

安装好以后在Python Script里面选择New Script,取个名字以后,粘贴入下面的代码:

import os;
import sys;
filePathSrc="f:\\Temp\\UTF8"
for root, dirs, files in os.walk(filePathSrc):
    for fn in files:
      if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico':
        notepad.open(root + "\\" + fn)
        console.write(root + "\\" + fn + "\r\n")
        #Does not work --> notepad.runMenuCommand("Encoding", "Character sets", "Chinese", "GB2312 (Simplified)")
        notepad.menuCommand(MENUCOMMAND.FORMAT_GB2312)
        # notepad.runMenuCommand("Encoding", "Convert to UTF-8-BOM")
        notepad.menuCommand(MENUCOMMAND.FORMAT_CONV2_UTF_8)
        # Reference: https://github.com/bruderstein/PythonScript/blob/master/PythonScript/src/NotepadPython.cpp
        notepad.save()
        notepad.close()

其中filePathSrc可以改成你想要转换的文件的路径。

保存以后,运行。就可以将目录中的所有文件由gb2312转换为utf8。对了,还有就是路径里面不能有中文字符。

Ref: https://pw999.wordpress.com/2013/08/19/mass-convert-a-project-to-utf-8-using-notepad/

发表评论?

0 条评论。

发表评论


注意 - 你可以用以下 HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Time limit is exhausted. Please reload CAPTCHA.