|
如何在Word文档中批量添加汉字注音
所谓的汉字注音,就是给汉字上方加注拼音。

在Office里面,这个功能叫做 “拼音指南”(Phonetic Guide)。

拼音指南一次只能够处理最多30个字,一篇文章不可能只有30个字,上百个字是很正常的,人工处理就会很累。所以,需要做到自动化,做到自动化有两种方式可以做到:
调用Office的功能
调用Office的功能又有两个途径:
其实,这两种途径最终都是调用的Office提供的API。
VBA
我查过了VBA的资料,总共有3个API可用:
- FormatPhoneticGuide
- Range.PhoneticGuide method (Word)
- Application.GetPhonetic method (Excel)
网上最多的用是第一种,使用FormatPhoneticGuide宏,我试过是能用的,但是存在着一个很大的问题:它不能够定制拼音的样式。而且,相对来说不够稳定。
'Word批量使用默认样式加注拼音
Sub BatchAddPinYinByDefaultStyle()
On Error Resume Next
Selection.WholeStory
TextLength = Selection.Characters.Count
Selection.EndKey
For i = TextLength To 0 Step -30
If i < 30 Then
Selection.MoveLeft Unit:=wdCharacter, Count:=i
Selection.MoveRight(Unit:=wdCharacter, Count:=i,Extend:=wdExtend)
Else
Selection.MoveLeft Unit:=wdCharacter, Count:=30
Selection.MoveRight(Unit:=wdCharacter, Count:=30,Extend:=wdExtend)
End If
SendKeys &#34;{Enter}&#34;
Application.Run &#34;FormatPhoneticGuide&#34;
Next
Selection.WholeStory
End Sub另外还有一个清除注音的方法,用到了第二个API:
&#39;Word批量清除拼音
Sub CleanPinYin()
Application.ScreenUpdating = False
Selection.WholeStory
TextLength = Selection.Characters.Count
Selection.GoTo What:=wdGoToHeading, Which:=wdGoToAbsolute, Count:=1
For i = 0 To TextLength
With Selection
.Range.PhoneticGuide Text:=&#34;&#34;
End With
Selection.MoveRight Unit:=wdCharacter, Count:=1
Next
Selection.WholeStory
Application.ScreenUpdating = True
End Sub这一个API既可以清除注音,也可以标明注音。只需要给Text赋值拼音即可。这个API好在可以定制拼音的样式,麻烦的是需要自己去计算出拼音,本来是找到了一个计算拼音的内置方法:GetPhonetic,但是,它只存在于Excel里面,在Word里边无法进行调用。
要实现内置的GetPhonetic,我在网上看到有两种实现方法:
- 自行实现的VBA,但是实现不够完整:https://github.com/StinkCat/CH_TO_PY
- 利用golang写了一个RestFull服务器提供服务,然后提供给VBA调用:https://github.com/yangjianhua/go-pinyin
我们来讨论第二种方法,比较灵活。
首先是golang的拼音计算服务:
package main
import (
&#34;flag&#34;
&#34;fmt&#34;
&#34;strconv&#34;
&#34;github.com/gin-gonic/gin&#34;
&#34;github.com/mozillazg/go-pinyin&#34;
)
var a pinyin.Args
func initPinyinArgs(arg int) { // arg should be pinyin.Tone, pinyin.Tone1, pinyin.Tone2, pinyin.Tone3, see go-pinyin doc
a = pinyin.NewArgs()
a.Style = arg
}
func getPinyin(c *gin.Context) {
han := c.DefaultQuery(&#34;han&#34;, &#34;&#34;)
p := pinyin.Pinyin(han, a)
c.JSON(200, gin.H{&#34;code&#34;: 0, &#34;data&#34;: p})
}
func getPinyinOne(c *gin.Context) {
han := c.DefaultQuery(&#34;han&#34;, &#34;&#34;)
p := pinyin.Pinyin(han, a)
s := &#34;&#34;
if len(p) > 0 {
s = p[0][0]
}
c.JSON(200, gin.H{&#34;code&#34;: 0, &#34;data&#34;: s})
}
func allowCors() gin.HandlerFunc {
return func(c *gin.Context) {
c.Writer.Header().Set(&#34;Access-Control-Allow-Origin&#34;, &#34;*&#34;)
c.Writer.Header().Set(&#34;Access-Control-Allow-Credentials&#34;, &#34;true&#34;)
c.Writer.Header().Set(&#34;Access-Control-Allow-Headers&#34;, &#34;Content-Type, Content-Length, Accept-Encoding, X-CSRF-Token, Authorization, accept, origin, Cache-Control, X-Requested-With&#34;)
c.Writer.Header().Set(&#34;Access-Control-Allow-Methods&#34;, &#34;POST, OPTIONS, GET, PUT, DELETE&#34;)
if c.Request.Method == &#34;OPTIONS&#34; {
c.AbortWithStatus(204)
return
}
c.Next()
}
}
func main() {
// init pinyin output format
initPinyinArgs(pinyin.Tone)
fmt.Print(&#34;\n\nDEFAULT PORT: 8080, USING &#39;-port portnum&#39; TO START ANOTHER PORT.\n\n&#34;)
port := flag.Int(&#34;port&#34;, 8080, &#34;Port Number, default 8080&#34;)
flag.Parse()
sPort := &#34;:&#34; + strconv.Itoa(*port)
// using gin as a web output
r := gin.Default()
r.Use(allowCors())
r.GET(&#34;/pinyin&#34;, getPinyin) // Call like GET http://localhost:8080/pinyin?han=我来了
r.GET(&#34;/pinyin1&#34;, getPinyinOne)
r.Run(sPort)
}
接着,我们来封装自己的GetPhonetic:
&#39;从Json字符串中提取data字段的数据
Function getDataFromJSON(s As String) As String
With CreateObject(&#34;VBScript.Regexp&#34;)
.Pattern = &#34;&#34;&#34;data&#34;&#34;:&#34;&#34;(.*)&#34;&#34;&#34;
getDataFromJSON = .Execute(s)(0).SubMatches(0)
End With
End Function
&#39;使用http组件调用拼音转换服务获取拼音字符
Function GetPhonetic(strWord As String) As String
Dim myURL As String
Dim winHttpReq As Object
Set winHttpReq = CreateObject(&#34;WinHttp.WinHttpRequest.5.1&#34;)
myURL = &#34;http://localhost:8080/pinyin1&#34;
myURL = myURL & &#34;?han=&#34; & strWord
winHttpReq.Open &#34;GET&#34;, myURL, False
winHttpReq.Send
GetPhonetic = getDataFromJSON(winHttpReq.responseText)
End Function
&#39;测试GetPhonetic方法
Sub testGetPhonetic()
ret = GetPhonetic(&#34;汗&#34;)
MsgBox ret
End Sub判定字符是否中文的方法:
&#39;判断传入的Unicode是否为中文字符
Function isChinese(uniChar As Integer) As Boolean
isChinese = uniChar >= 19968 Or uniChar < 0
End Function最后组装生成拼音注音的VBA脚本:
&#39; Word批量拼音注音
&#39; Alignment 对齐方式, see: https://learn.microsoft.com/en-us/office/vba/api/word.wdphoneticguidealignmenttype
&#39; Raise 偏移量(磅)
&#39; FontSize 字号(磅)
&#39; FontName 字体
Sub BatchAddPinYin()
Application.ScreenUpdating = False
Dim SelectText As String
Dim PinYinText As String
Selection.WholeStory
TextLength = Selection.Characters.Count
Selection.GoTo What:=wdGoToHeading, Which:=wdGoToAbsolute, Count:=1
For i = 0 To TextLength
Selection.MoveRight Unit:=wdCharacter, Count:=1
Selection.MoveLeft Unit:=wdCharacter, Count:=1, Extend:=wdExtend
With Selection
SelectText = .Text &#39;基准文字
If isChinese(AscW(SelectText)) Then &#39;判断是否为中文字符
PinYinText = GetPhonetic(SelectText) &#39;基准文字 转换为 拼音文字
If PinYinText <> &#34;&#34; Then
.Range.PhoneticGuide Text:=PinYinText, Alignment:=wdPhoneticGuideAlignmentCenter, _
Raise:=0, _
FontSize:=10, _
FontName:=&#34;等线&#34;
End If
End If
End With
Selection.MoveRight Unit:=wdCharacter, Count:=1
Next
Selection.WholeStory
Application.ScreenUpdating = True
End Sub根据golang服务代码的提供者所说,它比较明显的缺点是对多音字的处理不如Word原来的拼音指南,所以需要后期进行手工校正。
后期校正肯定是必须的,就好比古文里边还有一些通假字,发音是不一样的,这个,我想哪怕是拼音指南也做不好的吧。
完整的BAS文件如下:
&#39;判断传入的Unicode是否为中文字符
Function isChinese(uniChar As Integer) As Boolean
isChinese = uniChar >= 19968 Or uniChar < 0
End Function
&#39;从Json字符串中提取data字段的数据
Function getDataFromJSON(s As String) As String
With CreateObject(&#34;VBScript.Regexp&#34;)
.Pattern = &#34;&#34;&#34;data&#34;&#34;:&#34;&#34;(.*)&#34;&#34;&#34;
getDataFromJSON = .Execute(s)(0).SubMatches(0)
End With
End Function
&#39;使用http组件调用拼音转换服务获取拼音字符
Function GetPhonetic(strWord As String) As String
Dim myURL As String
Dim winHttpReq As Object
Set winHttpReq = CreateObject(&#34;WinHttp.WinHttpRequest.5.1&#34;)
myURL = &#34;http://localhost:8080/pinyin1&#34;
myURL = myURL & &#34;?han=&#34; & strWord
winHttpReq.Open &#34;GET&#34;, myURL, False
winHttpReq.Send
GetPhonetic = getDataFromJSON(winHttpReq.responseText)
End Function
&#39;测试GetPhonetic方法
Sub testGetPhonetic()
ret = GetPhonetic(&#34;汗&#34;)
MsgBox ret
End Sub
&#39; Word批量拼音注音
&#39; Alignment 对齐方式, see: https://learn.microsoft.com/en-us/office/vba/api/word.wdphoneticguidealignmenttype
&#39; Raise 偏移量(磅)
&#39; FontSize 字号(磅)
&#39; FontName 字体
Sub BatchAddPinYin()
Application.ScreenUpdating = False
Dim SelectText As String
Dim PinYinText As String
Selection.WholeStory
TextLength = Selection.Characters.Count
Selection.GoTo What:=wdGoToHeading, Which:=wdGoToAbsolute, Count:=1
For i = 0 To TextLength
Selection.MoveRight Unit:=wdCharacter, Count:=1
Selection.MoveLeft Unit:=wdCharacter, Count:=1, Extend:=wdExtend
With Selection
SelectText = .Text &#39;基准文字
If isChinese(AscW(SelectText)) Then &#39;判断是否为中文字符
PinYinText = GetPhonetic(SelectText) &#39;基准文字 转换为 拼音文字
If PinYinText <> &#34;&#34; Then
.Range.PhoneticGuide Text:=PinYinText, Alignment:=wdPhoneticGuideAlignmentCenter, _
Raise:=0, _
FontSize:=10, _
FontName:=&#34;等线&#34;
End If
End If
End With
Selection.MoveRight Unit:=wdCharacter, Count:=1
Next
Selection.WholeStory
Application.ScreenUpdating = True
End Sub
&#39;Word批量使用默认样式加注拼音
Sub BatchAddPinYinByDefaultStyle()
Application.ScreenUpdating = False
On Error Resume Next
Selection.WholeStory
TextLength = Selection.Characters.Count
Selection.EndKey
For i = TextLength To 0 Step -30
If i <= 30 Then
Selection.MoveLeft Unit:=wdCharacter, Count:=i
SelectText = Selection.MoveRight(Unit:=wdCharacter, Count:=i,Extend:=wdExtend)
Else
Selection.MoveLeft Unit:=wdCharacter, Count:=30
SelectText = Selection.MoveRight(Unit:=wdCharacter, Count:=30,Extend:=wdExtend)
End If
SendKeys &#34;{Enter}&#34;
Application.Run &#34;FormatPhoneticGuide&#34;
Next
Selection.WholeStory
Application.ScreenUpdating = True
End Sub
&#39;Word批量清除拼音注音
Sub CleanPinYin()
Application.ScreenUpdating = False
Selection.WholeStory
TextLength = Selection.Characters.Count
Selection.GoTo What:=wdGoToHeading, Which:=wdGoToAbsolute, Count:=1
For i = 0 To TextLength
With Selection
.Range.PhoneticGuide Text:=&#34;&#34;
End With
Selection.MoveRight Unit:=wdCharacter, Count:=1
Next
Selection.WholeStory
Application.ScreenUpdating = True
End Sub.net
它其实也是调用的Office的API,这个跟VBA调用API没有本质上的区别,是一样的。
VS2022需要安装:Visual Studio Tools for Office(VSTO)
然后,在项目当中引用程序集:Microsoft.Office.Interop.Word ,VS2022有14和15版本。
我本机的是Office16,而vs2022并没有提供相关的程序集,所以我没有办法使用,也就没有做进一步的探索了。
我查文档在Microsoft.Office.Interop.Word命名空间下,有一个Range.PhoneticGuide方法,接口看起来跟VBA调用的差不多,使用上应该也是差不太多的。
直接修改docx文档
docx的文档本质上是一个经过了zip压缩的OpenXML文档。
基本上,主流的办公软件都支持这样一个标准:微软Office、苹果iWork、WPS Office、Google Docs。
拼音指南在Office Open XML中的类型名是:CT_Ruby。
Ruby,Wiki百科中解释为:注音,或称注音标识、加注音、标拼音、拼音指南。
文档可见于:
- https://schemas.liquid-technologies.com/OfficeOpenXML/2006/?page=ct_ruby.html
- https://learn.microsoft.com/zh-cn/dotnet/api/documentformat.openxml.wordprocessing.rubyproperties?view=openxml-2.8.1
我稍微研究了下,拼音指南的节点:<w:ruby>。
其下面有若干个子节点:
- <w:rubyPr>是拼音指南的样式,
- <w:rt>是拼音指南的拼音文字,
- <w:rubyBase>是拼音指南的基准文字。
一个比较完整的拼音指南的XML是这样的:
<w:ruby>
<w:rubyPr>
<w:rubyAlign w:val=&#34;center&#34;/>
<w:hps w:val=&#34;26&#34;/>
<w:hpsRaise w:val=&#34;50&#34;/>
<w:hpsBaseText w:val=&#34;52&#34;/>
<w:lid w:val=&#34;zh-CN&#34;/>
</w:rubyPr>
<w:rt>
<w:r w:rsidR=&#34;00002ED0&#34; w:rsidRPr=&#34;00002ED0&#34;>
<w:rPr>
<w:rFonts w:ascii=&#34;等线&#34; w:eastAsia=&#34;等线&#34; w:hAnsi=&#34;等线&#34;/>
<w:color w:val=&#34;333333&#34;/>
<w:sz w:val=&#34;26&#34;/>
<w:shd w:val=&#34;clear&#34; w:color=&#34;auto&#34; w:fill=&#34;FFFFFF&#34;/>
</w:rPr>
<w:t>diǎn</w:t>
</w:r>
</w:rt>
<w:rubyBase>
<w:r w:rsidR=&#34;00002ED0&#34;>
<w:rPr>
<w:rFonts w:ascii=&#34;华文楷体&#34; w:eastAsia=&#34;华文楷体&#34; w:hAnsi=&#34;华文楷体&#34;/>
<w:color w:val=&#34;333333&#34;/>
<w:sz w:val=&#34;52&#34;/>
<w:shd w:val=&#34;clear&#34; w:color=&#34;auto&#34; w:fill=&#34;FFFFFF&#34;/>
</w:rPr>
<w:t>点</w:t>
</w:r>
</w:rubyBase>
</w:ruby>参考资料
- Office_Open_XML - WikiPedia
- 求解MacroName:=&#34;FormatPhoneticGuide&#34; &#39;运行拼音指南 在vba中的初始化方法
- 获取汉字拼音函数GetPhonetic的问题
- 有没有给大批量中文加拼音的宏?
- Word批量加注拼音/清除拼音
- Add pinyin to all text using MS word.
- VBA实践+word快速全文加拼音
|
|