□ 语文现代化
国际标准ISO 7098中文罗马字母拼写法的修订:从WD到DIS
[摘要]2011年5月,在ISO/TC 46第38届全体会议(悉尼)上,中国代表提出修改ISO 7098:1991以适应当前中文罗马字母拼写法在中国乃至世界的实际需要。2012年5月11日,ISO/TC 46第39届全体会议(柏林)通过决议,接受中国提案作为ISO 7098的工作草案(Working Draft,简称WD)。2013年11月5日,委员会草案(Committee Draft,简称CD)经过ISO/TC 46各国投票获得通过。2015年3月1日,国际标准草案(Draft of International Standard,简称DIS)经过ISO/TC 46各国投票获得通过。2015年6月5日,ISO/TC 46第42体会议(北京)通过决议,决定进行委员会内部投票(Committee Internal Balloting,简称CIB),由WG3(Working Group 3,ISO/TC 46第3工作组)根据DIS投票结果编辑之后出版ISO 7098的DIS修订稿。2015年9月18日,CIB投票获得通过,2015年12月15日,ISO 7098:2015由ISO总部正式出版。本文介绍ISO 7098修订工作从WD到DIS的整个过程。
[关键词]ISO 7098;中文罗马字母拼写法;委员会草案;国际标准草案;汉语拼音音节的歧义指数
我在《关于修订中文罗马字母拼写法国际标准ISO 7098(1991)的情况说明》一文中,介绍了ISO 7098:1991前期的修订情况[1]:1979年,在华沙召开的国际标准化组织第46技术委员会(ISO/TC 46)上,中国代表提出把《汉语拼音方案》作为国际标准的建议。[2-3]1982年,在南京召开的ISO/TC 46会议上,ISO 7098获得通过并作为第1版出版。1991年,ISO 7098做了一些技术上的修改,出版了ISO 7098:1991。2011年5月,在悉尼召开的ISO/TC 46第38届会议上,我国代表冯志伟提出了修改国际标准《ISO 7098信息与文献——中文罗马字母拼写法》(1991)的建议,并于会后正式向国际标准化组织提出了修订提案(提案号:N2358)。2012年5月,于柏林召开的ISO/TC 46第39届会议接受了我国的提案,将N2358号提案直接作为ISO 7098的工作草案(Working Draft,简称WD),成立了由中国、俄罗斯、德国、美国、加拿大5国专家组成ISO 7098国际修订工作组,并任命冯志伟担任国际工作组组长,开始委员会草案(Committee Draft,简称CD)的起草。
本文将进一步介绍委员会草案(CD)、国际标准草案(Draft of International Standard,简称DIS)的修订情况以及修订的技术内容。
柏林会议之后,我国就积极地开始了ISO 7098委员会草案(CD)的起草工作,CD草案英文本由冯志伟执笔。这个CD稿的主要修改内容如下:
零声母(zero initial)
合口呼韵母(在ISO 7098的英语文本中,使用Articulation B来代表合口呼):以u开头的韵母,如u,ua,uo,uai,uei,uan,uen,uang,ueng,ong。
齐齿呼韵母(在ISO 7098的英语文本中,使用Articulation C来代表齐齿呼):以i开头的韵母,如i,ia,ie,iao,iou,ian,in,iang,ing。
撮口呼韵母(在ISO 7098的英语文本中,使用Articulation D来代表撮口呼):以ü开头的韵母,如ü,üe,üan,ün,iong。
一声(高平调): —
二声(升调): /
三声(降/升调): ∨
目前在国际范围内全面实行汉语拼音的分词连写还有相当的困难,如果我们在CD修订稿中提出全面实行汉语拼音的分词连写,估计难以得到其他国家的支持。不过,如果我们在WD草案的基础之上,提出不仅在人名、地名中实行分词连写,还增加对于语言名、民族名、宗教名实行分词连写,还是有可能得到支持的,因此,我们把人名、地名、语言名、民族名、宗教名统称为“命名实体”(naming entity),提出了“命名实体的拼写规则”,在WD中提出人名、地名的拼写规则的基础上,进一步提出了语言名、民族名、宗教名的如下拼写规则:
1.语言名连写为一个单词,首字母大写。例如:Hanyu (汉语,Chinese)、Yingyu (英语,English)、Deyu (德语,German)、Fayu (法语,French)、Xibanyayu (西班牙语,Spanish)。
2.民族名和部族名连写为一个单词,首字母大写。例如:Hanzu (汉族,Chinese ethnic group)、Maolizu(毛利族,Maori tribe)、Maonanzu (毛难族,Maonan ethnic group)、Weiwu’erzu (维吾尔族,Uyghur ethnic group)。
3.宗教名连写为一个单词,首字母大写。例如:Fojiao (佛教,Buddhism)、Jidujiao (基督教,Christianity)、Tianzhujiao (天主教,Catholicism)、Yisilanjiao (伊斯兰教,Islamism)。
2013年6月3日至7日在巴黎召开ISO/TC 46第40届全体会议,我国代表冯志伟出席了这次会议,并在会议上正式向ISO/TC 46秘书处提交了ISO 7098的CD稿。ISO TC 46接受了我国的CD稿,并在2013年7月5日至2013年11月5日在各成员国中进行了投票。2013年11月,ISO/TC 46秘书处N2452号文件公布了投票结果。投赞成票的国家(21个):阿根廷、澳大利亚、加拿大、中国、捷克、埃及、爱沙尼亚、芬兰、法国、伊朗、意大利、肯尼亚、韩国、摩洛哥、俄罗斯、西班牙、瑞士、泰国、乌克兰、英国、美国。投反对票的国家(1个):德国。日本投了弃权票。由于这个CD稿得到大多数国家的支持,获得通过。
We reject the rules for the wording of personal and geographic names.This is a fundamental issue for us and leads to the rejection of the CD in total.After some years of individual practice within the German libraries the library community in Germany decided in 2010 to refrain from any wording when romanizing the Chinese characters.Since then we write syllable by syllable separated by blank in our transcriptions.This procedure has been published as best practice and is widely accepted within German libraries.Besides it is a perfect foundation for automated transcription processing that has been established in our cataloguing environments.
Concerning personal names we recently are considering wording rules for our authority files (GND);a decision has not been made so far.Within the bibliographic description of books we will continue to give the syllables individually.
The suggestions for the wording of geographical names are extremely questionable.German experts mention that the Pinyin rules for wording are not clear and disambiguous enough to make sure that the same name is always transcribed and worded in the same way.Thus the exchange of bibliographical or authoritative data will be hindered.
Add the Unicode values for the tone marks.
Add for both the Chinese and the Latin marks the Unicode values.
美国东亚图书馆协会(The Council on East Asian Libraries,简称CEAL)非常关注ISO 7098的修订,他们认真地研究了我国提出的CD稿,并且通过美国标准化部门转达了他们的意见。
The Council on East Asian Libraries (CEAL) supports ISO’s endeavor in this revision.The revised draft mandates most of the practices we currently follow,especially adding transliteration rules for personal and place names.It will enhance global information exchange in a data linking environment.
[美国东亚图书馆协会支持国际标准化组织在ISO 7098修订中所作的努力。这个修订草案涉及我们目前进行的大多数实践工作,特别是增加了人名和地名的转写规则。这将在数据链接的环境下大大推进全球范围内的信息交流。]
The CEAL would also like to express some concerns and suggestions about ISO rules specified in specific comments below.We hope ISO considers CEAL members’ suggestions.CEAL members deal with Romanization issues on a daily basis.Their application of Romanization and syllable aggregation guidelines are made in dealing with real life situations.Decisions need not only adhere to principles and guidelines,but are generally made in consideration of the user and orderliness in large databases such as LC.
The CEAL has concerns about hyphenated practice for double surnames.True double surnames should not be hyphenated.In addition,since not all such names following hyphenated practice,taking different approaches on hyphenated (or not) double surnames by different communities would have different results with indexing,searching,and retrieving of bibliographic data,and may subsequently become an issue for linking bibliographic data from different systems globally.
Since double surnames can be two-character or multi-character,we suggest removing “two-character” and adding an example of a double surnames with more than two-character,e.g.,项司徒文良.
Another option for better clarification is to add equivalent Chinese terms in qualifier after “traditional compound surnames”- (复姓) and “double surnames”- (双姓).EXAMPLE,Zhang Wang Shufang (张王淑芳),Xiang Situ Wenliang (项司徒文良).
[美国东亚图书馆协会关注到在双姓中连字符的使用问题。真正的双姓中没有必要使用连字符。并不是所有的双姓都使用连字符,不同的社团采用不同的方式来处理双姓的连字符问题,有的社团使用连字符,有的社团不使用。这样一来,在图书文献的索引、搜索或检索的时候,就会导致不同的结果。在把世界上不同的系统中的图书数据连接起来的时候,就会产生问题。由于双姓可以是两个字符组成的,也可以是多个字符组成的,我们建议去掉“两个字符”的提法,在双姓的例子中增加一个由两个以上的字符组成的双姓的例子:项司徒文良。为了表述得更清楚,建议在英文的术语后面加上一个等价的中文术语。双姓的例子,Zhang Wang Shufang (张王淑芳)、Xiang Situ Wenliang (项司徒文良)。]
Example “Lao Zhang tour (老张头儿,older Zhang)” introduces the syllable aggregation rule about 儿化音,i.e.,connecting other character(s) with “r” when “儿” serve as a suffix.This is a particular spoken feature and cannot be differentiated by machine transliteration.It could cause confusion and inconsistent practice among countries that don’t apply the Basic rules of the Chinese phonetic alphabet orthography as a whole.So we suggest to remove this example.
[“Lao Zhang tour (老张头儿,older Zhang)”这个例子引入了一个关于儿化音的音节连写规则,这牵涉到当把“儿”作为一个后缀时其他带“r”的汉字。这是一个特殊的口语特征,在机器自动转写时很难加以区分。在那些还没有使用《汉语拼音正词法基本规则》的国家中,这样的规则将会引起混乱并造成不一致。建议删除这个例子。]
The characters “市,省” in example “Beijing Shi (北京市,Beijing Municipality),Hebei Sheng (河北省,Hebei Province)”technically are not geographical feature names but names of jurisdiction.It should be used to avoid confusion.Provide explicit instruction for clarification on jurisdiction and the syllable aggregation rules for place names.
[从技术上说,在“Beijing Shi (北京市,Beijing Municipality),Hebei Sheng (河北省,Hebei Province)”这两个例子中的汉字“市、省”,并不是地理特征名,而是行政区划名,应当加以区分以避免混淆。建议清楚地阐述地名中的行政区划和它的音节连写规则。]
Although implied in the examples,the rule 12.14 lacks instruction on syllable aggregation for geographical proper names and geographical feature names.We suggest to rewrite as follows:Chinese place names should separate the geographical proper name from the names of jurisdiction or the geographical feature name.The multi-character geographical proper names,names of jurisdiction,or geographical feature names are written together as one word.The first letter of each element should be capitalized.More examples:Xikou Zhen (溪口镇,Xikou town),Shenzhen Tequ (深圳特区,Shenzhen Special Economic Zone),Qujiatun Cun (瞿家屯村,Qujiatun village).
[尽管在规则12.14的例子中隐含了规则的细节,但是在这条规则中仍然缺少对于地理专名和地理特征名的音节连写细节的具体说明。我们建议改写如下:“汉语地名中的专名与行政区划名或地理特征名分写。多音节的地理专名、行政区划名或地理特征名分别连写为一个单词。每一分写部分的第一个字母大写”。建议增加更多的例子:Xikou Zhen (溪口镇,Xikou town)、Shenzhen Tequ (深圳特区,Shenzhen Special Economic Zone)、Qujiatun Cun (瞿家屯村,Qujiatun village)。]
We suggest to rewrite the rule for Chinese transcription of non-Chinese personal names and place names.With bibliographic data for Chinese publications,one may not know the original form in Latin script.This instruction is not practical and will result in inconsistent practice.It’s better not making a conditional instruction on when to use pinyin and when to use original form.It may create confusion and/or barrier to users’ bibliographic search.By definition,transliterations should always match the characters they transliterate.If the original name or commonly known Roman (Latin) spelling is desirable for clarification,it can be provided in parentheses.Recording bibliographic data,transcribe/transliterate information as it appears;no research is needed to find the original name,therefore more user friendly;the original name can be provided for access points with non-original form as a variant.GB/T 16159-2012 汉语拼音正词法基本规则 6.2.3的规定:非汉语人名地名的汉字名称用汉语拼音拼写.Wulanfu (乌兰夫) Makesi (马克思).Suggested rewording:Chinese transliterations of non-Chinese personal and place names are to be spelled according to their Chinese pronunciation.If the original name or commonly known spelling in Latin characters is known,it may be provided in parentheses.
[我们建议重写关于非汉语人名、地名的译音规则。根据中文出版物的文献数据,人们不可能知道非汉语人名和地名的拉丁字母的原文书写形式。这样的规则在实际上是无法使用的,而且会造成不一致性。因此,最好不要制定这种条件性的规则,说明什么时候使用拼音,什么时候使用原文形式。这样将会造成混乱,妨碍用户进行文献搜索。根据转写的定义,转写时要始终保持转写字符之间的匹配。如果为了说清楚情况而需要原文的书写形式或者公认的罗马(拉丁)字母拼写方式,可以使用括号提供出来。在记录文献数据时,译音或转写的信息怎样出现就怎样表示,没有必要去搜索它们原来的名字,这样对于用户也会显得更加友好。当然,为了查询方便,也可以把原来的名字作为非原文形式的变体提供出来。根据GB/T 16159-2012 《汉语拼音正词法基本规则》6.2.3的规定:非汉语人名、地名的汉字名称用汉语拼音拼写。例如,Wulanfu (乌兰夫)、Makesi (马克思)。因此,我们建议重写如下:非汉语人名和地名的译音,要根据它们在汉语中的读音来拼写。如果要知道原文的名字或者公认的拉丁字母拼写形式,可以在括号中提供。]
Example for达尔文’s transliteration has no need for apostrophe according to Pinyin rule.
Abbreviation of Chinese names is not a good practice.Chinese characters cannot always be properly represented by Romanization.Abbreviation makes the situation worse.
2014年5月5日至9日,冯志伟受教育部派遣,到美国华盛顿参加了的ISO/TC 46第41届全会。冯志伟在5月7日上午举行的第3工作组(WG3)会议上,就ISO 7098 的修订问题说明了中国的立场。他在发言中回顾了ISO 7098的研制过程,逐一回答了在CD阶段投票中各国对于CD稿提出的意见,并说明了在即将提交的DIS稿中,如何根据各国的这些意见进行修改。
冯志伟的发言被与会专家们评价为“有深度”(in depth)的发言,引起了热烈的讨论,成为了当天WG3会议的关注焦点。
ISO/TC 46 WG3工作会议关于ISO 7098的意见:
Professor Feng Zhiwei gave an indepth expose about the history of Pinyin and the proposals for the Romanization of Chinese.(Slides to be circulated to attendees).After a lively discussion the following resolution was proposed:
TC 46/WG 3 instructs the editor of the revision of ISO 7098 to add a paragraph clarifying the use of separation of syllables especially for automated transcription processing,and to add some material regarding UCS code points in Romanization,and then to forward the text to the TC 46 secretariat for balloting as DIS 7098.
[冯志伟教授在会议上作了有深度的报告,介绍了汉语拼音的历史以及中文罗马字母拼写法提案的过程。他把报告的幻灯片分发给了与会者。经过热烈的讨论之后,会议作出了如下决议:TC 46/WG3要求ISO 7098的编者在文本中增加一段来阐述音节分合的使用,特别是要说明译音的自动处理技术,并且还要增加在中文罗马字母拼写中有关UCS代码的材料,然后向TC 46秘书处提交文本以便作为DIS 7098进行投票。]
冯志伟自始至终参加了第41届全会的全过程,认真参与讨论,与各国代表进行了广泛的接触,争取他们对于ISO 7098修订工作的支持,会议已经把ISO 7098的进一步修订写入了ISO/TC 46第41届全会的决议,作为决议的第7条。
ISO TC 46 第41届全会有关ISO 7098的决议:
RESOLUTION 2014-07:Revision of ISO 7098:
ISO/TC 46 instructs the editor of the revision of ISO 7098 to add a paragraph clarifying the use of separation of syllables,particularly for the automated transcription processing,and to add some material utilizing extended Latin script,and then to forward the text to the TC 46 secretariat for balloting as DIS 7098.
Approved unanimously
RESOLUTION 2014-07 :Révision de l’ISO 7098
ISO/TC 46 demande à l’éditeur de la révision d’ISO 7098 d’ajouter un paragraphe pour clarifier l’utilisation de la séparation en syllabes,notamment pour le traitement automatisé des transcriptions,et d’ajouter des éléments d’information utilisant le jeu de caractères latins étendu,avant d’envoyer le texte au secrétariat du TC 46 pour le vote DIS du 7098.
Approuvée à l’unanimité
[决议2014-07,第7条:修订ISO 7098
TC 46要求ISO 7098修订版的编者在文本中增加一段来阐述音节分合的使用,特别是要说明译音的自动处理技术,并且还要增加扩充拉丁字符使用的材料,然后向TC 46秘书处提交文本以便作为DIS 7098进行投票。
会议期间,冯志伟还分别与英国代表、德国代表交换了对于ISO 7098 的意见,进一步了解了他们的立场。在CD稿的投票中,英国已经投票支持ISO 7098,但是,英国代表在与冯志伟的个人谈话中透露,由英国人威妥玛(Thomas Wade)和詹理斯(H.A.Giles)制定的威妥玛式拼音(Wade Giles)已经在全世界范围内使用了将近200年,而汉语拼音方案只有50多年的历史,因此,英国可能在今后的投票中弃权。德国代表说,他们愿意支持人名拼音的分词连写,但对地名拼音的分词连写,尚持保留意见,德国图书馆界在图书编目时,已经放弃了按词连写,而采用按照音节拼写的方式,这样便于编目工作自动化。因此,德国可能在今后的投票中继续投反对票。英国代表和德国代表的这些意见引起了冯志伟高度的重视。
根据各国对于CD稿提出的意见以及 ISO/TC 46第41届全体会议的精神,我们提出了DIS稿,主要补充和修改的内容如下:
德国在CD投票中投反对票的原因是他们认为拼音分词连写是有歧义的,不便于图书馆编目工作的自动化。但是,我们认为,分词连写比之于按照音节分写更能避免歧义。为了说明这个道理,我们在DIS稿中提出了拼音“歧义指数”(ambiguity index)的概念。
如果不计声调,基本的汉语音节只有405个,这405个汉语音节可以表示全部汉字的读音。而《通用规范汉字表》包含了8 105个通用汉字,在这种情况下,在一般使用中,一个汉语音节平均要表示20个以上的汉字(8 105/405=20.01),不可避免会出现歧义。
也就是说,如果一个拼音音节可以表示一个语言单位,那么它的歧义指数为零;如果拼音音节可以表示两个语言单位,那么,它的歧义指数为 2-1=1;如果拼音音节可以表示三个语言单位,那么,它的歧义指数为 3-1=2,……。
在例1中,拼音音节/bei/可以表示31个语素(或词),也就是31个语言单位,它的歧义指数为 31-1=30;在例2中,拼音音节/jing/可以表示49个语素(或词),也就是49个语言单位,它的歧义指数为 49-1=48。但是,如果把单音节/bei/和单音节/jing/组合成一个双音节的词/beijing/,其歧义指数将明显地减少,因为/beijing/只能表示3个双音节的语言单位,也就是3个词:北京、背景、背静。其歧义指数减少为 3-1=2。
根据ISO/TC 46第41届全会决议的精神,要增加扩充拉丁字符使用的材料,因此,我们在DIS稿中,给汉语拼音的声调符号和标点符号补充了16进制的unicode代码(hexadecimal code,简称hex)。
一声(高平调): —(hex:0304)
二声(升调): /(hex:0301)
三声(降/升调): ∨(hex:030C)
四声(降调): (hex:0300)
表1 汉语带调元音小写字母16进制代码
表3 标点符号16进制代码对照表
根据ISO/TC 46第41届全会决议的精神,要求在ISO 7098的文本中特别说明译音的自动处理问题。因此,我们在DIS稿中,增加了如下内容:在计算机辅助文献工作中,有两种对命名实体进行自动译音的方法。一种是按音节全自动译音,一种是基于规则的按单词半自动译音。
b.bei jing shi
在自然语言处理中,单词切分就是把文本切分为负载意义的语言单位的过程。例如,在英语中,“the white house”可以切分为3个有意义的单位“the”,“white”,“house”,它表示一间白色的房子,而“the White House”则只与一个语言单位相对应,它表示美国总统的官邸。这样的有意义的单位就叫作单词的切分单位(Word Segmentation Units,简称WSU)。对于单词之间有空格的语言,如英语,把文本切分WSU时,只要使用空格作为基础,确定WSU切分的边界就行了,是简单易行的。但是,对于单词之间没有空格的语言,如汉语和日语,或者对于那些只在局部的单词之间有空格的语言,如泰语和韩国语,把书面文本切分为WSU要求使用不同的方法。
在很多应用领域中,需要把文本切分为单词。在翻译中,数出单词的个数是计算翻译工作量大小的主要方法。在翻译记忆系统和机器辅助翻译(Computer-Assisted Translation,简称CAT)的工具中,单词切分是这些系统的一个主要的功能。在术语抽取工具中,单词切分也起着重要的作用,在术语管理和CAT工具中,有时也要提供单词切分的功能。大多数的内容管理系统和数据库都要根据单词来进行搜索。在内容搜索时,也要对文本进行切分,以便使用搜索词进行匹配。此外,搜索功能要求关于单词边界的知识,文本—语音转换系统要在单词的基础上来生成语音,因此要求在单词查询时进行单词切分,等等。各种自然语言处理系统必须把文本切分为单词才能实现其功能。词典资源是根据它的规模来评测的,因而通常都要评估它的单词个数。
ISO 24614-1:201“语言资源管理—书面文本的单词切分—第1部分:基本概念和一般原则”给出了自然语言处理中单词切分的基本概念和一般原则,提出了以可以信赖而且能够复用的方式进行书面文本自动切分的导则,而且这种导则是独立于语言的。
ISO 24614-1:201“语言资源管理—书面文本的单词切分—第2部分:汉语、日语和韩国语的单词切分”提出了汉语、日语和韩国语中切分WSU的具体规则。其中有些规则是这3种语言共同的,尽管每种语言都有自己判别WSU的特殊规则。
b.bei jing shi
c.beijing shi
d.Beijing shi
e.Beijing Shi
根据规则,地名“北京市”首先被切分为/bei/、/jing/、/shi/3个音节,然后把/bei/、/jing/结合成/beijing/,使它与行政区划名/shi/分开,最后再把每一个部分的首字母大写,译音为/ Beijing Shi /。
如果在按词译音过程中出现歧义或其他问题,编辑人员可以根据译音词典,使用人机交互(Human-Computer Interaction,简称HCI)找出合适的命名实体的译音。所以,这样的方法是半自动的。
根据ISO/TC 46第41届全会决议的精神,要求在ISO 7098的文本中增加阐述音节分合的使用的内容,我们在DIS稿中,增加了如下的内容:
《汉语拼音正词法基本规则》(GB/T 16159-2012,中华人民共和国家标准,2012)中包括了音节分割或连接成单词的规则,常用词(名词、动词、形容词、代词等)的拼写规则,固定短语的拼写规则,人名和地名的拼写规则,声调的表示规则,在行末尾的连字符使用规则,等等。
(五)汉语的复姓(traditional compound surnames)和双姓(double surnames)采用不同的拼写形式,增加由两个以上汉字构成的双姓的例子
汉语人名姓与名分写。姓在前,名在后,名的部分连写为一个词。姓和名的首字母大写。复姓连写,中间不加连字符。由两个或两个以上的汉字构成的双姓使用连字符连接,每一个部分的首字母大写。例如,Zhuge Kongming (诸葛孔明),Zhang-Wang Shufang (张王淑芳)。补充了由两个以上的汉字构成的双姓的例子:Xiang-Situ Wenliang (项司徒文良),这个双姓“项司徒”中包含了复姓“司徒”。
把“Lao Zhang tour (老张头儿,older Zhang)”这个例子从有关人名的条目中删除。
把地名的“通名”进一步细化为“行政区划名”(name of jurisdiction)和“地理特征名”(geographical feature name)。相关段落重新表述如下:
专名是多音节词的例子:Xikou Zhen (溪口镇,Xikou town)、Qujiatun Cun (瞿家屯村,Qujiatun village)。
专名和行政区划名都是多音节词的例子:Shenzhen Tequ(深圳特区,Shenzhen Special Economic Zone)。
在2011年教育部语言文字信息管理司召开的专家会议上,专家们考虑到修订后的ISO 7098不仅要在中国使用,而且还要在国外使用,所以,对于非汉语人名、地名的拼写方法,专家们建议在ISO 7098修订稿中提出的规则与《汉语拼音正词法基本规则》(GB/T 16159-2012)应当稍有不同。有的专家认为,对于非汉语人名、地名的拼写,有必要根据“名从主人”的原则,按照罗马字母(拉丁字母)的原文书写;非罗马字母文字的人名、地名,按照该文字的罗马字母译音法拼写,不强求外国人按照汉语译名的读法来读他们自己的人名和地名,也就是不强求外国人改名。但是,为了便于外国人知道他们的名字和地名在汉语中究竟翻译成什么样的发音,也可以在原文后面注上汉字或汉字的拼音,在一定的场合也可以先用或仅用汉字译名的拼音。
根据这次专家会议上专家们的建议,我们采取了“名从主人”的主张,在CD稿中提出了不同于《汉语拼音正词法基本规则》(GB/T 16159-2012)个别规定的做法:非汉语人名、地名按照罗马字母(拉丁字母)原文书写;非罗马字母文字的人名、地名,按照该文字的罗马字母译音法拼写。为了便于阅读,可以在原文后面注上汉字或汉字的拼音,在一定场合也可先用或仅用汉字的拼音。首字母大写。
ISO TC 46秘书处接受了我们提交的DIS稿,并于2014年12月1日至2015年3月1日进行投票。根据ISO的规定,DIS投票在计算比例时,不考虑弃权票。2015年3月1日由ISO/TC 46委员会秘书处N2519号文件公布了投票结果:18个没有弃权的国家中,17票赞成,占94%,大于ISO规定的66.66%的比例,1票反对,占6%,小于ISO规定的25%的比例,DIS获得通过。
The CEAL CTP/CCM Working Group on ISO 7098 Romanization of Chinese submitted CEAL comments with suggested changes on the ISO/CD 7098 draft focusing on section 11,Transcription rules for personal names and geographic names,in October 2013.Most comments were accepted and changes are reflected in the new revised draft ISO/DIS 7098,section 12,Transcription rules for named entities.
From December 9,2014 to January 8,2015,the CEAL CTP/CCM Working Group conducted a survey among CEAL members.The survey requested input on the ISO/DIS 7098 new revised draft sections,especially on Rules 11 and 12.14- 17,in comparison to the ALA-LC Romanization table on Chinese.The survey included 13 questions that were grouped into four categories by importance.A total of 31 members participated in the survey.The majority voted to “support” or “support with modified suggestions” ISO/DIS 7098 instructions in Sections 11-12.Some voted not to support or voiced concerns.Most of the suggestions and concerns,whether members voted to support or not to support,include reasons.We have summarized members’ concerns in the table below.The entire survey results are available here.
CEAL also suggest the ISO website maintains lists of those “proper noun” names for non-Chinese places,languages,tribes,and religions and their variants and abbreviations which will be affected by new rules,if adopted.Users should be able to locate more inclusive lists in order to follow the standards correctly,and to avoid ambiguity and be consistent in practice.Besides the implications of Romanization rules,we sincerely hope that ISO considers rule change consequences that will affect scholarly communication,global information retrieval and discovery,data integrity,and the costs associated with consolidation and conversion of existing data.
[美国东亚图书馆协会CEAL CTP/CCM工作组在2013年10月就ISO 7098的CD稿,特别是对于第11节“人名、地名译音规则”部分提出了意见,提交了美国东亚图书馆协会的意见。我们的绝大多数意见都被ISO 7098的DIS 稿接受了,并且做了相应的修改,这些修改反映在DIS稿第12节“命名实体译音规则”中。
从2014年12月9日至2015年1月8日,美国东亚图书馆协会CEAL CTP/CCM工作组在CEAL成员中进行了调查。这项调查要求对ISO 7098的DIS新修改稿,特别是对于11节和12.14-17节中的规则,与ALA-LC的中文罗马字母拼写表ALA-LC Romanization table on Chinese 进行对比,然后作出回答。调查包括13个问题,按照重要程度分为4大类。共有31位成员参加了这项调查。对于ISO 7098 DIS稿中11~12节的规则,大多数人的回答是“支持”或“带有修改建议的支持”,有的回答是“不支持”,或者没有表态。我们把参加调查的成员的回答做成了一个表。全部的调查结果可以点击here 得到。
Double surnames should not be hyphenated.When a wife adopts husband surname,she still keep her own surname intact.If hyphenated,it becomes a one new surname and no longer is true double surnames.Replace this statement:“The two-character or multi-character double surnames are to be written together with a hyphen and the initial letters of both capitalized.” with “The double surnames without philological permanence (e.g.,those linked by marriage or other conventions) are to be written separately with the initial letters of both capitalized”.Remove hyphen from examples.
Adjuncts such as “xiao”,“lao” and “da” should not be capitalized unless they appear at the beginning of a sentence.If the character is part of a given name,follow the same practice for given names.
Religion names and their followers are written as one word with the initial letter capitalized.In addition,provide and maintain an online list of religion names (and whether to include all variant forms) to make it easy to apply.
Germany disagrees for the following reasons:Giving value judgements (“…quality … is good,but …”) is not appropriate in an standard.Replace with:
Fully automated transcription procedures generating single syllables separated from each other can be used by any application or environment in which the results are regarded appropriate,especially in those that store the Latin transcription together with the original Chinese characters.
(二)接受美国的意见,删除双姓之间的连字符。把例子改为:Zhang Wang Shufang (张王淑芳)、 Xiang Situ Wenliang(项司徒文良)。
(三)接受美国的意见,词头“小,老,大,阿”的首字母小写,以免与“肖”“劳”“达”等姓相混淆。例如,xiao Liu(小刘,younger Liu)、lao Qian(老钱,older Qian)、da Li(大李,older Li)、a Gui(阿贵,Mr.Gui)。
在6月3日上午的WG3会议上,冯志伟又说明了各国对于ISO 7098的DIS稿提出的意见以及我国对于这些意见的处理情况,并就ISO 7098的修订问题重申了中国的立场。
会议期间,冯志伟还分别与德国、日本、英国、美国的代表交换了对于ISO 7098 的意见,进一步了解了他们的立场,表示愿意接受代表们提出的绝大多数意见,并在最终的修订稿中做相应的修改。冯志伟特别与英国代表进行了沟通,希望英国一如既往地支持我国的汉语拼音方案,不要投弃权票。
2015年6月5日的全体会议决定ISO 7098跳过“国际标准最终草案”(Final Draft of International Standard,简称FDIS)阶段,直接出版,并且把这样的决定作为决议的第2条和第3条写入全会的决议,表明了这次会议对于ISO 7098国际标准修改工作的高度重视。
ISO/TC 46第42届全会关于ISO 7098的决议:
NEW resolutions
ISO/TC 46 RESOLUTION 2015-02:Publication of revised ISO 7098
ISO/TC 46 acknowledges the positive results of voting on ISO DIS 7098 "Romanization of Chinese" (N 2519) and,in view of the technical changes required,resolves to circulate a CIB to publish the DIS edited by WG3 in response to the DIS ballot.TC 46 instructs its secretariat to include the latest revised draft with the CIB.
Approved unanimously
NOUVELLES résolutions
ISO/TC 46 RESOLUTION 2015-02 :Publication de la norme révisée ISO 7098
ISO/TC 46 prend acte des résultats positifs du vote sur ISO DIS 7098 “Romanisation du chinois" (N 2519) et décide,au regard des modifications techniques nécessaires,de distribuer un CIB pour publier le DIS édité par le GT3 en réponse au vote DIS.Le TC46 enjoint son secrétariat d’inclure la dernière proposition révisée avec le CIB.
Approuvée à l’unanimité.
[ISO/TC 46 决议2015-02,第2条:出版ISO 7098的修订稿
ISO/TC 46认识到ISO DIS 7098“中文罗马字母拼写法”的投票已经得到了正面的结果(文件N2519),为了做技术上的修改,ISO/TC 46决定进行委员会内部投票(Committee Internel Balotting,CIB),由WG3根据DIS投票结果编辑之后出版ISO 7098的DIS修订稿。TC 46将通知秘书处把CIB的意见纳入最终修订稿中。
ISO/TC 46 RESOLUTION 2015-03:Expression of thanks to Professor Feng
ISO/TC 46 wishes to express its gratitude to Professor Feng Zhiwei for bringing his experience to the successful completion of the revision of ISO 7098 "Romanization of Chinese".
Approved unanimously
ISO/TC 46 RESOLUTION 2015-03 :Remerciements au Professeur Feng
ISO/TC 46 souhaite exprimer sa gratitude au ProfesseurFeng Zhiwei pour avoir contribué par
son expérience à la bonne réussite de la révision de la norme ISO 7098 "Romanisation du Chinois".
Approuvée à l’unanimité
[ISO/TC 46决议2015-03,第3条:对冯志伟教授表示感谢
ISO/TC 46对冯志伟教授表示感谢,因为冯志伟教授以他丰富的经验推动了ISO 7098国际标准“中文罗马字母拼写法”的修改工作,从而使得这一工作得以卓有成效地取得成功。
会后,我们把经过修改的DIS稿提交ISO TC 46秘书处,他们于2015年7月27日发给ISO/TC 46的各成员国进行委员会内部投票(Committee Internel Balotting,CIB),CIB投票于2015年9月18日结束。根据ISO国际标准化组织的规定,此次投票时,不了解汉语拼音的国家可以弃权,因此,计算票数时只考虑没有弃权的国家。ISO/TC 46秘书处N2562号文件公布了投票结果:ISO/TC 46委员会中没有弃权的19个国家都投了赞成票,获得全票通过。
“Please find attached the positive results following the distribution on the 27th July 2015 of the final text of SO 7098 Romanization of Chinese-document N2562 for electronic ballot closed on 2015-09-18.”
[请注意所附N2562号文件是在2015年7月27日发出的ISO 7098中文罗马字母拼写法的最终文本的正面投票结果,电子投票已于2015年9月18日结束。]
日本在CD和DIS投票中弃权,英国在DIS投票中弃权,经过我们与日本、英国专家的深入交流和沟通,在这次关键性的CIB投票中,他们改变态度投了赞成票。德国在过去的投票中都是投反对票,经过我们与德国专家以及德国标准化研究所(DIN)的深入交流和沟通,在这次CIB投票中,最后改变态度支持我们,投了赞成票。这说明,由我国起草并修订的ISO 7098已经得到了世界各国的支持。我们获得了全胜!
If there is a room for cataloger’s judgement,one may romanize it as “Agui” considering “阿” as part of a given name “阿贵”.
Adjuncts such as “xiao”,“lao”,“da” and “a” should not be capitalized unless they appear at the beginning of a sentence.E,g,xiao Liu (小刘,younger Liu),lao Qian (老钱,older Qian),da Li (大李,older Li),a Gui (阿贵,Mr.Gui).If the character “xiao”,“lao”,“da” and “a” is part of a given name,follow the same practice for given names.E.g.Wang Xiaojuan (王小娟),Zhao Laoshan (赵老山),Li Daqin (李大勤),Lou Ashu (娄阿鼠).
[“老”“小”“大”等附加成分的首字母小写,除非它们出现在句首才大写。例如,xiao Liu (小刘,younger Liu)、lao Qian (老钱,older Qian)、da Li (大李,older Li)、a Gui (阿贵,Mr.Gui)。如果“老”“小”“大”等汉字是人名的一部分,则按照人名的拼写规则来拼写。例如,Wang Xiaojuan (王小娟) 、Zhao Laoshan (赵老山)、Li Daqin (李大勤)、Lou Ashu (娄阿鼠)。]
之后,我们向ISO秘书处提供了最后的版本,2015年12月15日,ISO 7098的第三版,即ISO 7098:2015由ISO总部正式出版。
ISO 7098 and Its Application in Human-Computer Interaction*Keynote Speech of Feng Zhiwei at 42th plenary meeting of ISO/TC 46 on 2015-06-02.
Feng Zhiwei
1.Challenge of Computer to Chinese Characters
We are in the information epoch.In this epoch,computer and network play more and more important rule in human life.The language is an effective carrier of information.In information epoch,the computer with only more than 60 years challenged to the Chinese characters with more than 6000 years.
The Chinese character is a kind of ideophonographic character.The ideophonographic character is a graphic character that represents an object or a concept and associated sound element.The Chinese characters are a big character set.The most of character set in the world only includes a limited number of characters.The character number included in the character set of different languages is as following:
The number of the Chinese characters is much more than above languages.Following is the Chinese character number in different Chinese dictionaries from ancient China to Modern China:
The Chinese character number in ZHONGHUA ZIHAI arrives to 85,000,but some Chinese characters in this dictionary only are meaningless or soundless signs,they can’t be considered as the authentic Chinese characters.Generally the number of the Chinese characters is more than 60,000.It is the biggest character set in the world.
In 20th century,some experts try to invent the Chinese typewriter to type Chinese characters.The Chinese character typewriter is different from the Remington typewriter which based on Latin alphabet.It is extremely complicated and cumbersome.For example,the Chinese typewriter invented by Wally Johnson,which now is kept in the office of Vickie Fu Doll,Chinese and Korean Studies Librarian in the East Asian Library of the University of Kansas,USA*Victor Mair,Chinese typewriter,Language Log,June 30,2009..
The main tray-which is like a typesetter’s font of lead type-has about two thousand of the most frequent Chinese characters.Two-thousand Chinese characters are not nearly enough for literary and scholarly purposes,so there are also a number of supplementary trays from which less frequent Chinese characters may be retrieved when necessary.
The pieces of character types are tiny and all of a single metallic shade in the tray,it becomes a maddening task for typist to find the right character.Another problem is the principle upon which the characters are ordered in the tray.By radical of Chinese characters? By total stroke count of Chinese characters? Both of these methods would result in numerous Chinese characters under the same heading.By rough frequency of Chinese characters? By telegraph code of Chinese characters ? Both of these methods need the good memory of typist.Unfortunately,nobody seems to have thought to use the easiest and most user-friendly method of arranging the Chinese characters according to their pronunciation.For all of the above reasons,using a Chinese typewriter was an excruciating experience.Following is a precious photograph of Wally Johnson working at his typewriter:necessary.
And another is the photograph of Wally Johnson taking a short break in place:
These photos vividly convey the suffering that is associated with using a Chinese typewriter.The computer also uses the Remington typewriter as the keyboard for human and computer interaction.Obviously,above Chinese typewriter can not be used as the keyboard of computer for human and computer interaction.The design of computer keyboard is based on Latin alphabet system.If we use Latin alphabet to represent the pronunciation of the Chinese characters,then we can get the easiest and most user-friendly method to input or output the Chinese characters according to their pronunciation.Therefore,the Romanization of Chinese is very helpful for Human Computer Interaction.
2.Romanization of Chinese
The words in a language,which are written according to a given script (the converted system),sometimes have to be rendered according to a different system (the conversion system).The conversion is indispensable in that it permits the univocal transmission of a written message between two countries using different writing systems or exchanging a message,the writing of which is different from their own.
There are two basic methods of conversion of a system of writing:transliteration and transcription.
Transliteration is the operation which consists of representing the characters of an entirely alphabetical character or alphanumeric character system of writing by the characters of the conversion alphabet.In principle,this conversion should be made character by character:each character of the converted alphabet is rendered by one character,and only one character of the conversion alphabet,to ensure the complete and unamb- iguous reversibility of the conversion alphabet into the converted alphabet (re-transliteration).
Transcription is the operation which consists of representing the characters of a language,whatever the original system of writing,by the phonetic system of letters or signs of the conversion language.A transcription system is of necessity based on the orthographical conventions of a conversion language and its alphabet.The users of a transcription system must have the knowledge of the conversion language to be able to pronounce the characters correctly.Transcription is not strictly reversible.The transcription may be used for the conversion of all writing systems.It is the only method that can be used for systems that are not entirely alphabetical and for all ideophonographic writing systems as Chinese.
Romanization is the conversion of non-Latin writing systems to the Latin alphabet by means of transliteration or transcription.To carry out Romanization,it is possible to use either transliteration or transcription or a combination of these two methods,according to the nature of the converted system.
Many years ago,in 1958-02-11,the National People’s Congress of China approved The Scheme for the Chinese Phonetic Alphabet (Hanyu Pinyin,or Pinyin)[1][2].This scheme is based on the principle of the transcription in Romanization.So we call this scheme as Chinese Romanization.
3. ISO 7098 Information and Documentation:Romanization of Chinese
In 1979,Chinese delegate proposed to take the Scheme of Chinese Phonetic Alphabet as the international standard in ISO TC46 meeting (Warsaw).In 1982,ISO 7098 Documentation and Information-Chinese Romanization was approved at ISO TC46 meeting (Nanjing) as the first edition.In 1991,ISO 7098 was technically revised.It became the second edition (ISO 7098:1991).
In China,Pinyin,the international standard for Romanization of Chinese,gives impetus to new information technique in the information epoch.In computer application and mobile communication,it is used to input and output the Chinese characters in computer,web and mobile phone.Now more than 80% Chinese used Pinyin to deal with Chinese information processing.Pinyin became an important tool for human-computer interaction.In China,Pinyin also is used in natural language processing and language technique (machine translation,information extraction,information retrieval,text data mining,etc.).
In the international level,Pinyin has been adapted by most libraries around the world.It provides access to bibliographic material of the Chinese language in documentation (including traditional documentation and computerized documentation).In the computerized documentation field,Pinyin plays active role in human-computer interaction.In the end of 20 century,Library of Congress (USA) used Pinyin to catalogue Chinese books (700,000 books) in the library.In the same tine,the Bibliothèque universitaire des langues et civilisations in Paris asked a team of sinological librarians from all over the country,including the Bibliothèque Nationale de France,to ask their opinion on Chinese word segmentation of ISO 7098,in order to establish a common guideline on Chinese word segmentation in Pinyin.The National Library of Australia also adapted Pinyin for Chinese Romanization in documentation.Now more and more people in the world learn Chinese as a foreign language by the means of Pinyin.Pinyin became an important tool for teaching and learning Chinese.In Computer-Assisted Chinese Language Learning,Pinyin is used for input and output of Chinese characters in the human-computer interaction.
These facts show,Pinyin is a useful tool in human-computer interaction not only in China,but also in the rest of world.
4.Index of ambiguity for Chinese syllables
The number of basic Chinese syllables is only 405.These 405 Chinese syllables can represent the pronunciation of all Chinese characters.List of Standard Chinese Characters for General use (2012) includes 8105 commonly-used the Chinese characters.In this case,one Chinese syllable has to represent in average more than 20 Chinese characters (8,105/405=20.01) for the general use.
In List of Standard Chinese Characters for General Use (2013),the Pinyin syllable /bei/ can represent the following 31 Chinese characters:
北 杯 卑 背 椑 悲 碑 鹎 贝 孛 邶 狈 备 钡 倍 悖 被 琲 棓 辈 惫 焙 蓓 碚 鞁 褙 糒 鞴 鐾 呗 臂
In List of Standard Chinese Characters for General Use (2013),the Pinyin syllable /jing/ can represent the following 49 Chinese characters:
This means that the Pinyin syllable is ambiguous in representation of the Chinese characters.The ambiguity index is a mathematical description of the degree of ambiguity of Pinyin syllables.The ambiguity index of a Pinyin syllable (I) equals the number of Chinese linguistic units (Chinese characters,Chinese morphemes or Chinese words) represented by this Pinyin syllable (N) minus 1.The formula is as follows:I=N-1
This formula means that if the Pinyin syllable can represent N Chinese linguistic units,its ambiguity index (I) equals N-1.If the Pinyin syllable can only represent one Chinese linguistic unit,its ambiguity index is zero.If the Pinyin syllable can represent two Chinese linguistic units,its ambiguity index is 2-1=1.If the Pinyin syllable can represent three Chinese linguistic units,its ambiguity index is 3-1=2,...etc.
In example 1,the Pinyin syllable /bei/ can represent 31 Chinese characters,its ambiguity index is 31-1=30;in example 2,the Pinyin syllable /jing/ can represent 49 Chinese characters,its ambiguity index is 49-1=48.However,if we combine these two monosyllables /bei/ and /jing/ to form a bi-syllabic word /beijing/,the ambiguity index will be reduced,because /beijing/ can only represent three Chinese bi-syllabic words:北京,背景,背静.The ambiguity index of /beijing/ is reduced to 3-1=2.
If we capitalize the first letter of /beijing/ as /Beijing/,the ambiguity index will be reduced to 1-1=0.It means that /Beijing/ is a Pinyin word without ambiguity,its sense number is only 1.The sense of /Beijing/ is exactly the name of the capital of China:北京
Therefore,if we link different Pinyin monosyllables to form a polysyllabic Chinese word,the ambiguity index of the Pinyin syllable will be reduced.It is an advantage to link different monosyllables to form one polysyllabic Chinese word.
However,at present days,in Chinese linguistics,there is not clear definition of Chinese word,it is difficult to decide the boundary (dividing line) of a Chinese word,and of course it will bring the difficulty to link the monosyllables to form a polysyllabic Chinese word.But the definition of Chinese proper noun is relatively clear.It is not so difficult to link different monosyllables to form a Chinese polysyllabic proper noun (personal names,geographic names,language names,ethnic names,tribe names,religion names,etc),because the boundary of Chinese polysyllabic proper noun is easy to decide according to the standards or regulations.
By this reason,at the 38th plenary meeting of ISO/TC 46 (6 May 2011,Sydney),the Chinese delegate proposes to further update ISO 7098:1991 to reflect current Chinese Romanization practice and new development not only in China,but also in the world.At the 39th plenary meeting of ISO/TC 46 (11 May 2012,Berlin),ISO TC 46 resolves to accept the China’s proposal at Working Draft (WD) stage.On 5 November 2013,the CD ballot is approved.At the 41th plenary meeting of ISO/TC 46 (4 May 2014,Washington D.C.),the Chinese delegate submitted the Draft of International Standard (DIS) revised on the comments at the CD ballot stage.On 1 March 2015,the DIS ballot was approved.
In ISO 7098 updating version,the Chinese delegate proposed the suggestions for the transcription rules of personal names,geographic names,language names,ethnic names,tribe names and religion names in Chinese language.We believe that this kind of transcription will be the first step for Chinese transcription based on the Chinese word (including monosyllabic word and polysyllabic word,etc).
The detailed spelling rules of personal names and geographical names should be alphabetized according to the regulations Spelling Rules for Chinese Personal Names and Spelling Rules for Chinese Geographical Place Names (the part of Chinese Geographical Names).
The detailed spelling rules of common words are more complex than the rules of these proper nouns (naming entities).The rules of pinyin orthography for Chinese common words are included in the National Standard of China Basic Rules for Hanyu Pinyin Orthography (GB/T 16159-2011).
In ISO 7098 updating version,Chinese delegate proposed and shall propose the following suggestions for the transcription rules of personal names,geographic names,language names,ethnic names,tribe names and religion names in Chinese language.We believe that this kind of transcription for the naming entity will be the first step for Chinese transcription based on the Chinese word (including polysyllabic common word and polysyllabic proper noun,etc).The Chinese Romanization will play more and more important roles in human-computer interaction.
Updating ISO 7098 Romanization of Chinese:from WD to DIS
Feng Zhiwei
Abstract:At the 38th plenary meeting of ISO/TC 46 (Sydney,May 2011),the Chinese delegate proposed to update ISO 7098:1991 to reflect the current Chinese Romanization practice and new development not only in China,but also in the world.At the 39thplenary meeting of ISO/TC 46 (Berlin,11 May 2012),ISO/TC 46 resolved to accept China’s proposal at Working Draft (WD) stage.On 5 November 2013,the Committee Draft (CD) ballot was approved.On 1 March 2015,the Draft of International Standard (DIS) ballot was approved.At the 42nd plenary meeting of ISO/TC 6 (Beijing,5 June 2015),ISO/TC 46 resolved that ISO 7098 will go to the publication phase after the revised latest text has been checked by the ISO/TC 46 leadership and sent for committee-internal balloting (CIB) together with the text.On 18 September 2015,the CIB ballot was approved.On 15 December 2015,ISO 7098:2015 was published by ISO.This paper introduces the procedure of updating ISO 7098 from WD to DIS.
Key words:ISO 7098;Romanization of Chinese;Committee Draft;Draft of International Standard;Ambiguity index of Chinese Pinyin syllable