王培 赵茫茫
【Abstract】This article introduces the application of a green software of corpus tool named AntConc. The article mainly focuses on its glossary function, analysis of frequency and concordance of lexical chunks, which makes the translation work more formal and standard.
【Key words】AntConc; machine translation; corpus
Introduction
As we know, the accuracy of the words or terminology determines directly the quality of the translation in machine translation. Therefore, this article mainly focused on the handling of words or lexical chunks before translating. Among various kinds of corpus softwares, AntConc is a widely-used one for its easy operation and free of charge.
1. A brief introduction of AntConc
Antconc is a toolkit for corpus analysis. In the main interface of AntConc, there are several common tools. One of the tools is Concordance, a powerful function which is showed with KWIC(key word in context) form and also the core part of modern corpus techniques. And other tools options the users can use are Concordance Plot, Clusters, Collocates, Word List and Keyword List. There are two specific cases presented here to introduce the application of AntConc in translating.
2. Two cases of application of AntConc
2.1 English to Chinese translation case
Here is an English article titled “Will oil be the kiss of death for recovery?”. First, lets analyze its frequency of words. After we have spelling-checked the text with Microsoft Word and transformed it to TXT. format, we need to import the text to AntConc and obtain an overview of it: total number of word types is 446; total number of word tokens is 934. Following is the frequency table of this text:
Table 2-1 The analysis result of words frequency in the original text
Rank Freq Word
1
2
3
4
5
6
7 44
38
28
26
25
17
17 the
a
to
of
oil
in
prices
From table 1, we know the article “the” ranks first, but this information is useless for our translation, so we need to move down. As we see, the 2 to 4 in the ranking are all articles or pronouns, what we need to focus is the words with practical meaning. Then we should take advantage of Concordance to see the allocations with it in the context. And we discover that the “oil prices” has appeared many times, so we could translate this phrase first and recorded it. Then when we proceeding the machine translation afterwards, it will help us translate more accurately. From these we will get a general idea about the text that it is about the global oil prices in recent years and the overall trend is growing. By translating these words in advance will help us a lot later.
2.2 Chinese to English translation case
Now we will introduce how to make use of AntConc to help translating Chinese text into English. This example is about a Chinese medical text translated into English. After the tokenization, we import the Chinese text into AntConc. After the settings, we need to employ the Word List function again, and this time the overview of the text is: total number of word types is 215; total number of word tokens is 234. Following is the frequency table of this text:
Table 2-2 The analysis result of words frequency in Chinese text
Rank Freq Word
1
2
3
4
5
6
7 18
15
15
12
11
11
9 类风湿性关节炎
的
下丘脑-垂体-肾上腺
在
肾阳
炎症
慢性的
From the table above, we easily know that it is a medical text about “关节炎” and its treatment, and this is helpful for us to grasp the text meaning from a macro point of view. Specifically, we see there are some technical terms about medicine and computer-aided translation may meet some problems with these jargons, and resort to human translation at last. So here we should translate these high-frequency words first and ease the difficulty of computer-aided translation. With this process, we can improve the translating efficiency a lot and avoid repeating translation and revision later.
3. Conclusion
Based on above analysis, we have seen some very useful functions of AntConc. It does help us a lot when processing the original text before machine translation and its functions are of practical use.
All in all, no matter human translation or machine translation, they all have their advantages and disadvantages. If we want to make good use of machine translation and make it serve for us better, the corpus software like AntConc will help a lot.
References:
[1]Sinclair,J.1991.Corpus,Concordance,Collocation[M].Oxford:Oxford University Press.
[2]Kumiko Tanaka-Ishii,Yuichiro Ishii.Multilingual phrase-based concordance generation in real-time[J].Information Retrieval,2007,10(3).
[3]魏长宏.机器翻译的译前处理[J].科协论坛,2008,(9):93-94.