READING BETWEEN THE LINES

2018-03-06 20:18:01ByLaurenceCoulton
Beijing Review 2018年7期

By+Laurence+Coulton

Chinese technology and e-commerce giant Alibaba has developed artificial intelligence (AI) software capable of outperforming humans on a rigorous global reading comprehension test, according to the companys English language news platform Alizila. The breakthrough marks the first time machines have surpassed people on languagebased tests and could have a major impact on the future of employment worldwide.

On January 5, a machine-learning model developed by Alibabas AI research branch scored higher than humans on the Stanford Question Answering Dataset (SQuAD), a credible system for the evaluation of machine reading. The achievement was confirmed in a tweet from one of the tests creators, Pranav Rajpurkar, on January 11. Alibaba scored 82.44, while the figure for human performance registered by Stanford University stood marginally lower at 82.304. Microsoft achieved a similar feat, with its entry registering a score of 82.65 a day later than Alibabas.

Man and machine

Competition between humans and AI is nothing new, with computers long able to challenge people in strategically complex games such as chess. The ancient Chinese board game Go(weiqi), however, had always been considered a much greater trial for machines to overcome, with more potential board configurations than there are atoms in the universe, at least until Google took up the challenge in 2014. The companys AlphaGo research project was created to test how well a neural network using deep learning could compete at Go.

Beating its first human opponent in 2015, AlphaGo relied on mechanisms similar to the human brain to improve its playing style from experience, before beating a world champion over five highly publicized rounds in Seoul in 2016. However, according to experts, games such as Go and chess rely more heavily on memory and computing power, which computers have in abundance, while language-based tests of the kind evaluated by SQuAD are far more difficult for AI to contend with.

SQuAD, the system used for the evaluation of Alibabas technology, was created by researchers at Stanford University as a means of testing the ability of machine-learning models developed by corporations, individuals and academic institutions. Launched in 2016, it consists of over 100,000 questions on more than 500 Wikipedia articles, where the answer to every question is a segment of text from a reading passage. It is significantly larger than previous reading comprehension datasets and correspondingly more challenging. According to the developers, SQuAD is unique compared with other benchmarks because its means of testing is hidden. Teams submit their code which is run on a test set that is not publicly readable, crucial to preserving the integrity of the test results.endprint