艾德·金特 张雅晖/译
The AI-powered chatbot ChatGPT is taking the Internet by storm with its impressive language capabilities, helping to draw up legal contracts as well as write fiction. But it turns out that the underlying technology could also help spot the early signs of Alzheimers disease, potentially making it possible to diagnose the debilitating condition sooner.
人工智能聊天机器人ChatGPT正凭借其惊人的语言能力风靡互联网,它可以帮助起草法律合同,也能帮忙写小说。但事实证明,这项基础技术还能帮助发现阿尔茨海默病的早期症状,从而更快确诊这种令人衰弱的病症。
Catching Alzheimers early can significantly improve treatment options and give patients time to make lifestyle changes that could slow progression. Diagnosing the disease typically requires brain imaging or lengthy cognitive evaluations though, which can be both expensive and time-consuming and therefore unsuitable for widespread screening, says Hualou Liang a professor of biomedical engineering at Drexel University in Philadelphia.
盡早发现阿尔茨海默病可以极大提高治疗方案的选择空间,并给患者时间去改变生活方式,进而延缓病情发展。费城德雷塞尔大学生物医学工程的梁化楼教授说,诊断这种疾病通常需要做脑部成像或长期的认知评估,可能昂贵且耗时,因此不适用于广泛筛查。
A promising avenue for early detection of Alzheimers is automated speech analysis. One of the most common and noticeable symptoms of the disease is problems with language, such as grammatical mistakes, pausing, repetition, or forgetting the meaning of words, says Liang. This has led to growing interest in using machine learning to spot early signs of the disease in the way people talk.
自动语音分析是早期检测阿尔茨海默病的一个途径,很有发展前景。梁教授说,这种疾病最常见和最明显的症状之一就是语言出现问题,比如语法错误、停顿、重复或忘记语词含义。因此,运用机器学习来检测人们说话方式中隐现的疾病早期迹象已经引起日益广泛的关注。
Normally this relies on purpose-built models, but Liang and his colleagues wanted to see if they could repurpose the technology behind ChatGPT, OpenAIs large language model GPT-3, to spot the telltale signs of Alzheimers. They discovered it could discriminate between transcripts of speech from Alzheimers patients and healthy volunteers well enough to predict the disease with 80 percent accuracy, which represents state-of-the-art performance.
通常情况下,机器学习要依靠专门构建的模型,但梁教授和他的同事们想尝试看看能否重新调整ChatGPT(OpenAI的大语言模型GPT-3)的底层技术,用来检测阿尔茨海默病的警示迹象。他们发现,ChatGPT可以很好地区分阿尔茨海默病患者和健康实验志愿者的语音转录文本,预测该病的准确率达到80%,这展现了其最先进的性能。
“These large language models like GPT-3 are so powerful they can pick up these kinds of subtle differences,” says Liang. “If the subject has some kind of issue [involving] Alzheimers, and thats already reflected in the language, the hope is that we can use machine learning to pick up these kinds of signals that allow us to do early diagnostics.”
“像GPT-3这样的大语言模型非常强大,足以捕捉到那些细微差异。”梁教授说,“如果研究对象有某种(涉及到)阿尔茨海默病的问题,且这种问题已经反映在语言之中,我们就有望能够利用机器学习来捕捉到这些信号,从而得以进行早期诊断。”
The researchers tested their approach on a collection of 237 audio recordings taken from healthy volunteers and Alzheimers patients, which were converted to text using a pre-trained speech recognition model. To enlist the help of GPT-3, the researchers made use of one of its less well-known capabilities. Its API makes it possible to feed a chunk of text into the model and get it to spit out what is known as an “embedding”—a numerical representation of a piece of text that encodes its meaning and can be used to assess its similarity to other text.
研究人員以收集到的237份健康志愿者和阿尔茨海默病患者的录音作为样本,检验了他们的方法,这些录音由预先训练好的语音识别模型转换成文本。研究人员利用GPT-3不太起眼的一个功能来寻得帮助。GPT-3的API可以先将大段文本输入至模型,然后使其输出所谓的一段“嵌入”——由数字表达的一段文本,对文本含义进行编码,可用于评估其与他类文本的相似性。
While most machine learning models deal with word embeddings, one of the novel features of GPT-3, says Liang, is that its powerful enough to produce embeddings for entire paragraphs. And because of the models vast size and the huge amount of data used to train it, it is able to produce very rich representations of the text.
梁教授说,大多数机器学习模型都可以词嵌入,但GPT-3有一个新性能,强大到可以生成整个段落的嵌入。凭借巨大的模型规模和海量训练数据,它能够生成非常丰富的文本表达。
The researchers used this capability to create embeddings for all of the transcripts from both Alzheimers patients and healthy individuals. They then took a selection of these embeddings, combined with labels to say which group they came from, and used them to train machine-learning classifiers to distinguish between the two groups. When tested on unseen transcripts the best classifier achieved an accuracy of 80.3 percent, as reported in a paper in PLOS Digital Health.
研究人员利用该性能为阿尔茨海默病患者和健康个体的所有语音转录文本创建了嵌入。之后,他们对这些嵌入进行了筛选,加标签明示分组,并用它们训练机器学习分类器来区分这两类人群。正如《科学公共图书馆·数字健康》上的一篇论文所称,在对未见过的转录文本进行测试时,最优分类器达到了80.3%的准确率。
That was significantly better than the 74.6 percent the researchers achieved when they applied a more conventional approach to the speech data, which relies on acoustic features that have to be painstakingly identified by experts. They also compared their technique to several cutting-edge machine-learning approaches that use large language models too but include an extra step in which the model is laboriously fine-tuned using some of the transcripts from the training data. They matched the performance of the top model and outperformed the other two.
这明显优于研究人员采用更传统方法处理语音数据所达到的74.6%的准确率,而传统方法必须靠专家费力识别声学特征。他们还将自己的技术与另外几种尖端的机器学习方法进行了比较,这些方法也使用大型语言模型,但却多了一个步骤,即使用训练数据的一些转录文本对模型进行劳力费神的微调。该技术的表现与其中最顶级的模型不相上下,赢过了另外两种。
Interestingly, when the researchers tried fine-tuning, the GPT-3 model performance actually dropped. This might seem counter-intuitive, but Liang points out that this is probably due to the mismatch in size between the vast amount of data used to train GPT-3 and the small amount of domain-specific training data available for fine-tuning.
有趣的是,研究人员尝试微调后,GPT-3模型的性能反而下降了。这看似有悖常理,但梁教授指出,这可能是用于训练GPT-3的大量数据和可用于微调的特定领域少量训练数据间的大小不匹配所致。
While the team does achieve state-of-the-art results, Frank Rudzicz, an associate professor of computer science at the University of Toronto, says relying on privately owned models to carry out this kind of research does raise some problems. “Part of the reason these closed APIs are limiting is that we also cant inspect or deeply modify the internals of those models or do a more complete set of experiments that would help elucidate potential sources of error that we need to avoid or correct,” he says.
虽然该团队的确取得了一些最先进的成果,但多伦多大学计算机科学副教授弗兰克·鲁基奇表示,依赖私有模型进行此类研究确实会带来一些问题。“这些封闭的API存在局限的部分原因是,我们不能检查或深入修改这些模型的内部构建,也不能执行一套更为完整的实验来帮助阐明需要避免或纠正的潜在错误源。”他如是分析。
Liang is also open about the limitations of the approach. The model is nowhere near accurate enough to properly diagnose Alzheimers, he says, and any real-world deployment of this kind of technology would be as an initial screening step designed to direct people toward a specialist for a full medical evaluation. As with many AI-based approaches, its also hard to know exactly what the model is picking up on when it detects Alzheimers, which may be a problem for medical staff. “The doctor, very naturally would ask why you get these results,” says Liang. “They want to know what feature is really important.”
梁教授對该方法的局限性也开诚布公。他说,该模型目前还远不足以精确诊断出阿尔茨海默病,这种技术的任何实际应用将仅限于作为最初的筛查手段,旨在引导人们向专家寻求全面的医学评估。同许多基于人工智能的方法一样,很难准确知道该模型在检测出阿尔兹海默病时捕捉到了什么,这对医疗人员来说可能是个问题。“医生自然而然会问你这些结果是怎么得来的。”梁教授说,“他们想知道什么特征是真正重要的。”
Nonetheless, Liang thinks the approach holds considerable promise and he and his colleagues are planning to build an app that can be used at home or in a doctors office to simplify screening of the disease.
尽管如此,梁教授认为这一方法前景相当好,他和同事正计划开发一款可以在家里或医生诊室使用的应用程序,以简化阿尔兹海默病的筛查过程。
(译者单位:对外经济贸易大学)