
2024-12-31 余昊赵超群杨建萍
浙江理工大学学报 2024年11期

摘 要: 为了进一步提高pAUC(Partial area under curve)估计精度和医学诊断测试精确性,提出了一种基于密度比模型的pAUC半参数估计方法,并从理论和仿真两个方面研究其性质。首先,根据密度比模型,用半参数极大似然估计方法得到了pAUC半参数估计量,并用大样本理论分析了它的统计性能;然后,对pAUC半参数估计方法在实际应用中的性能进行了仿真,并与现有精度较高的pAUC非参数估计方法进行比较。研究发现,pAUC半参数估计量不仅具有相合渐近正态性等重要的统计性质,而且比已有的非参数pAUC估计量具有更高的渐近估计效率和精确度。将该pAUC半参数估计方法应用于乳腺癌诊断模型的筛选,得到了一个预测精度更高的新乳腺癌诊断模型,结果表明该方法在实际应用中能提高医学诊断测试的精度。

关键词: pAUC;半参数估计;密度比模型;渐近正态性;医学诊断

中图分类号: O212.1

文献标志码: A

文章编号: 1673-3851 (2024)11-0867-09

引文格式:余昊,赵超群,杨建萍. 基于密度比模型的pAUC半参数估计方法及其应用[J]. 浙江理工大学学报(自然科学),2024,51(6):867-875.

Reference Format:" YU" Hao,ZHAO Chaoqun,YANG Jianping. A semi-parametric estimation method for pAUC based on the density ratio model and its application[J]. Journal of Zhejiang Sci-Tech University,2024,51(6):867-875.

A semi-parametric estimation method for pAUC based on the density ratio model and its application

YU Haoa, ZHAO Chaoquna, YANG Jianpingb

(a.School of Computer Science and Technology; b.School of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China)

Abstract:" In order to further improve the estimation accuracy of pAUC (partial area under curve) and the accuracy of medical diagnosis tests, a semi-parameter estimation method of pAUC based on density ratio model is proposed, and its properties are studied from both theoretical and simulation aspects. Firstly, according to the density ratio model, the semi-parametric maximum likelihood estimator of pAUC is obtained based on the semi-parametric maximum likelihood estimation method, and its statistical performance is analyzed by using the large sample theory. Then, the performance of the pAUC semi-parametric estimation method in practical application is simulated and compared with the existing non-parametric estimation method in term of accuracy. It is found that not only the semi-parametric pAUC estimator has important statistical properties such as consistent asymptotic normality, but also it has higher asymptotic estimation efficiency and accuracy than the existing nonparametric pAUC estimator. The semi-parameter estimation method for pAUC is applied to the screening of breast cancer diagnosis models, and a new breast cancer diagnosis model with higher prediction accuracy is obtained. The result shows that the proposed method can improve the accuracy of medical diagnosis tests in practical applications.

Key words: pAUC; semi-parametric estimatior; density ratio model; asymptotic normality; medical diagnosis

0 引 言

ROC曲线(Receiver operating characteristic curve)是在测试数据集下,根据不同阈值计算所得的结果,以假阳率(False positive ratio, FPR)为横坐标、真阳率(True positive ratio, TPR)为纵坐标构成的图形[1]。令X和Y分别表示患病总体和未患病总体,其分布函数分别为F(x)和G(x),对应的ROC曲线为{(p,R(p)),p∈(0,1)},其中R(p)=G(F-1(p))。研究人员常计算ROC曲线下的全部面积AUC(Area under curve)来评估诊断测试的准确性[2-4]。在某些疾病的医疗诊断中,FPR须保持在较低水平,此时医生只需关注这部分ROC曲线下的面积,这部分ROC曲线下的面积称为pAUC[5]。若FPR取值在区间[p0,p1]内,对应的pAUC定义为:





1 pAUC的半参数估计及置信区间

1.1 F(x)和G(x)的半参数估计










3 应用分析



本文应用的数据集是Mangasarian等[20]创建的乳腺癌诊断(Wisconsin prognostic breast cancer, WPBC)数据集。该数据集中共有47个患病样本和151个未患病样本,包含的生物指标有32个,如Radius_mean、Area_se等。为便于结果记录和文字说明,本文使用V1,…,V32来表示这32个生物指标物。



表5—表6表明,基于B和向前逐步筛选方法构造的乳腺癌模型(模型1)与基于AUC和向前逐步筛选方法构造的乳腺癌模型(模型2)中的生物指标物不完全相同。模型1中有Radius_mean(V1)、Perimeter_mean(V3)和Area_se(V15) 3个生物指标物,模型2中有Fractal_dimension_worst(V31)、Perimeter_worst(V24)、Area_worst(V25)和Area_se(V15) 4个生物指标物。此外,模型1较模型2有更低的模型偏差值。模型偏差值也可评估模型的准确性,值越低说明模型准确性越高。因此,在实际应用中,基于B可筛选出准确率较高的诊断模型,即本文提出的pAUC半参数估计方法有助于筛选高精度的医学诊断模型。

4 结 论




