周刊 1997年1月创刊(总第261期) 第11卷 第5期 2007年2月4日出版


数据挖掘技术在计算机辅助肺癌诊断中的应用*☆

陈 卉1,王晓华2


Application of data mining in computer aided diagnosis of lung cancer *☆

Abstract

AIM:To analyze several classification methods in data mining and compare their diagnostic performance when used in computer-aided diagnosis system.

METHODS:Two hundred cases of solitary pulmonary nodules confirmed by biopsy pathology with surgery operation or puncturation in Beijing Friendship Hospital and Beijing Institute of Tuberculosis and Thoracic Tumor between June 1998 and December 2004 were collected including 135 peripheral lung cancers and 65 benign nodules. Two clinical features (age and having blood streak in phlegm or not) and 5 thin-slice CT signs of each nodule were determined and quantified. 200 valid samples were randomly divided into training samples and examination samples at the radio of 7:3. Diagnostic classificators were established through Fisher linear discriminated function, Logistic regression function, decision tree and neural network model,and validated by examination samples. Index such as sensitivity and specialty were used to evaluate the accuracy of the classificators; and area under ROC curve were adopted to compare the diagnostic performance of these classificators.

RESULT: ①In the diagnosis of 60 cases, sensitivities of the four classificators were 84.6%, 87.2%, 87.2% and 87.2%, specialties of them were 85.7%, 81.0%, 76.2% and 81.0%, respectively. ②Areas under ROC curve by four classificators were 0.918, 0.918, 0.939 and 0.942, no significant difference was found in the comparison between any two of them (P =0.898 2, 0.157 6, 0.349 5, 0.285 7, 0.431 9 and 0.986 8).

CONCLUSION: In terms of classified accuracy, understandability and helpfulness to clinical diagnosis, Logistic regression and BP neural network have higher diagnostic accuracy; discriminated analysis, Logistic regression and decision tree have higher understandabilities; BP neural network does better in actual diagnostic decision. All these methods can be applied in computer-aided diagnosis system.

Chen H, Wang XH.Application of data mining in computer-aided diagnosis of lung cancer.Zhongguo Zuzhi Gongcheng Yanjiu yu Linchuang Kangfu 2007;11(5):879-881,885(China) [www.zglckf.com/zglckf/ejournal/upfiles/07-5/5k-879(ps)pdf]


1School of Biomedical Engineering, Capital Medical University, Beijing 100069, China; 2Department of Radiation, Beijing Friendship Hospital, Capital Medical University, Beijing 100053, China

Chen Hui☆, Studying for doctorate, Associate professor, School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
chen9364@163.com

Supported by: the Basic Clinical Cooperative Program of Capital Medical University, No. 2003JL03*

Received: 2006-04-26
Accepted: 2006-09-10

摘要
目的
:对数据挖掘中解决分类问题的常用方法进行分析,比较它们应用于计算机辅助诊断系统时的性能。
方法:收集1998-06/2004-12在北京友谊医院和北京结核病院胸部肿瘤研究所经手术或穿刺活检病理证实的孤立肺节结200例(恶性135例,良性65例),观察2项临床指标(年龄及是否有痰中带血丝)和5项薄层CT指标,并按7∶3的比例将样本随机数字法分配到训练集和测试集中。分别用Fisher线性判别分析、Logistic回归分析、决策树和神经网络方法构建诊断分类器,并用测试样本验证各个分类器。利用诊断的敏感度、特异度评价分类器的准确性,用ROC曲线及曲线下面积比较各个分类器总体诊断性能。
结果:①对60例样本进行诊断测试,4种方法的敏感度分别为84.6%,87.2%,87.2%和87.2%,特异度分别为85.7%,81.0%,76.2%和81.0%。②4种方法诊断的ROC曲线下面积分别为0.918,0.918,0.939和0.942,任何两种方法比较,曲线下面积的差异均无统计学意义(P值分别为0.898 2,0.157 6,0.349 5,0.285 7,0.431 9和0.986 8)。
结论:从分类算法的分类准确性、分类器的可理解性以及对诊断的指导意义三方面进行比较,Logisitc回归和神经网络方法具有较高的诊断分类准确性,判别分析、Logistic回归分析和决策树方法具有较好的模型可理解性,基于BP算法的神经网络对实际诊断具有较好的指导作用。它们都可用于计算机辅助诊断系统中。
关键词:诊断,计算机辅助;肺肿瘤;分类法;决策树;神经网络(计算机);回归分析;判别分析

陈卉,王晓华.数据挖掘技术在计算机辅助肺癌诊断中的应用[J].中国组织工程研究与临床康复,2007,11(5):879-881,885
[www.zglckf.com/zglckf/ejournal/upfiles/07-5/5k-879(ps)pdf]

1首都医科大学生物医学工程学院,北京市 100069;2首都医科大学附属北京友谊医院放射科,北京市 100053

陈 卉☆,女,1968年生,北京市人,汉族,首都医科大学在读博士,副教授,主要从事生物医学工程方面研究。
chen9364@163.com

首都医科大学基础临床合作项目(2003JL03)*

中图分类号:R318.04 文献标识码:B
文章编号:1673-8225
(2007)05-00879-03

收稿日期:2006-04-26
修回日期:2006-09-10
(06-50-4-3716/S·LL)

课题背景:计算机辅助诊断就是计算机定量地分析相关影像学资料,医生在作出最终判断时参考其输出结果,从而使医生的诊断结果更客观更准确。研究表明计算机辅助诊断对于提高诊断准确率、减少漏诊起到积极的作用。它将量化的影像学及有关的临床指标输入分类器中形成诊断系统,对病变进行分类处理,进而区分各种病变,也即实现疾病的诊断。

应用要点:在众多数据挖掘的技术中,用于分类的技术主要包括归纳学习方法如决策树、统计方法如回归分析和判别分析以及神经网络方法。本文选择有代表性的数据挖掘算法,用于肺结节良恶性的诊断,并评价和比较各种方法对计算机辅助诊断系统性能的影响,为今后研究开发计算机辅助诊断系统提供参考。

同行评价:文章借助计算机系统,通过采用Fisher线性判别分析、Logistic回归分析、决策树和神经网络等方法进行肺癌的辅助诊断,充分利用了各种临床数据和现有的数据挖掘技术,有助于提高肺癌诊断的准确性,降低误诊率和漏诊率,可以更好地帮助医生进行最终诊断。这是计算机与生物信息学技术在肿瘤诊断应用方面的有益尝试。该研究内容比较新颖。文章论据比较充足、结构严谨,文理通顺。

 

Advertisement

《中国组织工程研究与临床康复》杂志社
地址:沈阳1200邮政信箱 邮编:110004 传真:+86 24 23394178