Samy A Azer
Abstract
Key words: Deep learning; Convolutional neural network; Hepatocellular carcinoma;Liver masses; Liver cancer; Medical imaging; Classification; Segmentation; Artificial intelligence; Computer-aided diagnosis
Significant progress has been made in image recognition primarily due to the recent revival of deep learning,particularly the convolutional neural network (CNN),a class of artificial neural networks that have been widely used in biomedical and clinical research[1].For example,the potential use of CNNs has been shown in the detection of gastrointestinal bleeding in wireless capsule endoscopy images using handcrafted and CNN features[2],diagnosis ofHelicobacter pyloriinfection based on endoscopy images[3,4],and detection of gastrointestinal polyps using endoscopy images[5,6].There is also a surge of interest in the potential of CNNs in radiology research[1,7]and in cellular and histopathological examinations[8].Several studies have shown the ability of the CNN algorithms in (1) Lesion detection,a prevalent task for endoscopists,radiologists,and pathologists to detect abnormalities with medical images.These include the detection of colonic polyps,the detection of lesions on radiological images,and the detection of histopathological malignant changes on biopsy images[1,5-9].CNN algorithms are also useful for (2) Classification,the CNNs utilise target lesions depicted in medical images,and these lesions are classified into classes.One of these is classifying lesions into particular categories (lesions or normal;malignant or benign).Other examples may include classification of precancerous gastric disease using the CNNs[10]or classification of skin cancer[11].Therefore,the task is to determine “optimal” boundaries for separating classes in the multi-dimensional feature space that is formed by input features.At least three significant techniques have been described that use CNNs for medical image classifications,including training the CNNs from scratch[12],using “off-the-shelf CNN” features as complementary information channels to existing hand-crafted image features[13],and performing unsupervised pre-training on natural or medical images and fine-tuning using deep learning models[14].(3) Segmentation of organs or anatomical structures is a functional image processing technique for the analysis of medical images such as quantitative evaluation of clinical parameters and computer-aided diagnosis system[15]; and (4) Image reconstruction,which may include obtaining a noiseless computed tomography (CT) image reconstructed from a subsampled sonogram[16].With the above information in mind,this study aims at reviewing and identifying the applications and uses of CNNs in the interpretation of liver cancers,including hepatocellular carcinoma (HCC),liver metastasis (secondaries),and other liver masses.
Primary liver cancer,mainly HCC,is the fifth most common cancer in men and the seventh most common in women and is the third leading cause of cancer-related death worldwide[17].In general,the disease is less common in females,and in most areas in the world,the male to female liver cancer rates are two- to three-fold higher,possibly due to the higher prevalence of risk factors in males and differences in sex steroid hormones,and perhaps epigenetic factors[18].Studies showed that there is an increasing rate of HCC worldwide,which may be related partly to hepatitis B virus and hepatitis C virus infections,obesity,diabetes,metabolic syndrome,and nonalcoholic fatty infiltration of the liver[19].
However,the burden of HCC varies depending on geographical location.For example,in the Asia-Pacific region,it is a significant public health problem[20].Because the liver is a common site of metastasis from cancers of other organs,mainly colorectal,gastric,pancreatic,breast,and lung cancers,secondaries to the liver add to the burden of liver cancer[21].
Currently and as per the guidelines of the American and European liver societies and World Gastroenterology Organisations,ultrasound is widely used in surveillance of HCCs,and CT and magnetic resonance images (MRI) are indicated to characterise a focal lesion suspected in the liver[22-24].The diagnosis of HCC relies on either MR images or contrast-enhanced CT-scan,which enable the identification of up to 65% of small cell nodules < 2 mm in size[22-25].However,the detection of small nodules is dependent on vascular dynamic enhancement pattern throughout the different phases of the study[26].Also,there is inter-operator variability induced by visual qualitative assessment[27].Therefore,the use of computer-aided diagnosis framework may enable us to resolve these limitations and enhance the diagnosing outcomes of these radiological modalities.
Based on these findings,this study aims to evaluate the use of CNN in HCC,liver metastasis (secondaries),or images of other liver masses.The rationales for the study were to assess the current status of convolutional neural networks and their applications in liver oncology images,identify gaps and deficiencies in ongoing research in this new field,particularly in relation to liver oncology images,and discuss future directions and research priorities that may maximise the applications of research in hepatic oncology.Therefore,our research questions are: (1) What is the current status of research output in the use of CNN in assessing HCC,liver metastases (secondaries),or images of other liver masses? and (2) What is the accuracy of CNN,deep learning systems,in lesion detection,classification,or segmentation of these images?
This manuscript was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines[28].
The databases PubMed,EMBASE,and the Web of Science were searched for studies on CNNs in liver cancer and liver masses images.Also,research books that published full papers from conferences and scientific meetings were searched.The search covered studies up to January 2019.Only studies in the English language and conducted on humans were included.Studies on animals or animal models were not included.We searched for articles with contributions of the subject headings and the following key words: “Cancer”,“Liver”,“Hepatocellular carcinoma”,“HCC”,“Liver mass”,“Metastasis”,“Hepatic”,“Secondaries in liver”,“Radiology”,“Pathology”,“Histology”,“Histopathology”,“Malignancy”,“Primary liver cancer”,“Ultrasound”,“Computed tomography”,and “Magnetic resonance images”.To maximize the yield of the search,another search was performed manually by searching the list of references of the primary articles and reviews to identify studies not found by the database search[29].
We also searched the journals listed by the Journal Citation Reports-2017 of the Web of Science under the categories Gastroenterology and Hepatology (n= 32 journals),Oncology (n= 41 Journals),Radiology (n= 6 journals),Pathology (n= 14 journals),Computer Sciences and Engineering (n= 18 journals),and Medical Informatics (n= 7).
To identify targeted studies,we created a PICOS framework (Population,Intervention,Comparison,Outcome,Studies) for the inclusion and exclusion.Table 1 summarises the PICOS framework used.The following inclusion and exclusion criteria were used in selecting studies.Studies that reported data on the use of the CNNs in liver cancers images (HCC or liver metastasis/other liver masses) were included.Full research papers of conferences and scientific meetings were also considered if they fulfil the research purpose.The search was limited to studies in the English language and conducted on humans.Studies on animals or animal models were not included.Reviews,editorials,commentaries,letters to the Editors,abstracts published in conference proceedings,were not included.
Two researchers (the author and a research assistant) independently reviewed the titles and abstracts of all citations identified by the literature search.Relevant studies were retrieved and reviewed in detail.Any disagreement was discussed by the two evaluators.The full texts of potentially relevant articles were sought,and the selection criteria were applied.Reviewers were not blinded to authors' names or institutions.Studies were selected if they match the selection criteria.
Data were extracted independently by the two researchers using a predefined extraction form.The following data were abstracted in the form: (1) First author's name; (2) Year of publication; (3) Objectives/research question; (4) Method used; (5)Liver cancer/liver masses investigated; (6) Main results; (7) Accuracy,sensitivity,and specificity of method used; and (8) Institute,university,city,and country where the study was conducted.Details on reported statistical associations and comparison of the results obtained with those obtained by using other methods were also evaluated.The agreement between evaluators measured by the degree of inter-rater agreement using Cohen kappa coefficient was also carried out using SPSS software (Armonk,NY,United States)[30].
Figure 1 is a flow diagram summarizing the search results and selection process of articles.One hundred and twenty-nine potentially relevant publications were identified through the search of the three databases and research books.After removal of duplicates,78 articles remained.Of these,42 were not relevant to the inclusion criteria.Thirty-six full-text articles were assessed for eligibility.Finally,we identified 11 articles that met our selection criteria and were consistent with the aims of the systematic review[31-41].
Table 2 summarises details of the 11 studies included[31-41].The studies demonstrated the ability of CNN models in analysed images of liver cancers as follows:Classification of liver masses into five categories: category A: Classic HCC; category B:Malignant liver tumour other than HCC; category C: intermediate masses (early HCC,dysplastic nodules,or benign liver masses; category D: Haemangiomas; category E:Cysts[31],detection of small metastasis in the liver[32],discrimination between primary liver cancer (HCC) and secondaries in the liver[33],differentiation between chronic liver diseases such as cirrhosis and the presence of HCC on top of cirrhosis[34],classification of grade of HCC nuclei and segmentation of HCC nuclei on pathology images[35,36],classification of liver lesions[31,33,34,37,41],and detection of liver tumour or liver masses and identification of their types and phases[38-40].While these studies examined liver CT images[31,32,37-40],ultrasound images[34],and 3D multi-parameter MRI scan images[33,41],other images such as cellular and histopathological images were also included[35,36].
Geographically,these studies were performed in Japan (2),China (3),the United States (2),India (1),Greece (1),and Israel (4).Some papers were written by authors from two countries.Top universities,hospitals,and research institutes that led such research were: Department of Radiology,the University of Tokyo Hospital,Tokyo,Japan; Faculty of Engineering,Department of Biomedical Engineering Medical Image Processing Laboratory,University of Tel Aviv,Israel; Department of Informatics Engineering Technology Education Institute of Crete,Greece; Thapar Institute of Engineering and Technology,Patiala,India; Manipal Hospital,Bangalore,India;Software College,Northeastern University,Shenyang,China; Electric and Computer Engineering Stevens Institute of Technology,NJ,United States; Department of Diagnostic Images,the Chaim Sheba Medical Centre; School of Computer Science and Engineering,The Hebrew University of Jerusalem,Jerusalem,Israel; Information Science and Engineering,Ritsumeikan University,Shiga,Japan; Medical School,Zhejiang University,Hangzhou,China; Department of Biomedical Engineering,Yale University,CT,United States; and Department of Electrical Engineering,Yale University,CT,United States.
Table 1 PICOS framework to identify studies for inclusion
Description of reported methods:The description of the method used in these studies and the content described provided significant detail about the dataset,the generation of the CNN algorithm,the architecture used,the experiments carried out to assess system performance,system evaluation,and accuracy of automatic liver segmentation or classification[14,31-41].However,in most studies identified no details were given regarding clinical information,such as number of patients included,sources and number of images used,and clinical procedures carried out.This imbalance in the methods described may be related to the background of the authors of these studies and the journals that published these studies.The 11 studies were published by 58 authors; of these,five were from radiology departments,one was from the pathology department,and two were possibly with medical background.The remaining 50 authors were non-medical and were from engineering,computer science,and medical image processing laboratory.Except forRadiology[31],andUltrasonic Imaging[34],the majority of these articles were published in journals specialised in computer science and Biomedical informatics,such asNeurocomputing[32,36,37],IEEE Journal of Biomedical Health Informatics[33],Computers in Biology and Medicine[35],Medical and Biological Engineering and Computing[38],andInternational Journal of Computer Assisted Radiology and Surgery[39].These two factors may explain the dominance of technical information regarding computer science information in the study methods.
Datasets and number of patients:The number of patients and images used in the studies varied significantly.Ben-Cohenet al[32]reported that their study involved 20 patients with 68 lesions in total and that they were tested on a testing set that included CT examinations for 14 patients with a total of 55 lesions[32].Another study that used CT imaging was Frid-Adaret al[37],who reported a limited dataset of 182 liver lesions(53 cysts,64 metastases,and 65 haemangiomas).The authors highlighted difficulties in obtaining datasets,and although there are medical datasets available online,they felt that they are still limited and only applicable to specific medical problems[37].Also,the study by Todorokiet al[40]used 3D multi-phase contrast-enhanced liver CT images from 75 patients.The cases comprised five different lesions,namely-cysts,focal nodular hyperplasia,HCC,haemangioma,and metastases[40].The study by Yasakaet al[31]is the fourth study that used contrast agent-enhanced CT images of the liver.The study was a retrospective study and used images of liver masses over three phases(non-contrast-agent enhanced,arterial,and delayed).The masses were grouped under five categories,as stated earlier[31].On the other hand,the study by Trivizakiset al[33]examined upper abdominal MRI scans of 134 patients (37.7% primary liver mass and 62.3% metastatic lesion).The study by Zhanget al[41]also used MRI scan images of 20 patients,with HCC generating 1700 non-overlapping patches.Only one study used liver ultrasound images acquired from 94 patients.The images datasets comprised 48 normal liver,50 chronic liver,50 cirrhosis,and 41 HCC evolved over cirrhosis.Other datasets comprised experimental data involving 127 liver pathology images[35,36].
Methods and study design:Methods using CNN models to study HCC and liver masses (secondaries or metastases or other liver masses) in these 11 studies varied significantly.While the study by Ben-Cohnet al[32]used a global context with a fully convoluted network and a local patch level analysis with superpixel sparse based classification,the study by Trivizakiset al[33]proposed a CNN model comprising four consecutive strided 3D convolutional layers with 3 × 3 × 3 kernel size and ReLU as activation functions.The design has a fully connected layer with 2048 neurons and a soft max layer for binary classification.For training and validation,a dataset of 130 Diffusion Weighted MR images scans was used[33].The method used by Bhartiet al[34]aimed at examining ultrasound images through visual interpretation of liver images.
Figure 1 A PRISMA flowchart showing articles searched on use of convolutional neural networks in gastrointestinal and liver cancers images.
The accuracy measurements used in these studies varied and can be summarised as follows: (1) Assessing system performance and system evaluation[32,34]; (2) Assessing accuracy of automatic liver segmentation and lesion detection[32,34]; (3) Generating operating characteristics curves and precision recall[33]; (4) Comparing the method used outcomes with the results obtained from other CNNs[35,37,41]; (5) Measuring sensitivity,specificity,and accuracy parameters[37,38]; (6) Comparing accuracy of the deep CNN method with those measured by Bayesian model and the benchmark method[40,41]; (7) Comparing the outcomes with visual inspection by expert radiologists(precision and recall rates)[37]; and (8) Comparing the performance of the CNN-based system to non-network state-of-the-art methods for liver lesion classification[37].Some studies did not measure sensitivity or specificity[34,40,41].
The studies demonstrated a medium accuracy of differential diagnosis of liver masses of 0.84 for test data and the under receiver operating characteristic curve was 0.92[31].Ben-Cohenet al[32]reported satisfactory results and demonstrated that on using 3-fold cross validation,experiments resulted in a true positive rate of 94.6% with 2.9 false positive result per case.The classification performance of the work conducted by Trivizakiset al[33]was 83% for 3Dvs69.6% and 65.2% for 2D,which indicates significance of tissue classification accuracy compared to two 2D CNNs.Another classification CNN system of four liver stages by Bhartiet al[34]had an accuracy of 96.6%.Other studies showed that the proposed method had superior performance when compared with related works by other techniques[35,36,40,41].Liet al[35]showed that external validation of the proposed method multiple fully connected CNN with extreme learning machine model they created by using Hep-2 cells indicates that their method can be generalised in grading HCC nuclei.The work of Vivantiet al[39]showedan overlap error of 17% (SD = 11.2) and surface distance of 2.1 mm (SD = 1.8),which was far better than stand-alone segmentation.The proposed system did not require large annotated training datasets,which is an advantage compared to other systems[38].Another study by Vivantiet al[39]showed that the experiments resulted in a yield of true positive new tumours detection rate of 86%versus72% with stand-alone detection.The tumour burden volume overlap error was 16%,which means that the follow-up CT scans enable not only the detection of new tumours but also the estimation of tumour burden volumetric; both are required in the diagnosis and management of liver tumours[39].
Table 2 Studies on convolutional neural networks and liver masses and liver cancers
CNNs: Convolutional neural networks; CT: Computed tomography.
The inter-rater agreement between evaluators had overall κ scores in the range of 0.779-0.894.
During the last 5-6 years a significant shift in research trends in image processing has taken place,and the use of artificial intelligence in the field has moved from hand crafted algorithms to deep learning architectures[14].In this study,our aims were to assess the use of the CNNs in deep learning of liver cancer (HCC and secondaries in the liver or liver masses) images.In this review,11 studies were identified.Because of the heterogeneous data,gaps in the reported results,and the variability in the design of methods,it was not possible to conduct a meta-analysis.Nevertheless,certain parameters were evident from these studies.First,deep learning architectures and particularly the CNNs have been usefully implemented into medical imaging domain.The studies have shown the accuracy of the CNNs in analysing and offering the diagnosis (segmentation,and classification) of radiology images (CT scans,ultrasound,and MRI scans) and histopathology and cellular images of liver cancers.Although the number of full original articles on each cancer was small,the 11 studies were published in the years 2017 and 2018; reflecting the fact that deep learning as outlined in the CNNs and their applications in medical sciences is a recently developed discipline[42].Second,although the technical information and the description of the test data preparation are important,the training and evaluation algorithm development are vital,and information about the sources of images,the patients involved in the study,and the clinical information collected is equally important.In this review,it was noted that the methods used varied and were not standardised.While these differences might be related to differences in the focus of the journals that published these studies and the background of the authors (medicalvsengineering),there is a need for more studies that equally represent authors from medical,computer science,and engineering and address both aspects in the methods and results to maximise the research readability and applications of publications across disciplines involved.Third,it would be of interest to assess the performance of different research articles that were validated on the same datasets; however,most studies did not provide enough information about the exact sources of their datasets and it was not possible to trace which studies used the same testing protocol[43].Such information is vital for comparison and assessment.We hope that journals interested in publishing such studies on deep learning,the CNNs,and artificial intelligence develop standardized guidelines that require authors to state such information,including details about the sources of datasets and protocols used in testing.
This study,however,is not free from limitations.Considering this diversity and lesions included,we must interpret the findings with caution.There could be publication bias that precluded the publication of negative studies.Studies included were only those published in the English language,and there could be good articles related to this topic that were published in other languages.However,this study presents what has been accomplished in this area in the English language literature.Studies varied in terms of patient type,study design,design of the CNN models,pathology of diseases included,and details of methods used[44].
This study highlights a number of future directions for research that uses CNNs to interpret liver cancer images.These can be summarised as follows: First,while the improvement in the design of the CNNs has required relatively small training sets of images,as it is the case with the models presented by Vivantiet al[38,39],there is a need for multi-institute and multicentre collaborations in studies including a large number of patients with cirrhosis due to different pathological causes and patients with HCC on top of cirrhosis,liver secondaries,or liver masses.Such collaborations could resolve concerns about the insufficient amount of training data in the medical image domain and enable measurement of accuracy and performance of the CNNs algorithms.Several researchers reported difficulties in obtaining images,which make the direct application of machine learning algorithms inappropriate for medical datasets and hence affect the capacity to conduct image classification or image segmentation with high accuracy[42,44].Second,differentiation between primary and secondary (metastases in the liver) or other liver masses is difficult on the basis of radiological imaging.The studies included in this review showed that CT-based deep learning methods can enable the categorisation of liver metastases[32,33].Although such differentiation between primary and secondary liver tumours may be clinically useful,other priorities that we need to explore further are: (1) Examination of HCC on top of liver cirrhosis to observe the characteristics of HCC on cirrhosis and noncirrhotic liver tissue[34]; and (2) Conducting longitudinal liver CT scan studies and comparing outcomes with existing stand-alone and follow-up methods.We hypothesise that longitudinal studies could enable researchers to compare changes with the baseline scan and thus could offer a better detection of new small tumours[38,39].Third,we need case control studies where the use of CNN could be compared with manual assessment of images by experts,radiologists,hepatologists,and pathologists.One of the major challenges we face with the use of CNN is the difficulty in choosing discriminant features to represent the clinical characteristics and using them as the key features in the CNN algorithm in segmentation and in classification functions.Again,this goal cannot be achieved without the collaboration of medical experts,radiologists,pathologists,computer engineering programmers,and experts designing these systems.Fourth,future studies should give more attention to the assessment of accuracy and sensitivity of the CNNs in evaluating the performance of systems and calculating the positive predictive values.Ideally,a study should use two or three different methods and compare accuracy parameters for the same set of images using these different methods.Currently,we are lacking such studies in the literature,and so any comparison of accuracy is not optimum because of several variables interfering with the methods/models reported[45].
With increasing research applying CNNs in liver cancer images there is a demand to evaluate carefully their accuracy and define future research directions.While current studies have covered major liver cancers,the number of studies conducted so far is small and limited,and more research is needed to answer questions about the accuracy and sensitivity of the CNN algorithms.The CNNs have demonstrated abilities in segmentation,classification,and lesion detection of radiological and anatomical pathology images of common cancers.However,several deficiencies in current studies were observed.In most studies there was no balance in the content of methods among the description of patients involved,the medical component,and the technical computer related component.Furthermore,comparing the use of the CNNs with other models is needed particularly in regard to accuracy and sensitivity of each model on the same set of images.Therefore,future studies that focus on these areas should be multi-institute and the outcome of multicentre collaborations should include with a large number of patients.This is particularly important in view of the growing demand of CNNs in liver oncology.
This study highlighted several aspects related to convolutional neural networks (CNNs).First,CNNs have potential use in identifying HCC and differentiating HCC from other liver masses with high accuracy.Second,CNNs can offer several functions concerning liver cancer,including lesion detection,classification,and segmentation.Third,the use of CNN in liver cancer is not limited to radiological images,but it is of value in pathological and cellular studies.However,the study identified several limitations in the literature in this area,mainly the smaller number of studies on the topic and the lack of studies from multi-centres as well as the lack of longitudinal liver computed tomography (CT) scan studies that can enable comparing outcomes with existing stand-alone and follow-up methods.These longitudinal studies could allow researchers to compare changes with the baseline scan and thus could offer better detection of new small liver tumours.
Recently,an increasing interest in the use of deep learning has emerged in research,particularly CNNs,a class of artificial intelligence that has been widely used in biomedical research.This study reviews the current literature on the use of CNNs in assessing hepatocellular carcinoma(HCC) and liver masses and how such advanced technology can help improve clinical diagnosis.
While the study focuses on an evolving field in gastroenterology and oncology with promising outcomes,several researchers reported difficulties in obtaining images,which make the direct application of machine learning algorithms inappropriate for medical datasets and hence affect the capacity to conduct image classification with high accuracy.Therefore,improvement in the design of CNNs and multi-institute and multi-centre collaborations with a large number of patients with cirrhosis due to different pathological causes and patients with HCC on top of cirrhosis or liver secondaries is needed.
The study aimed at assessing the use of CNNs in examining HCC and liver masses images in the diagnosis of cancer and evaluating the accuracy level of the CNNs and their performance.
Several databases,including PubMed,EMBASE,and Web of Science,were systematically searched for studies that covered pathological anatomy,cellular,and radiological images on HCC or liver masses using the CNNs.The data were extracted as per a predefined extraction protocol,and the accuracy level and performance of the CNNs in detecting cancer or early stages of cancer were analysed.The primary outcomes of the study were investigating the type of cancer or liver mass and identifying the type of images that showed optimum accuracy in cancer detection.
A small number of studies were identified.The studies demonstrated the ability to differentiate liver masses,differentiate HCC from other liver lesions,and differentiate HCC from cirrhosis or development of new tumours.Two studies focused on HCC nuclei grading or segmentation.In these studies,the CNNs showed satisfactory levels of accuracy.The studies aimed at detecting lesions,classification,and segmentation.Several methods were used to assess the accuracy of CNN models used.
While the current studies have covered liver cancers,the number of studies conducted so far is small and limited,and more research is needed to answer questions about the accuracy and sensitivity of the CNN algorithms.The CNNs demonstrated abilities in segmentation,classification,and lesion detection in radiological and anatomical pathology images of common cancers.However,several deficiencies in current studies were observed.
A large multi-centre trial is needed to evaluate carefully the use of CNNs and their clinical applications in HCC and liver masses.Differentiation between primary and secondary(metastases in the liver) or other liver masses is hard based on radiological imaging.The studies included in this review showed that CT-based deep learning methods could enable the categorisation of liver metastases from primary liver cancers.Future studies should give more attention to the assessment of accuracy and sensitivity of the CNNs in evaluating the performance of systems and calculating the positive predictive values.
ACKNOWLEDGEMENTS
The author would like to thank Dr Sarah Azer of St Vincent Hospital,University of Melbourne,for her help during writing this research article.
World Journal of Gastrointestinal Oncology2019年12期