Thoms Whish-Wilson *, Jo-Lynn Tn , Willim Cross ,Lih-Ming Wong Tom Sutherlnd ,d
a Department of Surgery, St Vincent’s Hospital Melbourne, 41 Victoria Pde, Fitzroy VIC, Australia
b Department of Surgery, The University of Melbourne, Melbourne VIC, Australia
c Faculty of Medicine, The University of Melbourne, Melbourne VIC, Australia
d Medical Imaging Department, St Vincent’s Hospital Melbourne, 41 Victoria Pde,Fitzroy VIC, Australia
KEYWORDS Prostate cancer;Magnetic resonance imaging;Prostate Imaging-Reporting and Data System;Intrareader;Prostate biopsy
Abstract Objective: To measure the intraobserver concordance of an experienced genitourinary radiologist reporting of multiparametric magnetic resonance imaging of the prostate(mpMRIp) scans over time.Methods: An experienced genitourinary radiologist re-reported his original 100 consecutive mpMRIp scans using Prostate Imaging-Reporting and Data System version 2 (PI-RADS v2) after 5 years of further experience comprising >1000 scans.Intraobserver agreement was measured using Cohen’s kappa.Sensitivity, specificity, negative predictive value (NPV),positive predictive value (PPV), and accuracy were calculated, and comparison of sensitivity was performed using McNemar’s test.Results: Ninety-six mpMRIp scans were included in our final analysis.Of the 96 patients,53 (55.2%) patients underwent subsequent biopsy (n=43) or prostatectomy (n=15), with 73 lesions targeted.Moderate agreement (Cohen’s kappa 0.55) was seen in the number of lesions identified at initial reporting and on re-reading (81 vs.39 total lesions; and 71 vs.37 number of PI-RADS ≥3 lesions).For clinically significant prostate cancer, re-reading demonstrated an increase in specificity(from 43%to 89%)and PPV(from 62%to 87%),but a decrease in sensitivity (from 94% to 72%, p=0.01) and NPV (from 89% to 77%).Conclusion: The intraobserver agreement for a novice to experienced radiologist reporting mpMRIp using PI-RADS v2 is moderate.Reduced sensitivity is off-set by improved specificity and PPV, which validate mpMRIp as a gold standard for prebiopsy screening.
Multiparametric magnetic resonance imaging of the prostate (mpMRIp) has altered practice around diagnosis and management of prostate cancer (PCa).Multiple studies have demonstrated that mpMRIp performed prior to biopsy has greater sensitivity compared to non-targeted systematic transrectal ultrasound (TRUS) prostate biopsy, high negative predictive value (NPV) (PROMIS study), and improved detection of clinically significant PCa (csPCa)with less insignificant cancer found (PRECISION study)[1-4].In the PROMIS study, mpMRIp targeted biopsies had superior sensitivity in detecting PCa than systematic TRUS biopsies.PROMIS showed that when using mpMRIp as a screening tool, 27% of patients would avoid biopsy and there would be a reduction in 5% of clinically insignificant cancer diagnoses [2].
There has been a universal acceptance and incorporation of mpMRIp into PCa diagnostic algorithms [5,6].To maintain quality, the increased demand and volume of scans must be balanced against the highly specialised nature of mpMRIp reporting, otherwise reports become useless for clinicians.
Criticism of Prostate Imaging-Reporting and Data System version 2 (PI-RADS v2) reporting, which improves on PI-RADS v1 with the introduction of dominant sequence and improved stratification of PI-RADS Grade 3 lesions, is that there is wide variation in performance outside of highly specialised centres.Comparison between specialised centres and lower volume centres reports a wide range of accuracies (44%-87%) and NPV (63%-98%) [1,3,7].mpMRIp reporting specificity has been shown to be related to radiologist experience, declining from 84.0% among experienced radiologists to 55.2%amongst radiologists with lower levels of experience [8].In addition to radiologist experience, the wide variation seen in reporting may be partly attributed to the subjective nature of interpretation of apparent diffusion coefficient signal intensities, and dynamic contrast enhanced sequences [9].Unsurprisingly,interobserver agreement improves with increased radiologist experience, as seen at large specialised tertiary hospitals [10], but reduces with statistical significance (kappa score 0.41) in lower volume centres [11].
At our tertiary hospital in Melbourne, Australia, mpMRIp was introduced in 2013.Since then our radiologists have reported over 1000 scans.We aimed to study the early performance of an Australian tertiary centre by quantifying the initial performance of a very experienced genitourinary radiologist.Our study hypothesis was that there would be a poor to moderate concordance, and higher accuracy,sensitivity, specificity, positive predictive value (PPV), and NPV.Despite the high-volume use of mpMRIp in PCa diagnosis and management in Australia,this is to our knowledge the first Australian study examining these outcomes.
Electronic clinical patient records were cross-referenced with an internally maintained database of prostate MRIs in accordance with local institution policy.An experienced genitourinary radiologist was blinded to the first 100 consecutive scans that they had ever reported from August 29, 2013 to October 20, 2015.With the analyzed scans and biopsies at the healthcare institution, urologists performed the biopsies without radiologists’ involvement.The radiologist improved his learning and skill acquisition by feedback from pathology reports and ongoing professional development-there was no formalised training pathway other than real world experience.The original scans were re-reported by the radiologist after 5 years of further experience.All scans were obtained using a Skyra MRI scanner (Siemens Healthcare, Erlangen, Germany) using a 3 T magnet, external body receiver coil, with high b-values of >1400 to fulfil the guidelines of the European Society of Urogenital Radiology[12].Scans were performed using 10 mg hyoscine butylbromide (IQVIA Inc.Durham, NC, USA) intravenous administration,and a single 7.5 mL dose of Galdovist contrast medium(Bayer AG,Leverkusen,Germany)followed by 20 mL of normal saline.Four scans were considered nondiagnostic due to artefacts and therefore were excluded,leaving 96 eligible studies.Lesions identified in the original reports were retrospectively converted from PI-RADS v1 to PI-RADS v2 by a single author to enable comparison.The original radiologist was blinded to original reports and histology results before re-reporting the 96 scans giving an average washout period of median of 5 years (interquartile range: 4-6 years).The project was given approval(LRR 040/15) by the Low Risk Research Sub-committee of Human Research Ethics Committee (St Vincent’s Hospital,Melbourne, Victoria, Australia).
Variables were collected including age, indication,prostate-specific antigen (PSA), radiological result (prostate volume, number of lesions, overall PI-RADS score of lesions, extraprostatic extension, and seminal vesicle invasion), and histology (biopsy and/or prostatectomy).When available, prostatectomy histology was preferentially over biopsy.
Intraobserver agreement between the original and rereading scans was recorded and Cohen’s kappa was calculated.Agreement was recorded on per scan and per lesion basis.If both reads recorded no lesion, it was one agreement; if one read recorded one lesion but the other recorded two, it was recorded as 1/2.The Cohen’s kappa scores were calculated for all lesions of International Society of Urological Pathology (ISUP) Grade ≥1, ISUP ≥2 (of any core length),and for all MRI lesions.PI-RADS ≥3 lesions were considered to be a positive indication for biopsy.To analyse secondary outcomes of accuracy, sensitivity, and specificity,mpMRI lesions were compared to histology from either prostate TRUS-guided biopsy or radical prostatectomy when available.Histology results obtained greater than 1 year after MRI were excluded.
Sensitivity, specificity, PPV, NPV, and accuracy were calculated for both rounds.Statistical measures were calculated under two conditions of clinical significance:ISUP ≥1 and ISUP ≥2.These were done for all PCa and csPCa(ISUP ≥2 of any core length).McNemar’s test was used to compare the sensitivity of both reads.
Ninety-six mpMRIp studies were included in the total eligible sample.Of the 96 patients included, 53 (55.2%)underwent a cognitive fusion TRUS biopsy (n=43) or prostatectomy(n=15);this corresponded to 73 of the detected lesions targeted (Tables 1 and 2).This included two PI-RADS 2 lesion biopsies, 10 PI-RADS 3 lesion biopsies, 36 PI-RADS 4 lesion biopsies, and 25 PI-RADS 5 lesion biopsies.
Indications for the mpMRI were:clinical suspicion of PCa(n=55, 57.3%), ongoing active surveillance for known low grade PCa (ISUP 1) (n=27, 28.1%), and preoperativeplanning with 14(14.6%)that had a prior diagnosis of csPCa and were imaged for operative planning or further lesion detection after incidental diagnosis post transurethral resection of the prostate for bladder outlet obstruction.
Table 1 Patient demographics (n=96).
Table 2 Number of lesions detected on original versus reread.
In our study sample, not all biopsies proceeded to prostatectomy, and not all prostatectomies were proceeded by a biopsy (e.g., MRI following a transurethral resection of the prostate for staging purposes, or referred from external sources after an original biopsy).A total of 73 lesions were targeted.These comprised 71 targets that were PI-RADS ≥3,and two lesions that were PI-RADS 2.Both PI-RADS 2 lesions were still targeted with a biopsy as they were secondary lesions in a patient who had an index lesion for targeting.
Original reporting identified 81 lesions, with 71 lesions graded as PI-RADS ≥3.On the 2nd reading, only 39 lesions were interpreted as significant for reporting,and 37 lesions were PI-RADS ≥3.This demonstrated moderate agreement in reporting outcomes for all scans (Cohen’s kappa score 0.55).The interrater reliability for both ISUP ≥1 and ISUP ≥2 lesions showed moderate agreement(kappa score:0.49 vs.0.47).Seventy-three of the originally reported lesions (representing 53 patients) underwent subsequent biopsy or prostatectomy.
Sensitivity was higher for all PCa detection and csPCa for the initial read (PCa: 87% vs.55%; csPCa: 94% vs.72%)(Table 3),but came at the expense of specificity,which was significantly higher for csPCa with the re-reporting (89%vs.43%) (Table 3).
In this series assessing intraobserver agreement, moderate agreement (kappa score=0.55) was found.This was a series comparing the initial learning curve of 96 mpMRIs reported, to re-reporting by the same radiologist with a subsequent 5 years of experience and over 1000 more studies reported.Given the median 5-year interval between review of scans,the moderate agreement was better than expected but consistent with published agreements by Smith et al.[13] who reported moderate to substantial(kappa scores=0.43-0.67)intraobserver agreement.It was expected that when re-reading mpMRIp after such asignificant learning curve, there would be a level of agreement analogous to a novice versus expert comparison(kappa score=0.55),which was true[14].The sensitivity of the original read was significantly better for detection of all PCa(0.87 vs.0.55,p=0.01),but there was no difference in the sensitivity for detection of csPCa (0.94 vs.0.72,p=0.28) (Table 3).This may be due to the radiologist’s initial inexperience with the imaging modality that may have resulted in a tendency for more conservative reporting to avoid missing more serious cancers,thus leading to more lesions being reported and resulting in better sensitivity but poorer specificity.
Table 3 The accuracy in magnetic resonance imaging reporting.
When performing a sub-analysis of the csPCa false negatives reported on the second read, three were for active surveillance (one was a non-diagnostic scan due to motion artefact; another was an occult cancer; and the third is <0.5 mm); three were for preoperative staging, which called the blinding into question as the original reads were privy to clinical information(one was too soon after biopsy with a large amount of blood causing artefact;another was occult; and the last one was reader error); two were for clinical suspicion of PCa (one was an equivocal lesion, and the other was after a recent bout of prostatitis in a 39-yearold with a low pretest probability).Only one of these patients would have had a missed biopsy in a real world setting.This result confirms that PI-RADS v2 is useful as a tool for reporting but meaningful inferences can only be made when using clinical information.
Experience greatly improved the PPV of mpMRI for all PCa (82% vs.90%), and more importantly csPCa (62% vs.87%).This gives urologists a modicum of certainty when discussing the possibility of prostate biopsy with a patient.It also confirms performance outcomes equivalent to the PROMIS trial, which reported mpMRIp NPV for any Gleason score ≥3+4 of 76% [2].Greer et al.[8] conducted a multicentre (eight) and multinational (six) study assessing performance of PI-RADS v2 in real world conditions with nine radiologists reporting 163 scans; they reported an overall sensitivity of 80.9%,which is slightly better than our result (72%).Greer et al.[8] found that specificity was dependent on experience (84.0% vs.55.2%), which was confirmed in this study.Their results suggested that radiologists inexperienced(defined as<500 cases in 2 years)in mpMRIp were more likely to miss lesions.On the contrary,our study found that inexperience resulted in a more cautious approach, resulting in over-reporting of higher grade lesions giving a higher sensitivity with lower specificity, and thus a lower PPV.This suggests that some radiologists early in the mpMRIp learning curve may have a tendency to over-report higher grade lesions perhaps as a result of not wanting to miss more serious cancers.Despite re-reporting having a higher rate of false negatives, our analysis showed that this would have had minimal impact on patient outcomes or treatment options.This is in agreement with findings by Mohammadian Bajgiran et al.[15], which demonstrated that majority of lesions missed on mpMRIp but later diagnosed on whole-mount histopathology, are clinically insignificant.
A prospective study by Kim et al.[16] of 295 patients with 478 lesions showed a sensitivity of 90% and specificity of 80% for detecting csPCa (defined as ISUP ≥2) using PI-RADS 4 as a cut-off, giving PPV of 83.3% and NPV of 81.8%.When the clinical threshold for performing a biopsy was lowered to PI-RADS 3,accuracy decreased to 68.6%but sensitivity increased to 94.6% with a specificity of 58.7%.Importantly, PPV decreased to 51.6% and NPV 85.7% [16].Similarly, this study involved a single radiologist reporting on their mpMRIp outcomes.Using PI-RADS ≥3 lesions as a cut-off,which is a surrogate for real-world decision making,we have demonstrated NPV and PPV equivalent to using a PI-RADS 4 as a cut-off.This suggests that with growing experience in mpMRIp reporting in Australia, the accuracy of MRI outcomes is increasing, allowing for greater certainty for urologists when counselling patients about management options such as biopsy or surgery.
Importantly, radiologists early in the mpMRIp learning curve tend to diagnose more indeterminate PI-RADS 3 lesions resulting in more false positives.Stolk et al.[17] quantified false positives by lesion type,reporting a false positive rate of 100%for inexperienced radiologists(defined as<100 cases),compared to only 71%false positives for experienced radiologists(defined as>500 cases).Results of this study corroborate this higher PI-RADS 3 determination with the initial read,reporting 10 of these lesions compared to five.Our study demonstrates that after 5-years of experience in mpMRIp,radiologists’reporting accuracy was significantly higher.
Consistency and agreement remain major concerns for mpMRIp, with reproduction of scans being inconsistent.Pre-biopsy scans repeated 4 weeks after initial scan reported by Mu¨ller et al.[18] showed poor concordance.However, compared to PSA screening alone, which has a high false positive rate of 75.9%, conferring a 5.5% risk of biopsy due to a false positive PSA screen,mpMRIp is still the gold standard pre-biopsy screen [19].A study involving subspecialists re-reporting the scans of 158 patients originally reported by regional radiologists revealed that the subspecialists were more likely to call a study negative(41%vs.20%); all studies were biopsied and the amount of Gleason scores ≥4+3 that a subspecialist would miss was 3%compared to 16% [20].These results are largely consistent with our study.This way of improving the poor concordance of mpMRIp by use of a second reader was demonstrated by Luzzago et al.[21] showing that 53% of scores were changed, avoiding 33.5% of total prostate biopsies.
This study highlights the shortcomings of human interpretation of mpMRIp using the PI-RADS system, but it also confirms its utility as an effective screening tool.PI-RADS has not been validated as a diagnostic tool, which is why confirmatory targeted biopsies continue to be required.The clinical utility and strength of PI-RADS lie in the information that it gives clinicians and patients in terms of NPV and PPV.In this study, the radiologist significantly improved specificity which directly improved NPV and PPV.At the beginning of a learning curve,it can be inferred that the novice called many lesions positive, providing a much better sensitivity at the detriment of accuracy.The experienced read did not have the sensitivity but when looking for clinically significant cancer it showed much higher PPV and equivalent NPV, greatly improving the accuracy and thus utility in clinical practice.This is key when counselling patients about results and treatment options such as surgery or active surveillance.Furthermore, these results confirm mpMRIp as an appropriate pre-biopsy test, with sufficient testing accuracy outside of the high volume centres where practice changing papers such as PROMIS were reported [2].
An important limitation of this study is that it is a retrospective study commencing in 2013 when prebiopsy mpMRIp was not as established as it is today.It is also not possible to comment on the total accuracy of the reads with only 54%of patients having available histology.The blinded nature of the repeat reporting makes comparison challenging as most radiologists incorporate clinical information into their decision making when reporting using PI-RADS.However,the false negatives in the re-reporting would have limited impact on patient outcomes, and there is no doubt that a more complete clinical picture incorporating known pathological features, such as PSA, PSA-density, or indication would have improved accuracy and widen the gap caused by the learning curve.
While accuracy remains an important focus of discussion,consistency is more important for clinicians.In recent years, the advancement of technology and artificial intelligence make computer-assisted diagnosis (CAD) a logical adjunct for mpMRIp interpretation.CAD improves not only sensitivity but also agreement across a range of experience levels [22].Where it requires improvement is specificity,with performance below that of an experienced radiologist,analogous to the improved specificity shown here.The inherent lack of intraobserver reliability in the PI-RADS reporting system remains.There is obvious value in the development and pilot testing of algorithms used in CAD.Ferriero et al.[23] demonstrated that CAD improves detection rate of csPCa when combined with fusion prostate biopsy.
Conclusion
In the current era of men receiving earlier diagnoses of PCa and more men undergoing active surveillance protocols,the potential of greater mpMRIp sensitivity that is conveyed by experience radiologist reports, means these men may possibly avoid multiple biopsies over their lifetime.
Author contributions
Study design: Tom Sutherland, Lih-Ming Wong, Thomas Whish-Wilson.
Data acquisition: Thomas Whish-Wilson, Jo-Lynn Tan.
Data analysis: William Cross.
Drafting of manuscript: Thomas Whish-Wilson, Jo-Lynn Tan.
Critical revision of the manuscript: Jo-Lynn Tan, Lih-Ming Wong, Tom Sutherland, Thomas Whish-Wilson.
Conflicts of interest
The authors declare no conflict of interest.
Acknowledgement
This research has been kindly supported by a grant from the St Vincent’s Research Endowment Fund (approval number 55.2014).
Asian Journal of Urology2023年4期