Comparing diagnostic tests: verification bias
An article in this week's Archives of Internal Medicine discusses a
limitation of study design and execution that can happen in
comparisons of diagnostic testing options, an issue known as
verification bias.
The scenario:
You have a new diagnostic test, Exciting Test A, that may be an option
for seeing if patients have Awful Disease X.
You also have Old-Standby Test B, the existing "gold standard"
diagnostic test for diagnosing Awful Disease X ("gold standard" means
that Old-Standby Test B is the best thing you had going up until now
to figure out if someone has Awful Disease X).
You want to set up a study to see if Exciting Test A is an accurate
test for diagnosing this disease, in comparison to the Old-Standby.
There are lots of pitfalls in designing this kind of study (the
Bandolier site has a really good discussion of the most common
potential limitations of diagnostic studies).
An example of one of these pitfalls - verification bias:
The study by Lauer et al. in this week's Archives estimates the impact
of verification bias - this kind of bias happens when everyone in the
study gets Exciting Test A, but not everyone gets Old-Standby Test B -
i.e. the "truth" of the Test A results are not verified in the whole
set of patients by Test B , which should be the definition of true
disease status.
The reference: Lauer MS, Murthy SC, Blackstone EH, Okereke IC, Rice
TW. [18F]Fluorodeoxyglucose Uptake by Positron Emission Tomography for
Diagnosis of Suspected Lung Cancer: Impact of Verification Bias. Arch
Intern Med. 2007;167:161-165 (abstract).
What this study looked at:
- The patient population: 534 patients with suspected lung cancer
(Awful Disease X)
- Exciting Test A: PET scan
- Old-Standby Test B: tissue diagnosis (including mediastinoscopy,
transbronchial biopsy, thoracotomy, percutaneous fine needl
aspiration, or thoracentesis)
- 419 patients (78%) underwent both PET scan and tissue diagnosis. In
this group, sensitivity (people with the disease who test positive) of
PET scanning was 95% and specificity (people without the disease who
test negative) was 31% (both figures related to the test's ability to
detect cancer at any site).
- Authors used two methods to adjust for verification bias (since 115
patients only underwent PET scanning): the Diamond method (relatively
simple) and the Begg Greenes method (more complex formula).
- Using the Diamond method, the adjusted sensitivity was 87% and the
adjusted specificity was 55%. The Begg Greenes method yielded a
sensitivity of 85% and 51% specificity. So, with each method of
adjustment, sensitivity went down (a lower percentage of people with
lung cancer actually tested positive) and specificity went up (a
higher percentage of people without lung cancer actually tested
negative).
- "Real world" meaning of these estimates -- a higher proportion of
diagnoses of lung cancer were probably missed by PET scanning when it
was not accompanied by tissue diagnosis -- so a greater number of lung
cancer cases were missed by the PET-scan-only approach than the
results would indicate if you didn't account for verification bias
(i.e. if you ignore the potential impact of verification bias, PET
scanning looks better than it actually is for diagnosing lung cancer).
- The authors conclude that verification bias in this case has a
substantial impact on the measures of diagnostic accuracy for PET in
assessing cases of suspected lung cancer, and suggest that clinicians
should "lower their threshold for proceeding to definitive tissue
diagnosis in the setting of negative PET scan findings."
Another prominent evaluation of verification bias:
Punglia RS, D'Amico AV, Catalona WJ, Roehl KA, Kuntz KM. Effect of
verification bias on screening for prostate cancer by measurement of
prostate-specific antigen. N Engl J Med. 2003;349:335-342. (full-text)
Labels: diagnosis, research methods, verification bias
posted by Becky @ 12:49 PM 0 comments links to this post
0 Comments:
Post a Comment
Links to this post:
No comments:
Post a Comment