ega Vignette

Daniel Schmolze

2017-03-20


Welcome

Welcome to the ega package! This vignette will explain the functionality of the package via hands-on examples, after first discussing error grids in general.

Introduction to Error Grid Analysis

This section explains the basic concepts of error grids for analyzing glucose data. If you’re already familiar with the Clarke and Parkes error grids, feel free to skip ahead.

When a glucose meter is being validated for regulatory or clinical purposes, a method comparison study is usually performed. This entails testing patients using both the reference method and the new meter, and comparing the results. An initial approach might be to simply plot the paired values. Let’s do this for the built-in dataset glucose_data:

library(ega)
library(ggplot2)

ggplot(glucose_data, aes(ref, test)) + geom_point()

There seems to be a fair bit of scatter, which we can quantify with a correlation coefficient:

cor(glucose_data$ref, glucose_data$test)
## [1] 0.8343016

A basic problem with this approach is that it fails to capture the clinical context of the discrepencies. For example, a pair (300, 550) represents a large numerical discrepency, but is unlikely to result in an adverse clinical outcome since in either case the patient will probably receive insulin. On the other hand, a discrepency of 70 vs. 110 could have serious clinical consequences since hypoglycemic therapy may be administered in the former case, possibly erroneously.

Error grids address this problem by attempting to place paired values into various “zones” defined by the clinical impact of the discrepency. There are two major systems in use: the Clarke error grid, and the Parkes, or consensus, error grid. Both systems place paired reference/test values into one of five zones – “A”, “B”, “C”, “D” or “E” based on the expected clinical impact of the discrepency.

In the Clarke system, pairs are considered clinically accurate and assigned to zone A if there is no more than a 20% discrepancy (for glucose values greater than 70). Zone B contains pairs with greater than 20% discrepancy, but with no expected adverse clinical consequence. Zone C discrepencies may lead to errors in treatment, but with low risk of adverse clinical outcomes. Together, zones A-C can be considered clinically benign discrepancies. Zone D discrepencies may lead to erroneous failures to treat, while zone E represents potential for innapropriate treatment.

The Clarke error grid, while based on sound clinical reasoning, contains arbitrary cut-offs. For example, the 20% cut-off for zone A has proven somewhat too lenient, especially for regulatory purposes. The grid lines are also discontinuous, which makes construction and interpretation more difficult. The Parkes (consensus) error grid builds on the Clarke system, but with various improvements. First, as the name indicates, the grid was constructed by asking 100 diabetes experts to place discrepencies into one of five zones. The zones were defined as follows: A = no effect on clinical action, B = altered clinical action with little to no effect on clinical outcome, C = altered clinical action, likely to effect clinical outcome, D = altered clinical action, could have significant medical risk, E = altered clinical action, could have dangerous consequences. Seperate zone assignments were recorded for Type 1 and Type 2 diabetes, and an error grid with continuous zone lines was calculated based on all the responses (the method is discussed in detail in the references below).

Using ega

ega provides two basic functions for both Clarke and Parkes error grids. The first assigns zones (“A”, “B”, “C”, “D” or “E”) to paired reference/glucose values, while the second plots the respective error grids using the ggplot package.

The built-in dataset glucose_data contains sample paired glucose values (in mg/dl) for 5072 patients. The function getClarkeZones will assign Clarke zones:

zones <- getClarkeZones(glucose_data$ref, glucose_data$test)

head(zones)
## [1] "A" "B" "B" "B" "B" "B"

If units of mmol/l are desired, this can be achieved with the unit parameter. If using the built-in dataset, the data will first need to be divided by 18.

zones <- getClarkeZones(glucose_data$ref/18, glucose_data$test/18, unit="mol")

head(zones)
## [1] "A" "B" "B" "B" "B" "B"

The return value is a simple character vector, of the same length as the input data, with a zone assignment corresponding to each pair. We can use the factor function to summarize the distribution of zones:

zones <- factor(zones)

# counts
table(zones)
## zones
##    A    B    C    D    E 
## 3652 1171   53  180   16
# percentages
table(zones)/length(zones)*100
## zones
##          A          B          C          D          E 
## 72.0031546 23.0875394  1.0449527  3.5488959  0.3154574

For this particular dataset, 72+23+1 = 96% of pairs fall within the clinically acceptable zones A-C.

Let’s plot the data on a Clarke error grid using the function plotClarkeGrid:

plotClarkeGrid(glucose_data$ref, glucose_data$test)

The plot can be tweaked via various parameters:

plotClarkeGrid(glucose_data$ref, glucose_data$test, 
               pointsize=1.5, 
               pointalpha=0.6, 
               linetype="dashed")

Additionally, the return value from plotClarkeGrid (a ggplot object) can be stored and modified:

ceg <- plotClarkeGrid(glucose_data$ref, glucose_data$test)

ceg + theme_gray() + 
  theme(plot.title = element_text(size = rel(2), colour = "blue"))

The ggplot documentation for theme should be consulted for additional possibilities.

Analogous functions are provided for the Parkes error grid. For example, getParkesZones will assign Parkes zones to paired glucose values:

zones <- getParkesZones(glucose_data$ref, glucose_data$test)

zones <- factor(zones)

# counts
table(zones)
## zones
##    A    B    C    D    E 
## 3906  951  165   48    2
# percentages
table(zones)/length(zones)*100
## zones
##           A           B           C           D           E 
## 77.01104101 18.75000000  3.25315457  0.94637224  0.03943218

And for plotting:

plotParkesGrid(glucose_data$ref, glucose_data$test)

A similar set of arguments can be specified to control basic plotting properties, and the return value can likewise be stored and modified.

Units of mmol/l can be used in the plotting functions as well.

plotParkesGrid(glucose_data$ref/18, glucose_data$test/18, unit="mol")

Wrapping up

This concludes the vignette. For additional information, please consult the reference manual. For detailed discussion of the Clarke error grid, the original paper by Clarke et. al. should be consulted:

Clarke, W. L., D. Cox, L. A. Gonder-Frederick, W. Carter, and S. L. Pohl. “Evaluating Clinical Accuracy of Systems for Self-Monitoring of Blood Glucose.” Diabetes Care 10, no. 5 (September 1, 1987): 622-28.

For the Parkes (consensus) error grid, the original paper by Parkes et. al. discusses the system in general terms:

Parkes, J. L., S. L. Slatin, S. Pardo, and B.H. Ginsberg. “A New Consensus Error Grid to Evaluate the Clinical Significance of Inaccuracies in the Measurement of Blood Glucose.” Diabetes Care 23, no. 8 (August 2000): 1143-48

For a more technical discussion, including the details necessary for drawing the grid lines, the following reference is useful:

Pfutzner, Andreas, David C. Klonoff, Scott Pardo, and Joan L. Parkes. “Technical Aspects of the Parkes Error Grid.” Journal of Diabetes Science and Technology 7, no. 5 (September 2013): 1275-81