Diagnosis and Evaluation

Reviewed on July 01, 2024

Introduction

Atopic Dermatitis (AD) is a heterogeneous disorder with a wide spectrum of clinical signs, symptoms, severity and clinical course. Consequently, there is no single tool that comprehensively assesses all aspects of the disease. There are presently >60 measures that have been used to assess the severity of AD. These assessments vary considerably with respect to content, scale, instructions, validity and concordance. Numerous scales are also used to assess the signs, symptoms and quality of life (QOL) disturbance of AD. This module will review the common diagnosis criteria and clinical outcome assessments used in AD. The properties, strengths and weaknesses of these assessments will be discussed, as well as their utility for assessing AD in controlled trials and clinical practice.

Diagnostic Criteria

The diagnosis of AD is made clinically based on patient medical history, morphology and distribution of skin lesions and the presence of associated comorbidities (e.g., food…

Introduction

Atopic Dermatitis (AD) is a heterogeneous disorder with a wide spectrum of clinical signs, symptoms, severity and clinical course. Consequently, there is no single tool that comprehensively assesses all aspects of the disease. There are presently >60 measures that have been used to assess the severity of AD. These assessments vary considerably with respect to content, scale, instructions, validity and concordance. Numerous scales are also used to assess the signs, symptoms and quality of life (QOL) disturbance of AD. This module will review the common diagnosis criteria and clinical outcome assessments used in AD. The properties, strengths and weaknesses of these assessments will be discussed, as well as their utility for assessing AD in controlled trials and clinical practice.

Diagnostic Criteria

The diagnosis of AD is made clinically based on patient medical history, morphology and distribution of skin lesions and the presence of associated comorbidities (e.g., food allergies, asthma, allergic rhinitis). Several groups have devised formal sets of criteria to aid in classification of AD; the most notable are discussed below. A systematic review of randomized controlled trials (RCTs) with a pharmacologic intervention from 2007 to 2016 found that the Hanifin and Rajka criteria were most commonly used (41.0%), followed by the UK refinement of the Hanifin and Rajka criteria (9.0%), Japanese Dermatological Association criteria (4.2%), and American Academy of Dermatology (AAD) criteria (3.8%).

Hanifin and Rajka Criteria

The 1980 Hanifin and Rajka criteria are the original and most commonly employed diagnostic criteria for AD (Table 5-1). A diagnosis of AD under the Hanifin and Rajka criteria requires that patients meet three of four major criteria and three of 23 minor criteria. Although comprehensive and used commonly in clinical trials, these criteria may not be practical in clinical practice for several reasons: 1) the large number of criteria are cumbersome, 2) some criteria are nonspecific, such as pityriasis alba; and 3) other criteria are uncommon in AD despite being fairly specific, such as upper lip cheilitis. As such, several groups have proposed modifications to address these limitations.

The UK Working Party Criteria

The UK Working Party simplified the Hanifin and Rajka criteria, requiring patients to meet a single mandatory condition (i.e., itchy skin condition) plus three of five minor criteria (Table 5-2). No laboratory testing is required, making these criteria better suited for epidemiologic and population-based studies, in addition to clinical practice. Both the Hanifin and Rajka criteria and UK criteria have been validated in studies and in a range of populations. Although revisions have been proposed to include infants, the original UK criteria cannot be applied to very young children.

AAD Criteria

In 2003, the AAD proposed a set of revised Hanifin and Rajka criteria (Table 5-3). These streamlined criteria are applicable to all age groups, making them potentially better suited for the clinical setting.

New Chinese Criteria for Childhood Atopic Dermatitis

In 2019, new diagnostic criteria were developed and validated for use in China following the publication of a large study in children aged 1-7 years, which suggested that a milder form of AD was prevalent among this population, and that the Hanifin and Rajka and UK Working Party criteria were not sensitive enough in this population. According to these new criteria, AD is diagnosed in children based on the presence of three essential features: 1) pruritus; 2) ‘typical morphology and distribution’ (flexural dermatitis) or ‘atypical morphology and distribution with xerosis’; and 3) a chronic or chronically relapsing course.

Distinguishing AD from Other Disorders

During diagnosis, AD should be differentiated from a variety of other causes of cutaneous inflammation.

  • In infancy, AD has a predilection for the face and may be difficult to distinguish from seborrheic dermatitis. While children with AD may develop diaper rash secondary to irritant contact dermatitis, AD per se usually spares the groin and axillary regions. In contrast, seborrheic dermatitis and inverse or napkin psoriasis do not. Seborrheic dermatitis and psoriasis are usually not pruritic, although some can experience substantial pruritus.
  • AD may be difficult to distinguish from cutaneous T cell lymphoma (CTCL), which can present with eczematous lesions. CTCL should be considered in the evaluation of patients with non-classical eczematous lesions, e.g., psoriasiform and follicular lesions, and a non-flexural distribution of eczematous lesions.
  • In adulthood, AD has a predilection for the head, neck, hands and feet and may be difficult to distinguish from irritant, allergic and/or airborne contact dermatitis, photosensitivity and photoallergic contact dermatitis. These etiologies should be considered in the evaluation of patients with adult-onset dermatitis.
  • If patients do not respond adequately to treatment, consider reevaluating the diagnosis to include other disorders, including allergic contact dermatitis and CTCL. Other AAD recommendations regarding the diagnosis and assessment of AD are shown in Table 5-4.

Biomarkers

Biomarkers can be used for a variety of purposes, including confirmation of clinical diagnosis, objective measures of disease severity, and prediction of treatment response. However, there are currently no biomarkers that accurately reflect the severity of AD or its symptoms, or distinguish AD from other diseases. As such, the diagnosis and severity assessment of AD remains clinical.

IgE

Elevated total and/or allergen-specific serum immunoglobulin E (IgE) is the most commonly considered biomarker with AD. Although total IgE does tend to increase with disease severity, many individuals with severe AD have normal IgE levels. Elevated IgE is also absent from approximately 20% to 50% of patients with AD. Such findings led to the denotation of “extrinsic” and “intrinsic” groups of patients based on the presence of absence of elevation IgE levels, respectively. However, it remains controversial whether such groups represent true subsets of AD patients. For instance, some patients without elevated IgE will later develop elevated IgE levels. Advances in understanding the pathogenesis of AD also suggests that elevated IgE may be a secondary phenomenon, therefore playing less of a role in AD compared to other atopic disease. Elevated IgE is also nonspecific, as it is elevated in multiple nonatopic conditions (e.g., parasitic infections, certain cancers and autoimmune diseases). More recent paradigms have placed greater emphasis on the upstream T cells and Th2 inflammation, including IL-4 and IL-13, rather than the IgE downstream.

Skin Prick Test (SPT) Positivity

Patients and parents often request allergy testing to identify potential food and/or airborne triggers to avoid in the hopes of curing or reducing the severity of AD. Although patients with AD are at higher risk of food and environmental allergies, AD per se is not typically caused or worsened by allergens. Positive allergy tests may only indicate sensitization rather than a causal connection to AD severity or disease course. Clinical assessment for personal and family history of allergies should be done during history taking to establish the presence of true food allergies. Skin prick test (SPT) and serum-specific IgE measurements are typically performed to assess for immediate/type I hypersensitivity reactions. Negative predictive value of these tests is high (>95%), but specificity and positive predictive value are low (40% to 60%). Thus, negative tests are helpful to rule out allergy, but false-positive tests are common; positive tests require clinical correlation to confirm the presence of allergic disease and the type of allergic response.

The AAD recommends against routine food allergy testing in AD patients. Limited food allergy testing may be considered (cow’s milk, eggs, wheat, soy, and peanut) in children <5 years of age with AD who have moderate to severe and persistent AD despite optimized management with trigger avoidance and topical therapy, and/or a reliable history of an immediate allergic reaction after ingesting a specific food. Controlled food challenges are the gold standard in verifying a positive skin test. Even if a food allergy is present, allergy avoidance is unlikely to cure AD, so effective treatment centered around good skin care and topical and/or systemic therapies remains crucial.

Other Biomarkers

Numerous other biomarkers have been evaluated for utility in the diagnosis and/or treatment of AD. These include serum levels of CD30, macrophage-derived chemoattractant (MDC), IL-12, IL-16, IL-18, IL-31, thymus and activation-regulated chemokine (TARC), eosinophil cationic protein (ECP), E-selectin, vitamin D, cutaneous T cell-attracting chemokine (CTACK), lactate dehydrogenase (LDH), tissue mast cell count and peripheral eosinophil count. Some of these biomarkers have shown a correlation with disease severity. One systematic review and meta-analysis of several biomarkers found IgE to be weakly correlated with AD severity. In contrast, TARC, a key chemokine involved homing CCR4–expressing T cells to the skin, was the most reliable biomarker and significantly correlated with AD severity in both longitudinal (r=0.60; 95% confidence interval (CI), 0.48-0.70) and cross-sectional studies (r=0.64; 95% CI, 0.57-0.70); however, even TARC is only moderately correlated with AD severity. No biomarkers have yet to show reliable sensitivity or specificity to support use in clinical practice and trials.

Disease Severity Scales

AD has a wide spectrum of lesional morphology, including acute oozing and crusting, subacute lesions with dryness and scaling, chronic lesions with lichenification and/or prurigo nodules, and erythema, excoriations and dryness occurring at all stages of disease. AD lesions may vary from being mild and barely perceptible to severe and profoundly inflamed. AD lesions may be limited to the flexural areas or cover the entire body. Moreover, severity and intrusiveness of itch and localization of lesions to certain anatomical distributions (e.g., face, hands and feet) may have particularly negative impact on QOL. These aspects are important considerations for assessing the severity of AD in clinical trials and practice.

There is no gold standard for evaluating the severity of AD. Since there are no reliable laboratory tests or biomarkers to assess the severity of AD, clinicians must rely upon clinical assessments of disease parameters that can be subjective and difficult to standardize. Standardized measures have been developed to quantify the clinical disease burden at baseline and the effectiveness of a treatment regimen. The first objective scoring systems—Rajka & Langeland and the Simple Scoring System—were developed in 1989. Since then, many additional instruments were developed, and there are now >60 different named and unnamed AD severity measures in the published literature. However, the validity, reliability, sensitivity to change and acceptability have not been adequately elucidated in many of these studies. The use of inadequately validated instruments and heterogeneity of instruments may impede comparison between different clinical trials using different instruments.

Given the increasing number of outcome measures developed for AD, the Harmonizing Outcome Measures in Eczema (HOME) international consensus group was assembled to promote the standardization of use of AD outcome measures in clinical trials internationally. HOME performed multiple systematic reviews to examine the validity of currently available measures. Rigorous validation of a severity measure requires the assessment of an array of measurement properties: (1) truth (eg, content validity and construct validity), (2) discrimination (e.g., internal consistency, inter-observer reliability, intra-observer reliability, responsiveness/sensitivity to change, floor and ceiling effects and interpretability), and (3) feasibility (e.g., acceptability, ease of use and time to perform and interpret). Despite considerable progress in harmonizing outcome measures, the landscape of severity measures remains muddled.

With the recent explosion of research and development of novel treatments for AD, it is imperative for clinicians to understand the various severity assessments used in past and future clinical trials. Moreover, as more evidence emerges about the validity, strengths and weaknesses of different measures, some measures may even emerge as being feasible and relevant for clinical practice. This section will describe, compare, and contrast the most commonly used AD severity assessments systems and highlight their role in both clinical trials and potentially clinical practice.

SCORing Atopic Dermatitis (SCORAD)

SCORAD was inaugurated in 1993 by the European Task Force on AD, and is the most commonly used objective assessment in RCTs for AD. SCORAD incorporates clinician assessment of lesional severity and extent, as well as patient-reported symptoms of pruritus and sleep loss. Six signs of AD are assessed: erythema, edema/papulation, oozing/crusting, excoriations, xerosis and lichenification/prurigo nodules. Each sign is given a score of 0 (not present), 1 (barely perceptible), 2 (clearly perceptible), or 3 (very prominent). A representative lesion is selected for scoring of erythema, edema/papulation, oozing/crusting, excoriations and lichenification/prurigo, i.e., neither the best nor the worst. A representative area of non-lesions skin is used to score xerosis.

Lesional extent is assessed via an estimation of body surface area (BSA) using the “rule of nines” or palmar method and ranges from 0 to 100%. SCORAD relies on the assumption that extent and severity are linearly correlated, but this relationship only exists in persons with ≤30% BSA. Symptoms of itch and sleep loss over the past 3 days are assessed on scales of 0 to 10 using 10-cm visual analogue scale. Total SCORAD is calculated as follows: BSA/5 + 7*(sum of intensity scores)/2 + subjective symptom scores, and has a range of 0 to 103. That is, lesional severity comprises 61.2%; extent and symptoms each comprise only 19.4% of total score. Ideally, when SCORAD is used, the sign and symptom components should also be reported to distinguish between the objective and subjective components AD severity. However, the vast majority of clinical trials (>90%) do not report the individual sign and symptom components of SCORAD.

Twenty-six validation studies have been performed on SCORAD, and found good convergent and divergent construct validity, internal consistency, responsiveness, adequate inter-observer reliability, and no floor or ceiling effects. However, there is inter-observer variability in evaluating lichenification and disease extent.

Objective SCORAD (oSCORAD)

oSCORAD includes the assessments for lesional severity and extent, but not symptoms. oSCORAD ranges from 0 to 83, unless disfiguring lesions on the hands or face or functionally limiting lesions are present. In this scenario, 10 bonus-points are added and the oSCORAD range increases to 93.

oSCORAD is internally consistent with good intra-rater reliability, but inter-observer reliability is still unclear. Severity strata for oSCORAD were proposed based on author consensus (<15=mild; 15-40=moderate; >40=severe) without any formal testing. These strata were extended to SCORAD by adding 10-points to the oSCORAD thresholds (<25=mild; 25-50=moderate; >50=severe), again without any formal testing. An interpretability study identified different severity bands for oSCORAD and SCORAD in adults with AD using an anchor-based approach (oSCORAD: 0-7.9=clear; 8.0-23.9=mild; 24.0-37.9=moderate; 38.0-83=severe; SCORAD: 0-9.9=clear; 10-28.9=mild; 29.0-48.9=moderate; 49.0-103=severe).

Eczema Area and Severity Index (EASI)

EASI was developed as a modification of the well-established Psoriasis Area and Severity Index (PASI). EASI assesses both lesional intensity and extent. Four signs–erythema, papulation/edema, excoriation and lichenification–are assessed using a scale of 0 to 3, similar to SCORAD. However, unlike SCORAD, EASI assesses the average lesional intensity within four body regions—head and neck, upper limbs, trunk and lower limbs. Lesional extent is evaluated by estimating the surface area involved of those four body regions (1-9%, 10-29%, 30-49%, 50-69%, 70-89%, 90-100%). Lesional intensity is multiplied by the surface area involved in that region and summed across regions, yielding a total score ranging from 0 to 72.

Nine validation studies of EASI were performed and demonstrated validity, internal consistency, adequate intra-observer reliability, intermediate inter-observer reliability, and adequate responsiveness. Due to moderate inter-observer reliability, it is recommended that the same clinician perform both baseline and follow-up assessments. However, there is a wider range of EASI scores for severe vs mild or moderate disease, and there are floor and ceiling effects. Changes in score correlate to larger clinical changes at low vs high EASI values. These differences have been suggested to be due to the absence of itch, impact on QOL, and involvement of high visibility areas in scoring.

We have observed that EASI is poorly responsive in the mild disease range because of limited discrimination between patients with a broad range of extent at the lowest range. That is, the presence of a 1-cm lesion affecting <1% and numerous lesions affecting 9% of a body region would be scored similarly. Thus, EASI may not be an ideal measure for studies assessing patients with mild AD owing to potentially insufficient responsiveness.

Severity strata were previously defined for EASI using an anchor-based approach (0=clear; 0.1-1.0=almost clear; 1.1-7.0=mild; 7.1-21.0=moderate; 21.1-50=severe; and 50.1-72=very severe). Another study found similar, but slightly different severity strata in adults with AD (0=clear; 0.1-5.9=mild; 6-22.9=moderate; 23-72=severe).

Modified EASI (mEASI)

The modified EASI (mEASI) is a variant of EASI that is identical but includes an assessment of itch by the patient. Itch is included in scoring because it is considered a primary symptom of AD. Scores range from 0-90 and severity strata are defined as follows: 0-0.9=clear; 1-8.9=mild; 9.0-29.9=moderate; 30.0-90=severe. Unlike EASI, mEASI has never been validated in the literature.

Global Assessments

Global assessments are quick and simple “snapshots” of disease severity. There are several different approaches used for global assessment of AD. Assessments may be static (severity at a fixed point of time) or dynamic (comparison of current severity to baseline); however, static assessments are preferred because they are considered more accurate and less subjective. There is extensive variability with respect to the use of global assessments.

One approach is a gestalt assessment of overall disease severity by a clinician. In this scenario, no explicit scoring instructions are required to perform an evaluation, and classically, only signs are assessed. While this approach has been used as the standard by which other instruments are validated or anchored, it has not been formally validated. It is expected to perform quite well and show good intra- and inter-observer reliability. Nevertheless, this approach may be influenced by the clinical experience of the observer and the most severe cases previously encountered. The gestalt severity assessment is simple and feasible for clinical practice. At the very least, AD severity should be assessed and documented at every clinical encounter using a gestalt severity assessment.

In addition, there are a number of global assessments that have been used with specific instructions on how to assess the intensity of a representative or target lesion. Although “Investigator Global Assessment (IGA)” and “Physician Global Assessment (PGA)” are the two most common names for global assessments, there are eighteen other names for these global scales. Moreover, there are 23 unique scoring systems ranging from 4 to 7 points, which vary with respect to size, content, instructions and analysis. The six-point scale is most common and is typically defined as follows: 0=clear; 1=almost clear; 2=mild; 3=moderate; 4=severe; 5=very severe. The most commonly assessed content includes: erythema and papulation/edema, with or without oozing/weeping. Less commonly, crusting, excoriation, scaling and lichenification are assessed. Despite there being no studies assessing the validity or responsiveness of IGA, the United States Food and Drug Administration generally mandates that IGA be used for RCT in AD. Global assessments may have less inter-observer reliability than other measures of AD signs, and should be performed by the same clinician from visit to visit.

Comparisons between global assessments and other severity measures are prevalent in the literature as global assessments often serve as a proxy gold standard for validation or anchoring. Items noticeably absent from IGA evaluations include disease extent, involvement of high-visibility or functionally significant areas, and symptoms, all of which are important aspects of AD severity. For example, a patient with a moderate AD lesion that is 1 cm in diameter is very different than one with moderate lesions covering 90% BSA. Thus, IGA evaluations that do not assess BSA are inadequate for both the clinical and trial settings.

Furthermore, global assessments often do not assess excoriations, which most correlate with itch and disease severity according to patients. Moreover, IGA demonstrates reduced responsiveness compared with other validated AD measures and may be influenced by the response to previous treatments, patient compliance and the doctor-patient relationship. For all these reasons, IGA is not an ideal measure for AD.

To overcome some of the limitations of IGA, particularly the lack of standardization and harmonization of disease assessment in clinical trials, an international group of 24 pediatric and adult dermatologists developed a validated IGA scale for AD (vIGA-AD) in 2018. The vIGA-AD is a 5-point scale (0=clear; 1=almost clear; 2=mild; 3=moderate; 4=severe) in which the score is defined using morphological characteristics of the lesions (erythema, lichenification, induration/papulation and oozing/crusting), with indeterminate cases being decided by lesional extent.vIGA-AD demonstrated high intra- and inter-rater reliability (intraclass correlation >0.8) and was officially reviewed and accepted by the FDA as an appropriate instrument for assessing the efficacy of AD medications.

Rajka-Langeland (R-L)

R-L was the one of the earliest developed AD severity instruments. Scoring is comprised of intensity (sleep disturbance due to itch), extent (“rule of nines”), and disease course (number of months the patient was affected by eczema in the previous year). The assessment of disease course is unique to R-L in comparison to other commonly used severity measures. Scores range from 3 to 9 and severity strata are as follows: mild=3-4; moderate=4.5-7.5; severe=8-9. This instrument has adequate content validity, very good inter- and intra-observer reliability, adequate divergent construct validity, sensitivity to change and time consumption. In a prospective, dermatology-based validation study, R-L demonstrated good concurrent validity with EASI, SCORAD, oSCORAD and BSA (Spearman’s ϱ=0.51-0.63), good convergent validity with worst and average NRS-itch, POEM, and DLQI (ϱ=0.53-0.60), and moderate-to-good convergent validity with PROMIS itch questionnaire (ϱ=0.35-0.55). R-L also showed good discriminant validity, moderate-to-good reliability, fair responsiveness and did not exhibit floor or ceiling effects. R-L is not the preferred objective AD outcome measure by the HOME group, and consequently will likely not be used much in AD trials. However, it may be a great measure of AD severity in clinical practice because it is simple to collect and score and accounts for symptoms, signs and long-term disease control.

Six Area Six Sign AD (SASSAD)

The SASSAD severity score was inaugurated in 1996 and consists of the assessment of six signs (erythema, exudation, excoriation, xerosis, cracking and lichenification) at six different areas (head/neck, buttocks/trunk, arms, legs, handsand feet). Each sign is graded on a scale of 0 to 3 at each of the six areas, with a total range of scores from 0 to 108. Although the six areas are not similar in size, they are deemed similar in importance to the patient. This scoring system simplifies assessments of extent by avoiding estimations of BSA. SASSAD has been used for the study of both adults and children. Compared to the other scales, fewer validation studies have been performed for SASSAD.

AD Severity Index (ADSI)

The ADSI was first described and used in 1998 in a study evaluating the effectiveness of topical ascomycin macrolactam. ADSI grades pruritus, erythema, exudation, excoriation and lichenification of a target lesion on a scale of 0-3, with total scores ranging from 0 to 15. To date, there have been no studies validating its use. Still, it was employed in RCTs to evaluate the safety and efficacy of crisaborole topical ointment. An interpretability study determined severity strata (0-1.9=clear; 2-5.9=mild; 6.0-8.9=moderate; 9.0-15=severe), which may help clarify its use.

EASI vs SCORAD

Erythema, edema/papulation, excoriation and lichenification are assessed in both EASI and SCORAD. However, SCORAD also assesses oozing/crusting and xerosis of non-lesional skin. Previous studies found strong correlations between EASI and oSCORAD. However, a nonlinear relationship between EASI and oSCORAD was recently shown (Figure 5-1). In particular, EASI scores ≤5 were unable to distinguish between fairly broad ranges of oSCORAD when moderate-severe xerosis was present or when moderate-severe lesions were localized to the face, eyelid, neck, flexural areas, hands and feet. Yet, for EASI scores >5, there was a linear relationship with oSCORAD. Thus, there is a complex relationship between EASI and oSCORAD and limited discriminative ability of low EASI scores compared with SCORAD. Moreover, unlike SCORAD, which only evaluates the intensity of itch, mEASI weights severity of itch according to the extent of affected BSA.

Enlarge  Figure 5-1: Mean BSA was 23.8% ± 27.6%, EASI was 8.7 ± 11.3, and oSCORAD was 22.7 ± 15.0.  Source: Chopra R, et al. J Allergy Clin Immunol. 2017;140(6):1708-1710.e1.
Figure 5-1: Mean BSA was 23.8% ± 27.6%, EASI was 8.7 ± 11.3, and oSCORAD was 22.7 ± 15.0. Source: Chopra R, et al. J Allergy Clin Immunol. 2017;140(6):1708-1710.e1.

Lesional Intensity vs Extent

Clinicians and patients alike agree that both intensity and extent of lesions are important for the assessment of AD severity. However, there is considerable variability in how these are measured and weighted, stemming from a fundamental debate over which of these matters more for AD severity. BSA assesses extent, but not intensity. IGA assesses intensity, but not extent. EASI assesses both, but heavily weights scores by extent (~50%). oSCORAD/ SCORAD is mostly driven by intensity (~60%), with a smaller weighting for extent (~20%). Contributions of disease intensity vary between outcome measures from 33% to 100% and extent vary from 19% to 100%. Moreover, there is a nonlinear relationship between lesional intensity and extent.

It is important for clinicians and investigators to recognize that different AD severity assessments have inherently different constructs that are measuring different things. An interpretability study determined severity strata for multiple measure of AD signs, and found that measures including extent (BSA, EASI, SCORAD) correlated better with overall disease severity than those with intensity but no extent (ADSI). To recap the above example, a patient with a moderate AD lesion that is 1 cm in diameter is very different than one with moderate lesions covering 90% BSA. The former may be successfully treated with relatively minimal amounts of topical therapy, whereas the latter may require systemic therapy and/or hospitalization. It is therefore essential that clinicians do a full-body skin exam to assess the true extent of AD lesions, and not just assess some representative lesions.

Patient-Reported Outcomes (PRO) and Quality of Life (QOL) Measures

The large symptom-burden of AD, including the sequela of itch, pain, sleep and mental health disturbance, may not be accurately depicted by objective measures of disease severity. Rather, these symptoms are likely better reflected by validated PROs aimed at standardizing the assessment of these subjective symptoms. Further, the profoundly negative impact of AD on QOL, such as physical and psychosocial wellbeing, warrants disease assessments that characterize patients’ QOL.

There is an increasing trend towards including PROs and QOL measures in clinical studies, particularly for disorders with low mortality and high disability. Further, clinician assessments of disease severity do not necessarily align with PROs. This was highlighted in a systematic review of QOL impairment in pruritus, the hallmark symptom of AD. Physical manifestations of itch were the most frequently reported concern in the clinical literature, but only the fifth most important concern to patients. Further, functional limitations and relationship/social effects related to itch were the second and third most commonly reported concerns by patients and only the sixth and eighth most commonly reported concerns in the clinical literature. Thus, assessing changes in these PROs and QOL measures may actually be better indicators of the patient-burden of AD and treatment response than relying exclusively on physical exam alone. However, research on PROs and QOL measures in AD is still lacking.

There is currently a lack of standardization in outcome measures used in clinical trials and practice for AD. This variability limits interpretability and comparability of clinical trial results and thus impairs evidence-based practice, which ultimately affects patient outcomes. As a result, the HOME international consensus group was developed to define a core outcome set that should be implemented in all future AD trials. At the HOME II meeting, the group determined that four domains should be measured in every AD trial: clinical signs, patient-reported symptoms, long-term control and QOL. The addition and recognition of PROs and QOL assessments signifies the recognized importance of these measures in assessing the burden of AD. At the HOME III meeting, the HOME group defined a symptom as any feature that is observed by the patient, thus broadening the outcomes that need to be captured. However, the HOME group set out to make this task feasible by working to recommend core instruments that can be used to assess PROs and QOL. The following section will review common instruments used to assess PROs and QOL.

Patient-Reported Global Severity of AD

No single objective assessment or PRO fully captures all aspects of AD disease severity. Moreover, there is a lack of simple and validated severity assessments that are feasible for clinical practice and epidemiologic research. One approach worth considering is the use of a single-question assessment for patient-reported global severity of AD. A single question for classifying the self- or caregiver-reported severity of AD would be short, intuitive and efficient. From the perspective of patient-centered research and care, patients’ own impression of their disease would be the gold standard AD severity assessment. The validity of this single-question approach was investigated in a prospective dermatology practice-based study using questionnaires and evaluation by a dermatologist.

Using a simple scoring system of mild, moderate, and severe, the study demonstrated that patient-reported AD severity correlated well with the objective AD assessments oSCORAD and EASI at baseline and follow-up (P <0.0001 for both). EASI and SCORAD have been extensively validated and used frequently to define AD severity in outcome measures and inclusion criteria. The HOME group found that only oSCORAD and EASI were adequately validated and assessed the four essential signs of AD. Patient-reported AD severity also correlated well with multiple validated PROs, including NRS-itch, POEM and DLQI (P <0.0001 for all). In 2017, POEM was selected as the preferred assessment for AD symptoms in clinical trials, while DLQI and CDLQI were selected as the preferred instruments to measure quality of life in 2019.Altogether, patient-reported global AD severity appears to be a simple, feasible and valid tool for assessing AD severity.

Skin Pain

Patients with AD and chronic itch may scratch their skin, resulting in skin barrier disruption and painful erosions. Skin pain has been associated with profoundly harmful effects on patient mental health and all aspects of QOL in many disease states. A prospective, dermatology practice-based study was performed using questionnaires and evaluation by a dermatologist to investigate whether AD is associated with increased skin pain, and its effect on patient mental health and QOL. At baseline, 42.7% of patients reported pain in the previous week, with 13.8% reporting severe or very severe pain. According to patients, 16.8% thought the skin pain was part of their itch, 11.2% thought the pain was due to scratching, and 72.0% believe it to be a combination of both. Patients who experienced skin pain were more likely to describe their itch using terms that resembled neuropathic pain. Skin pain was associated with increased AD severity and affected all aspects of patients’ QOL. Skin pain severity was most strongly correlated with POEM, followed by ItchyQOL, 5-dimensions of itch scale, DLQI, numeric rating scale for itch and sleep, Patient Health Questionnaire 9, patient-reported global AD severity, EASI and oSCORAD (P <0.001 for all). Patients with both severe itch and pain had significant increases in these measures compared to patients with only one or neither symptom. As such, the authors recommend that skin pain severity be routinely assessed along with itch severity in all AD patients. Moreover, skin pain may be an important endpoint for monitoring treatment response.

The need for suitable PRO measures for pain led to the development of an 11-point Skin Pain Numerical Rating Scale (NRS), with scores ranging from 0 (“no pain”) to 10 (“worst pain imaginable”). The Skin Pain NRS demonstrated good test-retest validity (week 1 to week 2 Average ICC=0.97; day 7 to day 14 ICC=0.91) and concurrent validity with other PROs (POEM week 1/2 ICC=0.75/0.77; PGIS week 1/2 ICC=0.68/0/65; DLQI Summary Score ICC=0.57–0.73; CDLQI Summary Score ICC=0.77–0.91). Based on patient feedback, skin pain descriptors such as discomfort or soreness were included in the final Skin Pain NRS, which improved patient understanding of the instrument. Another novel NRS for skin pain exhibited comparable measurement properties to an existing validated NRS for overall pain. In a study of its validity and reliability, NRS skin-pain exhibited good concurrent validity, showing moderate correlations with average overall-pain NRS (Spearman’s ϱ=0.65), POEM (ϱ=0.56), DLQI (ϱ=0.46), and NRS worst-itch (ϱ=0.43) and weak correlations with SCORAD (ϱ=0.39), NRS average-itch (ϱ=0.36), vIGA*BSA (ϱ=0.34), EASI (ϱ=0.32), and oSCORAD (ϱ=0.30) It also demonstrated good divergent validity, good discriminant validity, good-to-excellent reliability and fair-to-good responsiveness.

Visual Analogue Scale (VAS)-Pruritus

The VAS-pruritus is a unidimensional instrument that allows patients to mark the intensity of their itch on a 10-cm/100-mm ruler-shaped scale. The endpoints of the scale are marked with 0 (“no itch”) and 10 (“worst imaginable itch”), and thus scoring ranges from 0 to 10. The VAS has been well validated for rating pruritus, as well as other symptoms, most notably pain.

The VAS has demonstrated convergent validity with good correlation to the numerical rating scale (NRS) and visual response scale (VRS), content validity, and test-retest reliability. Additionally, a systematic review by the HOME group found that of 289 AD RCTs where itch was assessed, the VAS was the most commonly used measure to assess itch (76%). The VAS was noted to be more sensitive to change compared to the NRS and VRS; however, the VAS had the most missing data of the three during validation. Additionally, it should be noted that as a unidimensional instrument, the VAS only provides information about itch intensity. Moreover, there is a dearth of information as to the best practices for VAS-itch, including assessing worst vs average itch intensity, recall period, frequency of assessment and best time(s) to assess itch.

Reich and coworkers performed a detailed analysis to create a banding system as follows: 0=no pruritus; 1-3=mild pruritus; 4-6=moderate pruritus; 7-8=severe pruritus; and ≥9 points=very severe pruritus. The authors also recommended that a decrease of 2-3 points should constitute a minimal clinically important difference (MCID) for clinical itch improvement. Kido-Nakahara and colleagues recommended a slightly different banding system based on the results of a study of Japanese patients: 0=no pruritus; 1-2=mild pruritus; 3-6=moderate pruritus; 7-8=severe pruritus; and >9=very severe pruritus.

Numerical Rating Scale (NRS)-Pruritus

The NRS-pruritus is also a unidimensional, visual instrument that allows patients to rate their itch intensity. The NRS consists of 10 visually listed numbers ranging from 0 (“no itch) to 10 (“worst imaginable itch”) consecutively, and patients choose the number that describes the intensity of their itch. The NRS has been well validated for rating pruritus, as well as pain. An interpretability study found that the previously reported optimal bands for VAS-itch (0-3=mild; 4-6=moderate; 7-10=severe) were also optimal for NRS-itch. Reich and colleagues recommended that a decrease of 2-3 points should constitute an MCID for clinical itch improvement in NRS, similar to the VAS.

The NRS has demonstrated convergent validity, with good correlation to the VAS and VRS, and test-retest reliability. Reich and associates also showed that there was less missing data for the NRS compared to the VAS, and that patients indicated preference for the NRS over the VAS. In a large, population-based validation study of five PRO instruments, NRS-itch showed moderate-to-strong correlation with PO-SCORAD (and its objective and subjective subscores), POEM, and DLQI, weak correlation with HADS-anxiety and HADS-depression, and weak negative correlation with SF-12. The mentioned systematic review by the HOME group found that, of 289 AD RCTs that assessed itch, the NRS, along with the VRS, was the second most commonly used measure to assess itch (28% of trials). However, the NRS is also a unidimensional instrument and only provides information about itch intensity. Additionally, NRS scores were found to be significantly higher than corresponding VAS scores, so the scores may not be reliably used interchangeably.

Several itch severity assessments, including NRS, VRS and frequency of itch, are included in the PROMIS Itch Questionnaire (PIQ, see below). In a validation study, the NRS, VRS and frequency of itch items from the PIQ demonstrated good construct validity, responsiveness, reliability, and feasibility. Scores from all three items correlated strongly with each other (Spearman correlations P<0.001) and exhibited weak-to-moderate correlations with POEM, EASI, SCORAD and DLQI (Spearman correlations P<0.001). In qualitative interviews, most patients found that a 7-day recall period for the PIQ itch severity items best reflected their experience of itch and was more clinically relevant than a 24-hour recall period. A 1-day recall period may be better suited for clinical trials of novel agents because it allows for evaluation of rapid itch responses.

Visual Response Scale (VRS)-Pruritus

The VRS-pruritus is also a unidimensional instrument where patients verbally rate the intensity of their itch. The VRS has scoring system variability, as a four-point scale (none, mild, moderate and severe pruritus) and five-point scale (none, mild, moderate, severe and very severe pruritus) have both been used and validated. The VRS has been well validated and has demonstrated convergent validity, with good correlation to the VAS and NRS, and test-retest reliability. Reich and colleagues also showed that there was less missing data for the VRS compared to the VAS. As mentioned, the VRS, along with the NRS, was found to be the second most commonly used measure to assess itch in 28% of trials. The VRS also only provides information about itch intensity. Additionally, the scoring system variability limits comparisons between studies, and the limited number of response options can also increase variability and limit sensitivity to change.

Short Form-36 Health Survey (SF-36)

The SF-36 was first developed in 1992, with the revised version developed in 2000. The SF-36 was designed to assess patient-reported health-related quality-of-life (HRQOL), and can be used in general and clinical populations, thus allowing health comparability between disease states and populations. The SF-36 is a 36-item questionnaire covering eight domains: physical functioning, role limitations due to physical problems, social functioning, bodily pain, general mental health, role limitations due to emotional problems, vitality and general health perceptions. Total score ranges from 0 (worst) to 100 (best), broken into Physical Component Score (PCS) and Mental Component Score (MCS). The questionnaire takes approximately 5 to 10 minutes to complete, although shorter versions exist. The great international efforts that went into the testing and validation of the SF-36 survey have led to its widespread use in health-related research. One review study in 2000 found it to be the global HRQOL instrument of choice. More than 50 versions of the SF-36 currently exist in various languages.

The SF-36 has been a widely used instrument in AD research as a generic QOL measurement. It has been well validated and demonstrated to have high internal validity, construct validity, content validity and convergent validity. However, the questionnaire has questionable test-retest reliability, responsiveness and discriminant validity. Additionally, some patients have had difficulty with question interpretation.

Patient-Reported Outcomes Measurement Information System (PROMIS) Global Health (PGH)

The PGH is a non-disease-specific instrument for HRQOL evaluation comprised of five primary domains (physical function, fatigue, pain, emotional distress and social health). Distinct scores can be generated from PGH 4-items related to mental health (PGH-M4) and physical health (PGH-P4), and briefer 2-item versions (PGH-M2 and PGH-P2) have also been developed.

In a dermatological practice-based validation study, physical and mental PGH 4-item and 2-item scores correlated strongly or very strongly with each other (Spearman’s ϱ, 0.59 to 0.94) and exhibited moderate-to-strong correlations with PHQ9 (ϱ, -0.57 to -0.65), PROMIS sleep disturbance (ϱ, -0.42 to -0.53) and related-impairment (ϱ, -0.45 to -0.57), PO-SCORAD (ϱ, -0.40 to -0.52), EASI (ϱ, -0.31 to -0.41), and oSCORAD (ϱ, -0.30 to -0.39) and weak-to-moderate correlations with SCORAD (ϱ, -0.25 to -0.33), POEM (ϱ, -0.23 to -0.32), and NRS worst-itch (ϱ, -0.20 to -0.31) and average-itch (ϱ, -0.20 to -0.31). Compared to DLQI, PGH scores had stronger correlations with PHQ9 and PROMIS sleep disturbance and related-impairment but weaker correlations with POEM, PO-SCORAD, NRS worst-itch and average-itch, EASI, and SCORAD.

Short Form-12 Health Survey (SF-12)

The SF-12 is a set of 12 generic (non-disease-specific) adult quality of life measures developed from the longer SF-36 in order to streamline large-scale measurement and monitoring efforts. It includes questions on overall health and the impact of health status (physical and mental) and pain on daily activities, including work and social activities. In a cross-sectional, population-based validation study of SF-12 in AD patients, SF-12 showed good internal consistency. The SF-12 mental component score (MCS) exhibited good discriminant validity and moderate inverse Pearson correlations with several common patient-oriented AD severity measures, including POEM, PO-SCORAD, and NRS-pain. By contrast, the SF-12 physical component score (PCS) demonstrated low discriminant validity and weak correlations with the abovementioned AD severity measures. SF-12 was also found to perform worse than DLQI in assessing the burden of AD, with DLQI exhibiting a stronger correlation with AD severity. Similar results were reported in a large validation study of five PRO instruments, where SF-12 was shown to moderately negatively correlate with PO-SCORAD and its subscores, NRS-itch, and POEM, with the MCS exhibiting a stronger negative correlation (Pearson r=-0.32 to r=-0.51) with these measures than the PCS (r=-0.07 to r=-0.21). Despite its limitations compared to disease-specific instruments, as an instrument that is not skin-specific, SF-12 has the advantage of allowing comparisons across health conditions and disease states.

Patient-Oriented Eczema Measure (POEM)

The POEM, developed in 2004, was designed as a patient-reported outcome to measure frequency of AD activity. The creators felt that this would provide a more holistic assessment of the patient’s disease and treatment response. The POEM is a 7-item questionnaire that assesses the frequency of symptom occurrence. Each item is scored on a 5-point scale (0 to 4), thus the overall score ranges from 0 (best) to 28 (worst). The questionnaire takes 1 to 2 minutes to complete. The minimal clinically important difference for POEM has been stated by two key studies to be 3.4, and approximately 3 in young children. The creators also proposed an anchor-based banding system in 2013 for POEM scores as follows: 0-2=clear/almost clear; 3-7=mild; 8-16=moderate; 17-24=severe; and 25-28=very severe. An interpretability study has confirmed these strata for mild, moderate and severe classifications in patients with AD.

The questionnaire has been well validated and shown to demonstrate construct validity, convergent validity, divergent validity, internal consistency, sensitivity to change and test-retest reliability. Additionally, the questionnaire has been shown to have good correlation with the dermatology life quality index and children’s dermatology life quality index instruments. A systematic review assessing AD outcome measures also deemed POEM to be one of the few recommended outcome measurements for AD due to its validation and domain coverage. Of note, POEM was chosen by the HOME group at the fourth international consensus meeting as the preferred core instrument to consistently assess the core domain of symptoms and patient-reported symptoms in future AD trials.

Patient-Oriented SCORAD (PO-SCORAD)

PO-SCORAD is a patient self-assessment score developed in 2009 on the basis of objective and subjective evaluation criteria from SCORAD. It is organized in three parts, covering: 1) disease extent; 2) severity; and 3) subjective symptoms (including itch and sleep disturbance). An accompanying illustrated document aids the patient in assessing the severity of AD lesions. PO-SCORAD is a validated instrument that correlates well with SCORAD both in scores (r=0.67-0.79) and absolute changes from baseline (r=0.71). One dermatology practice-based study assessed the measurement properties of PO-SCORAD and compared them to POEM. PO-SCORAD and POEM showed moderate correlation with each other (Spearman’s ϱ=0.56) and weak-to-good correlations (ϱ=0.31-0.71) with other PRO instruments, including NRS itch, DLQI, ItchyQOL, PHQ-9, PROMIS sleep and EASI. POEM significantly outperformed PO-SCORAD in correlation with DLQI (ϱ=0.59 vs ϱ=0.52, P<0.001), ItchyQOL (ϱ=0.71 vs ϱ=0.62, P=0.02) and EASI (ϱ=0.52 vs ϱ=0.31, P<0.03). A validation study of 5 PRO instruments showed that PO-SCORAD and its objective and subjective subscores correlated well with each other (Pearson’s r=0.59-0.97), NRS-itch (r=0.59-0.82), POEM (r=0.55-0.71) and DLQI (r=0.53-0.71), moderately with HADS-anxiety (r=0.46-0.52) and HADS-depression (r=0.39-0.46), moderately negatively with SF-12 MCS (r=-0.40 to - 0.51) and weakly negatively with SF-12 PCS (r=-0.07 to - 0.21).Compared with POEM, PO-SCORAD had stronger correlations with DLQI (P=0.003), SF-12 MCS (P=0.002), SF-6D (P=0.006), HADS-anxiety (P=0.001) and HADS-depression (P=0.005), but not SF-12 PCS (P=0.57).

Patient-Reported Outcomes Measurement Information System (PROMIS) Itch Questionnaire (PIQ)

The PIQ was developed in 2019 to complement existing instruments and provide a multi-dimensional assessment of the impact of itch on health-related quality of life. The PIQ comprises four item banks with 63 items in total, covering the effects of itch on: 1) general concerns; 2) mood and sleep; 3) clothing and physical activity; and 4) scratching behavior. From these item banks, PIQ short forms can be developed for specific needs. Several PIQ short forms (SFs), including 8-item SFs from banks 1-3 and a 5-item SF from bank 4, were validated in a prospective, dermatology practice-based study in adults with AD. The SFs exhibited good content validity and concurrent validity, with all item-bank T-scores showing moderate-to-strong correlations with each other. They also had good convergent validity, with all item-bank T-scores showing moderate correlations with other assessments of itch severity, moderate-to-strong correlations with POEM and weak to moderate correlations with EASI and oSCORAD. While all item banks demonstrated fair discriminant validity, none were able to distinguish the lowest 2 or highest 2 levels of AD symptoms and itch severity. All item banks also showed good internal consistency (Cronbach ϱ, 0.91-0.95). Floor effects but no ceiling effects were observed in all item banks. The SFs demonstrated good feasibility, with a median completion time of 2 minutes.

Dermatology Life Quality Index (DLQI)

The DLQI tool, developed in 1994, is one of the most commonly used assessments in dermatology and clinical trials. It was developed to measure the impact that skin disease has on the lives of adult patients. The DLQI is a 10-item questionnaire with each item scored from 0 to 3, with a total score ranging from 0 (best) to 30 (worst) that assesses the following domains: skin symptoms, feelings of embarrassment, day-to-day activities and working and social life. It takes approximately 1 to 3 minutes to complete. The interpretation of the DLQI score is as follows: 0-1=no effect on patient’s life; 2-5=small effect; 6-10=moderate effect; 11-20=very large effect; and 21-30=extremely large effect. For general inflammatory skin conditions, a change in DLQI score of at least 4 points was found to be clinically important. An interpretability study confirmed that these bands are indeed optimal in adults with AD.

The DLQI is an extensively validated dermatology-specific questionnaire that has been translated into more than 40 languages. It is also one of the most frequently used instruments in AD studies. At the HOME VII meeting in 2019, the DLQI was chosen as the recommended instrument to measure skin-specific quality of life in adult patients. While the instrument is dermatology-specific and not disease-specific, it has been found to be highly specific for assessing QOL changes in AD patients. It has high internal consistency, construct validity and test-retest reliability. It has also shown responsiveness to changes in social, emotional and physical function. The DLQI correlates with disease severity and the SCORAD assessment in AD patients. It has also been found to correlate with other PRO measures, including PO-SCORAD and its subscores (Pearson’s r=0.53-0.71), NRS-itch (r=0.48) and POEM (r=0.62). Limitations of the DLQI include inadequate assessment of mild disease, item bias and a unidimensional nature. Additionally, some studies showed differential reporting in AD studies by gender when using the DLQI.

Children’s Dermatology Life Quality Index (CDLQI)

The CDLQI questionnaire was developed in 1995 to measure QOL in children aged 4-16 years old with skin disease to assess how their skin disease affects their lives. It has been found to be the most commonly used questionnaire for assessing QOL in children affected by skin disease. Like the DLQI, the CDLQI is commonly used in AD studies to assess QOL in children, and was recommended as the preferred instrument for the measurement of skin-specific quality of life in children with AD at the HOME VII meeting. Also like the DLQI, the CDLQI is a 10-item questionnaire with each item scored from 0 to 3; thus, the total score also ranges from 0 (best) to 30 (worst). The child completes the questionnaire with adult assistance if needed, and the domains covered include: itching, sleep loss, friendships, bullying, school performance, sports participation and enjoyment of vacation. The validated severity banding for CDLQI scores is as follows: 0-1=no effect on child’s life; 2-6=small effect; 7-12=moderate effect; 13-18=very large effect; and 19-30=extremely large effect.

A meta-analysis found that, over 38 studies, the mean CDLQI score in children with AD was 8.5 (SD 7.1-9.8), indicating that AD has a moderate effect on QOL in children. The CDLQI has been well validated, takes 1 to 2 minutes to complete, and has been shown to demonstrate test-retest reliability, internal consistency, content validity and responsiveness. A cartoon version of the CDLQI was created that was found to be quicker and preferred by both kids and parents. Limitations of the CDLQI include questionable correlation with AD severity, as well as an inability to compare results with the DLQI, due to both differences in patient populations and reported correlations. It also lacks validation of its use in patients aged <4 and >16 years. Of note, CDLQI and DLQI scores are distinct measures and their scores cannot be compared or used interchangeably.

Dermatitis Family Impact (DFI)

The DFI, developed in 1998, assesses the extent of how a child with AD affects the QOL of the child’s family. The DFI is a 10-item questionnaire, and with each item scored from 0 to 3, the total score can range from 0 (best) to 30 (worst). The questionnaire is to be completed by a parent or caregiver of the child. The domains assessed include housework, food preparation and feeding, sleep, family leisure activity, shopping, expenditure, fatigue, emotional distress and relationships. The DFI, which takes approximately 2 minutes to complete, has been well validated and has been shown to demonstrate high test-retest reliability, internal consistency, sensitivity to change and convergent validity. Many studies have also shown the DFI to correlate well with AD severity. Additionally, one review found the DFI to be the second most common QOL measure used in AD RCTs.

Concordance of AD Signs and Symptoms

The concordance between instruments that assess signs vs symptoms varies considerably between studies. According to Kim and colleagues, the objective domains of SCORAD and R-L are well correlated to the subjective domains of the QOL measures IDQOL, CDLQI and DLQI. On the other hand, according to Haeck and associates, SCORAD and DLQI are not significantly correlated at baseline, but are moderately correlated with respect to response to treatment. Zhao and co-workers found a low correlation between patient-reported and physician-assessed disease severity as demonstrated by a lack of correlation of oSCORAD, EASI, SASSAD and Three-Item Severity (TIS) score with POEM, DLQI and Skindex-29, with SASSAD and Skindex-29 being the only measures with moderate correlation. Together, these studies highlight that objective assessments and PROs measure different aspects of AD and the need for assessments of both AD signs and symptoms to completely capture the broad manifestations and burden of AD.

Concordance of Clinician and Patient Assessments

Patient-oriented SCORAD (PO-SCORAD) and self-administered EASI (SA-EASI) were developed as PRO analogs of SCORAD and EASI. PO-SCORAD utilizes an illustrated tutorial for patients to assess the severity and extent of their disease. SA-EASI allows the caregiver to assess extent via a body silhouette that estimates BSA and intensity of redness, thickness, dryness, number of scratches and itchiness of the child’s average AD lesion via five 100-mm VAS. Although these measures are analogous to their parent clinician assessments, they differ significantly in their review, content and measurement properties. These differences preclude comparisons between instruments and prevent these objective and subjective scores from being used interchangeably.

Clinical Outcome Assessments Feasible for Clinical Practice

Ideally, the same outcome domains should be assessed in a patient visit to promote the effective translation of results from clinical trials to general practice. However, the objectives in both settings are different, and the same assessments may be inappropriate due to variations in core outcome domains. For an instrument to be useful to the general practitioner, it should have demonstrated validity in clinical practice, require minimal training, be time-efficient, and seamlessly integrate into day-to-day practice. EASI requires extensive training to improve intra- and inter-rater reliability, has no demonstrated validity or feasibility in the clinical setting, takes a considerable amount of time to complete, and is likely a poor measure in the clinical practice setting.

The most critical aspect to implementing an instrument in routine care is the time it takes to perform. To be acceptable or adequate for use in clinical practice, a severity score should not take longer than 3 minutes or 3-5 minutes, respectively. However, SCORAD and EASI took 3-10 and 2-6 minutes to perform across different studies, respectively. Completion-time of these assessments is user-dependent and varies based on experience and training. Most studies that specifically addressed completion-time for an untrained dermatologist reported that it took longer than 3 minutes. Thus, SCORAD and EASI are most likely neither acceptable nor adequate for routine clinical practice.

A patient’s assessment of his/her symptoms may be a more accurate reflection of patient-burden and clinically relevant. Assessment of itch and sleeplessness alone is insufficient to fully measure disease severity according to HOME. However, due to the abundance of AD-related symptoms, it was not possible for HOME to develop a comprehensive short-list of important symptoms that need to be assessed by an instrument. However, dryness, redness/inflamed skin, and irritated skin, and to a lesser extent pain/soreness, were all considered essential.

POEM is the only non-symptom-specific PRO sufficiently validated to be recommended by the HOME collaboration. POEM is a self-reported eczema-specific tool that assesses frequency of signs and symptoms important to patients, including itch, sleep loss, bleeding, weeping/oozing, skin cracking, flaking and xerosis over the previous week. It is simple, patient-centered, easily interpreted and takes most patients 1 to 2 minutes to complete. It was developed in the primary and secondary care settings, supporting its use in both community and hospital settings. Moreover, patients can complete questionnaires before seeing their dermatologist either at home or in the waiting room. However, POEM’s structural validity, cross-cultural validity, reliability, measurement error and lack of assessment of intensity of symptoms raise major concerns about their use in clinical practice. The most feasible PRO in clinical practice are patient-reported global AD severity of mild, moderate, or severe. The most feasible objective AD assessment is the clinician’s gestalt global assessment of mild, moderate, or severe. However, the validity of this approach has yet to be established. NRS- and VRS-itch are also simple PRO measures that can be used to measure itch intensity, but may not characterize the full severity of AD.

 

References

  • Silverberg JI. Clinical Management of Atopic Dermatitis. 2nd ed. Professional Communications Inc. 2022
  • Abrams BB. Atopic dermatitis: elements in clinical study design and analysis. Acta Derm Venereol Suppl (Stockh). 1989;144:15-19.
  • Aziah MS, Rosnah T, Mardziah A, et al. Childhood atopic dermatitis: a measurement of quality of life and family impact. Med J Malaysia. 2002;57:329-339.
  • Badia X, Mascaro JM, Lozano R. Measuring health-related quality of life in patients with mild to moderate eczema and psoriasis: clinical validity, reliability and sensitivity to change of the DLQI. The Cavide Research Group. Br J Dermatol. 1999;141:698-702.
  • Bahmer FA, Schafer J, Schubert HJ. Quantification of the extent and the severity of atopic dermatitis: the ADASI score. Arch Dermatol. 1991;127:1239-1240.
  • Berth-Jones J, Finlay AY, Zaki I, et al. Cyclosporine in severe childhood atopic dermatitis: a multicenter study. J Am Acad Dermatol. 1996;34(6):1016-1021.
  • Berth-Jones J, Graham-Brown RA, Marks R, et al. Long-term efficacy and safety of cyclosporin in severe adult atopic dermatitis. Br J Dermatol. 1997;136:76-81.
  • Blome C, Radtke MA, Eissing L, et al. Quality of life in patients with atopic dermatitis: disease burden, measurement, and treatment benefit. Am J Clin Dermatol. 2016;17:163-169.
  • Bock SA, Lee WY, Remigio L, et al. Appraisal of skin tests with food extracts for diagnosis of food hypersensitivity. Clin Allergy. 1978;8:559-564.
  • Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. J Health Econ. 2002;21:271-292.
  • Chalmers JR, Schmitt J, Apfelbacher C, et al. Report from the third international consensus meeting to harmonise core outcome measures for atopic eczema/dermatitis clinical trials (HOME). Br J Dermatol. 2014;171(6):1318-1325.
  • Chalmers JR, Simpson E, Apfelbacher CJ, et al. Report from the fourth international consensus meeting to harmonize core outcome measures for atopic eczema/dermatitis clinical trials (HOME initiative). Br J Dermatol. 2016;175:69-79.
  • Charman C, Williams H. Outcome measures of disease severity in atopic eczema. Arch Dermatol. 2000; 136:763-769.
  • Charman CR, Venn AJ, Ravenscroft JC, et al. Translating Patient-Oriented Eczema Measure (POEM) scores into clinical practice by suggesting severity strata derived using anchor-based methods. Br J Dermatol. 2013;169:1326-1332.
  • Charman CR, Venn AJ, Williams H. Measuring atopic eczema severity visually: which variables are most important to patients? Arch Dermatol. 2005;141:1146-1151.
  • Charman CR, Venn AJ, Williams HC. The patient-oriented eczema measure: development and initial validation of a new tool for measuring atopic eczema severity from the patients’ perspective. Arch Dermatol. 2004;140:1513-1519.
  • Cheng R, Zhang H, Zong W, et al. Development and validation of new diagnostic criteria for atopic dermatitis in children of China. J Eur Acad Dermatol Venereol. 2020;34(3):542-548.
  • Chopra R, Vakharia P, Sacotte R, et al. Relationship between EASI and SCORAD severity assessments for atopic dermatitis. J Allergy Clin Immunol. 2017;140(6):1708-1710.e1.
  • Chopra R, Vakharia P, Sacotte, R, et al. Severity strata for EASI, mEASI, oSCORAD, SCORAD, ADSI and BSA in adolescents and adults with atopic dermatitis. Br J Dermatol. 2017;177(5):1316-1321.
  • Chren MM, Lasek RJ, Quinn LM, et al. Skindex, a quality-of-life measure for patients with skin disease: reliability, validity, and responsiveness. J Invest Dermatol. 1996;107:707-713.
  • Coutanceau C, Stalder JF. Analysis of correlations between patient-oriented SCORAD (PO-SCORAD) and other assessment scores of atopic dermatitis severity and quality of life. Dermatology. 2014;229:248-255.
  • Dodington SR, Basra MK, Finlay AY, et al. The Dermatitis Family Impact questionnaire: a review of its measurement properties and clinical application. Br J Dermatol. 2013;169:31-46.
  • Eichenfield LF, Hanifin JM, Luger TA, et al. Consensus conference on pediatric atopic dermatitis. J Am Acad Dermatol. 2003;49:1088-1095.
  • Eichenfield LF, Tom WL, Chamlin SL, et al. Guidelines of care for the management of atopic dermatitis: section 1. Diagnosis and assessment of atopic dermatitis. J Am Acad Dermatol. 2014;70(2):338-351.
  • Feldman SR, Krueger GG. Psoriasis assessment tools in clinical trials. Ann Rheum Dis. 2005;64(suppl 2):ii65-68.
  • Futamura M, Leshem YA, Thomas KS, et al. A systematic review of Investigator Global Assessment (IGA) in atopic dermatitis (AD) trials: many options, no standards. J Am Acad Dermatol. 2016;74:288-294.
  • Guo Y, Li P, Tang J, et al. Prevalence of atopic dermatitis in Chinese children aged 1-7 ys. Sci Rep. 2016;6:29751.
  • Haeck IM, ten Berge O, van Velsen SG, et al. Moderate correlation between quality of life and disease activity in adult patients with atopic dermatitis. J Eur Acad Dermatol Venereol. 2012;26:236-241.
  • Hanifin JM, Thurston M, Omoto M, et al. The eczema area and severity index (EASI): assessment of reliability in atopic dermatitis. EASI Evaluator Group. Exp Dermatol. 2001;10:11-18.
  • Harmonizing Outcome Measures for Eczema (HOME) initiative. Home VII Meeting. HOME for eczema website. Accessed March 1, 2021. http://www.homeforeczema.org/meetings-and-events/home-vii-meeting-2019.aspx
  • Hays RD, Bjorner JB, Revicki DA, Spritzer KL, Cella D. Development of physical and mental health summary scores from the patient-reported outcomes measurement information system (PROMIS) global items. Qual Life Res. 2009;18(7):873-880.
  • Hays RD, Schalet BD, Spritzer KL, Cella D. Two-item PROMIS® global physical and mental health scales. J Patient Rep Outcomes. 2017;1(1):2.
  • Hill MK, Kheirandish Pishkenari A, Braunberger TL, et al. Recent trends in disease severity and quality of life instruments for patients with atopic dermatitis: a systematic review. J Am Acad Dermatol. 2016;75:906-917.
  • Holm EA, Esmann S, Jemec GB. Does visible atopic dermatitis affect quality of life more in women than in men? Gend Med. 2004;1:125-130.
  • Holm EA, Wulf HC, Stegmann H, et al. Life quality assessment among patients with atopic eczema. Br J Dermatol. 2006;154:719-725.
  • Holm EA, Wulf HC, Thomassen L, et al. Assessment of atopic eczema: clinical scoring and noninvasive measurements. Br J Dermatol. 2007;157:674-680.
  • Holme SA, Man I, Sharpe JL, et al. The Children’s Dermatology Life Quality Index: validation of the cartoon version. Br J Dermatol. 2003;148:285-290.
  • Hon KL, Kam WY, Lam MC, et al. CDLQI, SCORAD and NESS: are they correlated? Qual Life Res. 2006;15:1551-1558.
  • Kabashima K. New concept of the pathogenesis of atopic dermatitis: interplay among the barrier, allergy, and pruritus as a trinity. J Dermatol Sci. 2013;70:3-11.
  • Kantor R, Dalal P, Cella D, et al. Research letter: Impact of pruritus on quality of life-A systematic review. J Am Acad Dermatol. 2016;75:885-886.e4.
  • Kido-Nakahara M, Katoh N, Saeki H, et al. Comparative cut-off value setting of pruritus intensity in visual analogue scale and verbal rating scale. Acta Dermato-venereologica. 2015;95:345-346.
  • Kim DH, Li K, Seo SJ, et al. Quality of life and disease severity are correlated in patients with atopic dermatitis. J Korean Med Sci. 2012;27:1327-1332.
  • Kosinski M, Keller SD, Hatoum HT, et al. The SF-36 Health Survey as a generic outcome measure in clinical trials of patients with osteoarthritis and rheumatoid arthritis: tests of data quality, scaling assumptions and score reliability. Med Care. 1999;37:10-22.
  • Kunz B, Oranje AP, Labreze L, et al. Clinical validation and guidelines for the SCORAD index: consensus report of the European Task Force on Atopic Dermatitis. Dermatology. 1997;195:10-19.
  • Lawson V, Lewis-Jones MS, Finlay AY, et al. The family impact of childhood atopic dermatitis: the Dermatitis Family Impact Questionnaire. Br J Dermatol. 1998;138:107-113.
  • Le Cleach L, Chassany O, Levy A, et al. Poor reporting of quality of life outcomes in dermatology randomized controlled clinical trials. Dermatology. 2008;216:46-55.
  • Leshem YA, Hajar T, Hanifin JM, et al. What the eczema area and severity index score tells us about the severity of atopic dermatitis: an interpretability study. Br J Dermatol. 2015;172(5):1353-1357.
  • Lewis V, Finlay AY. 10 years experience of the Dermatology Life Quality Index (DLQI). J Investig Dermatol Symp Proc. 2004;9:169-180.
  • Lewis-Jones MS, Finlay AY. The Children’s Dermatology Life Quality Index (CDLQI): initial validation and practical use. Br J Dermatol. 1995;132:942-949.
  • Linde L, Sorensen J, Ostergaard M, et al. Health-related quality of life: validity, reliability, and responsiveness of SF-36, 15D, EQ-5D [corrected] RAQoL, and HAQ in patients with rheumatoid arthritis. J Rheumatol. 2008;35:1528-1537.
  • Mallinson S. Listening to respondents: a qualitative assessment of the Short-Form 36 Health Status Questionnaire. Social Sci Med. 2002;54:11-21.
  • McCarberg BH, Nicholson BD, Todd KH, et al. The impact of pain on quality of life and the unmet needs of pain management: results from pain sufferers and physicians participating in an Internet survey. Am J Ther. 2008;15:312-320.
  • McHorney CA, Ware JE, Lu JF, et al. The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care. 1994;32:40-66.
  • Murat-Susic S, Lipozencic J, Zizic V, et al. Serum eosinophil cationic protein in children with atopic dermatitis. Int J Dermatol. 2006;45:1156-1160.
  • Newton L, DeLozier AM, Griffiths PC, et al. Exploring content and psychometric validity of newly developed assessment tools for itch and skin pain in atopic dermatitis. J Patient Rep Outcomes. 2019;3(1):42.
  • Oranje AP. Practical issues on interpretation of scoring atopic dermatitis: SCORAD Index, objective SCORAD, patient-oriented SCORAD and Three-Item Severity score. Curr Probl Dermatol. 2011;41:149-155.
  • Phan NQ, Blome C, Fritz F, et al. Assessment of pruritus intensity: prospective study on validity and reliability of the visual analogue scale, numerical rating scale and verbal rating scale in 471 patients with chronic pruritus. Acta Dermato-venereologica. 2012;92:502-507.
  • Pincus T, Bergman M, Sokka T, et al. Visual analog scales in formats other than a 10 centimeter horizontal line to assess pain and other clinical data. J Rheumatol. 2008;35:1550-1558.
  • Rajka G, Langeland T. Grading of the severity of atopic dermatitis. Acta Derm Venereol Suppl (Stockh). 1989;144:13-14.
  • Rehal B, Armstrong AW. Health outcome measures in atopic dermatitis: a systematic review of trends in disease severity and quality-of-life instruments 1985-2010. PloS one. 2011;6:e17520.
  • Reich A HJ, Ramus M, Ständer S, Szepietowski J. New data on the validation of VAS and NRS in pruritus assessment: minimal clinically important difference and itch frequency measurement. Acta Dermato-venereologica. 2011; 91:636.
  • Reich A, Heisig M, Phan NQ, et al. Visual analogue scale: evaluation of the instrument for the assessment of pruritus. Acta Dermato-venereologica. 2012;92:497-501.
  • Reich A, Riepe C, Anastasiadou Z, et al. Itch assessment with visual analogue scale and numerical rating scale: determination of minimal clinically important difference in chronic itch. Acta Dermato-venereologica. 2016;96:978-980.
  • Reitamo S, Wollenberg A, Schopf E, et al. Safety and efficacy of 1 year of tacrolimus ointment monotherapy in adults with atopic dermatitis. The European Tacrolimus Ointment Study Group. Arch Dermatol. 2000;136:999-1006.
  • Rudzki E, Samochocki Z, Rebandel P, et al. Frequency and significance of the major and minor features of Hanifin and Rajka among patients with atopic dermatitis. Dermatology. 1994;189:41-46.
  • Salek MS, Jung S, Brincat-Ruffini LA, et al. Clinical experience and psychometric properties of the Children’s Dermatology Life Quality Index (CDLQI), 1995-2012. Br J Dermatol. 2013;169:734-759.
  • Sampson HA, Albergo R. Comparison of results of skin tests, RAST, and double-blind, placebo-controlled food challenges in children with atopic dermatitis. J Allergy Clin Immunol. 1984;74:26-33.
  • Schmitt J, Langan S, Deckert S, et al. Assessment of clinical signs of atopic dermatitis: a systematic review and recommendation. J Allergy Clin Immunol. 2013;132:1337-1347.
  • Schmitt J, Langan S, Williams HC, et al. What are the best outcome measurements for atopic eczema? A systematic review. J Allergy Clin Immunol. 2007;120:1389-1398.
  • Schmitt J, Spuls P, Boers M, et al. Towards global consensus on outcome measures for atopic eczema research: results of the HOME II meeting. Allergy. 2012;67:1111-1117.
  • Schmitt J, Spuls PI, Thomas KS, et al. The Harmonising Outcome Measures for Eczema (HOME) statement to assess clinical signs of atopic eczema in trials. J Allergy Clin Immunol. 2014;134:800-807.
  • Schmitt J, Williams H, Group HD. Harmonising Outcome Measures for Eczema (HOME). Report from the First International Consensus Meeting (HOME 1), 24 July 2010, Munich, Germany. Br J Dermatol. 2010;163:1166-1168.
  • Schulte-Herbruggen O, Folster-Holst R, von Elstermann M, et al. Clinical relevance of nerve growth factor serum levels in patients with atopic dermatitis and psoriasis. Int Arch Allergy Immunol. 2007;144:211-216.
  • Schwartzman G, Lei D, Yousaf M, et al. Validity and reliability of Patient-Reported Outcomes Measurement Information System Global Health scale in adults with atopic dermatitis. J Am Acad Dermatol. 2021:S0190-9622(21)00180-00188.
  • Severity scoring of atopic dermatitis: the SCORAD index. Consensus Report of the European Task Force on Atopic Dermatitis. Dermatology. 1993;186:23-31.
  • Silverberg JI, Gelfand JM, Margolis DJ, et al. Validation and interpretation of Short Form 12 and comparison with Dermatology Life Quality Index in atopic dermatitis in adults. J Invest Dermatol. 2019;139(10):2090-2097.e3.
  • Silverberg JI, Lai JS, Kantor RW, et al. Development, validation, and interpretation of the PROMIS Itch questionnaire: a patient-reported outcome measure for the quality of life impact of itch. J Invest Dermatol. 2020;140(5):986-994.e6.
  • Silverberg JI, Lai JS, Patel KR, et al. Measurement properties of the Patient-Reported Outcomes Information System (PROMIS®) Itch Questionnaire: itch severity assessments in adults with atopic dermatitis. Br J Dermatol. 2020;183(5):891-898.
  • Silverberg JI, Lai JS, Vakharia PP, et al. Measurement properties of the Patient-Reported Outcomes Measurement Information System Itch Questionnaire item banks in adults with atopic dermatitis. J Am Acad Dermatol. 2020;82(5):1174-1180.
  • Silverberg JI, Lei D, Yousaf M, et al. Comparison of Patient-Oriented Eczema Measure and Patient-Oriented Scoring Atopic Dermatitis vs Eczema Area and Severity Index and other measures of atopic dermatitis: A validation study. Ann Allergy Asthma Immunol. 2020;125(1):78-83.
  • Silverberg JI, Lei D, Yousaf M, et al. Measurement properties of the Rajka-Langeland severity score in children and adults with atopic dermatitis. Br J Dermatol. 2021 Jan;184(1):87-95.
  • Silverberg JI, Margolis DJ, Boguniewicz M, et al. Validation of five patient-reported outcomes for atopic dermatitis severity in adults. Br J Dermatol. 2020;182(1):104-111.
  • Silverberg JI. Validity and reliability of a novel numeric rating scale to measure skin-pain in adults with atopic dermatitis. Arch Dermatol Res. 2021 Feb 6.
  • Simpson E, Bissonnette R, Eichenfield LF, et al. The Validated Investigator Global Assessment for Atopic Dermatitis (vIGA-AD): The development and reliability testing of a novel clinical outcome measurement instrument for the severity of atopic dermatitis. J Am Acad Dermatol. 2020;83(3):839-846.
  • Spuls PI, Gerbens LA, Simpson E, et al. POEM a core instrument to measure symptoms in clinical trials: a HOME statement. Br J Dermatol. 2017;176:979-984.
  • Stalder JF, Barbarot S, Wollenberg A, et al; PO-SCORAD Investigators Group. Patient-Oriented SCORAD (PO-SCORAD): a new self-assessment scale in atopic dermatitis validated in Europe. Allergy. 2011 Aug;66(8):1114-21.
  • Thijs J, Krastev T, Weidinger S, et al. Biomarkers for atopic dermatitis: a systematic review and meta-analysis. Curr Opin Allergy Clin Immunol. 2015;15(5):453-460.
  • Udkoff J, Silverberg JI. Validation of scratching severity as an objective assessment for itch. J Invest Dermatol. 2018;138(5):1062-1068.
  • Vakharia PP, Chopra R, Sacotte R, et al. Burden of skin pain in atopic dermatitis. Ann Allergy Asthma Immunol. 2017;119(6):548-552.e3.
  • Vakharia PP, Chopra R, Sacotte R, et al. Validation of patient-reported global severity of atopic dermatitis in adults. Allergy. 2018;73(2):451-458.
  • Vakharia PP, Chopra R, Silverberg JI. Systematic review of diagnostic criteria used in atopic dermatitis randomized controlled trials. Am J Clin Dermatol. 2018;19(1):15-22.
  • Vakharia PP, Sacotte R, Patel N, et al. Severity strata for five patient-reported outcomes in adults with atopic dermatitis. Br J Dermatol. 2018;178(4):924-930.
  • van Geel MJ, Maatkamp M, Oostveen AM, et al. Comparison of the Dermatology Life Quality Index and the Children’s Dermatology Life Quality Index in assessment of quality of life in patients with psoriasis aged 16-17 years. Br J Dermatol. 2016;174:152-157.
  • Van Leent EJ, Graber M, Thurston M, et al. Effectiveness of the ascomycin macrolactam SDZ ASM 981 in the topical treatment of atopic dermatitis. Arch Dermatol. 1998;134:805-809.
  • Verwimp JJ, Bindels JG, Barents M, et al. Symptomatology and growth in infants with cow’s milk protein intolerance using two different whey-protein hydrolysate based formulas in a Primary Health Care setting. Eur J Clin Nutrition. 1995;49(suppl 1):S39-S48.
  • von Baeyer CL, Spagrud LJ, McCormick JC, et al. Three new datasets supporting use of the Numerical Rating Scale (NRS-11) for children’s self-reports of pain intensity. Pain. 2009;143:223-227.
  • Vourc’h-Jourdain M, Barbarot S, Taieb A, et al. Patient-oriented SCORAD: a self-assessment score in atopic dermatitis. A preliminary feasibility study. Dermatology. 2009;218(3):246-51.
  • Ware J Jr, Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996;34(3):220-233.
  • Ware J, Kosinski M, Keller SD. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996;34:220-233.
  • Ware JE, Gandek B. SF-36 Health Survey: Manual and Interpretation Guide. Lincoln (RI): Quality Metric. 2005.
  • Williams HC, Burney PG, Hay RJ, et al. The U.K. Working Party’s Diagnostic Criteria for Atopic Dermatitis. I. Derivation of a minimum set of discriminators for atopic dermatitis. Br J Dermatol. 1994;131(3):383-396.
  • Wolkerstorfer A, de Waard van der Spek FB, Glazenburg EJ, et al. Scoring the severity of atopic dermatitis: three item severity score as a rough system for daily practice and as a pre-screening tool for studies. Acta dermato-venereologica. 1999;79:356-359.
  • Zhao CY, Tran AQ, Lazo-Dizon JP, et al. A pilot comparison study of four clinician-rated atopic dermatitis severity scales. Br J Dermatol. 2015;173:488-497.