Vol.:(0123456789)

Advances in Neurodevelopmental Disorders 
https://doi.org/10.1007/s41252-025-00452-2

REVIEW

A Scoping Review of Attention‑Deficit/Hyperactivity Disorder 
Assessment and Diagnosis: Tools, Practices, and Sex Bias

Sasha L. Crocker1 · Anja Roemer2 · Sarah Strohmaier3  · Grace Y. Wang4 · Oleg N. Medvedev1

Accepted: 2 June 2025 
© The Author(s) 2025

Abstract
Objectives Accurately diagnosing attention-deficit/hyperactivity disorder (ADHD) is challenging due to the overlap of 
symptoms with other mental health conditions. This scoping review evaluated the dependability and accuracy of prevalent 
diagnostic scales and investigates potential obstacles to ADHD assessment diagnosis including potential sex bias.
Method Following the PRISMA-ScR guidelines, 11 widely used diagnostic scales were identified and included. All scales 
were evaluated based on their psychometric quality and alignment with DSM-5 diagnostic criteria for ADHD.
Results The Attention Deficit Disorders Evaluation Scale emerged as the most reliable among the 11 scales, with the 
Symptom Checklist-4 ranking as the least reliable. No single assessment tool was adequate for ADHD diagnosis; additional 
testing was required for accurate conclusions. The literature revealed sex and age biases in some of the assessments. It was 
discovered that girls were diagnosed with ADHD less often than boys, yet their likelihood of misdiagnosis was notably lower.
Conclusions This review emphasizes the necessity of comprehensive, multi-method assessment approaches for accurate 
ADHD diagnosis, as no single tool demonstrated sufficient diagnostic precision. Effective clinical assessment design must 
incorporate strong psychometric measures, address sex-based diagnostic disparities, and emphasize the importance of evalu-
ating behavioural changes over time and their functional impact across settings.

Keywords ADHD · Assessment · Diagnosis · Sex differences · Reliability · Validity

Attention-deficit/hyperactivity disorder (ADHD) is a neu-
rodevelopmental condition involving attention and organi-
zational difficulties, increased impulsivity, and hyperactivity 
(American Psychiatric Association, 2013). Although ADHD 
could be diagnosed at any age, this disorder was typically 
identified in childhood and could negatively affect a person’s 
life into adulthood. Thus, early diagnosis and intervention 
for children are essential, as it can positively influence the 
child’s life trajectory (McGoey et al., 2002).

In the absence of diagnostically specific biomarkers, cur-
rent diagnostic criteria primarily focus on behavioural symp-
toms (Feldman & Reiff, 2014). The extensive list of behav-
ioural symptoms has been provided by the Diagnostic and 
Statistical Manual of Mental Disorders 5th Edition (DSM-5; 
American Psychiatric Association, 2013), such as a constant 
pattern of inattention and/or hyperactivity-impulsivity that 
disturbs functioning or development; fidgeting, tapping, 
excessive talking, and struggling to stay seated; and making 
careless mistakes with work, failing to follow instructions, 
being easily distracted, and struggling with self-organization 
(American Psychiatric Association, 2013, p. 59). However, 
many of these symptoms could be characteristics of typical 
development in children and adolescents. Thus, only symp-
toms that are severe, persistent, and out of proportion to 
expectations for the child’s age or developmental level, and 
without appropriate alternative explanations count for the 
diagnosis of ADHD (Feldman & Reiff, 2014). Furthermore, 
there are no differential criteria for girls and boys, which 
assume a limited sex bias.

 * Oleg N. Medvedev 
 oleg.medvedev@waikato.ac.nz

1 School of Psychological and Social Sciences, University 
of Waikato, Hamilton, New Zealand

2 School of Psychology, Massey University, Palmerston North, 
New Zealand

3 Department of Psychology, Institute for Health and Sport, 
Victoria University Melbourne, Melbourne, Australia

4 School of Psychology and Wellbeing, University of Southern 
Queensland, Ipswich, Australia

http://orcid.org/0000-0002-2569-8447
http://crossmark.crossref.org/dialog/?doi=10.1007/s41252-025-00452-2&domain=pdf


 Advances in Neurodevelopmental Disorders

At present, there were indications of both underdiagno-
sis and overdiagnosis of ADHD in children (Hamed et al., 
2015; Kazda et al., 2019; Paris et al., 2015). The causes 
of overdiagnosis may include growing awareness of mental 
disorders and associated reduction in stigmatization, changes 
in diagnostic thresholds, poor clinical judgement, and adver-
tising by the pharmaceutical industry, while the causes of 
underdiagnosis could be related to the attitude, knowledge, 
and partnerships between schools, teachers, and children 
and diagnostic complexity caused by other comorbid psy-
chiatric disorders (Quinn & Madhoo, 2014). Furthermore, 
a significant sex disparity in ADHD misdiagnosis was also 
found. Bruchmüller et al. (2012) examined sex differences 
in ADHD misdiagnoses and found that 157 out of 231 boys 
had been misdiagnosed compared to only 73 out of 231 girls.

Accurate diagnosis is vital, as misdiagnosis can result 
in inappropriate medication or treatment, leading to long-
term mental health and educational outcomes (Nussbaum, 
2012). Ford-Jones (2015) argued that ADHD misdiagnoses 
can negatively impact children’s home life and education, 
potentially leading to fewer employment opportunities and 
a reduced social life in adulthood. Alternatively, a diagnosis 
has been shown to be helpful not only for the growing child 
themselves, but also the people in their lives, such as par-
ents, siblings, teachers, and healthcare professionals to better 
understand the child’s difficulties, and how best to help them 
(Hamed et al., 2015). Diagnosis in ADHD is generally via 
assessment tools; however, there has so far not been a com-
parison of different assessment tools for ADHD to determine 
whether they are similarly useful or differ significantly for 
diagnostic purposes. Specifically, there is a lack of consoli-
dated evidence assessing how these tools perform in terms of 
psychometric robustness, practical application, and potential 
biases. This scoping review aimed to evaluate the reliabil-
ity and validity of prevalent ADHD diagnostic scales while 
investigating potential obstacles to accurate assessment and 
diagnosis, including sex bias.

Method

This scoping review followed the PRISMA-ScR guidelines 
(Tricco et al., 2018) to evaluate established ADHD diagnos-
tic tools with proven clinical and research utility. Our aim 
was to map diagnostic instruments for ADHD in children 
and adolescents, examining their psychometric properties, 
limitations, and applicability across diverse populations 
(Peters et al., 2015).

Search Strategy

A systematic search of PsycINFO, MEDLINE, ERIC, 
and Web of Science was conducted between January and 

March 2024, covering literature published from 1980 to 
2024. Search terms included combinations of (“ADHD” 
OR “Attention deficit hyperactivity disorder” OR “ADD”) 
AND (“assessment” OR “diagnosis” OR “evaluation” OR 
“screening”) AND (“tools” OR “scales” OR “measures” OR 
“rating scales” OR “questionnaires”).

Study Selection 

Two reviewers (first and last authors) independently screened 
titles and abstracts against predetermined criteria. Inclusion 
criteria for scale selection consisted of (1) alignment with 
DSM-5 diagnostic criteria for ADHD; (2) assessment of 
ADHD symptoms across attention-deficit and hyperactivity-
impulsivity domains; (3) publication in peer-reviewed jour-
nals; (4) demonstrated psychometric properties including 
Cronbach’s alpha coefficients ≥ 0.80, and validation studies 
showing strong convergent validity; (5) documented usage 
in at least five peer-reviewed studies within the past dec-
ade; and (6) evidence of clinical implementation in practice 
settings. Measures were limited to English-language tools 
for assessing ADHD in individuals under 18. Emerging or 
highly specialized tools were excluded to maintain focus on 
established measures with broad applicability.

Data Extraction and Synthesis

Initial database searches extracted relevant articles describ-
ing and evaluating ADHD assessment tools. The selection 
process involved identifying key diagnostic scales that met 
all criteria, ensuring our review focused on tools with docu-
mented reliability, validity, and clinical utility. Data regard-
ing psychometric properties, diagnostic accuracy, and poten-
tial biases were extracted and synthesized for each included 
measure.

Results

After screening and review according to our inclusion cri-
teria, we identified a set of 11 ADHD diagnostic scales 
that align with DSM criteria and offer comprehensive data 
on ADHD symptoms, reflecting a targeted focus on tools 
with proven usage in both research and practice and omit-
ting measures with limited validation or niche applications. 
Internal consistency as assessed through Cronbach’s alpha 
as well as test–retest reliability coefficients were extracted 
to evaluate the scales’ reliability. Identified scales alongside 
their characteristics and psychometric properties are pre-
sented in Table 1.


Advances in Neurodevelopmental Disorders 

Characteristics of Commonly Used ADHD Rating 
Scales

Most rating scales require input from both parents and 
teachers, allowing for behaviour assessment across home 
and school settings (Narad et al., 2015). For example, the 
Attention Deficit Disorders Evaluation Scale (ADDES; 
Adesman, 1991) collects consistent reports from both par-
ents and teachers to evaluate a child’s behaviour relative 
to ADHD diagnostic criteria, demonstrating particularly 
strong reliability (α = 0.96–0.99, test–retest r = 0.88–0.97). 
This dual-reporter design is beneficial in identifying 
ADHD symptoms that manifest across different environ-
ments, enhancing diagnostic validity. Each scale aligns 
with DSM-5 criteria, translating behaviours from the inat-
tentive and hyperactive-impulsive subtypes into test ques-
tions, thereby aiding psychologists in assessing ADHD 
likelihood based on DSM-5 standards. For a diagnosis, a 
score below the 93rd percentile cut-off typically predicts 

the inattentive subtype, while scores below the 90th per-
centile indicate the hyperactive-impulsive subtype. In 
research contexts, a more stringent 98th percentile cut-off 
is sometimes applied to ensure specificity.

Among the included scales, most met the acceptable 
cut-off of 0.70 for test–retest reliability, with notable varia-
tions. The ADDES, ADHD-IV, and VADRS demonstrated 
the strongest psychometric properties, with consistent reli-
ability across settings. Other scales showed more variable 
results. The Conners’ Rating Scale Revised (CRS-R; Con-
ners, 1997) displayed a wide range of temporal stability 
(0.47 to 0.92), which suggests variability across settings 
or populations, potentially impacting temporal stability. 
Some scales, such as the Swanson, Nolan, and Pelham 
Rating Scale (SNAP-IV; Swanson et al., 2012) and the 
ADD-H Comprehensive Teacher’s Rating Scale (ACTeRS; 
Ullmann et al., 1984), did not report temporal stability, 
which limits the ability to confirm their reliability in 
repeated applications.

Table 1  Summary of ADHD diagnostic scales: structure, reliability, validity, and scholarly use

T teacher; P parent; SR self-report; Short short form; L long form; ADHD-IV ADHD Rating Scale-IV; BASC-M BASC Monitor for ADHD; 
CRS Conners’ Rating Scale; CRS3 Conners’ Rating Scale 3rd Edition; CSR-R Conners’ Rating Scale Revised; SC–4 ADHD Symptom Check-
list-4; ADDES Attention Deficit Disorders Evaluation Scale; ACTeRS ADD-H Comprehensive Teacher’s Rating Scale; TOVA Test of Variables 
of Attention; TTEF Target Tests of Executive Functioning; SNAP-IV Swanson, Nolan and Pelham Rating Scale; VADRS Vanderbilt ADHD Rat-
ing Scale.

Scale Reference Items (subscales) Cronbach’s alpha (α) Test–retest reliability Google 
Scholar 
citations

ADHD-IV DuPaul et al. (1997)
DuPaul et al. (1998)
McGoey et al. (2007)

P = 18 (2)
T = 18 (2)
T = 18 (2)

α = 0.86–0.92
α = 0.88–0.96
α = 0.78–0.90

r = 0.78–86
r = 0.88–0.90

437
486
148 (1071)

VADRS Wolraich et al. (2003) P = 55 (8)
T = 43 (3)

α = 0.95
α = 0.90

r = 0.80
r = 0.80

651

CRS3 Conners (2008) Short P = 45
Short T = 41
Short SR = 41
Long P = 99
Long SR = 99

α = 0.77–97 r = 0.71–98 207

ACTeRS Ullmann et al. (1984) P = 25 (4)
T = 24 (4)

α = 78–0.96
α = 0.92–0.97

Not observed
Not observed

195

CRS; CRS-R Conners (1989)
Conners (1997)
Goyette et al. (1978)

Short P = 27 (7)
Short T = 28 (6)
Long P = 80 (7)
Long T = 59 (6)

α = 0.72–0.94
α = 0.77–0.95
α = 0.87–0.94
α = 0.90–0.96

r = 0.47–0.85
r = 0.47–0.92
r = 0.47–0.85
r = 0.47–0.92

186

SC-4 Gadow and Sprafkin (1986) P = 50 (4)
T = 50 (4)

α = 0.93–0.95
α = 0.92–0.95

r = 0.75–0.82
r = 0.70–0.89

83

TTEF Huang (2009) 3 α = 0.86 r = 0.85–0.94 2
BASC-M Kamphaus and Reynolds (1998) P = 46 (4)

T = 47 (4)
α = 0.57–84
α = 0.77–0.93

r = 0.60–0.90
r = 0.72–0.93

36

TOVA Leark et al. (2004) Time based Not observed r = 0.70 39
ADDES McCarney (1994) P = 46 (2)

T = 60 (2)
α = 0.96–0.98
α = 0.98–0.99

r = 0.88–0.91
r = 0.88–0.97

24

SNAP-IV Swanson (1981) P = 18
T = 18

α = 0.94
α = 0.97

Not observed
Not observed

39


 Advances in Neurodevelopmental Disorders

Internal consistency, as measured by Cronbach’s alpha, 
was generally satisfactory across scales, although two scales 
raised concerns. The BASC Monitor for ADHD reported 
alphas ranging from 0.57 to 0.84, indicating low to moder-
ate internal consistency. The Test of Variables of Attention 
(TOVA), which is based on participant response times, did 
not report internal consistency due to its single-item nature, 
limiting its reliability assessment.

Limitations of the Rating Scales

A critical limitation observed in some scales is low reli-
ability, which may compromise diagnostic accuracy. When 
test–retest reliability is inconsistent, as seen in the CRS-R 
and unreported in others like the SNAP-IV, the potential for 
differing results across repeated tests increases, potentially 
leading to misdiagnosis. Internal consistency issues, as with 
the BASC and TOVA, similarly impact confidence in the 
scale’s ability to measure ADHD symptoms consistently.

The ADHD Symptom Checklist (SC-4; Gadow & 
Sprafkin, 1997), using a 4-point Likert scale, was noted for 
its limited response range (0 = not at all to 3 = very much). 
This scale lacks nuanced options for symptom intensity, 
which may force respondents to choose responses that do 
not fully represent symptom severity. For example, a symp-
tom experienced moderately may be hard to distinguish 
between “pretty much” and “very much,” potentially lead-
ing to over- or under-reporting and limiting measurement 
precision. Adding additional response options could improve 
this scale’s reliability and validity.

In summary, while several scales display satisfactory reli-
ability and alignment with DSM-5 criteria, issues of reliabil-
ity and response range limit some scales’ diagnostic utility. 
Improvements, particularly in scales like the SC-4, CRS-
R, and BASC Monitor, could enhance diagnostic consist-
ency and accuracy. Future refinement of these scales, with 
attention to comprehensive validity measures and response 
options, will improve their applicability in both clinical and 
research settings.

Summary of Scales Evaluation

Among the 11 scales evaluated, the Attention Deficit Disor-
ders Evaluation Scale (ADDES) emerged as the most reliable 
diagnostic instrument, with exceptionally strong psychomet-
ric properties (internal consistency α = 0.96–0.99, test–retest 
reliability r = 0.88–0.97). This scale demonstrated consistent 
reliability across both parent and teacher versions, making 
it particularly valuable for comprehensive assessment. In 
contrast, the ADHD Symptom Checklist-4 (SC-4) ranked as 
the least reliable among the evaluated scales, primarily due 
to its limited response range and inability to assess symptom 
duration, onset age, or functional impairment as required by 

DSM-5 criteria. Importantly, our analysis revealed that no 
single assessment tool was adequate for a definitive ADHD 
diagnosis. Each scale presented specific limitations in scope, 
reliability across settings, or alignment with comprehensive 
diagnostic criteria. This finding emphasizes the necessity of 
employing multiple assessment methods and supplementary 
testing for accurate diagnostic conclusions.

Sex and Age Bias

The literature also revealed notable sex and age biases in 
ADHD assessment. Several studies documented that girls 
were diagnosed with ADHD significantly less often than 
boys, despite similar symptom presentations. This dispar-
ity appears particularly pronounced in classroom settings, 
where boys’ more externalized hyperactive symptoms 
received greater attention than girls’ predominantly inatten-
tive presentations. Interestingly, while girls were underdi-
agnosed, their likelihood of misdiagnosis was notably lower 
than boys, with Bruchmüller et al. (2012) finding that 157 
out of 231 boys had been misdiagnosed compared to only 
73 out of 231 girls. Age-related biases were also evident, 
with assessment tools often failing to account for develop-
mental differences across childhood and adolescence, poten-
tially contributing to misdiagnosis, particularly in younger 
children.

Discussion

The section aimed to synthesize the key findings of this 
scoping review by evaluating the psychometric proper-
ties, strengths, and limitations of commonly used ADHD 
assessment scales. Given the diverse approaches to ADHD 
diagnosis, this section is structured to first provide an over-
arching review of assessment reliability and validity, fol-
lowed by a detailed evaluation of individual scales. The 
order of discussion was based on the overall reliability of 
the scales as identified in the “Results” section, with the 
most robustly supported tools discussed first, followed by 
those with greater limitations or concerns regarding valid-
ity and bias. This ordering facilitates a progressive critique, 
moving from stronger measures to those requiring caution in 
interpretation. The discussion then addresses broader issues 
such as sex bias in ADHD diagnosis before concluding with 
recommendations for future research and practice.

Highly Reliable and Widely Used Scales

ADHD Rating Scale‑IV (ADHD‑IV)

The ADHD-IV is a comprehensive questionnaire that 
assesses children’s behaviour over the previous 6 months, 


Advances in Neurodevelopmental Disorders 

using DSM-5 criteria for both inattentive and hyperactive-
impulsive subtypes. It effectively captures behavioural pat-
terns across both school and home settings through parallel 
parent and teacher versions, allowing for identification of 
context-specific behaviours. For diagnostic screening, scores 
above the 93rd percentile suggest inattentive subtype, while 
scores above the 90th percentile indicate hyperactive-impul-
sive subtype; research studies often require the 98th percen-
tile (Pappas, 2006). While the scale shows strong reliability 
in identifying potential ADHD cases, significant limitations 
include inadequate cultural adaptation and unclear socio-
economic representation in the normative sample. These 
validity concerns mean the ADHD-IV cannot stand alone 
for diagnosis but serves as an effective initial screening tool 
that must be supplemented with comprehensive psychologi-
cal evaluation (McGoey et al., 2007; Pappas, 2006).

Vanderbilt ADHD Diagnostic Rating Scale (VADRS)

The VADRS (National Institute for Children’s Health Qual-
ity, 2002) employs separate parent (55 items) and teacher 
(43 items) versions to assess children aged 6–12 across 
multiple domains, including ADHD symptoms, academic 
performance, and relationships. Both scales use a 0–3 Likert 
scale for symptoms (Never to Very Often) and 1–5 for per-
formance ratings (Excellent to Problematic). The 6-month 
assessment period helps capture persistent behaviours rather 
than daily fluctuations. With strong psychometric properties 
(temporal reliability r = 0.80; internal consistency α = 0.95 
parent, α = 0.90 teacher), the VADRS remains valid despite 
using DSM-IV criteria, as DSM-5 made no significant 
changes (National Institute for Children’s Health Quality, 
2002; American Academy of Pediatrics, 2014). However, 
the scale’s complex scoring system requires meeting specific 
thresholds across different subscales—for example, scoring 
2–3 on at least 6 of 9 items for ADHD subtypes or 4 of 8 
items for oppositional defiant disorder. This structural com-
plexity may deter referrals for comprehensive assessment, as 
practitioners must navigate multiple scoring rules for accu-
rate interpretation.

Attention Deficit Disorders Evaluation Scale (ADDES)

The ADDES (McCarney, 1994) offers separate parent 
(46 items) and teacher (60 items) versions for broad age 
ranges—teachers assess children 4–19 years while parents 
evaluate ages 3–19. Using a 0–4 rating scale (from “does 
not engage” to “several times an hour”), it captures both 
frequency and duration of behaviours. Raw scores convert 
to subscale standard scores and percentiles, closely aligning 
with DSM-5 ADHD criteria for both inattentive and hyper-
active-impulsive symptoms (Demaray et al., 2003). While 
higher scores suggest greater ADHD likelihood, the absence 

of specific cut-off scores creates diagnostic ambiguity. The 
computerized scoring system generates treatment recom-
mendations, but clinicians must interpret what constitutes 
a “high” score without clear thresholds, limiting the scale’s 
practical application in making diagnostic decisions.

Moderately Reliable Scales with Some Limitations

Conners’ Rating Scales–Revised (CRS‑R)

The CRS-R (Conners, 1997) evaluates problematic behav-
iours through separate parent and teacher reports for children 
aged 3–17, offering both long forms for diagnostic assess-
ment and short forms for screening or repeated use. The 
Teacher version includes six subscales assessing cognitive 
problems, oppositional behaviour, hyperactivity-impulsivity, 
inattention, social difficulties, anxiety/shyness, and perfec-
tionism (Purpura & Lonigan, 2009). The Parent version 
contains fewer items but more subscales, adding psychoso-
matic symptoms while including home-specific behaviours 
absent from the teacher form, such as mealtime behaviour 
and social exclusion. These dual perspectives provide valu-
able context for psychologists to identify setting-specific 
behaviours, though final interpretation requires professional 
evaluation (Zelnik et al., 2012). Despite acceptable internal 
consistency above 0.70, test–retest reliability varies sub-
stantially from 0.47 to 0.92 (Table 1), indicating temporal 
instability.

For the CRS-R, cut-off scores vary by both age and sex, 
creating some ambiguity in interpretation. The Parent Rating 
Scale uses age-based cut-offs: a score of 50 for children aged 
3–9 years and 43 for those aged 10–17 years. The Teacher 
Rating Scale is more complex, with both age-based and sex-
based criteria. For age, the cut-offs are 48 for children aged 
3–9 years and 38 for those aged 10–17 years. However, the 
Teacher scale also specifies sex-based cut-offs that differ 
from these age standards: 38 for males and 47 for females. 
This dual system presents a challenge, as Deb et al. (2008) 
note, since it remains unclear whether clinicians should pri-
oritize age or sex criteria when these cut-offs lead to differ-
ent diagnostic conclusions.

The CRS-R effectively assesses ADHD symptoms 
through behavioural questions that align well with DSM-5 
criteria. Parents and teachers are ideal raters because they 
observe children over time, capturing patterns like losing 
personal items or having few friends—behaviours that 
cannot be evaluated in a single session. While the scale 
demonstrates good psychometric properties, its limitations 
include potential rater bias and, critically, confusing cut-off 
scores that differ not only between parent and teacher ver-
sions but also by age and sex. This ambiguity in scoring 
criteria complicates diagnosis, as clinicians must navigate 


 Advances in Neurodevelopmental Disorders

conflicting cut-off standards without clear guidance on 
which to prioritize.

Conners’ 3 Rating Scale

The Conners’ 3 (Conners, 2008) rating scale updated norma-
tive data from previous versions and introduced a self-report 
measure. While parent and teacher forms assess children 
aged 6–18, the self-report is limited to ages 8–18 (Conners 
et al., 2011). This revision aligned with DSM-V-TR crite-
ria, though minimal changes were made from the CRS-R. 
Responses use a four-point Likert scale and are converted 
to standardized T-scores, which have a mean of 50 and 
standard deviation of 10. This standardization allows com-
parison across age groups and sex. T-scores of 65–69 are 
considered elevated, while scores ≥ 70 indicate clinically 
significant symptoms (Morales-Hidalgo et al., 2017). These 
standardized scores provide clearer interpretation than raw 
scores, as they show how a child’s symptoms compare to age 
and sex norms. All three versions assess daily functioning 
and behavioural patterns, with final T-scores determining 
whether further evaluation is warranted. The scales dem-
onstrate acceptable internal consistency and test–retest reli-
ability (Table 1).

Behaviour Assessment System for Children Monitor 
for ADHD (BASC‑M)

The BASC-M differentiates between four subtypes: attention 
problems, hyperactivity, internalizing problems, and adap-
tive skills (Kamphaus & Reynolds, 1998). This test is used to 
screen children aged 4 to 18 years old (Angello et al., 2003). 
The items in this scale are based on the behaviours expected 
to be seen in a child with ADHD. The scale is based on 
DSM-IV, as this was the current DSM available at the time 
of the scale development. This scale has been updated to the 
BASC-3, which is aligned with DSM-5; however, there have 
been no significant changes made to the behaviours required 
for an ADHD diagnosis in the DSM-5, making this scale 
still relevant (American Psychiatric Association, 2013). The 
cut-off score for the BASC-M is 59.9, and any child that 
scores above this is suspected to have ADHD, while a score 
below this cut-off does not suggest ADHD (Ostrander et al., 
1998). The BASC-M demonstrates acceptable internal con-
sistency and test–retest reliability, though the wide range 
of coefficients (Table 1) suggests inconsistent reliability 
across subscales. Documentation for this 27-year-old meas-
ure is scarce, particularly regarding its scoring system, as 
research has shifted to the current BASC-3 version. This 
limited availability of information and the test’s age signifi-
cantly constrain its utility in contemporary clinical practice 
(Reynolds et al., 2011).

ADHD Symptom Checklist‑4 (SC‑4)

The SC-4 (Gadow & Sprafkin, 1997) was designed to assess 
ADHD along with Operant Defiant Disorder (ODD). There 
are 50 items total in this scale, and all items are relevant to 
the DSM-IV, as this was the current DSM at the time of the 
scale release, although, as mentioned above, there were no 
changes to the behaviour criteria between the DSM-IV and 
the DSM-5. The SC-4 uses two scoring methods, “symp-
tom count” and “symptom severity,” across four subscales: 
ADHD symptoms, ODD symptoms, Peer Conflict Scale, 
and Stimulant Side Effects Checklist. Symptoms marked as 
“often” or “very often” are clinically relevant (scored 1), 
while “never” or “sometimes” score 0. If the symptom count 
meets or exceeds DSM criteria, diagnosis may be warranted; 
if not, no diagnosis is made. The cut-off is binary (Yes/No) 
rather than numeric. However, the SC-4 has limitations: it 
does not assess symptom duration, age of onset, or func-
tional impairment as required by the DSM-5. While the 
scale shows excellent internal consistency and acceptable 
test–retest reliability (Table 1), clinicians must separately 
evaluate these additional DSM criteria, including whether 
symptoms significantly impact daily functioning.

Swanson, Nolan, and Pelham (SNAP) Rating Scale

The SNAP rating scale (Swanson et al., 2012) consists of 
two subscales, inattentive and hyperactive-impulsive, and 
adheres to the DSM-IV-R, which is aimed at children who 
are currently in school. The objective of this scale is to aid in 
identifying children with ADHD by noting their behaviours 
at school. This scale is presented on a Likert scale ranging 
from 0 to 3 (0 = Not at all, 1 = Just a little, 2 = Pretty much, 
3 = Very much). Information such as age, ethnicity, school 
year, type of class, and class size are all obtained in this 
scale. There is no singular cut-off score; the scores depend 
on age and gender. For example, if boys over 8 years old and 
girls of all ages marked the answer 2 (Pretty much) eight 
times or more, then this would highly suggest that the child 
has ADHD. For boys under the age of 8 years, the cut-off 
was 2.5, meaning that their answers must consist of multiple 
2 and 3 answers (Swanson et al., 2012). Both parents and 
teachers fill out the SNAP-IV form, and the test considers 
both perspectives for the final scores. This test has very gen-
eralized questions that both the parent and teacher can apply 
to both settings; however, this test is not based over a period. 
The evaluation is set in the present moment, so it would be 
hard for a teacher who does not know the student well to 
answer the questions accurately (Bussing et al., 2008). The 
scale has excellent internal consistency; however, test–retest 
reliabilities have not been reported (Table 1), so it is not 
clear whether assessments would be consistent over time. 
Furthermore, the evaluation does not state symptoms from 


Advances in Neurodevelopmental Disorders 

the DSM-5, so it would be hard to accurately diagnose using 
this evaluation, but it can be used as a predictor of ADHD 
based on the child’s behaviour.

Scales with Significant Limitations or Bias Concerns

Test of Variables of Attention (TOVA)

The TOVA (Leark et  al., 2004) assesses attention and 
impulse control through a computerized test, but its vali-
dation study raises methodological concerns. The study 
required 31 participants to complete the test four times: 
twice in one day with a 90-min interval and twice more a 
week later. This repeated administration design introduces 
potential confounds, as factors like fatigue, mood fluctua-
tions, or practice effects could influence performance across 
sessions. For example, a child’s changed emotional state 
between testing weeks might affect scores independently of 
actual attention abilities. Leark et al. (2004) did not address 
these limitations, which undermines confidence in the 
TOVA’s reliability for clinical assessment.

ADD‑H Comprehensive Teacher’s Rating Scale (ACTeRS)

The ACTeRS (Angello et al., 2003) assesses attention dis-
orders in children aged 5 to 12 using 24 items across four 
subscales: hyperactivity, attention, oppositional behaviour, 
and social skills. While both parents and teachers complete 
the scale, only teacher scores determine outcomes, with 
T-scores > 61 triggering further testing. The scale demon-
strates good internal consistency, though parent version reli-
ability varies more than teacher version (Table 1); test–retest 
reliability remains unreported. Several limitations under-
mine the ACTeRS’ validity: separate but unexplained sex-
specific scales introduce potential bias, a 5-year-old receives 
the same test as a 12-year-old despite vast developmental 
differences, and teacher ratings alone determine outcomes 
without parental input, risking subjective bias. The scale’s 
is not openly available, which prevents verification against 
DSM-5 criteria, further questioning its diagnostic utility 
(Carlini & Parks, 1993).

The Target Tests of Executive Functioning (TTEF) The TTEF 
(Huang, 2009), part of the Pediatric Attention Disorders 
Diagnostic Screener, assesses working memory and execu-
tive functioning through three computer-based tasks: target 
recognition, target sequencing, and target tracking. The first 
task tests attention to detail and emotional modulation by 
showing five coloured squares that must be matched after 
disappearing in 1.5 s, repeated 153 times. Target sequencing 
evaluates distractibility and organization by requiring chil-
dren to remember the sequence of coloured circles matched 
with appearing squares. The final tracking task measures 

instruction recall and focus by having children replicate 
shape movements between rows. While the test lacks spe-
cific symptom metrics, it captures observable behaviours 
during administration that can be evaluated against DSM-5 
criteria. Strong internal consistency and test–retest reliabil-
ity make the TTEF a reliable assessment tool for both hyper-
active-impulsive and inattentive symptoms (Huang, 2009). 
However, like other scales reviewed, no single assessment 
suffices for ADHD diagnosis; multiple measures are neces-
sary due to varying reliability and potential biases across 
instruments.

Difficulties of Screening for ADHD in the Context 
of Research Studies

ADHD screening faces several key challenges. First, the 
DSM-5 criteria encompass symptoms that often overlap with 
other diagnoses, such as depression (Newson et al., 2021). 
Second, diagnosis heavily relies on parent and teacher per-
ceptions, which can be inconsistent (Zelnik et al., 2012). 
Research indicates that objective, performance-based tests 
like TOVA, while promising, show low specificity—in one 
study identifying 78.4% false positives in a sample of 179 
children (Zelnik et al., 2012). Furthermore, temporal fac-
tors significantly impact diagnosis, as indicated by the study 
conducted by Morrow et al. (2012), which included 938,000 
Canadian children and showed that those born later in the 
academic year were more likely to receive ADHD diagnoses 
and medication. The study found lower IQ scores (averag-
ing 86) among 366 diagnosed children, though this finding 
may reflect sampling bias. Environmental and contextual 
factors also influence symptom presentation, with children 
potentially modifying behaviour in response to rewards, 
complicating consistent assessment (Morrow et al., 2012; 
Whitely, 2015).

Potential Sex Bias in the Assessment of ADHD

Sex-based diagnostic disparities represent a significant con-
cern in ADHD assessment. Research consistently shows 
higher diagnosis rates in boys, attributed largely to their 
more visible hyperactive symptoms compared to girls’ pre-
dominantly inattentive presentations (Berry & Brunet, 2021; 
Einarsson & Granström, 2002). Bruchmüller et al. (2012) 
examined this disparity, finding 157 out of 231 boys had 
been misdiagnosed, compared to only 73 out of 231 girls. 
Their study of 473 psychotherapists revealed that clinicians 
tended to diagnose boys more frequently even when present-
ing identical symptoms.

Clinical bias compounds these issues. Girls often present 
with inattentive symptoms that may be mistaken for day-
dreaming or quiet behaviour (Hill, 2021; Steer & Bilbow, 
2021). As Ivens (2021) documented, seemingly compliant 


 Advances in Neurodevelopmental Disorders

behaviours like quietly drawing or appearing to pay attention 
while not listening can mask ADHD symptoms. Quinn and 
Madhoo (2014) noted that depressed mothers were more 
likely to over-report problematic behaviours, potentially 
contributing to diagnostic inconsistencies.

The impact of non-binary gender identities on ADHD 
assessment remains understudied, with most research focus-
ing on binary gender differences. Recent literature suggests 
the need for more inclusive diagnostic approaches that 
consider diverse gender expressions and their influence on 
symptom presentation (Clay et al., 2024; Johansson et al., 
2022).

Limitations of ADHD Assessment and Suggestions 
for Improvements

The reviewed assessment tools reveal several significant 
limitations. Most notably, many scales lack sufficient valid-
ity and reliability data, which are essential for accurate 
diagnosis. All diagnostic tools should demonstrate strong 
psychometric properties to ensure acceptable diagnostic 
outcomes. Another key limitation is that research valida-
tion often uses only children with existing ADHD diagnoses, 
excluding undiagnosed children from the statistical analysis. 
This sampling bias potentially skews results and limits the 
generalizability to broader populations.

Among the evaluated scales, the ADDES appears most 
promising for reliable diagnosis due to its comprehensive 
structure and dual parent-teacher approach. Its response 
scale, ranging from “multiple times an hour” to “multiple 
times a month,” effectively captures behavioural frequency 
over time without requiring repeated administration. While 
the scale collects demographic data (age and sex) that does 
not influence scoring—raising questions about its neces-
sity—the ADDES demonstrates robust psychometric prop-
erties, test–retest reliability (r = 0.88–0.97) and internal 
consistency (r = 0.96–0.99), confirming its reliability and 
validity for ADHD assessment.

To conclude, of the presented scales, there is a definite 
need for improvement in terms of adhering closer to the 
DSM-5 diagnostic criteria and making sure to account for 
changes in behaviour over time. Additionally, none of these 
tests can be done as the sole diagnostic test for ADHD, mak-
ing the process cumbersome for the client, their family, and 
the psychologist. Finally, due to sex bias in diagnosis, there 
is a need to review each of these tests further in depth to 
understand how they might specifically be contributing to 
sex bias in diagnosis.

Limitations of the Current Review

This review has several methodological limitations that 
should be considered. First, our search strategy, while 

comprehensive, was limited to English-language measures, 
potentially missing relevant assessment tools from other lan-
guages and cultures. Second, comparing scales developed 
across different time periods presented challenges, particu-
larly regarding alignment with evolving DSM criteria. While 
we focused on current DSM-5 relevance, some older scales 
required careful interpretation of their diagnostic frame-
works. Additionally, access to comprehensive psychometric 
data varied across scales, with some having limited pub-
lished reliability or validity information. These constraints 
highlight the need for ongoing validation studies of ADHD 
assessment tools.

Directions for Future Research

When reviewing the limitations, ideas for further studies 
were elucidated. Further research should aim to discover 
why there is such a big bias around boys and girls when it 
comes to ADHD and what factors contribute to girls being 
underdiagnosed compared to boys, even when presenting 
the same symptoms. This is a big factor when diagnosing, 
as displayed in Bruchmüller et al. (2012) study, which dem-
onstrated that boys were more likely to be misdiagnosed 
compared to girls. Further studies could also aim to perhaps 
create a single test battery with both a 95% reliability and 
validity rate. This would make it easier to diagnose ADHD 
and remove bias and human error. Finally, there should be 
more resources dedicated to educating psychologists on sex 
bias in ADHD diagnosis and how to overcome this, includ-
ing for non-binary genders. ADHD is a complex disorder 
to diagnose, and further research on biases and better test-
ing can help to improve the rates of correct and accurate 
diagnoses.

Conclusion

This review highlights several critical aspects of ADHD 
assessment. While multiple diagnostic tools exist, their util-
ity varies significantly, with the ADDES emerging as the 
most reliable among current options. However, no single 
tool provides comprehensive diagnostic certainty, empha-
sizing the need for multiple assessment methods. Sex-based 
diagnostic disparities and screening difficulties remain sig-
nificant challenges, suggesting the need for more inclusive 
and objective assessment approaches. Future development 
of ADHD assessment tools should focus on addressing 
these limitations while maintaining strong psychometric 
properties and clinical utility. Additionally, greater atten-
tion to cultural sensitivity and gender diversity in diagnostic 
criteria could improve assessment accuracy across diverse 
populations.


Advances in Neurodevelopmental Disorders 

Acknowledgements The authors acknowledge the founding editor 
of the Advances in Neurodevelopmental Disorders, Professor Nirbay 
Singh, who inspired the authors to conduct this scoping review.

Funding Open Access funding enabled and organized by CAUL and 
its Member Institutions.

Declarations 

Ethical Approval Not required.

Conflict of Interest None.

Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long 
as you give appropriate credit to the original author(s) and the source, 
provide a link to the Creative Commons licence, and indicate if changes 
were made. The images or other third party material in this article are 
included in the article’s Creative Commons licence, unless indicated 
otherwise in a credit line to the material. If material is not included in 
the article’s Creative Commons licence and your intended use is not 
permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. To view a 
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References

Adesman, A. R. (1991). The Attention Deficit Disorders Evalua-
tion Scale. Journal of Developmental and Behavioral Pediat-
rics, 12(1), 65–66. https:// doi. org/ 10. 1097/ 00004 703- 19910 
2000- 00012

American Psychiatric Association. (2013). Diagnostic and statistical 
manual of mental disorders (5th ed.). https:// doi. org/ 10. 1176/ appi. 
books. 97808 90425 596

Angello, L. M., Volpe, R. J., DiPerna, J. C., Gureasko-Moore, S. P., 
Gureasko-Moore, D. P., Nebrig, M. R., & Ota, K. (2003). Assess-
ment of attention-deficit/hyperactivity disorder: An evaluation of 
six published rating scales. School Psychology Review, 32(2), 
241–262. https:// doi. org/ 10. 1080/ 02796 015. 2003. 12086 196

Berry, C., & Brunet, J. (2021). Assessment of ADHD in girls: Unlock-
ing hidden superpowers through understanding ADHD. In J. Steer 
& A. Bilbow (Eds.), Understanding ADHD in girls and women 
(pp. 35–72). Jessica Kingsley Publishers.

Bruchmüller, K., Margraf, J., & Schneider, S. (2012). Is ADHD diag-
nosed in accord with diagnostic criteria? Overdiagnosis and influ-
ence of client gender on diagnosis. Journal of Consulting and 
Clinical Psychology, 80(1), 128–138. https:// doi. org/ 10. 1037/ 
a0026 582

Bussing, R., Fernandez, M., Harwood, M., Hou, W., Garvan, C. W., 
Eyberg, S. M., & Swanson, J. M. (2008). Parent and teacher 
SNAP-IV ratings of attention deficit hyperactivity disorder 
symptoms: Psychometric properties and normative ratings from 
a school district sample. Assessment, 15(3), 317–328. https:// doi. 
org/ 10. 1177/ 10731 91107 313888

Clay, T., Callen, E. F., Alai, J., Goodman, D. W., Adler, L. A., & 
Faraone, S. V. (2024). Measuring quality care for adult ADHD 
patients: How much does gender and gender identity matter? 
Journal of Attention Disorders, 28(3), 364–376. https:// doi. org/ 
10. 1177/ 10870 54723 12184 49

Conners, C. (1997). Conners’ rating scales--Revised. Multi-Health 
Systems.

Conners, C. K. (1989). Conners’ Rating Scales manual. Multi Health 
Systems.

Conners, C. K. (2008). Conners third edition (Conners 3). Western 
Psychological Services.

Conners C. K., Pitkanen J., Rzepa S.R. (2011) Conners 3rd Edition 
(Conners 3; Conners 2008). In: Kreutzer J.S., DeLuca J., Caplan 
B. (eds) Encyclopedia of Clinical Neuropsychology. Springer. 
https:// doi. org/ 10. 1007/ 978-0- 387- 79948-3_ 1534

Carlini, R. J., & Parks, T. W. (1993). ADD-H comprehensive teacher’s 
rating scale. Journal of Psychoeducational Assessment, 11(1), 
95–97. https:// doi. org/ 10. 1177/ 07342 82993 01100 114

Deb, S., Dhaliwal, A. J., & Roy, M. (2008). The usefulness of Con-
ners’ Rating Scales-Revised in screening for attention deficit 
hyperactivity disorder in children with intellectual disabilities 
and borderline intelligence. Journal of Intellectual Disability 
Research, 52(11), 950–965. https:// doi. org/ 10. 1111/j. 1365- 2788. 
2007. 01035.x

DuPaul, G. J., Anastopoulos, A. D., Power, T. J., Reid, R., Ikeda, 
M. J., & McGoey, K. E. (1998). Parent ratings of attention-
deficit/hyperactivity disorder symptoms: Factor structure and 
normative data. Journal of Psychopathology and Behavioral 
Assessment, 20, 83–102.

DuPaul, G. J., Power, T. J., Anastopoulos, A. D., Reid, R., McGoey, 
K. E., & Ikeda, M. J. (1997). Teacher ratings of attention deficit 
hyperactivity disorder symptoms: Factor structure and norma-
tive data. Psychological Assessment, 9(4), 436.

Demaray, M. K., Elting, J., & Schaefer, K. (2003). Assessment of 
attention-deficit/hyperactivity disorder (ADHD): A compara-
tive evaluation of five, commonly used, published rating scales. 
Psychology in the Schools, 40(4), 341–361. https:// doi. org/ 10. 
1002/ pits. 10112

Einarsson, C., & Granström, K. (2002). Gender-biased Interaction in 
the Classroom: The influence of gender and age in the relationship 
between teacher and pupil. Scandinavian Journal of Educational 
Research, 46(2), 117–127. https:// doi. org/ 10. 1080/ 00313 83022 
01421 55

Feldman, H. M., & Reiff, M. I. (2014). Attention deficit–hyperactiv-
ity disorder in children and adolescents. New England Journal 
of Medicine, 370(9), 838–846. https:// doi. org/ 10. 1056/ NEJMc 
p1307 215

Ford-Jones, P. C. (2015). Misdiagnosis of attention deficit hyperactivity 
disorder: “Normal behaviour” and relative maturity. Paediatrics & 
Child Health, 20(4), 200–202. https:// doi. org/ 10. 1093/ pch/ 20.4. 
200

Gadow, K. D., & Sprafkin, J. (1986). Stony Brook child psychiatric 
checklist-3. State University of New York at Stony Brook.

Gadow, K. D., & Sprafkin, J. (1997). Child Symptom Inventory-4 
norms manual. Checkmate Plus.

Goyette, C. H., Conners, C. K., & Ulrich, R. F. (1978). Normative data 
on revised Conners parent and teacher rating scales. Journal of 
Abnormal Child Psychology, 6, 221–236.

Hamed, A. M., Kauer, A. J., & Stevens, H. E. (2015). Why the diag-
nosis of attention deficit hyperactivity disorder matters. Frontiers 
in Psychiatry, 6, 168.

Hill, P. (2021). Treatment of ADHD in Girls. In J. Steer., & A. Bilbow 
(Eds.), Understanding ADHD in girls and women. (p. 73–106). 
Jessica Kingsley Publishers.

Huang, L. V. (2009). Test and product review: Pediatric attention dis-
orders diagnostic screener. Journal of Attention Disorders, 13(3), 
310–314. https:// doi. org/ 10. 1177/ 10870 54709 346681

Ivens, V. (2021). Coaching girls with ADHD. In J. Steer., & A. Bilbow 
(Eds.), Understanding ADHD in girls and women. (p. 173–210). 
Jessica Kingsley Publishers.

Johansson, C., Kullgren, C., Bador, K., & Kerekes, N. (2022). Gender 
non-binary adolescents’ somatic and mental health throughout 

http://creativecommons.org/licenses/by/4.0/
https://doi.org/10.1097/00004703-199102000-00012
https://doi.org/10.1097/00004703-199102000-00012
https://doi.org/10.1176/appi.books.9780890425596
https://doi.org/10.1176/appi.books.9780890425596
https://doi.org/10.1080/02796015.2003.12086196
https://doi.org/10.1037/a0026582
https://doi.org/10.1037/a0026582
https://doi.org/10.1177/1073191107313888
https://doi.org/10.1177/1073191107313888
https://doi.org/10.1177/10870547231218449
https://doi.org/10.1177/10870547231218449
https://doi.org/10.1007/978-0-387-79948-3_1534
https://doi.org/10.1177/073428299301100114
https://doi.org/10.1111/j.1365-2788.2007.01035.x
https://doi.org/10.1111/j.1365-2788.2007.01035.x
https://doi.org/10.1002/pits.10112
https://doi.org/10.1002/pits.10112
https://doi.org/10.1080/00313830220142155
https://doi.org/10.1080/00313830220142155
https://doi.org/10.1056/NEJMcp1307215
https://doi.org/10.1056/NEJMcp1307215
https://doi.org/10.1093/pch/20.4.200
https://doi.org/10.1093/pch/20.4.200
https://doi.org/10.1177/1087054709346681


 Advances in Neurodevelopmental Disorders

2020. Frontiers in Psychology, 13, Article 993568. https:// doi. 
org/ 10. 3389/ fpsyg. 2022. 993568

Kazda, L., Bell, K., Thomas, R., McGeechan, K., & Barratt, A. (2019). 
Evidence of potential overdiagnosis and overtreatment of atten-
tion deficit hyperactivity disorder (ADHD) in children and ado-
lescents: Protocol for a scoping review. British Medical Journal 
Open, 9(11), Article e032327. https:// doi. org/ 10. 1136/ bmjop 
en- 2019- 032327

Leark, R. A., Wallace, D. R., & Fitzgerald, R. (2004). Test-retest reli-
ability and standard error of measurement for the Test of Variables 
of Attention (T.O.V.A.) with healthy school-age children. Assess-
ment, 11(4), 285–289. https:// doi. org/ 10. 1177/ 10731 91104 269186

Kamphaus, R., & Reynolds, C. (1998). BASC monitor for ADHD. 
American Guidance Service.

McCarney, S. (1994). Attention deficit disorders intervention manual 
(2nd ed.). Hawthorne Educational Services.

McGoey, K. E., Eckert, T. L., & Dupaul, G. J. (2002). Early interven-
tion for preschool-age children with ADHD: A literature review. 
Journal of Emotional and Behavioral Disorders, 10(1), 14–28. 
https:// doi. org/ 10. 1177/ 10634 26602 01000 103

McGoey, K. E., DuPaul, G. J., Haley, E., & Shelton, T. L. (2007). 
Parent and teacher ratings of attention-deficit/hyperactivity dis-
order in preschool: The ADHD rating scale-IV preschool version. 
Journal of Psychopathology and Behavioral Assessment, 29(4), 
269–276.

National Institute for Children’s Health Quality. (2002). NICHQ Van-
derbilt Assessment Scales. Retrieved April 19, 2025, from https:// 
nichq. org/ downl oadab le/ nichq- vande rbilt- asses sment- scales/

Morales-Hidalgo, P., Hernández-Martínez, C., Vera, M., Voltas, N., & 
Canals, J. (2017). Psychometric properties of the Conners-3 and 
Conners Early Childhood Indexes in a Spanish school population. 
International Journal of Clinical and Health Psychology, 17(1), 
85–96. https:// doi. org/ 10. 1016/j. ijchp. 2016. 07. 003

Morrow, R. L., Garland, E. J., Wright, J. M., Maclure, M., Taylor, S., & 
Dormuth, C. R. (2012). Influence of relative age on diagnosis and 
treatment of attention-deficit/hyperactivity disorder in children. 
Canadian Medical Association Journal, 184(7), 755–762. https:// 
doi. org/ 10. 1503/ cmaj. 111619

Narad, M. E., Garner, A. A., Peugh, J. L., Antonini, T. N., Kingery, K. 
M., Simon, J. O., & Epstein, J. N. (2015). Parent–teacher agree-
ment on ADHD symptoms across development. Psychological 
Assessment, 27(1), 239–248. https:// doi. org/ 10. 1037/ a0037 864

Newson, J. J., Pastukh, V., & Thiagarajan, T. C. (2021). Poor separation 
of clinical symptom profiles by DSM-5 disorder criteria. Frontiers 
in Psychiatry, 12, Article 775762. https:// doi. org/ 10. 3389/ fpsyt. 
2021. 775762

Nussbaum, N. L. (2012). ADHD and female specific concerns: A 
review of the literature and clinical implications. Journal of 
Attention Disorders, 16(2), 87–100. https:// doi. org/ 10. 1177/ 10870 
54711 416909

Ostrander, R., Weinfurt, K. P., Yarnold, P. R., & August, G. J. (1998). 
Diagnosing attention deficit disorders with the Behavioral Assess-
ment System for Children and the Child Behavior Checklist: Test 
and construct validity analyses using optimal discriminant clas-
sification trees. Journal of Consulting and Clinical Psychology, 
66(4), 660–672. https:// doi. org/ 10. 1037/ 0022- 006X. 66.4. 660

Pappas, D. (2006). ADHD Rating Scale-IV: Checklists, norms, and 
clinical interpretation. Journal of Psychoeducational Assessment, 
24(2), 172–178. https:// doi. org/ 10. 1177/ 07342 82905 285792

Paris, J., Bhat, V., & Thombs, B. (2015). Is adult attention-deficit 
hyperactivity disorder being overdiagnosed? The Canadian Jour-
nal of Psychiatry, 60(7), 324–328. https:// doi. org/ 10. 1177/ 07067 
43715 06000 705

Peters, M. D. J., Godfrey, C. M., Khalil, H., McInerney, P., Parker, D., & 
Soares, C. B. (2015). Guidance for conducting non-systematic scop-
ing reviews. International Journal of Evidence-Based Healthcare, 
13(3), 141–146. https:// doi. org/ 10. 1097/ XEB. 00000 00000 000050

Purpura, D. J., & Lonigan, C. J. (2009). Conners’ Teacher Rating 
Scale for preschool children: A revised, brief, age-specific meas-
ure. Journal of Clinical Child and Adolescent Psychology, 38(2), 
263–272. https:// doi. org/ 10. 1080/ 15374 41080 26984 46

Reynolds, C. R., Kamphaus, R. W., Vannest, K. J. (2011). Behavior 
Assessment System for Children (BASC). In: Kreutzer, J.S., 
DeLuca, J., Caplan, B. (eds) Encyclopedia of Clinical Neuropsy-
chology. Springer. https:// doi. org/ 10. 1007/ 978-0- 387- 79948-3_ 
1524

Quinn, P. O., & Madhoo, M. (2014). A review of attention-deficit/
hyperactivity disorder in women and girls: Uncovering this hidden 
diagnosis. Primary Care Companion for CNS Disorders, 16(3), 
PCC.13r01596. https:// doi. org/ 10. 4088/ PCC. 13r01 596

Steer, J., & Bilbow, A. (2021). Understanding ADHD in girls and 
women. Jessica Kingsley Publishers.

Swanson, J. M. (1981). The SNAP Rating Scale for the diagnosis of the 
attention deficit disorder. ERIC Document Reproduction Service. 
https:// eric. ed. gov/? id= ED217 047

Swanson, J. M., Schuck, S., Porter, M. M., Carlson, C., Hartman, C. A., 
Sergeant, J. A., Clevenger, W., Wasdell, M., McCleary, R., Lakes, 
K., & Wigal, T. (2012). Categorical and dimensional definitions 
and evaluations of symptoms of ADHD: History of the SNAP and 
the SWAN rating scales. The International Journal of Educational 
and Psychological Assessment, 10(1), 51–70.

Tricco, A. C., Lillie, E., Zarin, W., O’Brien, K. K., Colquhoun, H., 
Levac, D., Moher, D., Peters, M. D. J., Horsley, T., Weeks, L., 
Hempel, S., Akl, E. A., Chang, C., McGowan, J., Stewart, L., 
Hartling, L., Aldcroft, A., Wilson, M. G., Garritty, C., … Straus, 
S. E. (2018). PRISMA extension for scoping reviews (PRISMA-
ScR): Checklist and explanation. Annals of Internal Medicine, 
169(7), 467–473. https:// doi. org/ 10. 7326/ M18- 0850

Ullmann, R. K., Sleator, E. K., & Sprague, R. L. (1984). ADD-H Com-
prehensive Teacher Rating Scale (ACTeRS) [Database record]. 
APA PsycTests. https:// doi. org/ 10. 1037/ t08014- 000

Whitely, M. (2015). Attention deficit hyperactive disorder diagnosis 
continues to fail the reliability and validity tests. Australian and 
New Zealand Journal of Psychiatry, 49(6), 497–498. https:// doi. 
org/ 10. 1177/ 00048 67415 579921

Wolraich, M. L., Lambert, W., Doffing, M. A., Bickman, L., Simmons, 
T., & Worley, K. (2003). Psychometric properties of the Vander-
bilt ADHD diagnostic parent rating scale in a referred population. 
Journal of Pediatric Psychology, 28(8), 559–568. https:// doi. org/ 
10. 1093/ jpepsy/ jsg046

Zelnik, N., Bennett-Back, O., Miari, W., Goez, H. R., & Fattal-
Valevski, A. (2012). Is the test of variables of attention reliable for 
the diagnosis of attention-deficit hyperactivity disorder (ADHD)? 
Journal of Child Neurology, 27(6), 703–707. https:// doi. org/ 10. 
1177/ 08830 73811 423821

Publisher's Note Springer Nature remains neutral with regard to 
jurisdictional claims in published maps and institutional affiliations.

https://doi.org/10.3389/fpsyg.2022.993568
https://doi.org/10.3389/fpsyg.2022.993568
https://doi.org/10.1136/bmjopen-2019-032327
https://doi.org/10.1136/bmjopen-2019-032327
https://doi.org/10.1177/1073191104269186
https://doi.org/10.1177/106342660201000103
https://nichq.org/downloadable/nichq-vanderbilt-assessment-scales/
https://nichq.org/downloadable/nichq-vanderbilt-assessment-scales/
https://doi.org/10.1016/j.ijchp.2016.07.003
https://doi.org/10.1503/cmaj.111619
https://doi.org/10.1503/cmaj.111619
https://doi.org/10.1037/a0037864
https://doi.org/10.3389/fpsyt.2021.775762
https://doi.org/10.3389/fpsyt.2021.775762
https://doi.org/10.1177/1087054711416909
https://doi.org/10.1177/1087054711416909
https://doi.org/10.1037/0022-006X.66.4.660
https://doi.org/10.1177/0734282905285792
https://doi.org/10.1177/070674371506000705
https://doi.org/10.1177/070674371506000705
https://doi.org/10.1097/XEB.0000000000000050
https://doi.org/10.1080/15374410802698446
https://doi.org/10.1007/978-0-387-79948-3_1524
https://doi.org/10.1007/978-0-387-79948-3_1524
https://doi.org/10.4088/PCC.13r01596
https://eric.ed.gov/?id=ED217047
https://doi.org/10.7326/M18-0850
https://doi.org/10.1037/t08014-000
https://doi.org/10.1177/0004867415579921
https://doi.org/10.1177/0004867415579921
https://doi.org/10.1093/jpepsy/jsg046
https://doi.org/10.1093/jpepsy/jsg046
https://doi.org/10.1177/0883073811423821
https://doi.org/10.1177/0883073811423821

	A Scoping Review of Attention-DeficitHyperactivity Disorder Assessment and Diagnosis: Tools, Practices, and Sex Bias
	Abstract
	Objectives 
	Method 
	Results 
	Conclusions 

	Method
	Search Strategy
	Study Selection 

	Data Extraction and Synthesis

	Results
	Characteristics of Commonly Used ADHD Rating Scales
	Limitations of the Rating Scales
	Summary of Scales Evaluation
	Sex and Age Bias

	Discussion
	Highly Reliable and Widely Used Scales
	ADHD Rating Scale-IV (ADHD-IV)
	Vanderbilt ADHD Diagnostic Rating Scale (VADRS)
	Attention Deficit Disorders Evaluation Scale (ADDES)

	Moderately Reliable Scales with Some Limitations
	Conners’ Rating Scales–Revised (CRS-R)
	Conners’ 3 Rating Scale
	Behaviour Assessment System for Children Monitor for ADHD (BASC-M)
	ADHD Symptom Checklist-4 (SC-4)
	Swanson, Nolan, and Pelham (SNAP) Rating Scale

	Scales with Significant Limitations or Bias Concerns
	Test of Variables of Attention (TOVA)
	ADD-H Comprehensive Teacher’s Rating Scale (ACTeRS)
	The Target Tests of Executive Functioning (TTEF) 


	Difficulties of Screening for ADHD in the Context of Research Studies
	Potential Sex Bias in the Assessment of ADHD
	Limitations of ADHD Assessment and Suggestions for Improvements
	Limitations of the Current Review
	Directions for Future Research

	Conclusion
	Acknowledgements 
	References