Why Measurement Matters

A framework that cannot be measured is a philosophy. That is not a criticism of philosophy — philosophy is valuable. But this thesis makes a specific claim: that emotional wellness is a set of developable skills, that people operate at identifiable stages, and that targeted intervention can produce genuine change. If those claims are true, they must be testable. If they are testable, there must be an instrument.

This is the chapter where the framework meets the data.

Everything in the preceding chapters — the Emotional State Model, the six stages, the eight components of emotional wellness, the values/anti-values directional analysis — describes a theoretical architecture. Theory is necessary but insufficient. A therapist who understands the architecture but cannot assess where a particular client sits within it has an elegant map and no coordinates. A coach who grasps the developmental sequence but cannot identify which specific component is underdeveloped in a specific person has direction without a starting point. An organisation that adopts the framework as aspirational language without measuring its people has exchanged one motivational poster for another.

Growth requires a starting point. Without honest assessment, frameworks remain aspirational. The model is only useful if it can be operationalised.

The Generalized Resting Emotional Awareness Test — the GREAT — was designed to provide that operationalisation. It is a 40-item psychometric instrument that measures where a person sits on the emotional wellness spectrum, which specific components are developed and which are underdeveloped, and — critically — it provides a baseline against which change can be tracked. Take the GREAT before an intervention. Take it again after. If the framework is right and the intervention is effective, the numbers should move. If they do not move, either the framework is wrong, the intervention is ineffective, or both. That accountability is the point.

This chapter presents the instrument’s design rationale, the diagnostic trial that validated it, the statistical evidence for its reliability, what it measures and what it does not, its limitations, and the directions in which it needs to develop. I have tried to write it with the rigour it deserves while keeping the statistics accessible to a practitioner audience that may not have taken a research methods course since university. Every statistical concept is explained in plain language. Skip the explanations if you do not need them.


Instrument Design Rationale

The Theoretical Basis: Eight Components of Emotional Wellness

The GREAT was built on the eight-component model of emotional wellness described in Chapter 4. To recap briefly: the model defines emotional wellness as the ability to understand and manage our emotions and emotional state at will. This is not a single skill. It is eight skills, arranged in a developmental sequence from most fundamental to most advanced:

  1. Emotional Expression — the ability to externalise what you feel, to others and to yourself
  2. Reflective Analysis — examining emotional patterns after they occur to understand them
  3. Reflective Identification — naming emotions accurately in the moment they arise
  4. Situational Emotional Awareness — understanding how context, environment, and relationships affect your state
  5. Self-Control — managing emotional responses through conscious choice (not willpower-based suppression)
  6. Self-Empathy — compassion toward your own emotional experience
  7. Emotional Feedback — using emotions as information for decision-making, not obstacles to overcome
  8. Mood Management — deliberately shifting emotional state to match what the situation requires

The sequence matters. You cannot practise Reflective Analysis (Component 2) if you cannot first express emotions at all (Component 1). You cannot identify emotions in real time (Component 3) if you have not yet learned to examine them retrospectively (Component 2). You cannot manage your mood deliberately (Component 8) if you do not yet use emotions as feedback signals (Component 7). Each component builds on the preceding ones, forming a developmental ladder.

This architecture determined how items were written. The GREAT needed to sample all eight components, with items distributed across the full spectrum — not clustered at the bottom (which would create a ceiling effect for emotionally mature respondents) and not clustered at the top (which would create a floor effect for those in earlier developmental stages). The goal was an instrument that could differentiate across the entire range, from a person who cannot identify their own emotions to a person who can shift emotional state at will.

Item Construction

The initial item pool contained 50 items. Each item was written to capture a specific component, with multiple items per component to ensure reliable measurement. Items were phrased as self-descriptive statements — “I can recognise the way I felt previously,” “It’s difficult to identify my own emotions,” “I take time to reflect on how my mood affects others” — because the instrument measures self-reported emotional awareness, and self-report is the appropriate method for assessing subjective experience.

Several items were reverse-coded. A reverse-coded item is one where agreement indicates lower emotional wellness rather than higher. For example, “It’s difficult to identify my own emotions” (Item 1) and “I cannot decide on a way to improve my mood” (Item 9) are reverse-coded: a high score on these items indicates poor emotional awareness. Reverse coding serves two purposes. First, it disrupts acquiescence bias — the tendency to agree with statements regardless of content. A respondent who simply marks “4” on everything will produce contradictory data (scoring high on both “Expressing emotion is easy” and “It’s difficult to identify my own emotions”), which flags the response as unreliable. Second, reverse-coded items capture the absence of a skill, which is psychometrically useful: sometimes the clearest way to assess whether someone has a capacity is to ask directly whether they lack it.

The Response Scale

All items were rated on a 5-point Likert scale from 0 to 4:

  • 0 — Strongly Disagree
  • 1 — Disagree
  • 2 — Neutral
  • 3 — Agree
  • 4 — Strongly Agree

A 5-point scale was chosen over a 7-point or 10-point scale for practical and psychometric reasons. Practically, the instrument was designed for use with working adults in organisational settings, not research laboratories. The cognitive load of distinguishing between “slightly agree,” “somewhat agree,” “moderately agree,” and “agree” on a 7-point scale adds precision in theory but noise in practice, particularly when the respondent is taking the assessment as one part of a larger emotional wellness programme. A 5-point scale provides sufficient granularity for component-level differentiation while remaining cognitively manageable.

Psychometrically, a 5-point Likert scale has been shown to produce reliable factor structures with sample sizes in the 100-200 range, which was our expected recruitment target. Wider scales require larger samples to produce stable factor solutions. The choice was pragmatic: the scale matched the sample we could realistically recruit.


Methodology: The Diagnostic Trial

Design

The GREAT was validated through a diagnostic trial conducted in 2018 under the auspices of Undelusional Technologies Pte. Ltd., with validation analysis conducted by James Lim C. H. (B.Sc., Life Science — Biomedical Sciences) and Julian A. G. Lim M. H. (M.A., Social Science — Sociology). The validation report was prepared for Undelusional Technologies and for Wendy Han, Assistant Development Partner at Enterprise Singapore, who oversaw the grant framework under which the trial was partially funded.

Participants

One hundred and twenty-three participants were recruited from metropolitan Singapore. The sample comprised mainly working adults with a sizable proportion of students. Of those who reported gender, 75 identified as women and 48 as men. The average age group fell within the 18-35 bracket (S.D. approximately 0.596 on the age-group measure). Participants included civil service members, entrepreneurs, and company trainers — a cross-section of Singapore’s professional population, though not a random sample of the general population (a limitation I will address below).

Procedure

The 50-item pilot version of the GREAT was hosted on an online survey platform (initially Google Survey, subsequently Typeform). Participants completed the assessment over a four-day data collection window, prior to attending a series of talks on emotional wellness delivered by myself. This sequencing was intentional: participants completed the GREAT before any exposure to the framework, eliminating the risk that the talks would prime their responses. What they reported reflected their resting emotional awareness — what they actually do, day to day, with their emotional experience — rather than what they had just been taught they should do.

The four-day window was chosen for practical reasons (coordinating across multiple organisations and schedules) but had a psychometric benefit: it reduced the likelihood that a shared transient mood state (a public holiday, a news event, a weather pattern) would uniformly inflate or deflate responses. Responses collected across multiple days are more likely to reflect stable patterns than responses collected in a single sitting.

Ethical Considerations

Participation was voluntary. The assessment was presented as part of a broader emotional wellness initiative, with no adverse consequences for non-participation. Data was anonymised for validation analysis — the researchers received response data without identifying information. Gender and age group were collected for demographic analysis but were not linked to individual identities.


Validation Results

The validation analysis employed standard psychometric procedures to answer three questions: Is the GREAT measuring something real? Is it measuring one thing or many things? Is it measuring reliably?

Sampling Adequacy: Is the Data Suitable for Factor Analysis?

Before conducting factor analysis, two preliminary tests were run to determine whether the data was suitable for that analysis.

Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy: 0.813

The KMO statistic indicates how much variance in the data might be caused by underlying factors, as opposed to variance that is unique to individual items or is simply noise. It ranges from 0 to 1. Values below 0.5 indicate that factor analysis is inappropriate — the items are not sufficiently inter-correlated to warrant searching for underlying factors. Values above 0.8 are classified as “meritorious” in Kaiser’s (1974) taxonomy. A KMO of 0.813 means that the correlations between GREAT items are strong enough that underlying factors almost certainly exist, and a factor analysis is a legitimate way to find them.

In plain terms: the items hang together. They are measuring something shared, not 50 unrelated things.

Bartlett’s Test of Sphericity: Chi-Square = 2644.513, df = 780, p < .001

Bartlett’s test asks: is the correlation matrix significantly different from an identity matrix? An identity matrix is what you would get if every item correlated with itself perfectly but with no other item — that is, if nothing in the dataset was related to anything else. A significant Bartlett’s test (p < .001, as obtained) means the correlation matrix is not an identity matrix — items are correlated with each other, and those correlations are not due to chance. With a chi-square of 2644.513 on 780 degrees of freedom, the result is unambiguous: the items are measuring shared constructs.

In plain terms: the inter-item relationships in the GREAT are real, not random.

Factor Structure: What Is the GREAT Measuring?

A principal-components analysis with orthogonal (varimax) rotation was conducted on the 50-item dataset. Principal-components analysis is a statistical method that looks for clusters of items that co-vary — items that tend to go up and down together across respondents. Each cluster is called a “factor” or “component.” Orthogonal rotation is a technique that rotates the factor solution to make the factors as distinct from each other as possible — maximising the clarity of the structure.

The analysis extracted eight factors with eigenvalues greater than 1.0. An eigenvalue represents the amount of total variance in the dataset explained by a given factor. An eigenvalue of 1.0 means the factor explains as much variance as a single item would on its own — the conventional threshold below which a factor is considered too weak to retain.

The eight factors and their eigenvalues:

Factor Eigenvalue Variance Explained
1 10.429 Dominant
2 4.014 Moderate
3 2.789 Moderate
4 1.937 Modest
5 1.877 Modest
6 1.745 Modest
7 1.316 Minimal
8 1.262 Minimal

The pattern here is instructive. Factor 1 is overwhelmingly dominant, with an eigenvalue (10.429) more than two and a half times larger than Factor 2 (4.014) and nearly eight times larger than Factor 8 (1.262). Of the 50 initial items, 40 loaded on Factor 1 at the 0.5 threshold — meaning that 40 items’ variance was substantially explained by a single underlying construct.

Using a loading threshold of 0.5 means that at least 25% of an item’s variance is shared with the factor. This is a moderately conservative threshold — liberal analyses use 0.3 (9% shared variance) and strict analyses use 0.7 (49% shared variance). The 0.5 threshold represents a reasonable balance between inclusivity and rigour: items retained at this level share enough variance with the factor to be meaningfully part of the construct, without the threshold being so lenient that noisy items are included.

Factors 2 through 8, while statistically present (eigenvalues above 1.0), did not produce item loadings that were conceptually distinct from Factor 1. That is, the items that loaded on Factors 2-8 were not measuring something different from the items that loaded on Factor 1 — they were measuring aspects or facets of the same underlying construct. This is consistent with the theoretical model: the eight components of emotional wellness are not eight independent traits. They are eight facets of a single capacity that develops in a specific sequence. You would expect them to be correlated, and you would expect a general factor to dominate.

This is sometimes called a “general factor” or “g-factor” structure. Intelligence research shows a similar pattern: while there are distinguishable facets of intelligence (verbal, spatial, processing speed), a single general factor dominates the variance. The GREAT shows the same structure for emotional wellness: while there are distinguishable components (Expression, Analysis, Identification, etc.), a single general factor — overall emotional wellness — dominates.

The 40 items that loaded on Factor 1 at 0.5 or above were retained for the final GREAT. The 10 items that did not meet this threshold were discarded. This is standard practice in scale development: you write more items than you need, test them all, and retain only those that contribute meaningfully to the construct you are measuring.

Internal Consistency: Is the GREAT Measuring Reliably?

Cronbach’s Alpha: 0.916 (overall), 0.919 (standardised)

Cronbach’s alpha is the most widely used measure of internal consistency — the degree to which items in a scale are measuring the same construct. It ranges from 0 to 1. The conventional benchmarks, drawn from George and Mallery (2003) and widely used in psychometric practice, are:

  • 0.9 and above: Excellent
  • 0.8 - 0.9: Good
  • 0.7 - 0.8: Acceptable
  • 0.6 - 0.7: Questionable
  • Below 0.6: Poor

The GREAT’s overall alpha of 0.916 falls in the “Excellent” range. This means that the 40 items are measuring the same underlying construct with very high consistency. If you removed any single item, the remaining 39 would still produce a highly similar total score. The respondent’s score on any subset of items is a strong predictor of their score on any other subset. The instrument is internally coherent.

The standardised alpha of 0.919 adjusts for any differences in item variances and is virtually identical to the unstandardised value, confirming that the items have similar distributional properties — no single item is operating on a radically different scale than the others.

To put 0.916 in context: the NEO-PI-R (the gold-standard Big Five instrument) reports domain-level alphas ranging from 0.86 to 0.92. The Beck Depression Inventory (BDI-II), one of the most widely used clinical instruments in psychology, reports alphas typically between 0.90 and 0.93. The GREAT’s internal consistency sits comfortably within the range of established, validated instruments.

Gender Invariance: Does the GREAT Work for Everyone?

A critical question for any psychometric instrument is whether it functions equivalently across demographic groups. An instrument that measures emotional wellness in women but something else in men (or vice versa) would be useless as a general diagnostic tool.

The validation analysis computed Cronbach’s alpha separately by gender:

Group N (valid) N (missing) Cronbach’s Alpha
Male 47 1 0.929
Female 72 3 0.903

Both values are in the “Excellent” range. The male alpha (0.929) is slightly higher than the female alpha (0.903), but both are above 0.9, and the difference is not substantively meaningful. The GREAT is measuring the same construct with comparable reliability across genders. This matters because emotional expression and emotional awareness are culturally gendered in ways that could easily contaminate an instrument — men socialised to suppress emotion might respond to items about emotional expression differently than women socialised to express emotion freely. The fact that internal consistency is approximately equal across genders suggests that the GREAT is measuring capacity for emotional wellness, not cultural permission to display it.

In technical terms: the instrument demonstrates measurement invariance across gender groups. A male respondent who scores 120 and a female respondent who scores 120 are comparable — the scores mean the same thing. This is not a trivial finding. Many emotional and personality instruments show differential item functioning across genders (certain items behave differently for men and women), which introduces bias into cross-gender comparisons. The GREAT does not show this pattern.


What the GREAT Measures and What It Does Not

What It Measures

The GREAT measures emotional wellness — specifically, a person’s resting-state capacity for emotional awareness, expression, analysis, identification, self-control, self-empathy, emotional feedback utilisation, and mood management. The term “resting state” is deliberate. The GREAT asks about a person’s typical, day-to-day emotional functioning — not how they perform under laboratory conditions, not how they respond to a specific scenario, and not how they believe they should function. It captures the baseline — the level of emotional capacity a person brings into any given situation before that situation’s demands either support or overwhelm their functioning.

This is analogous to the distinction between resting heart rate and maximum heart rate. Your resting heart rate tells a physician something meaningful about your cardiovascular fitness as a whole-system indicator. It does not tell them how you perform on a treadmill. The GREAT provides the emotional equivalent: a whole-system indicator of emotional fitness that has diagnostic and developmental value precisely because it measures the foundation rather than the performance.

The 40 items sample across all eight components. Some items are straightforward and face-valid — “I can recognise the way I felt previously” (Component 3: Reflective Identification), “I have people that I can talk with to discuss my moods” (contributing to Component 6: Self-Empathy, through social support for emotional processing). Others capture the absence of a component — “I do not know why I feel certain emotions” (reverse-coded, tapping Component 3), “It is not important to manage my mood” (reverse-coded, tapping Component 8: Mood Management). The combination of positively-framed and reverse-coded items provides a more complete picture than either would alone.

The total score provides an overall emotional wellness classification. The component-level analysis provides the developmental roadmap. A person who scores high overall but low on Self-Empathy (Component 6) needs a different intervention than a person who scores low overall because Emotional Expression (Component 1) is blocked. The GREAT identifies both the position and the specific deficits. This is its practical value: not just a label, but a map.

What It Does Not Measure

The GREAT does not measure personality traits. It does not tell you whether someone is introverted or extraverted, agreeable or disagreeable, open or closed to experience. A person high in Neuroticism on the Big Five might score high or low on the GREAT, depending on whether their emotional reactivity has been met with developmental work. The GREAT measures the skills. The Big Five measures the patterns.

The GREAT does not measure emotional intelligence directly. Goleman’s model describes competences (self-awareness, self-regulation, empathy, social skills, motivation). The GREAT measures the foundation on which those competences are built. A person at the Muted stage of the Emotional State Model may have been trained in “self-regulation techniques” and may report using them, but the GREAT will reveal the brittleness of that regulation — the degree to which it depends on willpower rather than genuine emotional integration. The GREAT measures capacity. Emotional intelligence frameworks measure application.

The GREAT does not diagnose clinical conditions. A low score on the GREAT does not indicate depression, anxiety disorder, borderline personality disorder, or any clinical diagnosis. It indicates low emotional wellness — which may or may not co-occur with a clinical condition. A person with clinical depression may score low on the GREAT because their depression suppresses emotional engagement. A person without any clinical condition may also score low on the GREAT because they have simply never developed emotional awareness skills. The GREAT is a developmental instrument, not a clinical one. It identifies where someone sits and which skills need development. Clinical diagnosis is a different task, requiring different instruments and different professional competences.

The GREAT does not measure domain-specific emotional functioning. It measures the resting state — the baseline capacity a person carries across contexts. A person who has done extensive therapeutic work in the domain of intimate relationships but none in the domain of professional conflict will show a higher level of emotional functioning in relationships than at work. The GREAT captures the aggregate, not the domain-specific. This is both a strength (it provides a general indicator) and a limitation (it misses domain variance). I return to this in the section on future directions.


Limitations

Every psychometric instrument has limitations, and the intellectually honest move is to state them clearly rather than bury them in footnotes. The GREAT has six that I consider significant.

1. Sample Characteristics: Singapore, Metropolitan, Self-Selected

The 123-person sample was recruited from metropolitan Singapore. Singapore is a multicultural, highly educated, English-speaking city-state — broadly classifiable as WEIRD-adjacent (Western, Educated, Industrialised, Rich, Democratic), even though it is not geographically Western. The sample over-represents urban, educated, English-literate working adults and students. It does not include rural populations, populations with limited formal education, or populations from radically different cultural contexts (Sub-Saharan Africa, Indigenous communities, non-English-speaking populations).

This does not invalidate the results. It bounds them. The GREAT has been validated for use with metropolitan Singaporean working adults and students. Extending its use to other populations requires further validation — not because the underlying construct (emotional wellness) is culturally specific, but because the items may be culturally specific. The statement “I like to be alone when I am reflecting on my emotions” (Item 26) may function differently in a collectivist culture where solitude is unusual than in an individualist culture where it is normative. Until the instrument is validated cross-culturally, claims about its universality should be made cautiously.

2. Self-Report Bias

The GREAT is a self-report instrument. Respondents report their own emotional awareness, expression, and management. This creates two sources of potential bias.

First, social desirability bias: respondents may over-report emotional awareness because they perceive high emotional awareness as socially valued. The reverse-coded items partially mitigate this (a respondent who endorses both “Expressing emotion is easy” and “It’s difficult to identify my own emotions” is flagged for inconsistency), but self-report instruments are inherently vulnerable to impression management.

Second, the Dunning-Kruger problem: people with low emotional awareness may, by definition, lack the awareness to accurately report their low awareness. A person at the Distracted stage of the Emotional State Model may not know what they do not know — they may genuinely believe their emotional awareness is adequate because they have never experienced what developed emotional awareness feels like. This means the GREAT may underestimate the gap between low-scoring and high-scoring respondents. The low scorers may be lower than their scores indicate.

No self-report instrument fully solves this problem. Behavioural observation, informant reports, and physiological measures each capture different aspects of the construct but introduce their own biases and practical constraints. The GREAT accepts self-report as the most practical and scalable method while acknowledging its limitations.

3. Age Skew

The majority of participants fell in the 18-35 age bracket, with a standard deviation of approximately 0.596 on the age-group measure. This means the sample is skewed toward younger adults. The GREAT cannot definitively answer whether it works as effectively for older adults (35+, 50+, 65+) as it does for younger ones.

There are theoretical reasons to expect that it would. Emotional wellness is a developmental construct — it should be measurable at any age. But there are also theoretical reasons to expect that item functioning might differ across age groups. Older adults have had more life experience with emotional regulation (Carstensen’s socioemotional selectivity theory suggests emotional regulation improves with age), which might shift the response distribution on certain items. Until the GREAT is validated with a more age-diverse sample, its applicability to populations significantly older than 35 remains an open question.

The age skew likely reflects a self-selection effect: the topic of emotional wellness may hold higher priority for younger adults navigating identity formation, early career challenges, and relationship development. Older adults may be less likely to self-select into an emotional wellness assessment unless it is presented in a context that resonates with their life stage (leadership development, retirement transition, grandparenting). Future validation studies should actively recruit across age groups rather than relying on self-selection.

4. Gender Skew and Missing Populations

The sample comprised approximately 61% women and 39% men — roughly 10% more female than the general population ratio. While the gender-specific alphas (0.929 male, 0.903 female) demonstrate measurement invariance across these two groups, the skew should be noted. More significantly, no participants identified as intersex, and the binary gender categorisation may not capture the experience of non-binary, genderfluid, or transgender individuals. The GREAT cannot be confirmed as valid for these populations without explicit inclusion in future validation samples.

This is not a small caveat. Gender identity intersects with emotional expression norms in complex ways, and a person whose gender identity is itself a source of emotional labour may respond to items about emotional awareness and expression differently than cisgender respondents. Until the instrument is validated with gender-diverse samples, this remains a genuine gap.

5. Self-Report vs. Observed Behaviour

The GREAT measures what people report about their emotional functioning. It does not measure what they do. A person may accurately report that they take time to reflect on how their mood affects others (Item 23) while an observer might note that their reflections rarely translate into changed behaviour. Self-reported awareness and enacted awareness are related but not identical constructs. The GREAT captures the former. Capturing the latter would require behavioural observation methods that are more expensive, less scalable, and introduce their own biases (observer effects, context dependence).

The practical implication is that the GREAT should be interpreted as a starting point for developmental conversation, not as a definitive assessment. A high score is not a certificate of emotional wellness. It is an indication that the person reports high emotional awareness — which may reflect genuine capacity, or may reflect sophisticated self-narrative without corresponding practice. The clinical or coaching conversation that follows the assessment is where the distinction becomes clear.

6. Resting State vs. Domain Specificity

As noted above, the GREAT measures the resting state — the general baseline. It does not differentiate between domains. A person might have high emotional wellness in their professional life (where they have done extensive development) and low emotional wellness in intimate relationships (where trauma patterns remain active). The GREAT would produce a blended score that obscures this clinically important variance.

This limitation is inherent to any general instrument. The solution is not to criticise the general measure but to develop domain-specific complements — an idea I address in the final section of this chapter.


Future Directions

The GREAT, as validated in 2018, represents the first generation of a measurement approach. It demonstrates that emotional wellness can be measured reliably and that the eight-component model produces a coherent factor structure. But first-generation instruments are, by definition, starting points. The following directions represent where the measurement programme needs to go.

Cross-Cultural Validation

The most pressing need is validation across culturally diverse populations. Singapore’s multiculturalism (Chinese, Malay, Indian, and expatriate communities) provides some cultural breadth, but it is not a substitute for validation in radically different cultural contexts. Key questions include:

  • Do the 40 items function equivalently across cultures, or do some items show differential item functioning (DIF) — behaving differently in different cultural contexts even after controlling for the underlying trait?
  • Does the eight-component structure replicate, or do some cultures carve emotional wellness along different joints?
  • Are there culturally specific components that the current model misses — for example, relational emotional awareness (how emotions function between people, not just within them) that cross-cultural psychologists like Mesquita (2022) have identified as central in collectivist cultures?

Cross-cultural validation is not simply about translating the items into other languages. It requires cultural adaptation — working with local practitioners and researchers to ensure that items capture the construct as it manifests in each cultural context, not as it manifests in metropolitan Singapore.

Domain-Specific Versions

The resting-state GREAT provides a general baseline. But the model predicts — and clinical experience confirms — that people function at different levels of emotional wellness in different life domains. A domain-specific assessment battery would provide significantly more granular diagnostic value:

  • GREAT-Relationship: Measuring emotional wellness in the context of intimate partnerships, family relationships, and close friendships. Items would assess emotional expression with a partner, reflective analysis of relational patterns, self-control during conflict, and mood management in domestic settings.
  • GREAT-Financial: Measuring emotional wellness in relation to money, financial decisions, and economic stress. Financial behaviour is heavily emotion-driven (fear, greed, shame, status anxiety), and a domain-specific instrument could identify where emotional hijacking enters financial decision-making.
  • GREAT-Creative: Measuring emotional wellness in creative practice — the ability to tolerate the vulnerability of creative expression, to use emotional states as raw material for creative work, and to manage the emotional rollercoaster of creative projects.
  • GREAT-Professional: Measuring emotional wellness in workplace settings — emotional expression with colleagues and superiors, self-control under organisational pressure, reflective analysis of professional conflicts, and mood management during high-stakes work.

Each domain-specific version would share the eight-component structure but adapt the items to the context. The resting-state GREAT would remain the foundation; the domain-specific versions would layer diagnostic precision on top.

Larger and More Representative Samples

A sample of 123 is adequate for initial validation but insufficient for advanced psychometric analyses. Future validation studies should aim for:

  • N > 500 for confirmatory factor analysis (CFA) to rigorously test the eight-component structure
  • Stratified sampling across age groups (18-25, 26-35, 36-50, 51-65, 65+) to establish age norms and test whether item functioning varies across the lifespan
  • Balanced gender recruitment and explicit inclusion of non-binary, transgender, and intersex participants
  • Multi-site recruitment across countries and cultural contexts

Larger samples would also enable item response theory (IRT) analysis, which provides item-level diagnostics more precise than classical test theory. IRT can identify which items are most informative at which levels of the underlying trait — allowing the development of an adaptive version of the GREAT that selects items based on the respondent’s emerging score, producing a shorter, more efficient assessment without loss of precision.

Longitudinal Studies

The GREAT’s greatest potential value — and the claim most in need of empirical verification — is that it can track change over time. If the Emotional State Model is correct that emotional wellness is developmental, then effective interventions should produce measurable increases in GREAT scores, and those increases should correspond to observable changes in behaviour.

Longitudinal studies would administer the GREAT at multiple time points: pre-intervention, mid-intervention, post-intervention, and at follow-up intervals (3 months, 6 months, 12 months). This design would answer several questions:

  • Does the GREAT show test-retest reliability? (Do scores remain stable in the absence of intervention?)
  • Does the GREAT show sensitivity to change? (Do scores increase following effective emotional wellness interventions?)
  • Do changes in GREAT scores predict changes in other outcomes — job performance, relationship satisfaction, mental health indicators, decision-making quality?
  • Is there a minimum intervention dose required to produce measurable change?

These studies would also address a question the current validation cannot answer: is the resting-state baseline truly stable, or does it fluctuate with life circumstances? A person going through a divorce might score lower than their true baseline. A person in a supportive new relationship might score higher. Understanding the GREAT’s temporal stability is essential for its use as a developmental tracking instrument.

Behavioural Validation

The GREAT’s self-report limitation could be addressed through convergent validation studies that correlate GREAT scores with behavioural measures:

  • Physiological indicators: Heart rate variability (HRV) is associated with emotional regulation capacity. Do high GREAT scorers show higher resting HRV?
  • Behavioural observation: In structured emotional challenge tasks (receiving critical feedback, watching emotionally evocative stimuli, navigating interpersonal disagreement), do high GREAT scorers show more adaptive emotional responses as rated by trained observers?
  • Informant reports: Do partners, close friends, or colleagues rate high GREAT scorers as more emotionally aware, expressive, and well-regulated?

Convergent validity — the degree to which the GREAT correlates with other measures of the same construct — would significantly strengthen the evidence base and address the self-report limitation directly.

Normative Data and Interpretation Guidelines

The current GREAT provides raw scores and component-level profiles. What it does not yet provide is normative data — a reference distribution that tells the respondent where their score falls relative to the population. “Your Emotional Expression score is 14” is less useful than “Your Emotional Expression score is 14, which places you at the 35th percentile for your age and gender group — below average, suggesting this is a priority area for development.”

Developing normative tables requires the larger, more representative samples described above. Once available, norms would transform the GREAT from a research instrument into a practical clinical and coaching tool with standardised interpretation guidelines.


Conclusion

The GREAT is the empirical anchor of this thesis. The theoretical architecture — the six stages of the Emotional State Model, the eight components of emotional wellness, the values/anti-values distinction, the Thought Action Paradigm — describes how human emotional processing works. The GREAT provides the means to test it.

With a Cronbach’s alpha of 0.916, a KMO of 0.813, a dominant general factor that accounts for the lion’s share of variance, and gender-invariant reliability, the GREAT demonstrates that emotional wellness can be measured as a coherent, internally consistent construct. It is not a perfect instrument — no instrument is. Its limitations are real: a single-population sample, self-report methodology, age skew, missing populations, resting-state generality. I have stated them plainly because intellectual honesty is non-negotiable in measurement, and because knowing the boundaries of what an instrument can tell you is as important as knowing what it does tell you.

But within those boundaries, the GREAT does something that existing instruments do not. The Big Five measures the patterns of personality without explaining their mechanism. The MBTI classifies preference without assessing development. Goleman identifies emotional competences without measuring the foundation they depend on. The GREAT measures that foundation — the resting-state capacity for emotional awareness, expression, analysis, and management that determines how effectively a person can deploy any emotional skill.

My position is that measurement is not optional. If we claim that emotional wellness is developable, we must be willing to measure it — to provide a starting point, to identify specific deficits, to track change, and to hold our interventions accountable for producing results. The GREAT is the first step in that accountability chain. It is not the last. The future directions outlined above describe a programme of work that would take years and require collaboration across cultures, institutions, and disciplines.

The instrument is ready. The framework is ready. The question now is whether the practitioners who use them — therapists, coaches, educators, leaders — will do the harder work: not just measuring where people are, but creating the conditions for them to develop.

That is the work of the remaining chapters.