A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse

Henry NIN; Pedersen M; Williams M; Martin JLB; Donkin L

A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse

dc.citation.issue	7
dc.citation.volume	6
dc.contributor.author	Henry NIN
dc.contributor.author	Pedersen M
dc.contributor.author	Williams M
dc.contributor.author	Martin JLB
dc.contributor.author	Donkin L
dc.date.accessioned	2025-10-22T19:24:13Z
dc.date.available	2025-10-22T19:24:13Z
dc.date.issued	2025-10-06
dc.description.abstract	The value-loading problem is a major obstacle to creating Artificial Intelligence (AI) systems that align with human values and preferences. Central to this problem is the establishment of safe limits for repeatable AI behaviors. We introduce hormetic alignment, a paradigm to regulate the behavioral patterns of AI, grounded in the concept of hormesis, where low frequencies or repetitions of a behavior have beneficial effects, while high frequencies or repetitions are harmful. By modeling behaviors as allostatic opponent processes, we can use either Behavioral Frequency Response Analysis (BFRA) or Behavioral Count Response Analysis (BCRA) to quantify the safe and optimal limits of repeatable behaviors. We demonstrate how hormetic alignment solves the ‘paperclip maximizer’ scenario, a thought experiment where an unregulated AI tasked with making paperclips could end up converting all matter in the universe into paperclips. Our approach may be used to help create an evolving database of ‘values’ based on the hedonic calculus of repeatable behaviors with decreasing marginal utility. Hormetic alignment offers a principled solution to the value-loading problem for repeatable behaviors, augmenting current techniques by adding temporal constraints that reflect the diminishing returns of repeated actions. It further supports weak-to-strong generalization – using weaker models to supervise stronger ones – by providing a scalable value system that enables AI to learn and respect safe behavioral bounds. This paradigm opens new research avenues for developing computational value systems that govern not only single actions but the frequency and count of repeatable behaviors.
dc.description.confidential	false
dc.edition.edition	October 2025
dc.identifier.citation	Henry NIN, Pedersen M, Williams M, Martin JLB, Donkin L. (2025). A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse. SN Computer Science. 6. 7.
dc.identifier.doi	10.1007/s42979-025-04369-4
dc.identifier.eissn	2661-8907
dc.identifier.elements-type	journal-article
dc.identifier.issn	2662-995X
dc.identifier.number	872
dc.identifier.uri	https://mro.massey.ac.nz/handle/10179/73718
dc.language	English
dc.publisher	Springer Nature Singapore Pte Ltd
dc.publisher.uri	https://link.springer.com/article/10.1007/s42979-025-04369-4
dc.relation.isPartOf	SN Computer Science
dc.rights	(c) The author/s	en
dc.rights.license	CC BY	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en
dc.subject	Artificial intelligence
dc.subject	Machine learning
dc.subject	Value-loading
dc.subject	Alignment
dc.subject	Hormesis
dc.subject	Allostasis
dc.title	A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse
dc.type	Journal article
pubs.elements-id	503778
pubs.organisational-group	Other