A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse

dc.citation.issue7
dc.citation.volume6
dc.contributor.authorHenry NIN
dc.contributor.authorPedersen M
dc.contributor.authorWilliams M
dc.contributor.authorMartin JLB
dc.contributor.authorDonkin L
dc.date.accessioned2025-10-22T19:24:13Z
dc.date.available2025-10-22T19:24:13Z
dc.date.issued2025-10-06
dc.description.abstractThe value-loading problem is a major obstacle to creating Artificial Intelligence (AI) systems that align with human values and preferences. Central to this problem is the establishment of safe limits for repeatable AI behaviors. We introduce hormetic alignment, a paradigm to regulate the behavioral patterns of AI, grounded in the concept of hormesis, where low frequencies or repetitions of a behavior have beneficial effects, while high frequencies or repetitions are harmful. By modeling behaviors as allostatic opponent processes, we can use either Behavioral Frequency Response Analysis (BFRA) or Behavioral Count Response Analysis (BCRA) to quantify the safe and optimal limits of repeatable behaviors. We demonstrate how hormetic alignment solves the ‘paperclip maximizer’ scenario, a thought experiment where an unregulated AI tasked with making paperclips could end up converting all matter in the universe into paperclips. Our approach may be used to help create an evolving database of ‘values’ based on the hedonic calculus of repeatable behaviors with decreasing marginal utility. Hormetic alignment offers a principled solution to the value-loading problem for repeatable behaviors, augmenting current techniques by adding temporal constraints that reflect the diminishing returns of repeated actions. It further supports weak-to-strong generalization – using weaker models to supervise stronger ones – by providing a scalable value system that enables AI to learn and respect safe behavioral bounds. This paradigm opens new research avenues for developing computational value systems that govern not only single actions but the frequency and count of repeatable behaviors.
dc.description.confidentialfalse
dc.edition.editionOctober 2025
dc.identifier.citationHenry NIN, Pedersen M, Williams M, Martin JLB, Donkin L. (2025). A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse. SN Computer Science. 6. 7.
dc.identifier.doi10.1007/s42979-025-04369-4
dc.identifier.eissn2661-8907
dc.identifier.elements-typejournal-article
dc.identifier.issn2662-995X
dc.identifier.number872
dc.identifier.urihttps://mro.massey.ac.nz/handle/10179/73718
dc.languageEnglish
dc.publisherSpringer Nature Singapore Pte Ltd
dc.publisher.urihttps://link.springer.com/article/10.1007/s42979-025-04369-4
dc.relation.isPartOfSN Computer Science
dc.rights(c) The author/sen
dc.rights.licenseCC BYen
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en
dc.subjectArtificial intelligence
dc.subjectMachine learning
dc.subjectValue-loading
dc.subjectAlignment
dc.subjectHormesis
dc.subjectAllostasis
dc.titleA Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse
dc.typeJournal article
pubs.elements-id503778
pubs.organisational-groupOther
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
503778 PDF.pdf
Size:
4.38 MB
Format:
Adobe Portable Document Format
Description:
Published version.pdf
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
9.22 KB
Format:
Plain Text
Description:
Collections