Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    New user? Click here to register using a personal email and password.Have you forgotten your password?
Repository logo
    Info Pages
    Content PolicyCopyright & Access InfoDepositing to MRODeposit LicenseDeposit License SummaryFile FormatsTheses FAQDoctoral Thesis Deposit
  • Communities & Collections
  • All of MRO
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    New user? Click here to register using a personal email and password.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Henry NIN"

Now showing 1 - 1 of 1
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    A Hormetic Approach to the Value-Loading Problem: Preventing the Paperclip Apocalypse
    (Springer Nature Singapore Pte Ltd, 2025-10-06) Henry NIN; Pedersen M; Williams M; Martin JLB; Donkin L
    The value-loading problem is a major obstacle to creating Artificial Intelligence (AI) systems that align with human values and preferences. Central to this problem is the establishment of safe limits for repeatable AI behaviors. We introduce hormetic alignment, a paradigm to regulate the behavioral patterns of AI, grounded in the concept of hormesis, where low frequencies or repetitions of a behavior have beneficial effects, while high frequencies or repetitions are harmful. By modeling behaviors as allostatic opponent processes, we can use either Behavioral Frequency Response Analysis (BFRA) or Behavioral Count Response Analysis (BCRA) to quantify the safe and optimal limits of repeatable behaviors. We demonstrate how hormetic alignment solves the ‘paperclip maximizer’ scenario, a thought experiment where an unregulated AI tasked with making paperclips could end up converting all matter in the universe into paperclips. Our approach may be used to help create an evolving database of ‘values’ based on the hedonic calculus of repeatable behaviors with decreasing marginal utility. Hormetic alignment offers a principled solution to the value-loading problem for repeatable behaviors, augmenting current techniques by adding temporal constraints that reflect the diminishing returns of repeated actions. It further supports weak-to-strong generalization – using weaker models to supervise stronger ones – by providing a scalable value system that enables AI to learn and respect safe behavioral bounds. This paradigm opens new research avenues for developing computational value systems that govern not only single actions but the frequency and count of repeatable behaviors.

Copyright © Massey University  |  DSpace software copyright © 2002-2025 LYRASIS

  • Contact Us
  • Copyright Take Down Request
  • Massey University Privacy Statement
  • Cookie settings