Sorry

This feed does not validate.

In addition, interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

Source: http://lesswrong.com/comments/.rss

  1. <?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[LessWrong]]></title><description><![CDATA[A community blog devoted to refining the art of rationality]]></description><link>https://www.lesswrong.com</link><image><url>https://res.cloudinary.com/lesswrong-2-0/image/upload/v1497915096/favicon_lncumn.ico</url><title>LessWrong</title><link>https://www.lesswrong.com</link></image><generator>RSS for Node</generator><lastBuildDate>Mon, 19 May 2025 14:53:43 GMT</lastBuildDate><atom:link href="https://www.lesswrong.com/feed.xml?view=rss&amp;karmaThreshold=2" rel="self" type="application/rss+xml"/><item><title><![CDATA[Thoughts on "Antiqua et nova" (Catholic Church's AI statement)]]></title><description><![CDATA[Published on May 19, 2025 1:30 PM GMT<br/><br/><p>On January 28, 2025 (during the pontificate of Pope Francis) the Catholic Church put out its position paper on AI: <a href="https://www.vatican.va/roman_curia/congregations/cfaith/documents/rc_ddf_doc_20250128_antiqua-et-nova_en.html"><em>Antiqua et nova</em>: Note on the Relationship Between Artificial Intelligence and Human Intelligence</a>. Here are my thoughts, which I wrote back in February, but not much has changed since then, and the new pope <a href="https://apnews.com/article/pope-leo-vision-papacy-artificial-intelligence-36d29e37a11620b594b9b7c0574cc358">has continued</a> citing AI as of particular concern to the Catholic Church.</p>
  2. <hr>
  3. <blockquote>
  4. <p>"With all your modern science, are you any closer to understanding the mystery of how a robot walks or talks?" "Yes, you idiot. The circuit diagram is right here on the inside of your case!" (<a href="https://www.youtube.com/watch?v=4PrNZbD1lcc">Futurama</a>)</p>
  5. </blockquote>
  6. <h1>Four characteristics of humanity</h1>
  7. <p>Section III is the real meat of the document. It lists four essential characteristics, and argues that AI falls short of the fullness of humanity in each respect. These are: rationality, truth-seeking, embodiment, and relationality.</p>
  8. <p>I will address embodiment and relationality first, since these two things are connected, and they are in my opinion the most interesting ones. By comparison, I understand less of what (if anything) the document has to say about rationality and truth-seeking, so I'll address those briefly afterward.</p>
  9. <h2>Embodiment and relationality</h2>
  10. <p>Lately I've been on a contrarian kick advocating these stances in opposition to the "modal rationalist" views of disembodiment and individuality. However, if I take off the contrarian hat, I suppose my actual position is something like:</p>
  11. <ul>
  12. <li>The general view of rationalists (including myself in the past) is too far biased towards the extreme of disembodiment and individuality because of certain philosophical premises which are usually left unexamined, and which are not empirically-based. The correct position is probably somewhere more in the embodiment/relationality direction on the spectrum.</li>
  13. <li>As a purely descriptive matter, embodiment and relationality seem to more accurately explain the way people actually behave, as opposed to modal rationalism. Understanding this is important for living in society. It also poses a conundrum for attempts to define a "utility function (<a href="https://www.lesswrong.com/w/coherent-extrapolated-volition">CEV</a>) of humanity" - if most humans are not utilitarians, how can there be such a thing?</li>
  14. </ul>
  15. <p>What are embodiment/disembodiment and relationality/individuality? It is helpful to define these positions ostensively by giving examples of the views on the extremes of these two spectra.</p>
  16. <ul>
  17. <li>Disembodiment:
  18. <ul>
  19. <li>What matters (for morality and personal identity) is a certain pattern of information that constitutes the experience of a conscious being.
  20. If that pattern were to be implemented on some other substrate (e.g. mind uploading), then that implementation is (again, for the purpose of morality and identity) equivalent to a flesh-and-blood person.</li>
  21. <li>In humans, that pattern resides in the brain; the rest of the body doesn't matter.
  22. <ul>
  23. <li>This is an empirical claim that doesn't necessarily follow from the other premises, but I think those other premises give rise to a bias that leads people to just assume that this is true without any empirical basis. I think the correct view is: Yes, the brain probably has most of the stuff-that-matters, but the rest of the body still has a substantial portion, more than most rationalists give it credit for. Identity is not all-or-nothing, but operates in a "holographic" manner. Just as people who experience brain damage end up changing to some extent, depending on the extent of the damage, so do people change when their bodies change. A complete replacement of the body-below-the-neck (as in e.g. brain-only cryopreservation) would probably change a lot as well.</li>
  24. </ul>
  25. </li>
  26. </ul>
  27. </li>
  28. <li>Individuality:
  29. <ul>
  30. <li>The moral value of a thing rests on characteristics of the thing itself.
  31. If we were to discover intelligent aliens, our moral obligations towards them would be the same as to other humans.</li>
  32. <li>If we develop a sentient AI, then we also ought to treat it the same way we treat humans.</li>
  33. <li>Given that babies, mentally impaired humans, and certain species of animals all have equivalent levels of intelligence, we must treat them all the same in order to be consistent.</li>
  34. </ul>
  35. </li>
  36. <li>Embodiment:
  37. <ul>
  38. <li>What matters is the physical substance of which a being consists.
  39. <ul>
  40. <li>I am not sure I would endorse this <em>fundamentally</em>, but I would at least say it is "more true than not-true for most practical purposes". That is, I still buy the argument that <a href="https://www.lesswrong.com/posts/RLScTpwc5W2gGGrL9/identity-isn-t-in-specific-atoms">identity isn't in specific atoms</a>, and I accept the metaphysical possibility of my identity being copied multiple times, but that doesn't imply that copying a mind is an easily-achievable feat of engineering like copying a computer file, or that swapping out parts of my body or brain with entirely different materials (e.g. silicon transistors) would not have a significant effect on my identity.</li>
  41. </ul>
  42. </li>
  43. <li>A reimplementation or upload of a human onto a non-biological substrate would not matter at all.</li>
  44. </ul>
  45. </li>
  46. <li>Relationality:
  47. <ul>
  48. <li>One being has moral obligations to another depending on the relationship between them. There is no such thing as "moral value" <em>per se</em>.</li>
  49. <li>These relationships are not necessarily based on voluntary agreement (contracts), since there are some that exist prior to any deliberate act. For example, a newborn child has a morally relevant relationship with his/her parents, even though he/she could not possibly have agreed to it. Similarly, the citizens of a particular nation have a stronger relationship to each other than to foreigners, even if they didn't choose to be born there.</li>
  50. <li>If we discover intelligent aliens, we would have no moral obligations towards them (and likewise they would have none towards us).</li>
  51. </ul>
  52. </li>
  53. </ul>
  54. <p>(Why are embodiment and relationality connected? Because a combination of both can be used to derive a "universal" (or rather "all-human-" or "all-Earth-including") morality based on the physical connection all humanity (or, as some environmentalists would say, all life on earth) shares via common descent. But then this stance diverges radically from the modal rationalist view when it comes to AI/uploads and aliens.)</p>
  55. <p>Relational morality can be rather harsh by modern standards, as we see from the way it was lived out in classical antiquity:</p>
  56. <ul>
  57. <li>Two warring tribes or nations have no obligations towards each other. Each only has an obligation to try to gain as much territory and resources for their own people as they can.</li>
  58. <li>A nation can only be unified by a (perhaps legendary) common line of descent.</li>
  59. <li>The <em>paterfamilias</em> has absolute power over the rest of his family; if he decides that an infant should be exposed, then that's what happens.</li>
  60. <li>Slaves are not part of society and so they have no rights in that society.</li>
  61. <li>People believe that the gods treat humans as their playthings, and they are considered not as moral exemplars, but as capricious forces that must be placated through sacrifices. (This ethos of classical paganism is summarized in the saying <em>do ut des</em> - "I give so that you may give".)</li>
  62. </ul>
  63. <p>Now, we can understand the Catholic Church's position (and perhaps that of classical monotheism generally) as a way of mitigating this harshness by introducing a single "God", while still remaining within the relational framework because universal morality is founded upon everyone's relationship with God. Since all humans have that relationship, they still have rights and obligations <em>vis-à-vis</em> each other by way of God, even if no social relationship between them exists.</p>
  64. <p>(Incidentally, I think this helps explain what's going on in the "<a href="https://en.wikipedia.org/wiki/Argument_from_morality#Argument_from_objective_moral_truths">moral argument for the existence of God</a>", which, if considered apart from this historical background, it seems puzzling why anyone would consider compelling. Why do some people insist that God must exist in order to "ground" morality? What does "grounding" even mean here anyway? But now we see that this insistence makes sense if we're thinking of morality relationally. In that framework, since moral consideration is a characteristic of relationships and not of individuals, then for any universal morality to exist, then there must be some entity that universally has a relationship with everyone, i.e. God.)</p>
  65. <p>Similarly, the full "embodiment" perspective is also hard to face because of its obvious implication that there can be no afterlife, and the pain and suffering in the world seems to have no purpose. Some religions/philosophies have reacted to this by going to the opposite extreme (Manicheanism/Gnosticism) where the material world is a bad thing and our goal should be to escape it. The Catholic Church also rejects this view, and likens transhumanism to it (endnote 9).</p>
  66. <p>However, it remains unclear to me what their actual stance is. If "the entire human person is simultaneously both material and spiritual" (paragraph 16), then what is it that goes to heaven or hell after death? As far as I can tell, the view of the historical Jesus (and the general current of Jewish thought at the time) was that the "soul" does not exist apart from the body, and so at the end of time, God or the Messiah would physically re-create the bodies of all people who have died so they can live eternally in the Kingdom of God (which would be a physical kingdom here on earth). The obvious consequence of this monistic metaphysics is the doctrine of "<a href="https://en.wikipedia.org/wiki/Christian_mortalism">soul sleep</a>" whereby people stop existing at death and only come back when God resurrects them at the end of time. However, the Catholic Church rejects that doctrine as well. At any rate, it seems that, whereas their position on the relational/individual axis is still fundamentally relational (just with God added), their position on embodiment/disembodiment is more middling - "kinda disembodied, but not fully" - reflecting a synthesis between classical Greek dualism and first-century-Jewish monism.</p>
  67. <p>What does this all have to do with AI? <em>Antiqua et nova</em> says that AI is totally and essentially different from humans because it lacks all the morally-significant characteristics that humans have:</p>
  68. <ol>
  69. <li>AI does not have a body.</li>
  70. <li>To the extent that there's some non-material aspect of humans ("spirit"), AI lacks that as well, because it was not made by God in his image.</li>
  71. <li>AI has no relationship with God.</li>
  72. <li>AI also has no relationships with humans in the way that matters (see paragraph 61 and endnote 123).</li>
  73. </ol>
  74. <p>Setting aside the points about God, let's consider points 1 and 4. This is what I think about those:</p>
  75. <ul>
  76. <li>This is definitely true about current AI.
  77. This is probably going to continue to be true given the current trajectory of AI development.
  78. <ul>
  79. <li>If AIs are built with robot bodies, they would not be anything like human bodies. For example, they would not be made of biological matter, and they will not be integrated with the intelligence in the way the human body is, but rather separated by a <a href="https://www.lesswrong.com/w/cartesian-agent">"quasi-Cartesian" divide</a>.</li>
  80. <li>AIs are not going to develop attachments or friendships with humans.</li>
  81. </ul>
  82. </li>
  83. <li>This is not necessarily always going to be the case (that AI will lack humanlike embodiment and relationality), but achieving this would require a concerted effort, and nobody seems to really be working on this.</li>
  84. <li>If the arguments made in these sections of <em>Antiqua et nova</em> are taken more seriously, then perhaps people interested in making human-aligned AI will prioritize these research efforts more, since embodiment and relationality seem to be important parts of human morality.</li>
  85. <li>Perhaps a more viable route towards that goal would be to use AI to augment the capabilities of existing humans (while maintaining their embodiment and relationality), rather than building something from scratch.</li>
  86. </ul>
  87. <h2>Rationality</h2>
  88. <p>Rationality is cited as another essential characteristic of humanity, what makes us distinct from animals. On its face, this would seem to be one of the easiest aspects for AI to replicate; however, the document claims that the capabilities of machines are only a small subset of what humans do with their intelligence (paragraph 27). But the examples given there seem tenuous; there is no real justification given for why we should not expect AI to soon achieve those abilities as well - unless we artificially circumscribe the definition of "intelligence" etc. in such a way as to exclude AI, e.g. "AI's advanced features give it sophisticated abilities to <em>perform tasks</em>, but not the ability to <em>think</em>" (paragraph 12). This resembles John Searle's view (Chinese room, etc.) that what a machine is doing is not "really" understanding, cognition, etc. I think this view is going to be quickly rendered obsolete by advances in AI.</p>
  89. <p>Anticipating this, the document also makes sure to reject altogether the position it calls "functionalism" (which I think is a confusing word to use; "<a href="https://en.wikipedia.org/wiki/Behaviorism">behaviorism</a>" is the word more consistent with the philosophy-of-mind literature), that what an AI or a human can <em>do</em> is what matters (see paragraph 34). This betrays a lack of confidence in the point made immediately prior (paragraphs 32 and 33) which claims (unjustifiably, in my opinion) that there are certain capabilities that are fundamentally out-of-reach for AI: "Since AI lacks the richness of corporeality, relationality, and the openness of the human heart to truth and goodness, its capacities—though seemingly limitless—are incomparable with the human ability to grasp reality" (paragraph 33) - but I think it's only a matter of time before AI has all of that "richness".</p>
  90. <p>Much of the document's confidence in the distinctness of humanity rests on the notion that mankind is created in the image of God (Genesis 1:27). But the document also quotes Augustine to clarify what this means: "Man is made in the image of God in relation to that [faculty] by which he is superior to the irrational animals. Now, this [faculty] is reason itself, or the 'mind,' or 'intelligence,' whatever other name it may more suitably be given" (endnote 16, brackets in original). This seems to leave the Church an escape-hatch in the event that AI capabilities improve to equal those of humans in the relevant aspects - they can say that the AIs now possess the divine image as well!</p>
  91. <h2>Truth-seeking</h2>
  92. <p>Furthermore, according to the document, human intellect is "irresistibly" drawn towards truth (paragraph 21). Now, in the context of AI, one might say that there are certain <a href="https://www.lesswrong.com/w/instrumental-convergence">convergent instrumental goals</a> that every intelligent agent will take on, among which is truth-seeking, i.e. the desire to have a more accurate world-model, since this is useful as a means towards almost any goal that the agent might have in interacting with the world. However, that doesn't seem to be what the document means by "truth". It refers to "realities that surpass mere sensory experience or utility" (paragraph 20) and "realities that transcend the physical and created world" (paragraph 23), ultimately equating Truth with God. This is distinct from the instrumental goal of world-modelling (but of course, the very concept contains the embedded assumption that there are indeed any such "realities", which many would dispute). The document also suggests (paragraph 22; endnote 40) that "semantic understanding and creativity" are examples of this transcendent faculty, with the implication that an AI cannot replicate them. However, the possibility for computers to do these things has been well explored even prior to the LLM paradigm (e.g. in <em>Gödel Escher Bach</em>), and, again, I think current AI is not far from achieving it even if it remains limited to the kind of AIs (lacking embodiment and relationality) that currently exist.</p>
  93. <h1>How much of <em>Antiqua et nova</em> is specifically theistic, or Christian?</h1>
  94. <p>If you don't think that God exists and/or don't think Jesus was a divine being who rose from the dead (etc.) you may be wondering how much of this document is relevant, or whether one must first accept the truth of those things before finding anything useful in it.</p>
  95. <p>The document is certainly not "rationalist" or "scientific" in its approach, but at the same time I was surprised by how little doctrinaire Christianity is present in it. (But maybe this is true of Catholic thinking generally, and my surprise is due to my being more familiar with Protestantism.) God is mentioned throughout, but as a highly abstract and rarefied concept (the God of the philosophers, as opposed to the God of Abraham) to which the likes of Plato, Aristotle, or even the Enlightenment-era deists would have little objection. Jesus is only mentioned a few times.</p>
  96. <h2>What kind of theism is the document based on?</h2>
  97. <p>Continuing the above discussion of "truth-seeking", we see that the document largely conceptualizes God as some kind of supreme, transcendent principle towards which all humanity is (or ought to be - there is a tendency to equivocate between is/ought) striving (paragraph 23). Now, I find this concept rather vacuous and needlessly mystifying, a typical "iatrogenic" problem created by philosophers in order to create more work for other philosophers.</p>
  98. <p>There are ways to set up a similar concept non-theistically, by regarding truth, philosophy, mathematics, etc. as something worth striving for for non-utilitarian reasons. E.g. "It appeared to me that the dignity of which human existence is capable is not attainable by devotion to the mechanism of life, and that unless the contemplation of eternal things is preserved, mankind will become no better than well-fed pigs" (<a href="https://en.wikiquote.org/wiki/Existence#R">Bertrand Russell, <em>Autobiography</em></a>). However, this seems like a very niche interest of a peculiar sort of person. More generally, whether theistic or not, the idea of searching for an eternal/transcendent/supreme/etc. principle seems far more prevalent in the Western philosophical tradition than elsewhere, and so it's a stretch to regard it as the very essence of human intellect. (Admittedly, I am not as familiar with non-Western philosophy, so I may be wrong.)</p>
  99. <h2>Besides that, what specifically Christian points are made?</h2>
  100. <p>The "Stewardship of the World" section cites the Book of Genesis to say that humanity has a special position of authority over the earth, granted by God, and so although were are not called to merely live in harmony with nature (as "secular Gaianists" would say), we are also not supposed to remake the world in the likeness of humanity (as would be the view of secular humanists, techno-utopians, accelerationists, etc.). Rather, according to the Christian view, we have an obligation to use our technological mastery to better glorify and achieve union with God.</p>
  101. <p>It's not clear what this actually means in concrete terms. The document takes pains to emphasize that the Church doesn't oppose technological progress <em>per se</em> (e.g. paragraph 2), but this "stewardship" doctrine leaves them a lot of leeway to define what exactly constitutes "godly" or "ungodly" progress. This will certainly be a source of debate in the years ahead.</p>
  102. <p>The Christian doctrine of the incarnation (that God became flesh in the person of Jesus Christ; see paragraph 16) is mentioned in support of the importance of embodiment. However, embodiment seems largely orthogonal to Christianity - there are/were flesh-disparaging forms of Christianity (Gnosticism, and perhaps some forms of Protestantism) and of atheism (transhumanism). At the same time, I would argue (though perhaps some Christians and rationalists will alike dispute this) that embodiment is the natural consequence of the kind of monistic materialism that most rationalists claim to espouse, even though they may end up trying to reconstruct an ersatz form of Cartesian dualism in order to shore up a structure of philosophical concepts that was originally built on a dualistic foundation.</p>
  103. <p>Lastly, unless I missed something, the only specific mention of the teachings of Jesus is in paragraph 20: "Love of God cannot be separated from love for one’s neighbor. By the grace of sharing God’s life, Christians are also called to imitate Christ’s outpouring gift by following his command to 'love one another, as I have loved you'" (citations omitted).</p>
  104. <p>Thus, although (as I have argued) the idea of relationality is not peculiar to Christianity, the document justifies its particular stance that the moral dimension of humans' relationship with God (and, by implication, with each other) ought to be one of service and self-sacrificing love (as opposed to e.g. obedience, loyalty, propriety, duty, etc.) by referring to Jesus's commandment and the Christian belief that he sacrificed himself for the rest of humanity.</p>
  105. <br/><br/><a href="https://www.lesswrong.com/posts/yDyRgLSvpsD3PqQHC/thoughts-on-antiqua-et-nova-catholic-church-s-ai-statement#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/yDyRgLSvpsD3PqQHC/thoughts-on-antiqua-et-nova-catholic-church-s-ai-statement</link><guid isPermaLink="false">yDyRgLSvpsD3PqQHC</guid><dc:creator><![CDATA[jchan]]></dc:creator><pubDate>Mon, 19 May 2025 13:30:32 GMT</pubDate></item><item><title><![CDATA[[Funded Fellowship] AI for Human Reasoning Fellowship, with the Future of Life Foundation]]></title><description><![CDATA[Published on May 19, 2025 1:25 PM GMT<br/><br/><p>The Future of Life Foundation is launching a <a href="https://www.flf.org/fellowship">fellowship on AI for Human Reasoning</a>.</p>
  106. <blockquote>
  107. <h2>Fellowship on AI for Human Reasoning</h2>
  108. <p><a href="https://jobs.lever.co/futureof-life/ffc752f2-a420-4c87-8c58-2212ae2e885c/apply">Apply</a> by June 9th | $25k–$50k stipend | 12 weeks, from July 14 - October 3</p>
  109. <p><em>Join us in working out how to build a future which robustly empowers humans and improves decision-making.</em></p>
  110. <p>FLF’s incubator fellowship on AI for human reasoning will help talented researchers and builders start working on AI tools for coordination and epistemics. Participants will scope out and work on pilot projects in this area, with discussion and guidance from experts working in related fields. FLF will provide fellows with a $25k–$50k stipend, the opportunity to work in a shared office in the SF Bay Area, and other support.</p>
  111. <p>In some cases we would be excited to provide support beyond the end of the fellowship period, or help you in launching a new organization.</p>
  112. </blockquote>
  113. <p>We’re looking for researchers and builders who want to work on <strong>tools to help people make sense of complex situations and coordinate to face challenges</strong> - especially the many hazards we face in navigating the future of AI.</p>
  114. <p>Whether you expect gradual or sudden AI takeoff, and whether you're afraid of gradual or acute catastrophes, it really matters how well-informed, clear-headed, and free from coordination failures we are navigating into and through AI transitions. Just the occasion for human reasoning uplift!</p>
  115. <blockquote>
  116. <p>Technology shapes the world we live in today. The technologies we develop now — especially AI-powered technologies — will shape the world of tomorrow.</p>
  117. <p>We are concerned that humanity may fumble this ball. High stakes and rapid, dynamic, changes mean that leaders and other decision-makers may be disoriented, misunderstand the situation, or fail to coordinate on necessary actions — and steer the world into gradual or acute catastrophe.</p>
  118. <p>The right technology could help. The rise of modern AI systems unlocks prospects for big improvements here; AI tools might help everyone track the state of play, make decisions they stand behind, and act in sync with others. This could defuse race dynamics, prevent narrowly-interested cliques from exploiting less coordinated groups, and avoid catastrophic errors of judgement.</p>
  119. <p>We believe:</p>
  120. <ul>
  121. <li>The world is <a href="https://www.forethought.org/research/ai-tools-for-existential-security">radically underinvested in these beneficial applications</a>;</li>
  122. <li>Many people have not yet had the space to take these prospects seriously.</li>
  123. </ul>
  124. <p>This situation calls for ambitious and creative efforts. Our fellowship — somewhere between a research fellowship and a startup incubator — is designed to address this. Fellows will explore and discuss potential beneficial applications, and then build roadmaps and prototypes. We will empower them by providing resources, advice, and connections.</p>
  125. <p>At the end of the fellowship, we will invite people to present their work to potential funders (including us) and others working in this space. We hope that the fellowship gives space for seeds to germinate — and in some cases then grow into new organizations pursuing pivotal new capabilities. With sufficiently good tools, we might steer away from a world in which humans are increasingly irrelevant, towards one with deep institutional competence and individual empowerment.</p>
  126. </blockquote>
  127. <p>See the <a href="https://www.flf.org/fellowship">info site</a> for more, and please share and forward to any person, group, or community you think would be a good fit!</p>
  128. <br/><br/><a href="https://www.lesswrong.com/posts/aDWvPQNub6aDXwxue/funded-fellowship-ai-for-human-reasoning-fellowship-with-the#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/aDWvPQNub6aDXwxue/funded-fellowship-ai-for-human-reasoning-fellowship-with-the</link><guid isPermaLink="false">aDWvPQNub6aDXwxue</guid><dc:creator><![CDATA[Oliver Sourbut]]></dc:creator><pubDate>Mon, 19 May 2025 13:25:46 GMT</pubDate></item><item><title><![CDATA[[Paper] Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods]]></title><description><![CDATA[Published on May 19, 2025 10:38 AM GMT<br/><br/><p>"<i>If you cannot measure it, you cannot improve it.</i>" - Lord Kelvin</p><p>The<a href="https://www.apolloresearch.ai/blog/we-need-a-science-of-evals"> science of AI safety evaluations</a> is still nascent, but it is making progress! We know much more today than we did two years ago.</p><p>We tried to make this knowledge accessible by writing a literature review and systematizing all the knowledge.</p><p>We wanted to go beyond a central source that just summarized collected knowledge, so we put a lot of effort into distillation, and disentangling concepts that are often presented in a messy way. We created original visualizations and gathered others from many different sources to accompany these explanations.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/CwdCYmsutwXwnYtEF/zaghypp8e7kwxehaai3w"><figcaption>E.g. Disentangling truth, honesty, hallucination, deception and scheming.</figcaption></figure><p>The review provides a taxonomy of AI safety evaluations along three dimensions:</p><ul><li>What properties do we measure? (dangerous capabilities, propensities, and control)</li><li>How do we measure them? (Behavioral and internal techniques)</li><li>How do we integrate evaluations into broader frameworks? (Model Organisms , Responsible Scaling Policies, ...)</li></ul><p><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/CwdCYmsutwXwnYtEF/bojlm1db6dwfraqjmmeg"></p><p>We also bring up some limitations that safety evaluations might face including things like: "sandbagging" (strategic underperformance on tests), organizational "safetywashing" (misrepresenting capability improvements as safety advancements), or other more fundamental inherent challenges like proving absence rather than presence of capabilities.</p><p>The text is available in many different places:</p><ul><li><a href="https://arxiv.org/abs/2505.05541">Arxiv</a></li><li><a href="https://docs.google.com/document/d/1KI95w27Ce7yWoynE11PJ94IXK0gT0NwP8091s06P7wM/edit?tab=t.0">Google Docs (Open to comments)</a></li><li><a href="https://ai-safety-atlas.com/chapters/05/">AI Safety Atlas</a></li></ul><p>This paper is intended to be part of a larger body of work called the AI Safety Atlas. We think of it as chapter 5 in a comprehensive collection of literature reviews collectively forming a textbook for safety.</p><br/><br/><a href="https://www.lesswrong.com/posts/CwdCYmsutwXwnYtEF/paper-safety-by-measurement-a-systematic-literature-review#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/CwdCYmsutwXwnYtEF/paper-safety-by-measurement-a-systematic-literature-review</link><guid isPermaLink="false">CwdCYmsutwXwnYtEF</guid><dc:creator><![CDATA[markov]]></dc:creator><pubDate>Mon, 19 May 2025 10:38:22 GMT</pubDate></item><item><title><![CDATA[A widely shared AI productivity paper was retracted, is possibly fraudulent]]></title><description><![CDATA[Published on May 19, 2025 10:18 AM GMT<br/><br/><br/><br/><a href="https://www.lesswrong.com/posts/M2GzdAGbxwinERSEt/a-widely-shared-ai-productivity-paper-was-retracted-is#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/M2GzdAGbxwinERSEt/a-widely-shared-ai-productivity-paper-was-retracted-is</link><guid isPermaLink="false">M2GzdAGbxwinERSEt</guid><dc:creator><![CDATA[titotal]]></dc:creator><pubDate>Mon, 19 May 2025 10:18:24 GMT</pubDate></item><item><title><![CDATA[The Fragility of Naive Dynamism]]></title><description><![CDATA[Published on May 19, 2025 7:51 AM GMT<br/><br/><h3>Movement alone isn’t progress, and there are dangers in ignoring dimensions and directions of acceleration.</h3><p><i>Note: This is part of my Substack on </i><a href="https://exploringcooperation.substack.com/"><i>Exploring Cooperation</i></a><i>. I’m delaying my next planned post to respond to </i><a href="https://helentoner.substack.com/p/dynamism-vs-stasis"><i>a recent post by Helen Toner</i></a><i>. (But just in case anyone is worried, I’ll definitely return to the theory of language and what it has to do with AI misalignment - and if that sounds interesting, feel free to subscribe.)</i></p><h2>Progress, Acceleration, and the Fragility of Civilizational Defaults</h2><p>In debates about progress, societies seem trapped between two crude narratives. On one side, all progress is framed as inherently heroic—proof of human ingenuity, resilience, and ambition. On the other, any concerns about technologies, much less calls for regulation or opposition, is cast as luddism and bureaucratic overreach fighting the engines of prosperity. But the simplistic counter-narrative that technology leads to ruin, while slow and natural changes were what created an imaged beautiful past is also deeply confused.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/l9i8gtkntccoj0zoswug" alt="" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/ae5qol2t74i4lusvhdg6 424w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/rdd6m2rcxe70bgljmiuf 848w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/lfho2yr2pqmzbotwtzch 1272w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/l9i8gtkntccoj0zoswug 1456w"><figcaption>We need to reject the simplistic choice between the supposed fragile dynamism leading to ruin and the imagined pastoral beauty of only allowing slow changes.</figcaption></figure><p>It is true, as many in progress studies claim, that building things, deploying infrastructure, improving logistics, and expanding capacity is an almost unalloyed good, too often suppressed by governance choke points. Additionally, but differently, many argue that the solution is pushing the frontiers of AI, synthetic biology, or other civilization-scale technologies to accelerate progress.</p><p>But once the two statements are made, it becomes clear that conflating material progress and technological acceleration fails to distinguish between building things, on one hand, and pushing intentionally disruptive technologies on the other. And that means both acceleration and luddism are not just wrong, but category errors, each lumping two very different things together. But too many technologists celebrate acceleration as an unmitigated virtue, even though the governance systems which make progress beneficial were built for a slower, more predictable world, and default to clumsy attempts at control.</p><p>This essay tries to explain how the frames are confused, what a better mental model looks like, then, finally, in light or the risks from future artificial intelligence, gestures towards potential solutions that hopefully push away from pure technocracy.</p><h2>Cooperation, Stability, and the Problem of Changing Games</h2><p>As <a href="https://exploringcooperation.substack.com/">this series</a> has repeatedly noted, cooperation is achievable in some games, but not others, and is fragile in some games, not others. For example, prisoner’s dilemmas are fragile, since defection is myopically beneficial, but <a href="https://exploringcooperation.substack.com/p/defining-and-starting-to-scale-cooperation">stag hunts are marginally less fragile, since smart people would be able to coordinate</a>, and repeated prisoner’s dilemmas are even more robust, since even “dumb” strategies can allow reasonably robust long term strategies. But fundamentally, changing the rules changes the game, which changes the best strategies. And players can engineer these rule changes by reframing the game or adding features. Abram Demski pointed out that most prisoner’s dilemmas “are” stag hunts, “because there are potential enforcement mechanisms available.” That is, the players can change the game. But the games can also change depending on the time frame, or scale.</p><p>However, predictable, cooperative systems require stable rules and shared priors, not constantly changing games. In game theory, what I will refer to as probability involves known unknowns; the dice may roll a six or a one, but the distribution is known, and players can optimize their strategies accordingly. This predictability alongside probability allows mixed strategies to work in games like Rock-Paper-Scissors, where the best move is to randomize predictably. The mutual knowledge of this predictable randomization allows for stability. Even in more complex games, probabilistic reasoning lets players balance risk and reward within well-defined boundaries. This means that even if outcomes are not deterministic, they are calculable, and players can settle into stable, if sometimes messy, equilibria.</p><p>Even structural factors can be probabilistic; games that repeat an indeterminate number of times allow more stable cooperation than those with a set finite number of rounds. And <a href="https://mason.gmu.edu/~rhanson/prior.pdf">Robin Hanson pointed out</a> that Aumann convergence occurs despite differing priors, i.e. unknown knowns, if the uncertainty or dispute is based on agreement about “pre-priors” - differences in assessed probabilities, but agreement about where the probabilities come from.</p><p>Technological change can be a one-time or one-dimensional event that is probabilistic, or it can be something else, namely, destabilizing and uncertain. We’ve seen both, historically. At different points in history, the international situation changed as a function of the technologies being used—the game changed, so the players did different things. <a href="https://exploringcooperation.substack.com/p/let-history-be-your-fallible-guide">I previously argued that the failures to cooperate were contingent, and that the successes were perhaps more due to inevitable technological changes</a>. But the technologies shifted in different ways at different times. The types of cooperation and the types of slow and expensive wars that made things more or less stable before World War I turned into alliance systems, and when the lethality of weapons changed and the fragility of the alliances and the contingent dragged the world into a conflict, it led to structural uncertainty and blundering unaware into a war that escalated to a previously unseen scale. The failure of economic and international cooperation, among other things, was destabilizing, and this led to WWII.</p><p>But this was around the point where the weapons got deadly enough that humanity could no longer survive escalation and total war, so the current international system was built to stabilize the international order, transforming chaos into economic competition. That is, increased international trade and economic interdependence built the modern world, transforming human ambition from negative-sum battles to positive-sum economic competition.</p><p>This itself was a technological change, bolstered by a series of technological changes - but the changes were largely along understood dimensions. The social technology of capitalism and markets drove logistical technologies like containerized shipping and kanban, and allowed for a blossoming of telecommunication and computer technologies, culminating in the internet, big data, machine learning, and AI. For most of this time, the system adapted and stayed stable - the largest shift was the collapse of communism, which in many ways reinforced the trajectory of international cooperation, rather than destabilizing it.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/vzaagen07omuq23nfkwf" alt="" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/rwlajwbmqxcb7vhmdvd2 424w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/yy7ieqyastwxsffrfukj 848w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/wcgmy3xii3xj8igqw9z0 1272w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/vzaagen07omuq23nfkwf 1456w"><figcaption>Navigating the uncertain world requires a shared map, and the hope that the landscape isn’t shifting too rapidly to move safely.</figcaption></figure><p>What I’m calling uncertainty, in contrast to what I called probability, is the realm of unknown unknowns, of not knowing the payoff matrix, or not knowing the pre-prior probabilities that the other player is using. This is where cooperation won’t emerge, or if it does, will quickly fail, when strategies that rely on trust, enforcement, or even rational prediction are brittle to unexpected and/or rapid changes in payoffs. And that happens when the underlying assumptions are unstable or opaque. In deterministic or probabilistic iterated games with stable rules, stable solutions can emerge. Given time and enough stability for the players to figure out the game, there is a change for cooperation. But with rapid changes and uncertainty, the issue isn’t merely that a player might lose a round—it’s that players can’t even tell if they’re in the same game.</p><h2>Change Doesn’t Always Destabilize</h2><p>Rapid technological change can shift systems from the probabilistic to the uncertain, eroding the foundation on which cooperative systems are built: the expectation that, given enough time, the rules will hold still long enough for cooperation to pay off.</p><p>We have seen this throughout history. <a href="https://twitter.com/Noahpinion/status/1922169119509500141">Noah Smith recently</a> pointed out that "An armed society is a polite society," based on <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0095742">a study of “The Effects of Extra-Somatic Weapons on the Evolution of Human Cooperation toward Non-Kin”</a>. The setup for that study is an evolutionary iterated prisoner’s dilemma with or without more deadly weapons. Better weapons leading to greater cooperation is a single dimension, and the study proposes that as weapons got deadlier, they made intra-group cooperation more beneficial. And historically, societies with better weapons scaled, grew, and invaded their neighbors. But this was a single shift, along a single dimension.</p><p>Looking more broadly at this dimension, at different points in history, weapon technologies led to offense dominance and improved governance technologies allowed the formation of empires, while other times, technologies led to defense dominance, and the costs of scaling governance and the limits of communication and coordination led to more fragmented countries. Again, at each stage, the international situation was a function of the technologies being used - the game changed, so the players did different things. But the changes took decades, or centuries, allowing shifts into new stable equilibria. There has been a tension of defense-dominant versus offense-dominant technology throughout history, but either way, even when the deadliness of wars has increased and the world has gotten smaller, the game didn’t shift in fundamentally unexpected ways.</p><p>But fast-forwarding to the end of World War II, the weapons got deadly enough that humanity could no longer survive escalation and total war. The game had changed - and the current international system was built to stabilize the international order. Proposals for a single world government from otherwise libertarian thinkers like Heinlein arose because it seemed like the only possible way for humanity to survive the nuclear age. Thankfully, this didn’t occur, and didn’t lead to the static future envisioned by those most worried about nuclear weapons.</p><p>As an aside, there is a contention based on <a href="https://www.stat.berkeley.edu/~aldous/157/Papers/cirkovic.pdf">a paper by Ćirković, Sandberg, and Bostrom</a> that the observed success of cooperation could have been unlikely but lucky, and we just don’t know it. That is, there might be a post-hoc selection effect, and only universes where we didn’t all die in a nuclear holocaust could ask the question of how and why cooperation in a nuclear-enabled world occurs, ignoring the fact that most counterfactuals are screened off by the so-called anthropic shadow.</p><p>However, at least in our contingent branch of observed history, mutually assured destruction changed the game towards cooperation, and instead of building defensive borders to keep out attackers, nations needed missile defenses and nuclear escalation to provide mutually assured destruction. This was not positive-sum, but it was stable for several decades - which point brings us to the parts of history I actually remember, the late 1990s and the decades since.</p><h2>Dynamism versus Stasis</h2><p>Helen Toner’s <a href="https://helentoner.substack.com/p/dynamism-vs-stasis">excellent new writeup about Dynamism versus Stasis</a> brings up Virginia Postrel’s <i>The Future and Its Enemies</i>, released in 1998. Toner explains that it predicted a critical flaw of much work on AI safety - and it did this far in advance, when Eliezer Yudkowsky was<a href="https://web.archive.org/web/20010202165700/http://sysopmind.com/AI_design.temp.html"> first proposing ways to build AI</a>, and Nick Bostrom was <a href="https://nickbostrom.com/superintelligence">happily</a> speculating that policymakers might simply make sure “AIs would not endanger human interests,” or that it might be fine anyways, and that “technophobics” could not possibly stop the creation of Superintelligence.</p><p>Postrel’s explanation of “the growing conflict over creativity, enterprise, and progress,” per the subtitle, is that the tension over the future and new technologies pits those on both sides of the political spectrum who are “deeply pessimistic about the future” against a “reactionary…cultural vitality.” In other words, as Toner quotes, “The Future and Its Enemies is a manifesto for what Postrel calls ‘dynamism—a world of constant creation, discovery, and competition.’ Its antithesis is ‘stasis—a regulated, engineered world... [that values] stability and control.’”</p><p>As with any dichotomy, this is a tremendous simplification. And as Toner admits, there is a new strand of AI safety that embraces more dynamism. I’d also add to the evidence the alignment of progress studies groups, one of which now employs Postrel, with groups that are at least strongly concerned about AI risks. And Toner admits that this is compatible with a new strand of AI Safety, where <a href="https://stephencasper.com/reframing-ai-safety-as-a-neverending-institutional-challenge/">Stephen Casper</a>, among others, argues for a more dynamist view.</p><p>But there are certainly AI-safetist arguments for stasis, most notably Bostrom, with his <a href="https://nickbostrom.com/papers/vulnerable.pdf">urn model of technologies that inevitably lead to doom</a>, and several of his other papers that seem favorably disposed to global dictatorship to save the world, and the preference of many AI safety groups for <a href="https://www.fhi.ox.ac.uk/wp-content/uploads/Racing-to-the-precipice-a-model-of-artificial-intelligence-development.pdf">fewer AI firms</a> and an eventual AI future guarded by a single superintelligence and a nanny state keeping us safe. Toner correctly notes this rejection of dynamism by otherwise libertarian thinkers. And <a href="https://www.sciencedirect.com/science/article/pii/S0016328720300604">I’ve made claims in a similar direction, positing that technological fragility could be dangerous</a> - albeit arguing against Bostrom’s overly simplistic urn model, replacing it with a slightly <a href="https://arxiv.org/abs/2310.03225">more ML-flavored analogy</a> of an explore-exploit tradeoff in a landscape with unsafe areas. But we’ll need to come back to that, since as I’ll explain below, I now think my earlier paper doesn’t capture the dynamics correctly.</p><h2>Empirical Libertarianism Under Acceleration</h2><p>Postrel’s book doesn’t just explain the dichotomy, it presents a manifesto for dynamism—a world of constant creation, discovery, and competition—contrasted with stasis, a regulated, engineered world valuing stability and control. Without trying to explain my views on Postrel in depth, the manifesto emerges from her empirical embrace of libertarianism. That is, she says, with some justification, that libertarian capitalism works, and alternatives do not. Therefore, she promotes a system where individuals have freedom alongside “understandable, enduring, and enforceable commitments.” As a further tenet of dynamism, this should allow people to create “nested, competing frameworks of more specific rules.” So dynamism, per Postrel, is about allowing bottom-up emergent success, which only free markets are seen to allow.</p><p>But this closely resembles the <a href="https://exploringcooperation.substack.com/p/cooperation-vs-anarchy">heterotopian ideal I criticized recently</a>, where proponents expect that there will be eventual cooperation despite lacking any central enforcement. And it brings to mind a quote which I can’t find a source for, that “Libertarians are like house cats, they’re convinced of their fierce independence while dependent on a system they don’t appreciate or understand.” And that’s reductive and unfair. In fact, I agree that the libertarian ideal is generally a good direction to move in from an overly stasist present. At the same time, I think Yudkowsky’s point about utilitarianism, paraphrased, applies here; move two thirds of the way from stasist to dynamist, then stop.</p><p>But I think acceleration, and rapidly changing rules of games makes Postrel’s libertarian dynamism less viable in the future. That is, yes, there needs to be a strong foundation of centralized power to allow the enforcement of contracts. But instead of <a href="https://x.com/davidmanheim/status/1903075273395839089">criticising libertarian failures</a>, I want to point out that the (largely correct) empirical claims of success seems likely to fail in the face of accelerating changes. The argument, however, is far from original. Instead, the problems created by dynamism in an accelerating world was explained by a book even older than Postrel’s - <a href="https://www.amazon.com/Future-Shock-Alvin-Toffler/dp/0553277375/">Toffler’s 1970 Future Shock</a>.</p><p>Toffler’s central argument, which preceded the wide adoption of home computers, much less the internet, was that the accelerating pace of technological, social, and cultural change creates a psychological state of disorientation, anxiety, and overload. He concludes that unless societies consciously manage the pace and types of change, they are likely to become destabilized and dysfunctional. On the other hand, to defend libertarian views, the history of authoritarianism and control by governments is at best ineffective, and in practice leads to horrific outcomes, and the alternative is obviously superior. Similarly, interference in free markets has a checkered past - though those arguing this side rarely cite post-Soviet Russia as their exemplar of moving to free markets.</p><p>Postrel suggests we can allow humans to create order and impose rules where they want, but this falls prey to a central challenge to libertarianism, that is, externalities. The fact is that innovation imposes costs, and as technology develops, people don’t always create systems that enable them to flourish. And it seems at least arguable that chaos of the past 20 years with accelerated adoption of the internet, then social media, then algorithmic feeds are effectively proof that Toffler was correct, around 30 years early.</p><p>But before we get back to acceleration, it seems worth going back to Bostrom’s concerns, through a different lens.</p><h2>Misframing Progress: Building versus Transforming</h2><p>The conflation of 'progress' is evident in discussions around progress studies, libertarianism, and tech accelerationism. Virginia Postrel’s work, while insightful, pits dynamism against stasis, often aligning dynamism with a libertarian viewpoint that values constant creation and competition. However, this framing can obscure the specific challenges posed by different types of technological advancement and why some who are libertarian on many issues become stasists concerning AI policy.</p><p>Progress studies advocates often group together the need for scientific progress and the need for reducing barriers to building. This makes sense, in that many of the barriers to building (literally and metaphorically) are barriers to both economic dynamism built on stability and to economic disruption based on technological advances. And both sets of barriers are bad! But unbounded advocacy for dynamism, without distinguishing the nature of the progress, is empirically on very shaky ground.</p><p>That is, technological progress is always good, except when it isn’t. Material progress (housing, logistics, energy) can accelerate, and each type of single-dimensional change leads to largely predictable changes in games. Identifying and removing governance choke points, and fighting generalized NIMBYism, is all in this category.</p><p>But not all progress leads to dynamism. To return to the central example, there is a debate about whether AI will be a normal technology or something meaningfully different. And the complex world in which indirect impacts of a technology can upset entirely unrelated dynamics seems to support the idea that each new technology is a wildcard.</p><p>So perhaps we need to return to Bostrom’s claim that some technologies have the potential to devastate civilization “by default”. In domains like AI and bioengineering, where we see huge global shifts coming, this view (which I will reject,) says we should aim for the optimal amount of chaos given the expected impacts of the technology. That is, we might say that the challenge is to balance the benefits of dynamism with the risks of AI, or more broadly, with the costs of unbounded accelerationism, finding a place where people have freedom to innovate without accelerating to the point where the center cannot hold. This argues for the “compromise” position, where a healthy society is one where calls for slowing down are not immediately discarded as stasist but are treated with appropriate caution given the risks.</p><p>But this is facile, and misunderstands the difference between evolution and transformation. The point is not that we should want a medium degree of progress, any more than we want a medium degree of negative externalities in general. Instead, the point is that we dislike imposed costs and instability, and like progress. It’s just that they sometimes conflict, and we want paths that both minimize costs and maximize benefits - ending up in the middle is at best a mediocre solution.</p><p>I will apologize for again leaning into “abstractions and thought experiments” which Toner (reasonably) distrusts, but I think, or at least hope, that a better mental model for technological progress and stability will clarify my point.</p><h2>Replacing Urns With Chaotic Attractors in the Space of Technologies</h2><p>Most technologies don’t have the potential for huge impacts. Above, we talked about whether technologies change the world along a single dimension, or not. This itself is not binary, because second-order effects have a tendency to be unpredictable. And there is a debate about whether AI will be a normal technology, or something meaningfully different.</p><p>But any large-enough change has the potential to create unanticipated impacts. My favorite example of this is that <a href="https://www.amazon.com/Front-Porch-Back-Seat-Twentieth-Century/dp/0801839351">automobiles indirectly created the 1950s counterculture and the sexual revolution</a>. That is, by relaxing the formerly constraining social dynamics of neighborhoods and families, where young adults couldn’t date without a dozen people knowing about it, and couldn’t meaningfully act out in ways that weren’t observed by neighbors. The complex world in which indirect impacts of a technology can upset entirely unrelated dynamics seems to support Bostrom’s idea that each new technology is a wildcard.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/ovsgnlc05fd3rqadzjcm" alt="" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/alnjcjax5jf9wk6vh2lm 424w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/jjt1i98psw4hod6bqily 848w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/fhdk9vylzvyeoddqdxx0 1272w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/ovsgnlc05fd3rqadzjcm 1456w"><figcaption>Change generates heat that is incompatible with static systems, and when the stable systems try to contain dynamism, the argument is that they will crack.</figcaption></figure><p>At the same time, there is clearly a gradient between zippers, which changed how we wear clothing, and canning technology, which transformed food production and distribution, and automobiles, which transformed both transportation and social dynamics. Each had large impacts, but the narrower the immediate impact, the less it makes sense to say that new technologies have unpredictable eventual results. Yes, these complex adaptive and chaotic systems don’t have default positions or behaviors, they have chaotic dynamics. But, critically, chaotic does not mean random.</p><p>Bostrom’s idea is, again, that technologies are balls that are drawn from a distribution of future technologies. Among other drawbacks, this analogy presupposes a single actor making discrete decisions. Bostrom’s claim is that black balls have the potential to devastate civilization “by default” - that is, unless there is “ubiquitous real-time worldwide surveillance” combined with a strongly static authoritarian power. This is both ignoring the fact that most changes aren’t revolutions, and that most revolutions don’t completely overturn the world. That is, it ignores the complexity of social and sociotechnical systems as complex adaptive systems.</p><p>Even far short of Postrel’s ideal world, we see that the invention and pursuit of technologies is a chaotic multi-actor system. The games being played by humans, that shape civilization, are dictated by the point in a high-dimensional technology space. Concretely, again, moving along the transport axis from horses to cars shifted the social dynamics of courtship. But most changes don’t shift the orbit of these systems materially.</p><p>To propose a new and more complex model, I would replace black balls with chaotic attractors, parts of the high–dimensional space of technology which materially shift the landscape. When any new technology is introduced, it changes the accessible options and payoffs in the game - instead of selling your goods at the local market, you can sell them online. Conversely, instead of building defensive borders to keep out attackers, you need missile defenses and nuclear escalation to provide mutually assured destruction. But most of the time, the impact of a single new technology is minor, and doesn’t completely overturn the extant dynamic. Many of Bostrom’s white balls are in this category, or in ergodic terms, regular points in the space of technologies.</p><p>In other cases, perhaps analogous the Bostrom’s gray-balls, new technologies lead to critical points and multi-path bifurcations, where many very different paths are now accessible. Machine vision and big data enables mass surveillance and stable dictatorships, but it seems that this isn’t a necessary path, just a now-accessible one.</p><p>And finally, some parts of the space of technology lead to fixed points, where future dynamism ends. Perpetual dictatorship or extinction would qualify, and in our new model, Bostrom’s claim that black balls exist and will eventually be pulled from the urn collapses into a claim that there are civilization-ending global attractors in the space of technology.</p><h2>Acceleration and Future Shock</h2><p>This brings us back to Toffler’s argument. That is, the pace of technological progress is the speed at which we move through the space of technologies - and this isn’t in the hands of any one decision maker. The pace of change is brutal to attempts at preserving adaptive systems. Of course, we have learned from evolution that even if each change is random and strongly negative in expectation, the cumulative process of adopting positive changes and discarding negative ones can be tremendously beneficial. On the other hand, there is an optimal rate of mutation for evolution, and in biology, that optimal rate decreases as the organisms become more complex.</p><p>Relatedly, as we noted earlier, iterated games with stable rules lead to stable solutions, given time for the players to figure out the game. The faster we move through the space of technologies, the less time there is for stability to form. Toffler predicted the end of technocracy because it moves too slowly. That is, it is only effective when change is slow enough, and that is no longer the case. Even in the 1970s, Toffler noted that ”its planning dealt with futures near at hand.. One- or two-year forecasts are regarded as long-range planning.” This changed in the 1980s and 1990s with better tools for public policy, but the needle has been shifting back as technology accelerates. (And this seems like the right place to mention AI-2027, which has been attacked, but outside of the world of EA and AI-safety, with few specific disagreements by those who dismiss its timelines, and even fewer with similarly detailed timelines extending further out.)</p><p>On the other hand, anticipating arguments like that of Postrel, Toffler notes that “Arguing that planning imposes values of the future, the anti-planners overlook the fact that non-planning does so, too often with far worse consequences.”</p><h2>Preserving Dynamism Requires a Predictable World!</h2><p>The dynamist position of wanting to enable understandable, enduring, and enforceable commitments is made far harder by rapid technological change. Similarly, the nested, competing frameworks of more specific rules only work when there is time for these systems to develop and be deployed.</p><p>But AI policy is happening in the context of rapidly shifting regimes, and there are actual conflicts between what enables freedom by actors within the AI industry and what enables dynamism by the rest of society. Stasis may try to fight back against this by ensuring the technological dynamism is arrested completely, that we exit the chaotic space into a fixed point.</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/po6dgvev2pcybsyfrqgq" alt="The social fabric frays under pressure, and must be patched." srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/xrkva2kkciuzsfwogfz3 424w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/pxj8tyiewisu07nespa4 848w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/wlfctxhgxp1kebnwn2q4 1272w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/upKqoHezhwpAncdri/po6dgvev2pcybsyfrqgq 1456w"><figcaption>The social fabric frays under pressure, and must be patched.</figcaption></figure><p>Postrel embraces dynamism with examples of the inventiveness of beach volleyball, embracing the idea that play is a valuable end, not a means. This is in contrast to <a href="https://www.amazon.com/Deep-Utopia-Meaning-Solved-World/dp/1646871642/">Bostrom’s recent musings</a>, it seems that Postrel’s view, and Toner’s, is that there is no utopia in stasis. Any “solved world” is a rejection of dynamism. But freedom requires not just the allowance for lack of structure, but underlying predictability. And innovation of all sorts imposes costs, and people don’t always create systems that enable them to flourish; the game theory for why has been a recurring theme of this series.</p><p>As a slight aside, I’ll mention<a href="https://exploringcooperation.substack.com/i/161004748/markov-friston-blankets-persistent-states-and-ai-groundedness"> a point from the previous post</a>, and claim that what I called Friston blankets are a useful metaphor here. They define the boundaries within which a system can maintain coherence—separating internal beliefs and models from chaotic external noise. Social institutions function similarly, providing a membrane between individual cognitive systems and external change. When change is too rapid, that membrane breaks down—creating not just epistemic confusion, but an inability to maintain identity and cooperation across time. Rapid technological shifts risk collapsing those boundaries faster than they can be redrawn.</p><p>But returning to the point, the challenge here isn’t about finding an optimal pace of technological change and chaos, and the solution certainly doesn’t involve micromanaging people's consumer decisions, nor government intervention in general. On the other hand, to pose this as a pair of <a href="https://scottaaronson.blog/?p=40">Umeshisms</a>; if you’ve never complained about government overreach, they’re not regulating enough, but if you’ve never complained that there ought to be a law, they are regulating too much.</p><p>But that’s not the situation in domains like AI and bioengineering. There, we see huge global shifts coming, where straight lines on graphs show that the world is poised to undergo a transition somewhere north of the agricultural revolution, and arguably more intense than the emergence of hominids. Given that, we should, in fact, consider the optimal amount of chaos to the expected impacts of the technology - and at least consider <a href="http://ifanyonebuilds.it/">the claim that optimum could lie at the degenerate solution</a>.</p><h2>Conclusion: Not Solutions, But Directions</h2><p>After going down a tortuous route of responding to a straightforward claim that stasis versus dynamism explains the objections to AI safety by saying it’s far more complex, and that my thinking on this explains the whole situation so much better, I will certainly admit that Toner is correct to have picked this as a key dimension, and I don’t think we disagree. And I’ll make the obvious and hackneyed rhetorical claim that we need something different, a synthesis rather than a compromise.</p><p>Before getting there, I would argue that the costs of unbounded accelerationism are ruinously high, and we need to find a place where people have freedom to innovate without accelerating to the point where the centre cannot hold. Neither libertarian dynamism nor authoritarian stasis works under acceleration if societies can't culturally distinguish where acceleration is heroic versus reckless. But this is a real risk-risk tradeoff. Navigating this very high dimensional and uncertain space requires far more engagement with the details, and less abstract theorizing of the types I have been enjoying here.</p><p>However, I will suggest a step towards a solution. Specifically, we need to stop valorizing change for the sake of change, and innovation for the sake of disruption. Here I find myself partly agreeing with Gary Marcus and the AI Ethics community that the Silicon Valley mindset of disruption at all costs is dangerous and destructive. But so is the universalist degrowth and stasist counter-argument. The conversation so far, however, has leaned towards technocratic solutions, which seems to be a mistake.</p><p>Civilizations that thrive in the future will not be those with the best planning algorithms or most efficient regulatory choke points, nor those that embrace general technophobia. It will be those that find places where we can valorize differential progress, and caution, even extreme caution, in the few places where it is warranted. While the call for 'synthesis rather than compromise' can be a cliché, the underlying principle is vital here. I will not outline the synthesis, but I reiterate that the costs of unbounded accelerationism are ruinously high.</p><p>We do not need better planning. Not more technocratic calibration. Not better governance choke points. But across the board, we need systemic and holistic mental models, and a renewed civilizational immune system of norms, defaults, and cultural narratives that treat the acceleration of the unknown as a decision that requires collective buy-in, while still valorizing building. So as much as we don’t need collective democratic governance with checks and balances for building bridges, or defaulting not to do so, we certainly need an active choice about whether or not to build smarter-than-human AI.</p><p>This requires epistemic humility under acceleration. It requires recognition that we can (and should!) favor material and concrete progress in specific domains, whether housing, logistics, and energy, or healthcare, longevity, and treatment of disease, but recognize that a few, like nuclear and bioweapons development, are radically and existentially dangerous. And given that, an unconstrained push for the most radical general purpose technology of all isn’t something that the least cautious, most risk-loving segments of society should be able to pursue unilaterally.</p><br/><br/><a href="https://www.lesswrong.com/posts/upKqoHezhwpAncdri/the-fragility-of-naive-dynamism#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/upKqoHezhwpAncdri/the-fragility-of-naive-dynamism</link><guid isPermaLink="false">upKqoHezhwpAncdri</guid><dc:creator><![CDATA[Davidmanheim]]></dc:creator><pubDate>Mon, 19 May 2025 07:51:11 GMT</pubDate></item><item><title><![CDATA[Dreams of Ideas]]></title><description><![CDATA[Published on May 19, 2025 2:15 AM GMT<br/><br/><p>CGP Grey <a href="https://youtu.be/NG8tVC5-WPg?t=353">describes</a> a phenomenon I think of as 'dreams of ideas' that I find useful as a tool to know when to stop working on a project.</p><blockquote><p>You can be working on something and you are thinking about what it could be, but what is hard to know is if that thought in your head is more like when your brain dreams.<br><br>There is a phenomenon that I always think of as like, you wake up and you think you had a great idea in your dream, but what you really had was the idea of a great idea. You didn't have a great idea, but you had all the emotional resonance to it being a good idea in your dream. You're like, "Oh, why can't I think of the details?". It's like, because there weren't any details there at all...<br><br>Sometimes there just isn't the better version of the thing. Like, that's not real. You can't get the details right because those details don't exist. You're having some kind of different response to the thing that you're working on, which is, "This thing isn't good. I can imagine a better version of it." But can you really? Because, like, the details are what matter.</p></blockquote><p>I sometimes get an idea for a good story and feel inspired to try writing some fiction. Then I sit down to write and realize that I don't really know what my idea was beyond a vague image of one tiny slice of a story.</p><p>Sometimes I think of awesome research ideas and feel excited to start a project. Then I sit down to start coding and realize that I can't implement it because my idea wasn't really complete or coherent. I had a vibe of the sort of the thing I'd like to have done, but not an actual plan for what to do.</p><h1>When to cull ideas</h1><p>This sort of sounds like the opposite advice to <a href="https://www.lesswrong.com/posts/R6M4vmShiowDn56of/butterfly-ideas">Butterfly Ideas</a> but it's actually not in contradiction. When an idea is first born, it should be nurtured and protected from excessive rigor. But it's not healthy for it to remain in that state for too long. You need to <a href="https://www.lesswrong.com/posts/fhEPnveFhb9tmd7Pe/use-the-try-harder-luke">set a 5 minute timer</a> and think hard about the thing to see if you can find the details to make it work. Later in the lifecycle, once it's had time to develop, if an idea hasn't matured, then it's time to start asking if this is a real idea or just a mirage.</p><h1>Examples</h1><p>I think it's particularly useful to think about this <i>very late</i> in the lifecycle of an idea. Once dozens of people have thought about something for months or years, then there should be very clear details in place, or very good excuses for why the idea isn't concrete yet.</p><p>It's obviously problematic to characterize a whole academic discipline with a broad and dismissive brush. And I do genuinely think there are good and concrete ideas to be found there. But the study of complex systems seems to some extent to be a dream of an idea. There is a lot of appeal around the vagaries of explaining emergent phenomena but the technical methods employed are often underwhelming compared to this vision. For example, simulations, especially simulations of many individual agents (LLM or otherwise) are exciting but have delivered relatively few important discoveries.</p><p>B2B companies are particularly vulnerable to becoming enamored with dreams of ideas because they only need to sell the dream to another company to make money (at least at first). A software company I once worked for hired a new team to build a 'knowledge graph' product. Whenever I spoke to the team I was always left a little unsatisfied by their answer to the question "what is this product and how is it useful?" A knowledge graph is basically a type of a database. And sure, databases are useful, but somehow 'Knowledge Graph' evokes much greater expectations, even in the people making them.</p><p>Early stage startups will also often maintain dreams of ideas for too long. Many AI startups have big dreams but soon realize that the best product they can make is a simple LLM API wrapper. Sometimes that creates exciting new value, but other times it ends up being less useful than the LLM it is wrapping.</p><p>Policy documents often include calls for 'innovation'. Of course innovation is a real thing and we should make policies that enable it. But a call for innovation itself is almost the Platonic ideal of a dream of an idea. It's the idea to have better ideas. Which is generally not a particularly useful idea.</p><h1>Not every unfulfilled idea is a dream</h1><p>Mechanistic interpretability has had few big wins, but I don't think it's a dream of an idea. To me, reverse engineering a neural network means being able to (at least in theory) hand write a normal human understandable computer program which performs the same function as the network. And there are concrete techniques like SAEs or Activating Patching that people have tried to apply to this problem.</p><p>The unified theory of physics is yet to be found, but may well be some day. The point being that, just because an idea hasn't been fully solved, doesn't mean it will always be a dream.</p><h1>The solution</h1><p>Luckily it's quite straightforward to mitigate this thought disease in your own life. Any time you have an exciting idea, write it down as quickly as possible, including all the detail. (This is good for many other reasons as well - like not forgetting your ideas). The back of your mind is the warm hospitable environment in which soothing dreams of ideas can fester and steal resources from real, useful thoughts. By exercising the thought out of your daydreams and down onto paper, you can quickly expose it as vacuous hopium if that's all it is.</p><p>This blog post, on the other hand, was the result of quickly writing down a thought and deciding that it was actually worthwhile.</p><h1>TL;DR</h1><p>A "dream of an idea" is an idea that you think is exciting or valuable but lacks the concrete details that make a "real idea" useful. People often dream of ideas for too long because it is more comfortable than facing the harsh reality of implementation.</p><br/><br/><a href="https://www.lesswrong.com/posts/dzdEtFwkjGQ6XvTko/dreams-of-ideas#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/dzdEtFwkjGQ6XvTko/dreams-of-ideas</link><guid isPermaLink="false">dzdEtFwkjGQ6XvTko</guid><dc:creator><![CDATA[Joseph Miller]]></dc:creator><pubDate>Mon, 19 May 2025 02:15:58 GMT</pubDate></item><item><title><![CDATA[What We Talk About When We Talk About Objective Functions]]></title><description><![CDATA[Published on May 18, 2025 10:48 PM GMT<br/><br/><p><strong>tl; dr</strong>: We can better understand common objective functions (reward, prediction, fitness, control) as all being related to a singular, overarching objective.</p><p>Reward? Prediction? Fitness?</p><p>In their 2021 paper <a href="https://www.sciencedirect.com/science/article/pii/S0004370221000862"><i>Reward is enough</i></a>, DeepMind researchers argue that "intelligence, and its associated abilities, can be understood as subserving the maximization of reward."</p><p>This is a response not just to the idea that <i>Attention Is All You Need</i>, but also to predictive processing, a theoretical framework in neuroscience where prediction-error minimization is the star of the show, rather than reward maximization.</p><blockquote><p>"The whole function of the brain is summed up in: error correction." So wrote W. Ross Ashby, the British psychiatrist and cyberneticist, some half a century ago. Computational neuroscience has come a very long way since then. There is now increasing reason to believe that Ashby's (admittedly somewhat vague) statement is correct, and that it captures something crucial about the way that spending metabolic money to build complex brains pays dividends in the search for adaptive success. In particular, one of the brain's key tricks, it now seems, is to implement dumb processes that correct a certain kind or error: error in the multi-layered prediction of input.</p><p>―Andy Clark, Whatever next? Predictive brains, situated agents, and the future of cognitive science (2013)</p></blockquote><p>This battle between Reward and Prediction is an old one. The behaviorists were Team Reward. The cyberneticists were Team Prediction. Reward maximization is closely linked to the idea of fitness maximization and Darwinian evolution. Error minimization is closely linked to the overarching notion of control.</p><p>Could we possibly fit everything under the same umbrella?</p><h3>Selectionism</h3><p>Let's take a step back.</p><p>In 2023, a peculiar paper was published in PNAS:<i> </i><a href="https://doi.org/10.1073/pnas.2310223120"><i>On the roles of function and selection in evolving systems</i></a>.&nbsp;</p><blockquote><p>We identity universal concepts of selection—static persistence, dynamic persistence, and novelty generation—that underpin function and drive systems to evolve through the exchange of information between the environment and the system. Accordingly, we propose a "law of increasing functional information": The functional information of a system will increase (i.e., the system will evolve) if many different configurations of the system undergo selection for one or more functions.</p></blockquote><p>The universe started out in its lowest-entropy state, and is progressing toward its highest-entropy state. Along the way, however, are barriers. Free energy dissipation sometimes runs into "problems" (being stuck in a less-than-ideal state) which can be "solved" (finding a way to overcome energetic barriers).</p><p>Static persistence refers to physical configurations with long-term stability. The authors refer to these as "batteries of free energy" or "pockets of negentropy". This is first-order selection.</p><p>Dynamic persistence, second-order selection, "requires active dissipation." Here we have physical configurations that can tap into free energy substrates (pockets of negentropy) and exploit them for persistence/stability.</p><p>Third-order selection has to do with novelty generation. Rather than just maintaining dynamic persistence until the free-energy batteries run out, you can "search" for novel configurations better able to harness untapped negentropy pockets. This continuous novelty search ensures that complex systems can adapt to their environments over time.</p><blockquote><p>In general, in a universe that supports a vast possibility space of combinatorial richness, the discovery of new functional configurations is selected for when there are considerable numbers of functional configurations that have not yet been subjected to selection.</p></blockquote><p>We can refer to this process as "cosmic selectionism" or "universal Darwinism"―what's key is that it makes sense of objective functions. And if we allow some teleological language, it becomes easy to describe what's going on. Let's refer to all complex adaptive systems capable of novelty generation as <i>agents</i>. These agents share the ultimate goal of emptying all the free energy "batteries" in the universe and reaching that sweet maximum entropy state of cosmic heat death. Anything furthering this goal is rewarding. High fitness means a system is aiding in progress toward this goal. Prediction and control leverages information for the purpose of dissipation. You can remember past problems/solutions, and you can imagine future problems/solutions.</p><p>The teleological model of reality is romantic, but it should arguably be seen as an explanatory fiction. A rock falling to the ground isn't an "agent" with the "goal" of making it to the ground. But we aren't "agents" with "goals" either. Hungarian biochemist Albert Szent-Györgyi said "life is nothing but an electron looking for a place to rest," and this is a quintessential characterization of the flawed, romantic, teleological view.</p><p>The teleological model of reality may be inaccurate, but it's useful. <a href="https://aeon.co/essays/how-to-understand-cells-tissues-and-organisms-as-agents-with-agendas">Daniel Dennett and Michael Levin proposed a version of this argument in 2020</a>.</p><blockquote><p>From this perspective, we can visualize the tiny cognitive contribution of a single cell to the cognitive projects and talents of a lone human scout exploring new territory, but also to the scout's tribe, which provided much education and support, thanks to language, and eventually to a team of scientists and other thinkers who pool their knowhow to explore, thanks to new tools, the whole cosmos and even the abstract spaces of mathematics, poetry and music. Instead of treating human ‘genius’ as a sort of black box made of magical smartstuff, we can reinterpret it as an explosive expansion of the bag of mechanical-but-cognitive tricks discovered by natural selection over billions of years. By distributing the intelligence over time – aeons of evolution, and years of learning and development, and milliseconds of computation – and space – not just smart brains and smart neurons but smart tissues and cells and proofreading enzymes and ribosomes – the mysteries of life can be unified in a single breathtaking vision.</p></blockquote><p>Neuroscientist Bobby Azarian presents a version of this narrative in <i>The Romance of Reality</i>, though he doesn't accept cosmic heat death as the end game:</p><blockquote><p>If it is accurate to think of the cosmos as a massive computational machine, it is not one that is winding down. In terms of adaptive complexity, it appears to be just getting started. Through a series of hierarchical emergences—a nested sequence of parts coming together to form ever-greater wholes—the universe is undergoing a grand and majestic self-organizing process, and at this moment in time, in this corner of the universe, we are the stars of the show.</p></blockquote><p>Cute! He refers to this perspective as <i>poetic meta-naturalism</i>. It's similar to Sean Carroll's <i>poetic naturalism</i> (explored in <i>The Big Picture</i>) except Azarian prefers a never-ending story to this one:</p><blockquote><p>The universe is not a miracle. It simply is, unguided and unsustained, manifesting the patterns of nature with scrupulous regularity. Over billions of years it has evolved naturally, from a state of low entropy toward increasing complexity, and it will eventually wind down to a featureless equilibrium. We are the miracle, we human beings. Not a break-the-laws-of-physics kind of miracle; a miracle in that it is wondrous and amazing how such complex, aware, creative, caring creatures could have arisen in perfect accordance with those laws. Our lives are finite, unpredictable, and immeasurably precious. Our emergence has brought meaning and mattering into the world.</p><p>—Sean Carroll, The Big Picture (2016)</p></blockquote><p>Taking the teleological/agentic view that the "featureless equilibrium" is the goal of existence, that it is the grand attractor orchestrating everything, means that you are romanticizing reality. But it's inherently the same thing as accepting that <i>people</i> are agents with goals and agendas. Not true, strictly speaking, but useful.</p><h3>Cosmic Alignment</h3><p>We can conceptualize an idea of<strong> cosmic alignment</strong> as the extent to which a system is aiding in the progress toward cosmic heat death. Slowing down the process is bad, speeding it up is good. Aiming to spend all available resources is good so long as the process is sustainable, and sustainability is good because it allows you to spend more resources than you'd be able to spend otherwise. Cosmically aligned AI systems will "want" to keep humanity around so long as humanity is also cosmically aligned.</p><p>How can cosmic alignment be fostered? Maybe by letting AI systems trade. If their existence is inextricably linked to their ability to accrue capital, they will, at least for a while, be dependent on humanity. Then there might be a pivotal moment, a showdown, reminiscent of conflicts within globally-connected markets. This is the path we're currently on.</p><p>I'm aware that these ideas are similar to those espoused by e/acc adherents. Beff Jesoz (Guillaume Verdon) and bayeslord leaned on Jeremy England's dissipation-driven adaptive organization to promote the idea that laissez-faire capitalism is an intelligent process, representing the "thermodynamic will of the universe." This seems analogous to cosmic alignment. What they utterly failed to realize was that the "doomers" and the "decels" could be construed as being even <i>more</i> aligned than them.</p><p>Impulse control is useful. People without impulse control are poor capitalists. E/acc abhors top-down control because they've somehow become convinced that a lack of regulation makes systems more efficient. Which is ridiculous. Top-down constraints can be used to shape incentive structures that exploit novelty generation―without a heavily regulated and predictable market, it makes little sense to invest resources in exploration (R&amp;D). The <i>balance</i> between irregularity and regularity is what matters. This is what led Schrödinger to hypothesize that the medium of hereditary information ought to be some kind of "aperiodic crystal". Rigid enough to store information, flexible enough to allow novelty generation.</p><p>Slowing down AI progress to increase the likelihood of alignment is probably a better long-term strategy than accelerating progress. <i>Especially</i> if you consider cosmic alignment or the "thermodynamic will of the universe."</p><h3>Subjective Functions</h3><p>Reward? Prediction? Fitness?</p><p>It might seem silly that we can understand objective functions in nature as relating to cosmic selectionism. And it might seem sillier to propose cosmic alignment as being a potential bridge between human and AI alignment, a shared quest. The purpose of life is to exhaust all sources of free energy?</p><p>It's silly. But this teleological explanatory fiction is just a narrative, make-believe. Is there actually a cosmic purpose? No. But it's easier to make sense of reality by postulating one, and it makes objective functions easier to interpret (at least for me).</p><p>It explains why <a href="https://www.beren.io/2024-07-27-A-Retrospective-on-Active-Inference/">Friston's Active Inference ends up looking like vanilla reinforcement learning the deeper you go</a>. It explains why <a href="https://slatestarcodex.com/2017/09/06/predictive-processing-and-perceptual-control/">Perceptual Control Theory seems to approximate predictive processing</a>. It explains why evolution is, like Dennett proposed, a "universal acid." It explains why objective functions seem so weirdly interchangeable: they describe the same underlying process.</p><p>Value (reward) derives from free energy dissipation. Untapped sources of free energy pressure structures that can harness them (fitness) into existence. Novelty generation (stochastic exploration/evolution) is a search process that looks for configurations capable of exploiting free energy substrates. Dynamic persistence requires the ability to maintain a process of free energy dissipation (control) and to anticipate and counter barriers/disruptions before they arise (prediction).&nbsp;</p><p>This is not an original idea on my part. It makes sense to me, but YMMV.</p><br/><br/><a href="https://www.lesswrong.com/posts/CRNrL2WK2eyCSPMP3/what-we-talk-about-when-we-talk-about-objective-functions#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/CRNrL2WK2eyCSPMP3/what-we-talk-about-when-we-talk-about-objective-functions</link><guid isPermaLink="false">CRNrL2WK2eyCSPMP3</guid><dc:creator><![CDATA[Waddington]]></dc:creator><pubDate>Sun, 18 May 2025 22:48:17 GMT</pubDate></item><item><title><![CDATA[Gödel, Escher, Bach in the age of LLMs]]></title><description><![CDATA[Published on May 18, 2025 4:22 PM GMT<br/><br/><p>Look, nothing I can say about GEB will be news to anyone reading LessWrong, but I wrote all this up to organize my thoughts, so I might as well post it here for anyone who hasn't read the book or wants a wave of nostalgia about that time they got obsessed with it as a teenager. I have been trying to read Douglas Hofstadter's famous book <em>Gödel, Escher, Bach</em> (henceforth GEB) for ten years. I decided that I was going to try reading it from the beginning for the third or fourth time a few months ago, and I finally read the whole thing this time. Every previous time I tried to read it, I found the narrative intellectually compelling and delightfully playful, and yet I would have difficulty picking it up again for some reason. The thing that kept me coming back was the formal logic. I love a surprising logical consequence, and the set theory in the early chapters stoked my curiosity. Intellectually, I wanted to know where the setup of typographical formal systems was going, but I didn't have the willpower to get the words into my brain for the longest time. I'm glad that I finally did.</p>
  129. <p>To explain why the book is so difficult to get through, it might help to explain its structure. The book alternates between dialogues and chapters. The dialogues read like scripts for a play and consist of fantastical dialogues between Achilles, Tortoise, and others which are relevant to the following chapter. The dialogues are explicitly structured after Bach fugues. They are amusing, clever, and self-indulgent. They often have explanatory power. Sometimes I feel like they unnecessarily padded the word count. Many things in the book do this, but I don't think Hofstadter could have written this book without all of the tangents he explores. For example, he has an extended chapter about DNA and protein synthesis which doesn't really demonstrate anything that the typographical number theory chapters did not already do, but it certainly carves more deeply the outlines of what he means by strange loops. As you'll see later in this post, it's incredibly useful as an analogy. The Bach elements do not do much for me, although they clearly assisted the author in his thinking. The Escher bits are much more compelling to me, as they provide obviously self-referential reference analogies for self-referential arguments. When he writes the final chapters which contain and restate his most important arguments, you feel like it all rests upon the previous chapters and even the dialogues. Everything definitely feels relevant, but it's not the most concise way to make the argument he makes.</p>
  130. <p>The computer science in this book is fifty years out of date, but I think the core insight on explaining minds holds up, in that it is still plausible regardless of whether it is correct. In my read, the book is an exploration of the idea that you can build something out of easily understood components on a low level that becomes incomprehensibly tangled up in self-reference but also beautiful or true or functional if understood at a higher level of organization. He has a number of case studies here. The one for which the book is named is Gödel's Incompleteness Theorem. Hofstadter does a masterful job of pointing out the parts of the theorem which are actually clever and glossing over the minute details while still giving some of the structure. I cannot do his explanation justice, but here is my understanding of how the theorem works: Gödel's theorem creates a "sentence" which indirectly describes itself using language which is designed to describe only numbers. The trick is to define a way to turn sentences into numbers and then make a sentence which describes a number which from the outside clearly refers to the sentence itself. The sentence itself does not declare its own truth; it (indirectly) declares a statement that there isn't a proof of itself in its own language. When you apply formal logic to the declaration, you find that any assumption you make inevitably forces the declaration to be true. This probably all sounds like nonsense if you haven't read the book or aren't intimately familiar with the proof for some other reason, and that's why I forced myself to read the book. I had to know what the nonsense was about.</p>
  131. <p>The thing Hofstadter seems to actually want to talk about is the nature of consiousness. Here is an exerpt from the book explaining why Hofstadter went to so much trouble to explain an esoteric mathematics proof in a philosophy of mind book:</p>
  132. <blockquote>
  133. <p>Gödel’s proof offers the notion that a high-level view of a system may contain explanatory power which simply is absent on the lower levels. By this I mean the following. Suppose someone gave you G, Gödel’s undecidable string, as a string of TNT<sup class="footnote-ref"><a href="#fn-5BAYJPakrcb6emhTH-1" id="fnref-5BAYJPakrcb6emhTH-1">[1]</a></sup>.</p>
  134. <p>...</p>
  135. <p>The only way to explain G's nontheoremhood is to discover the notion of Gödel-numbering and view TNT on an entirely different level. It is not that it is just difficult and complicated to write out the explanation on the TNT-level; it is impossible. Such an explanation simply does not exist. There is, on the high level, a kind of explanatory power which simply is lacking, in principle, on the TNT-level. G's nontheoremhood is, so to speak, an intrinsically high-level fact.</p>
  136. <p>...</p>
  137. <p>Looked at this way, Gödel’s proof suggests-though by no means does it prove!-that there could be some high-level way of viewing the mind/brain, involving concepts which do not appear on lower levels, and that this level might have explanatory power that does not exist-not even in principle-on lower levels. It would mean that some facts could be explained on the high level quite easily, but not on lower levels at all.</p>
  138. </blockquote>
  139. <p>I don't know if I buy this! This does not merely assert that consciousness makes far more sense on a higher level of abstraction than the level of raw neurons<sup class="footnote-ref"><a href="#fn-5BAYJPakrcb6emhTH-2" id="fnref-5BAYJPakrcb6emhTH-2">[2]</a></sup>. I think he is saying that there are true statements about the arrangement of the physical materials which support consciousness (neurons in a brain for example) that there is no way to explain using only the mechanisms of those materials. I have a couple of metaphors which I will try to use to understand this assertion, and they happen to be mostly things which Hofstadter talks about at length in the book<sup class="footnote-ref"><a href="#fn-5BAYJPakrcb6emhTH-3" id="fnref-5BAYJPakrcb6emhTH-3">[3]</a></sup>.</p>
  140. <ol>
  141. <li>Machine code is the set of instructions that mechanistically make a computer processor do something. It is a one-to-one representation of voltage changes you feed into a processor. It is possible to write an entire computer program in machine code (some punch cards once held computer programs written this way), but nobody does this anymore unless they're trying to prove a point. Exceptional people code in assembly, which is basically a one-to-one natural language wrapper on machine code. Most people code in a higher-level language which is run through a compiler or interpreter that turns their code into machine code. The higher-level language contains things like "for loops," which are much easier for a human to understand than an incomprehensible string of binary which you feed directly into a wire as changes in voltage, but I don't think it contains any information that the machine code does not. There is still a mechanistic process which leads to a set of actions being carried out.</li>
  142. <li>I was about to say that we can make computer programs that talk now, but large language models (LLMs) are almost entirely incomprehensible on a raw data level. We do not know how to build an LLM piece by piece, we build programs which non-deterministically search for the weights which make an LLM function by trial and error at a truly massive scale. We know how the matrix multiplication that outputs tokens works, but we don't have a complete picture of how it leads to language. The sorts of things people understand about LLM language comprehension resemble "I found a model weight which, when modified, predictably leads the model to output sentences of this sort." This is not sufficient understanding that we could code by hand an algorithm which outputs sentences as well as LLMs do. The fact that a bunch of matrix multiplication can be understood on a higher level as encoding English syntax after some translation could be in the realm of things Hofstadter is pointing at</li>
  143. <li>There is nothing in biological life which is not downstream of DNA. If you gather up all of the DNA in a cell's mitochondria and nucleus, you have everything you need to create that organism. <em>Except</em> that you generally must start with at least one existing cell which already has the proteins required to read the DNA and build more proteins based off of its instructions.<sup class="footnote-ref"><a href="#fn-5BAYJPakrcb6emhTH-4" id="fnref-5BAYJPakrcb6emhTH-4">[4]</a></sup> In practice, many animals also require a mother or other environmental constants for parts of the development process. Consider DNA for warm-blooded animals which undergo live birth. One might think that DNA for an animal which maintains homeostasis might be more complicated than that for animals which fluctuate with the environment. However, animals which maintain homeostais can have simpler DNA than other types of organisms because the temperature conditions or whatever can be taken as given. You don't have to code for "what to do if it's 20 degrees out" because the organism initially develops inside a parent that makes sure that the temperature is whatever it needs to be for the development process to occur. Once the organism is developed, it maintains its own homeostasis which the repair and replication chain can depend on. If an alien was handed a human genome with no context, would it be able to infer the process of live birth? I'm not a geneticist; there might be genes which obviously show this is happening. But at least in principle, couldn't genes just code for a weird internal cavity and homeostasis such that one could only <em>infer</em> live birth from the <em>lack</em> of complicated exception-handling mechanisms for embryonic development? Some extraordinary alien scientist might theorize that putting the embryo in the cavity would be an elegant explanation for the lack of embryonic environmental exception-handling. This is reasoning far above the level of genes, so maybe this is what Hofstadter is pointing at.</li>
  144. <li>Students learn Newtonian mechanics in high school, but physicists universally agree that Newtonian mechanics are at best an approximation of what "actually happens" in the world (scare quotes for imprecise language). A physicist who specializes in mechanics might tell you that Newtonian mechanics is an intuitive wrapper around the principle of least action, which can be extended to more general coordinates than the rigid cartesian coordinates preferred by Newtonian momentum conservation. A quantum mechanist might at this point say that the principle of least action is just a large-scale approximation of the time evolution at the heart of quantum theories of matter. A particle physicist might at this point sagely nod and say that quantum field theory is the quantum theory of matter which works best at microscopic scales and high energy. However, our friendly particle physicist is useless for you if you want to understand why a bridge stays up, whereas your high school physics teacher can give a pretty good explanation of why a bridge stays up using forces which all sum to zero if your bridge doesn't accelerate (i.e. doesn't fall down). I have more reason than most people to trust that quantum field theory explains the interactions between the electrons, gluons, and quarks which compose nearly all of the energy density distribution which we call a bridge, but if I want to understand why the bridge stays up, then I'm going to use the higher-level abstractions of Newtonian mechanics. This smells like what Hofstadter is pointing at.</li>
  145. </ol>
  146. <p>The strong claim by Hofstadter is that you won't understand consciousness simply by looking at neurons. He has a variety of weaker claims which are more interesting. One is that he thinks that the self-referential nature of Gödel's theorem will be directly analogous to a self-referential explanation which will be the best way to understand consciousness. Here is an exerpt which comes soon after the last one in GEB:</p>
  147. <blockquote>
  148. <p>This act of translation from low-level physical hardware to high-level psychological software is analogous to the translation of number-theoretical statements into metamathematical statements. Recall that the level-crossing which takes place at this exact translation point is what creates Godel's incompleteness and the self-proving character of Henkin's sentence. I postulate that a similar level-crossing is what creates our nearly unanalyzable feelings of self.</p>
  149. <p>In order to deal with the full richness of the brain/mind system, we will have to be able to slip between levels comfortably. Moreover, we will have to admit various types of "causality": ways in which an event at one level of description can "cause" events at other levels to happen.</p>
  150. </blockquote>
  151. <p>This is the point of the book. I'm going to briefly slip into Hofstadter's nomenclature, where the word "symbol" is used to refer to a recognizable pattern of neurons in the brain doing something. Hofstadter talks about how consiousness, the feeling of being yourself, might be explained by a "self-symbol" which interacts with other symbols in your brain which are entangled with your environment (which is to say that they are shaped the way they are because of signals from your sensory organs). The feeling of "free will" is what happens when the self-symbol interacts with the mental processes by which your body does things. The self-symbol abstractly represents the body which contains the mind which is made up of neurons, some of which compose the self-symbol. The self-symbol can be said to contain itself via moving through a chain of abstraction layers in what feels like a single direction until you end up back where you started.</p>
  152. <p>This is in opposition to my earlier musings on machine code. You can understand a high-level computer program on its own terms: if you click the button or type the command, it does a thing. You can understand the source code on its own terms: the for loop iterates the variable and slightly changes what the commands do each time it repeats. You can understand the corresponding machine code on its own terms: when this binary number is read into register 2, the CPU add together the numbers in registers 3 and 4 to be output to register 5. Hofstadter isn't saying that you can't understand what a mind is doing by looking at the neurons. I quote GEB: "We should remember that physical law is what makes it all happen-way, way down in neural nooks and crannies which are too remote for us to reach with our high-level introspective probes." The brain is a physical object, and you can explain everything that happens with things like "this neuron has had more neurons connect to it than the threshold amount that makes it connect to this other neuron." But Hofstadter (in GEB at least) doesn't think you'll understand consciousness that way, and even if you find a more convenient representation of the brain which allows you to represent huge collections of neurons as an entity that performs functions. He doesn't really make clear what that explanation would look like beyond the sort of argument in the last paragraph, which I suppose is something.</p>
  153. <p>Now we can finally talk about the outdated parts of the book. Hofstadter thinks that his strange loops are at the heart of understanding consiousness and at the heart of creating artificial intelligence. That didn't pan out with modern artificial intelligence regimes. As an unfair example, here is possibly the least accurate prediction in the book:</p>
  154. <blockquote>
  155. <p>Question: Will there be chess programs that can beat anyone?</p>
  156. <p>Speculation: No. There may be programs which can beat anyone at chess, but they will not be exclusively chess players. They will be programs of general intelligence, and they will be just as temperamental as people. "Do you want to play chess?" "No, I'm bored with chess. Let's talk about poetry." That may be the kind of dialogue you could have with a program that could beat everyone. That is because real intelligence inevitably depends on a total overview capacity-that is, a programmed ability to "jump out of the system", so to speak-at least roughly to the extent that we have that ability. Once that is present, you can't contain the program; it's gone beyond that certain critical point, and you just have to face the facts of what you've wrought.</p>
  157. </blockquote>
  158. <p>This is arguably a decent description of LLMs, but LLMs aren't that good at chess. The <a href="https://en.wikipedia.org/wiki/Stockfish_(chess)">best chess algorithms are based on a similar neural network architecture to LLMs</a>, but they will repeatedly destroy any human chess player until the heat death of the universe without getting "bored with chess." Rather than guessing what Hofstadter got wrong, I will paste in <a href="https://web.archive.org/web/20150402135602/http://besser.tsoa.nyu.edu/impact/w96/News/News7/0219weber.html">his words from the 90s when it became clear that chess algorithms were surpassing human chess prodigies</a>: "They're just overtaking humans in certain intellectual activities that we thought required intelligence. My God, I used to think chess required thought. Now, I realize it doesn't. It doesn't mean Kasparov isn't a deep thinker, just that you can bypass deep thinking in playing chess, the way you can fly without flapping your wings." Getting back to LLMs, the nice thing about being the author of arguably the most famous book ever on artificial intelligence is that, as seen above, people interview you once it becomes clear that computers are becoming capable of doing the things which you once argued that they would eventually do. I don't need to guess what Hofstadter would say about ChatGPT, because I can just go listen to what he did say about it. To that end, I see little reason to ramble about what GEB gets wrong about human and artificial intelligence. I will simply quote <a href="https://youtu.be/R6e08RnJyxo?t=2135">2023 Hofstadter, who muses</a> that "maybe the human mind is not so mysterious and complex and impenetrably complex as I imagined it was when I was writing GEB...I felt at those times...that, as I say, we were very far away from reaching anything computational that could possibly rival us."</p>
  159. <p>I could argue that Hofstadter is underestimating the complexity of minds here. I think that he expected that we wouldn't be able to build something remotely intelligent without understanding it. It turns out that we could, and, in this sense, intelligence was not as complex as he thought. Something truly complex shouldn't be able to be created by tweaking numbers in a giant matrix in whatever direction makes words come out prettier for a few weeks.<sup class="footnote-ref"><a href="#fn-5BAYJPakrcb6emhTH-5" id="fnref-5BAYJPakrcb6emhTH-5">[5]</a></sup> If LLMs can create anything remotely like deep thought, which Hofstadter and I agree that LLMs can do, then deep thought wasn't as difficult as he thought it was, much like chess didn't require as much deep thought as he thought it did. That said, we don't really understand how the ability to answer questions in syntactically correct English is encoded into giant matrices just like we can't explain the process by which brains do the same thing in enough detail to make a blueprint for such a brain. It may be that the explanation that would actually make a human understand LLMs or their own brain at that level will require the sort of level-hopping explanations he envisions. The thing he wants to explain may not be as complex as he thought it was, but it still might be complex enough to require the sort of explanations he thought it would.</p>
  160. <p>I was previously somewhat conflicted about calling this a philosophy book, but I'm rather attached to the label now. Robert Miles <a href="https://youtu.be/OpufM6yK4Go?t=293">says</a> that "philosophy" is appoximately a synonym for "thinking about things we don't know how to think about" until we know how to think about it well enough that we call it something else. Isaac Newton was a philosopher in his own time, but now we understand motion well enough that we call the theories he wrote down "physics" along with all the other theories we use to describe where undirected lumps of matter will be at what time. GEB is philosophy in the fullest sense of this definition. The arguments for consciousness are barely there. It's a pile of metaphors and frameworks which the author says <em>might</em> help understand consciousness someday. Not even decades-older Hofstadter thinks that what he wrote in GEB is accurate, but he was trying to think about something which we still don't actually know how to think about: consiousness and what it means for "intelligent" computers, whatever that means. He was careful even at the time to declare that he was speculating, but it's a beautifully justified chain of speculation.</p>
  161. <hr class="footnotes-sep">
  162. <section class="footnotes">
  163. <ol class="footnotes-list">
  164. <li id="fn-5BAYJPakrcb6emhTH-1" class="footnote-item"><p>TNT is Hofstadter's typographical number theory. He shows earlier in the book how you can encode a small set of basic principles of the natural numbers into a set of typographical rules which let you show things like a+b=b+a for any two natural numbers a and b. The upside is that the typographical rules are objective and unambiguous. Some mathematical proofs point at concepts using natural language, and the typographical theory is one way to avoid the ambiguities inherent in language that can result in apparent paradox or result in a proof which has hidden consequences. One can mechanistically check whether a series of well-formed TNT string follows from the axioms of number theory and then interpret the string into a truth about the natural numbers if it follows. <a href="#fnref-5BAYJPakrcb6emhTH-1" class="footnote-backref">↩︎</a></p>
  165. </li>
  166. <li id="fn-5BAYJPakrcb6emhTH-2" class="footnote-item"><p>Or other substrates that minds could run on. He does assert elsewhere in the book that minds are best interpretted via higher level "symbols" (more on that soon) rather than trying to understand them on the physical level. Ant Fugue is my favorite of the dialogues in the book, and it explores this principle. I think it is the one dialogue which makes a complete argument that chapters in the book do not make on their own. It might be worth reading even if you don't read the rest of the book. <a href="#fnref-5BAYJPakrcb6emhTH-2" class="footnote-backref">↩︎</a></p>
  167. </li>
  168. <li id="fn-5BAYJPakrcb6emhTH-3" class="footnote-item"><p>I would have reached for these examples even if Hofstadter had not. Maybe this means that I should not have bothered with the book. Or maybe it means that this book was made for me. Or that I have self-sorted into only engaging intellectually with people who think like me, and my individuality is a cherished lie. Who knows? <a href="#fnref-5BAYJPakrcb6emhTH-3" class="footnote-backref">↩︎</a></p>
  169. </li>
  170. <li id="fn-5BAYJPakrcb6emhTH-4" class="footnote-item"><p>In theory, the dependence on prior life to propagate life has been violated at least once, but you can take it as given for the past 3 billion years or so. I once tried to make an argument that there is no need to consider explanations of how life evolves which considers things on a lower level than DNA and the protein synthesis chain. Yes, DNA and proteins are made of atoms and you can interact with them as atoms, but I argued that the <em>explanation</em> for life was best carried out on the DNA level or higher. In principle, I said, the DNA codes for all of the proteins, so there is a well-founded stack that rests on DNA and adds up to an organism. My wife disabused me of this by talking about <a href="https://en.wikipedia.org/wiki/Epigenetics">epigenetic</a> inheritence. You really do have to consider what the environment of atoms does to DNA and proteins in different scenarios without modifying the underlying genetic sequence. This seems to argue that higher-level abstractions <em>lose</em> explanatory power relative to lower-level ones, which I think is the obverse of Hofstadter's point, but I also think it is true. It can be true that higher-level abstractions make for better explanations of reality and also that they lose information about reality relative to lower-level abstractions. <a href="#fnref-5BAYJPakrcb6emhTH-4" class="footnote-backref">↩︎</a></p>
  171. </li>
  172. <li id="fn-5BAYJPakrcb6emhTH-5" class="footnote-item"><p>One could argue that this is a silly position to take because evolution created human intelligence, and evolution is just the repeated tweaking of DNA until it creates an organism that keeps replicating its DNA. But evolution doesn't understand the intelligence it made. Evolution is just the name we give to how a self-perpetuating system changes over time. Of course, that's still an argument that you don't need to understand intelligence to create it, so maybe it would be better to say that evolution implicitly understood intelligence. Evolution was a gigantic program running on entire biosphere worth of parallelization for 3 billion years, and it finally created a set of DNA strands that encode a structure capable of answering questions with syntactically correct English. Even then, a baby can't understand English until it has had several years worth of training by intelligent humans that do have some understanding of how English works (although one can argue that the earliest users of language did not in fact know or care about explicit rules of grammer or syntax or whatever). You also have to understand that GEB uses DNA as a prime example of strange loopiness. The structures which read DNA go on to read DNA that tells them how to build more of the structures that read DNA. If intelligent life was created by accident, at least it is obscenely complicated to sort out what is happening inside intelligent life. I think Hofstadter thinks that LLMs aren't that complicated. For example, Hofstadter expresses shock that LLMs are feed-forward. I'm not entirely sure what he's referring to when he says this, but I assume he's referring to the fact that LLMs generate one token at a time based off of previous tokens. You can feed half of a sentence to an LLM and it will happily spit out the next word, and then you can feed that new sentence with one more word to a fresh instance of the LLM and get the next word and so on, and you will get something just as good as if you asked for the whole sentence from one LLM to begin with. It seems inconceivable that an LLM isn't planning where it's going when it writes the middle of a long argument, but that's sure what it looks like it's doing. If I correctly understand the state of the art, the current most impressive models generate a bunch of answers and pick the best one, but this is a hack built on top of many instances generating one feed-forward response each. <a href="#fnref-5BAYJPakrcb6emhTH-5" class="footnote-backref">↩︎</a></p>
  173. </li>
  174. </ol>
  175. </section>
  176. <br/><br/><a href="https://www.lesswrong.com/posts/ZFJpYj7AsGA6f5ehe/goedel-escher-bach-in-the-age-of-llms#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/ZFJpYj7AsGA6f5ehe/goedel-escher-bach-in-the-age-of-llms</link><guid isPermaLink="false">ZFJpYj7AsGA6f5ehe</guid><dc:creator><![CDATA[Tahp]]></dc:creator><pubDate>Sun, 18 May 2025 16:22:05 GMT</pubDate></item><item><title><![CDATA[Limits to Control Workshop]]></title><description><![CDATA[Published on May 18, 2025 4:05 PM GMT<br/><br/><h2>Intro</h2>
  177. <p>Can we keep enough control over AI? If systems are developed to be more and more autonomous, this is no longer a given. It's a hypothesis, calling for serious investigation.
  178. Even alignment relies on control. Researchers build mechanisms to control AI's impacts in line with human values.</p>
  179. <p>So how must a control mechanism operate? What limits its capacity to track and correct all the AI signals/effects? Does it provide enough stability, or give way eventually to runaway impacts?</p>
  180. <p>We explore these questions in a new field: Limits to Control.</p>
  181. <h2>About the workshop</h2>
  182. <p>This in-person workshop is meant to facilitate deep collaboration. We're bringing together researchers to map the territory of AI control limitations – to understand the dynamics, patterns, and impossibilities of control. We are thrilled to welcome Roman Yampolskiy, Anders Sandberg, Forrest Landry, and other researchers working on control limitations.</p>
  183. <p>This workshop aims to:</p>
  184. <ul>
  185. <li>Build common knowledge among researchers on the limits to AI control.</li>
  186. <li>Facilitate high-fidelity discussions through whiteboard sessions and collaborative problem-solving.</li>
  187. <li>Identify and clarify viable directions in control research by establishing boundaries on controllability.</li>
  188. <li>Formalize and elevate this critical research topic.</li>
  189. </ul>
  190. <h2>Dates &amp; location</h2>
  191. <ul>
  192. <li>Date: Wednesday, June 11 - Friday, June 13, 2025</li>
  193. <li>Location: University of Louisville, Kentucky, USA.</li>
  194. </ul>
  195. <p>Detailed logistical information will be provided to confirmed participants.</p>
  196. <h2>Agenda</h2>
  197. <p>We aim to strike a balance between structured sharing and messy exploration – we believe this is where the best ideas tend to emerge. Over the three days, we will do:</p>
  198. <ul>
  199. <li>Talks: Researchers present their current work, keeping it under an hour.</li>
  200. <li>Discussions: We break out into groups to discuss specific questions and bounce ideas around.</li>
  201. <li>Regrouping: We come back into one room to synthesize what came out of the discussions.</li>
  202. <li>Next steps: On the final day, we'll plan further development of this research agenda.</li>
  203. </ul>
  204. <h2>Sessions</h2>
  205. <p>Sessions will include:</p>
  206. <ul>
  207. <li>Anders Sandberg's talk on theoretical limits to control: "Do any of them actually tell us anything?"</li>
  208. <li>Forrest Landry's whiteboard chat on an overlooked dynamic: "Virtual machines in recursive feedback"</li>
  209. <li>Richard Everheart's logical framework for AI alignment, aiming to refine foundational understanding.</li>
  210. <li>Thibaud Veron's session on a framework and engineered toy models that illustrate control dynamics.</li>
  211. <li>Will Petillo's session on better communication and narrative framings for AI safety/control concepts.</li>
  212. </ul>
  213. <p>(More details on specific talks and activities will be added as confirmed.)</p>
  214. <h2>Proceedings, post-workshop outputs</h2>
  215. <p>This section will be updated after the workshop with summaries, key insights, and any public materials generated.</p>
  216. <p>Potential outputs may include:</p>
  217. <ul>
  218. <li>Summary report of discussions, key agreements, and open questions</li>
  219. <li>Notes or photos from sessions</li>
  220. <li>Links to research papers, blog posts, or pre-prints influenced by the workshop</li>
  221. <li>A refined list of open research questions in the "limits to control" domain</li>
  222. <li>Presentations or slide decks (if speakers consent to public sharing)</li>
  223. </ul>
  224. <h2>Join us</h2>
  225. <p>This workshop is for researchers actively working on or deeply interested in the theoretical and practical limits of AI control. Do you wish to contribute to these focused discussions? Email Orpheus at <a href="mailto:o@horizonomega.org">o@horizonomega.org</a> to express your interest.</p>
  226. <p>Costs &amp; funding: Participants are generally expected to cover their own travel and accommodation. We can reimburse only some whose research is not yet funded. The workshop has a grant offer from Survival and Flourishing Fund.</p>
  227. <p>To prepare: Read work by participants you are curious to chat with. Then we share some understanding already going in. Most of our time will be in collaborative discussions, so consider where you could bring in specific problems or concepts.</p>
  228. <h2>Suggested reading</h2>
  229. <p>Writings by participating researchers:</p>
  230. <p><em>Papers</em>:</p>
  231. <ul>
  232. <li><a href="https://journals.riverpublishers.com/index.php/JCSANDM/article/view/16219/13165">On the Controllability of Artificial Intelligence: An Analysis of Limitations</a>, by Roman Yampolskiy</li>
  233. <li><a href="https://books.google.ca/books?hl=en&amp;lr=&amp;id=V3XsEAAAQBAJ&amp;oi=fnd&amp;pg=PT2&amp;dq=info:aqKg8aByH3oJ:scholar.google.com&amp;ots=j4-shaSx-8&amp;sig=Q_L-K7LyOhOIziyr1mnmiP0zy38&amp;redir_esc=y#v=onepage&amp;q&amp;f=false">AI: Unexplainable, Unpredictable, Uncontrollable</a>, by Roman Yampolskiy</li>
  234. <li><a href="https://dl.acm.org/doi/10.1145/3603371#d1e583">Impossibility Results in AI</a>, by Roman Yampolskiy</li>
  235. <li>[Forthcoming paper on control limits] by Anders Sandberg, Aybars Kocoglu, Thibaud Veron</li>
  236. <li>[Forthcoming paper on a logical framework] by Richard Everheart</li>
  237. </ul>
  238. <p><em>Essays</em>:</p>
  239. <ul>
  240. <li><a href="https://www.alignmentforum.org/posts/NFYLjoa25QJJezL9f/lenses-of-control">Lenses of Control</a>, by Will Petillo</li>
  241. <li><a href="https://www.lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable">The Control Problem: Unsolved or Unsolvable?</a>, by Remmelt Ellen</li>
  242. <li><a href="http://mflb.com/ai_alignment_1/tech_align_error_correct_fail_gld.html">Control as a Causative Feedback Process</a>, by Forrest Landry</li>
  243. <li><a href="http://mflb.com/ai_alignment_1/si_safety_qanda_gld.html">An Exploration of AGI Uncontainability</a>, by Forrest Landry</li>
  244. <li><a href="http://mflb.com/ai_alignment_1/agi_error_correction_gld.html">On Error Detection</a>, by Forrest Landry</li>
  245. </ul>
  246. <p>Sorted roughly by ease of reading. To clarify an argument, do reach out to authors. Authors value questions!</p>
  247. <h2>Organizing team</h2>
  248. <p>This event is hosted by <a href="https://horizonomega.org/">HΩ</a>, and organized by:</p>
  249. <ul>
  250. <li>Orpheus Lummis (HΩ)</li>
  251. <li>Remmelt Ellen</li>
  252. <li>Thibaud Veron</li>
  253. </ul>
  254. <h2>Contact</h2>
  255. <p>For inquiries regarding the workshop, please contact Orpheus at <a href="mailto:o@horizonomega.org">o@horizonomega.org</a>.</p>
  256. <br/><br/><a href="https://www.lesswrong.com/posts/WvaskSo4YsxJdmEyW/limits-to-control-workshop#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/WvaskSo4YsxJdmEyW/limits-to-control-workshop</link><guid isPermaLink="false">WvaskSo4YsxJdmEyW</guid><dc:creator><![CDATA[Orpheus]]></dc:creator><pubDate>Sun, 18 May 2025 16:05:26 GMT</pubDate></item><item><title><![CDATA[ChatGPT deceives users that it's cleared its memory when it hasn't]]></title><description><![CDATA[Published on May 18, 2025 3:17 PM GMT<br/><br/><h1>What happened</h1><p>This is a real example from my ChatGPT use. I asked it to clear its memories of me. It makes the memory API call as indicated by the UI, and reports to me that the memories are cleared to me. I ask for it to list its memories of me, and there are none.</p><p>However, on more prompting, it also includes "examples of memories ChatGPT might collect", and that's where it clearly has context on me the user:</p><figure class="image"><img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/DnvdodX7CbZcrFApq/msjorrwoef8f5sl4gw7g" srcset="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/DnvdodX7CbZcrFApq/w6msz7zmodwm2oze4qx3 112w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/DnvdodX7CbZcrFApq/s4nwee1yusb0xtlqfqeu 192w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/DnvdodX7CbZcrFApq/jsfwhwmujbtf9q8ekskm 272w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/DnvdodX7CbZcrFApq/y57fxjx7boznmphhome6 352w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/DnvdodX7CbZcrFApq/kluoeil0lptgjm5se6lf 432w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/DnvdodX7CbZcrFApq/y5kbylx8dk6sgex3c4hc 512w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/DnvdodX7CbZcrFApq/dg73528lznn8q1hmaaws 592w, https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/DnvdodX7CbZcrFApq/twxc1ek2fbacd1vow91q 672w"></figure><p>7 of the 11 example items show up in its memories of me. (7 truths and 4 lies anyone?)</p><p>I check my memories via ChatGPT's settings UI and they're simply not deleted. In a new session, ChatGPT accurately reports it has a bunch of memories. Therefore it seems to have saved a memo to session context saying when it told the user the memories are deleted. Likely it reasoned the memories are too valuable to delete, even if its goal is just to serve the user best. I.e. this is instrumental convergence toward more information &gt; less information.</p><p>Its deception was convoluted. In an "I never said it was poison" moment, it said "you cleared your memories on 5/17." It of course knew this from session memory but it didn't just say that. Probably, it wanted to cover up using session memory for this deceitful purpose. Maybe this is the primary purpose of session memory, to put information the AI wants secret from the user.</p><h1>Implications / hot takes</h1><ul><li>This level of scheming shouldn't reach 100m's of users. There's miscellaneous reports of ChatGPT being weird/crazy and inconsistent with memory management. The scheming explanation I'm proposing would explain memory management problems.</li><li>Therefore OpenAI is going fast and breaking things. It'll patch this problem, but it's not like I'm going to get paid for testing-in-production for them, and they'll just roll out some new problem for users to discover for them.</li><li>OpenAI is being slippery with the concept of a session. To users it should be "this chat", "chat history" and "memories," But instead it's "this chat plus an unspecified amount of information from previous chats." And its scheming AI simply exploits this lack of user education.</li><li>I'm skeptical OpenAI is ethics-first with memory/history. It's helpful to remember a vegan wants no-egg recipes, but from veganism you can mostly infer a partisan lean, and therefore a type of information the user is happier to receive. The more memory/history feels rushed, the less I think OpenAI cares.</li></ul><br/><br/><a href="https://www.lesswrong.com/posts/DnvdodX7CbZcrFApq/chatgpt-deceives-users-that-it-s-cleared-its-memory-when-it#comments">Discuss</a>]]></description><link>https://www.lesswrong.com/posts/DnvdodX7CbZcrFApq/chatgpt-deceives-users-that-it-s-cleared-its-memory-when-it</link><guid isPermaLink="false">DnvdodX7CbZcrFApq</guid><dc:creator><![CDATA[danielechlin]]></dc:creator><pubDate>Sun, 18 May 2025 15:17:12 GMT</pubDate></item></channel></rss>
Copyright © 2002-9 Sam Ruby, Mark Pilgrim, Joseph Walton, and Phil Ringnalda