<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Simply Boring AI]]></title><description><![CDATA[Making AI useful, boring, and safe. With art and stories.]]></description><link>https://www.simplyboring.ai</link><image><url>https://www.simplyboring.ai/img/substack.png</url><title>Simply Boring AI</title><link>https://www.simplyboring.ai</link></image><generator>Substack</generator><lastBuildDate>Sun, 05 Apr 2026 09:06:46 GMT</lastBuildDate><atom:link href="https://www.simplyboring.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Gary Ang (Ming)]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[simplyboringai@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[simplyboringai@substack.com]]></itunes:email><itunes:name><![CDATA[Gary Ang (Ming)]]></itunes:name></itunes:owner><itunes:author><![CDATA[Gary Ang (Ming)]]></itunes:author><googleplay:owner><![CDATA[simplyboringai@substack.com]]></googleplay:owner><googleplay:email><![CDATA[simplyboringai@substack.com]]></googleplay:email><googleplay:author><![CDATA[Gary Ang (Ming)]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Garfield's Eyes]]></title><description><![CDATA[Second in my series on what happens when the people who make things meet a technology that also makes things]]></description><link>https://www.simplyboring.ai/p/garfields-eyes</link><guid isPermaLink="false">https://www.simplyboring.ai/p/garfields-eyes</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Mon, 30 Mar 2026 14:06:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2jFe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is the second in my series on what happens when the people who make things meet a technology that also makes things.</p><p>The first part, &#8220;<em><a href="https://open.substack.com/pub/simplyboringai/p/my-shifu?utm_campaign=post-expanded-share&amp;utm_medium=post%20viewer">My Shifu</a></em>&#8220;, covered how I met <em><a href="https://en.wikipedia.org/wiki/Johnny_Lau">Johnny Lau</a></em> through a crazy experiment called Creative Youth Xchange, and his view that technology forces artistic purity rather than threatening it.</p><p>I&#8217;ve been sitting on this, but this <em><a href="https://www.linkedin.com/posts/randomwalker_discussions-of-ai-and-creativity-tend-to-activity-7443289527525277696-CDfk?utm_source=social_share_send&amp;utm_medium=member_desktop_web&amp;rcm=ACoAAAnEwqsBmNv-udZ8tKaEG_MQGlUiz7C_KAg">post</a></em> by Arvind Narayanan on how creativity in art and creativity in science are different made me get back to this second part.</p><p>This article goes deeper into the creative process.  About creative production, and why AI may not be that great a leap.</p><h2><strong>Is AI Really A Leap?</strong></h2><p>One of the key fears of creatives is that this leap into AI may be one too far. Further than the camera. Further than digital tools.</p><p>Something that creativity cannot keep up with. But is it realistic to think that all art in the future will be purely from AI?</p><p>So I asked Johnny about his jumps from pen-and-paper comics to different technologies, the most recent being a Singlish speaking robot. Most people would call that a transformation. He didn&#8217;t:</p><p><em>&#8220;It was never a leap for me. Right from the beginning when I launched the first book in 1990, I engineered a range of products such as t-shirts &amp; mugs, a 13-inch ruler and bumper sticker with a tagline: Beware of Kiasu Driver. The latter became a bestseller instantly.&#8221;</em></p><p>My read. This leap is perhaps not that different from the ones in the past for creatives. And AI is perhaps more of an opportunity than a threat.</p><p><em>&#8220;For me it has always been a concept, not a book nor a sticker ... My brain doesn&#8217;t function the way where sectors and industries are differentiated. I had to develop a language to align myself with how the world functions.&#8221;</em></p><p>A concept. Not a product. Not a comic.</p><p>Here&#8217;s what I think is going on. Narayanan&#8217;s distinction is useful, creativity in art is emotional, social, driven by taste. Creativity in science is instrumental, evaluable, a systematic search through possibilities. AI is getting better at the second kind. The first is harder to touch.</p><p>Johnny&#8217;s concept is the first kind. Not the drawing - the reason for the drawing. The cultural nerve. The recognition that makes a Singaporean laugh at their own kiasu-ness. That doesn&#8217;t live in a dataset. It comes from being someone, from somewhere.</p><p>The production - t-shirts, mugs, strips - is the second kind. Patterns. Systems. Things that can be decomposed and delegated.</p><p>Johnny worked this out decades ago. He kept the concept. He systematized the production. And the person who showed him how wasn&#8217;t an AI researcher. It was Jim Davis.</p><h2><strong>The Garfield Revelation</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2jFe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2jFe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2jFe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2jFe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2jFe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2jFe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg" width="580" height="323.64" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:558,&quot;width&quot;:1000,&quot;resizeWidth&quot;:580,&quot;bytes&quot;:179817,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.simplyboring.ai/i/192613643?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2jFe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2jFe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2jFe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2jFe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ceb51a3-d338-41d8-bf49-d9b6e6f35d13_1000x558.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Not exactly Garfield, but you get the idea.</figcaption></figure></div><p>I asked about architecture at USC. He said something I didn&#8217;t expect - that USC was never about architecture for him:</p><p><em>&#8220;The years at USC were never about the study of building design. I took full liberty to immerse myself in the world of media and entertainment when I landed in Los Angeles.&#8221;</em></p><p>That&#8217;s where he met Garfield. Not the cat. The operation.</p><p><em>&#8220;I was stunned by Davis and how he had built a team of artists and writers to produce daily comics strips for the lasagna loving cat for its national syndicate. He had deployed an animation studio system to run his comics studio.&#8221;</em></p><p>Here&#8217;s what he described. Davis ran his comics like a factory. Team meeting on Monday. Different themes tossed around for the week&#8217;s seven strips. Once decided, a lead artist carried out the drafts. The process cascaded through the team. And at the end:</p><p><em>&#8220;Davis merely just drew Garfield&#8217;s eyes as a gesture of completion.&#8221;</em></p><p>Garfield&#8217;s eyes. The signature. Everything else was system. That one stroke was authorship.</p><p>I sat with that image for a while.</p><p>Johnny brought this model back to Asia. Most Asian comics were produced by a single artist - one person doing everything. The Garfield team showed him a different path. It became Comix Factory, the studio that produced Mr Kiasu.</p><h2><strong>Agents and Art</strong></h2><p>Here&#8217;s the connection I keep thinking about.</p><p>Davis&#8217;s studio system was, in effect, an early version of what AI promises every creator today. Break the work into steps. Specialize each step. Let the system handle volume while the creator handles vision.</p><p>I&#8217;ve <em><a href="https://www.linkedin.com/pulse/thinking-agents-systems-age-ai-gary-ang-phd-gvb3c">written about agentic AI</a></em> before - systems where multiple AI agents each handle a different part of a task. One agent plans, another retrieves information, another reasons, another generates output. The whole thing works because each part has a defined role and you can observe what each part does.</p><p>Davis was essentially running an agentic system in the 1980s. Just with humans instead of models. One person for drafts, another for inking, the process cascading through the team. And the orchestrator - Davis - only needed to draw the eyes.</p><p>Davis understood in the 1980s what the AI industry is selling in the 2020s: you don&#8217;t need to do everything yourself. You need to know what only you can do.</p><p>For Davis, it was the eyes. For Johnny, it&#8217;s the concept.</p><p>And Johnny made the connection himself:</p><p><em>&#8220;Similarly with AI, we just have to change our mindset on the method of production. The creation portion doesn&#8217;t change as it will always come from the depth of our minds.&#8221;</em></p><p>Change the method of production. Keep the creation. That&#8217;s not a theory from someone who&#8217;s read a few articles about AI. That&#8217;s a principle from someone who&#8217;s been decomposing creative labor since 1990.</p><p>A nice footnote: in 2023, Mr Kiasu and Garfield collaborated on a joint library programme with Singapore&#8217;s National Library. A one-of-a-kind global collaboration between two comics characters - the Singaporean one modeled on the American one&#8217;s production system - sharing a stage forty years later.</p><h2><strong>AI As Just Another Method</strong></h2><p>There&#8217;s a broader pattern here that I think matters for anyone thinking about AI and creative work.</p><p>We tend to frame the AI question as: will machines replace artists? That&#8217;s the wrong question. The better question is: what part of your creative work is system, and what part is signature?</p><p>If you&#8217;re a writer, maybe the system is research, outlining, drafting. The signature is voice, judgment, the instinct for when a sentence lands. If you&#8217;re a designer, maybe the system is rendering, iteration, production. The signature is the concept that no brief could have specified.</p><p>Davis knew. He drew the eyes and let the system do the rest. Not because he was lazy. Because he understood what only he could do.</p><p>AI doesn&#8217;t change this question. It just makes it unavoidable.</p><p>Johnny figured this out before AI existed. He&#8217;s been running this model for thirty-five years - concept first, production second, and the concept crosses every medium, every platform, every technology that comes along.</p><p>The question I&#8217;m leaving with you: what&#8217;s the part that only you draw?</p><p>More soon.</p>]]></content:encoded></item><item><title><![CDATA[A Simple Reading List on Third-Party AI Risk Management]]></title><description><![CDATA[Probably the hardest problem in AI risk management]]></description><link>https://www.simplyboring.ai/p/a-simple-reading-list-on-third-party</link><guid isPermaLink="false">https://www.simplyboring.ai/p/a-simple-reading-list-on-third-party</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Wed, 25 Mar 2026 16:42:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xVmN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xVmN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xVmN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png 424w, https://substackcdn.com/image/fetch/$s_!xVmN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png 848w, https://substackcdn.com/image/fetch/$s_!xVmN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png 1272w, https://substackcdn.com/image/fetch/$s_!xVmN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xVmN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png" width="1456" height="977" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:977,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7614742,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.simplyboring.ai/i/192114649?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xVmN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png 424w, https://substackcdn.com/image/fetch/$s_!xVmN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png 848w, https://substackcdn.com/image/fetch/$s_!xVmN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png 1272w, https://substackcdn.com/image/fetch/$s_!xVmN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda320531-0f95-4a20-9a60-a5b753c975e6_2528x1696.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Yesterday, there was news about a threat actor compromising the popular LiteLLM package on PyPI. litellm is an ubiquitous library that many teams use as a unified interface to call different LLM APIs. The attacker didn&#8217;t need anything sophisticated. They poisoned the CI/CD pipeline through a compromised security scanner (the irony!), then pushed backdoored versions of litellm that harvested SSH keys, cloud credentials, and secrets on every Python startup. Seems like the hackers are now working hard - going through a 300gb treasure trove and extorting multi-billion-dollar companies.</p><p>This is what third-party AI risk actually looks like. Not the dramatic scenarios we imagine - collusion, rogue AI, coordinated manipulation. Sometimes it&#8217;s a simple file in a package you installed last Tuesday.</p><p>And it&#8217;s not just open-source libraries. Every AI vendor you use sits on its own supply chain of dependencies, sub-processors, and infrastructure providers. A single compromised link and you inherit the risk.</p><p>So how do you govern AI you don&#8217;t control? And maybe don&#8217;t even know exists.</p><p>I can&#8217;t say I have the answer. But I can offer a reading list.</p><p><em>Note: I have used open-access links from arXiv as far as possible.</em></p><div><hr></div><h2><strong>Where and Why Does It Break?</strong></h2><p><strong>1. &#8220;AI Auditing: The Broken Bus on the Road to AI Accountability&#8221;</strong> - Birhane et al. | <em>IEEE SaTML (2024)</em></p><p>Taxonomizes current AI audit practices across regulators, law firms, civil society, journalism, academia, and consulting. Finds that only a subset of AI audit studies translate to desired accountability outcomes. The title says it all - we&#8217;re doing audits, but many of them aren&#8217;t actually getting us where we need to go. &#128196; <a href="https://arxiv.org/abs/2401.14462">arXiv:2401.14462</a></p><p><strong>2. &#8220;Dislocated Accountabilities in the &#8216;AI Supply Chain&#8217;: Modularity and Developers&#8217; Notions of Responsibility&#8221;</strong> - Widder &amp; Nafus | <em>Big Data &amp; Society (2023)</em></p><p>Developers building AI from preexisting modules often believe responsible AI belongs to &#8220;the next or previous person in the imagined supply chain.&#8221;  Everyone assumes someone else is managing the risk. &#128196; <a href="https://arxiv.org/abs/2209.09780">arXiv:2209.09780</a></p><p><strong>3. &#8220;Understanding Accountability in Algorithmic Supply Chains&#8221;</strong> - Cobbe, Veale &amp; Singh | <em>FAccT (2023)</em></p><p>Explores how algorithmic supply chains create distributed responsibility and limited visibility due to what the authors call the &#8220;accountability horizon&#8221; - you can only see so far. Also covers cross-border supply chains and regulatory arbitrage, which matters when your AI vendor operates across jurisdictions with different rules. &#128196; <a href="https://arxiv.org/abs/2304.14749">arXiv:2304.14749</a></p><h2><strong>How can we perhaps solve it?</strong></h2><p><strong>1. &#8220;AEF-1: Minimum Operating Conditions for Independent Third Party AI Evaluations&#8221;</strong> - Stosz et al. | <em>AI Evaluation Foundation (2025)</em></p><p>A voluntary standard and checklist that defines what evaluators actually need from AI providers: independence from the provider, sufficient access to assess characteristics of interest, and transparency in sharing methods and findings. The gap between what this standard says you need and what most vendors actually give you <em>is</em> the governance gap. &#128279; <a href="https://www.aef.one/aef-one.pdf">AI Evaluation Foundation</a></p><p><strong>2. &#8220;Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance&#8221;</strong> - Raji et al. | <em>AAAI/ACM AIES (2022)</em></p><p>Synthesizes lessons from financial, environmental, and health regulation on crafting effective external oversight systems. The key insight for me: audits alone won&#8217;t achieve accountability. You need deliberate design and institutional weight, the same lesson financial regulators learned after Enron. &#128196; <a href="https://arxiv.org/abs/2206.04737">arXiv:2206.04737</a></p><p><strong>3. &#8220;Model Cards for Model Reporting&#8221;</strong> - Mitchell et al. | <em>FAccT (2019)</em></p><p>The foundational paper proposing that released models be accompanied by documentation detailing their performance characteristics. This paper is from 2019. It&#8217;s now 2026. Notice how few vendors actually provide this level of transparency. In fact, some studies have shown that such transparency is declining, rather than increasing due to greater awareness. &#128196; <a href="https://arxiv.org/abs/1810.03993">arXiv:1810.03993</a></p><p><strong>4. &#8220;Third-party compliance reviews for frontier AI safety frameworks&#8221;</strong> - Homewood et al. | <em>arXiv preprint (2025)</em></p><p>Explores third-party compliance reviews where an independent external party assesses whether a frontier AI company complies with its own safety framework. Discusses benefits (increased compliance, assurance to stakeholders) and real challenges (information security risks, cost burdens, reputational damage from findings). This is the emerging infrastructure for keeping vendors accountable - but it&#8217;s still nascent. &#128196; <a href="https://arxiv.org/abs/2505.01643">arXiv:2505.01643</a></p><p><strong>5. &#8220;Implementing AI Bill of Materials (AI BOM) with SPDX 3.0&#8221;</strong> - Bennet et al. | <em>Linux Foundation Research (2025)</em></p><p>Extends the Software Bill of Materials concept to AI, including documentation of algorithms, data collection methods, frameworks, licensing, and compliance. If you want to know what&#8217;s actually inside the AI system you&#8217;re buying - and what changes when the vendor updates it - this is the direction. Think of it as the AI equivalent of ingredient labelling. &#128196; <a href="https://arxiv.org/abs/2504.16743">arXiv:2504.16743</a></p><p><strong>6. &#8220;AgentFacts: Universal KYA Standard for Verified AI Agent Metadata &amp; Deployment&#8221;</strong> - Grogan | <em>arXiv preprint (2025)</em></p><p>Forward-looking. Proposes a &#8220;Know Your Agent&#8221; standard with cryptographically-signed capability declarations and multi-authority validation. If something like this existed at scale, it would reduce the custom integration friction that makes switching so expensive. We&#8217;re not there yet, but this is the direction things need to go. &#128196; <a href="https://arxiv.org/abs/2506.13794">arXiv:2506.13794</a></p><p><strong>7. &#8220;Consultation Paper on Proposed Guidelines on Third-Party Risk Management&#8221;</strong> - Monetary Authority of Singapore | <em>MAS (March 2026)</em></p><p>And last but not least. MAS just released this in March 2026. Supersedes the old outsourcing guidelines and extends expectations to all third-party arrangements, not just outsourced services. Covers risk assessment, due diligence, contracting, onboarding, ongoing monitoring, and termination. Requires FIs to maintain a register of third-party arrangements, monitor concentration risk, and extend oversight to sub-contractors. AI appears in a footnote, where it refers to MAS AI Risk Management Guidelines I wrote, but the relevance of these guidelines to AI is clear. &#128279; <a href="https://www.mas.gov.sg/-/media/mas-media-library/publications/consultations/bd/2026/consultation-paper---tprmg.pdf">MAS Consultation Paper</a></p><p>I think this is one of the hardest problems to solve in AI risk management. Because it requires everyone to work together. And that&#8217;s really hard today.</p><p>#ThirdPartyAI #AIRiskManagement #VendorManagement #AIGovernance #AIReadingList</p>]]></content:encoded></item><item><title><![CDATA[The AI Governance Tool]]></title><description><![CDATA[A simply boring attempt at making AI governance and risk management accessible]]></description><link>https://www.simplyboring.ai/p/the-ai-governance-tool</link><guid isPermaLink="false">https://www.simplyboring.ai/p/the-ai-governance-tool</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Mon, 23 Mar 2026 13:16:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fKDk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you&#8217;re infantry in the Singapore army, you would probably have encountered this protocol to shout &#8220;gap gap gap&#8221; when you breach a fortification.</p><p>I always thought it was kind of stupid. Since shouting it was asking for the bullets to be aimed at you. I always felt I was lucky to move from infantry to be a scout. Scouts never ran towards bullets.</p><p>But I think I understand the need for it more now. Shouting it in the army helps make people aware of the gap that you are moving towards.</p><p>In one of my past weekly reflections on <a href="https://www.linkedin.com/pulse/edge-action-gary-ang-phd-f6vmc">LinkedIn</a>, I said three gaps kept surfacing. The common language gap. The contextualisation gap. The last-mile gap. Same need for awareness.</p><p>Frameworks exist. Guidelines exist. I wrote some of them. But translating frameworks into something relevant for one&#8217;s context is not easy. Especially for individuals and smaller firms.</p><p>Last week, MAS launched the <a href="https://www.mas.gov.sg/schemes-and-initiatives/project-mindforge">MindForge AI Risk Management Toolkit</a>. I was deeply involved as it needed to be aligned to the <a href="https://www.mas.gov.sg/-/media/mas-media-library/publications/consultations/bd/2025/final_consultation_paper_on_guidelines_on_ai_risk_management_forrelease.pdf">MAS AI Risk Management Guidelines</a> I wrote. </p><p>The MindForge AI Risk Management Toolkit is an operationalisation handbook with case studies. It&#8217;s an attempt at bridging the messy middle I keep writing about. But I suspect, aside from large financial institutions, it may still be a lot to chew on for everyone else.</p><p>Last week, a friend who leads manpower development also asked me something similar: what about everyone else?</p><blockquote><p><em>This is my early attempt at an answer.</em></p></blockquote><p>GOT, not Game of Thrones, but an AI Governance Tool. </p><p>At <a href="https://govern.simplyboring.ai/">govern.simplyboring.ai</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fKDk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fKDk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png 424w, https://substackcdn.com/image/fetch/$s_!fKDk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png 848w, https://substackcdn.com/image/fetch/$s_!fKDk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png 1272w, https://substackcdn.com/image/fetch/$s_!fKDk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fKDk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png" width="725" height="806.2609457092819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:635,&quot;width&quot;:571,&quot;resizeWidth&quot;:725,&quot;bytes&quot;:62104,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.simplyboring.ai/i/191861678?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fKDk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png 424w, https://substackcdn.com/image/fetch/$s_!fKDk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png 848w, https://substackcdn.com/image/fetch/$s_!fKDk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png 1272w, https://substackcdn.com/image/fetch/$s_!fKDk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F352bbd43-683b-4ed7-9c13-7a2e305493b0_571x635.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Subscribe to my Substack to get keys for more generations.</figcaption></figure></div><p>Describe what you are using AI for. Get a tailored governance pack in minutes. The standards that apply to your situation. The controls to put in place. How to implement them. What evidence to prepare.</p><p>Built on public information. The MAS AI Risk Management Guidelines I wrote. The MindForge toolkit that just launched. The US Financial Services AI Risk Management Framework. The EU AI Act. NIST AI RMF. ISO 42001. And other international frameworks. Cross-referenced and connected. And a little of my own perspectives.</p><p>The AIRG tells you what to do. MindForge shows how the industry is doing it. GOT asks what you specifically are doing, and tells you what applies to you.</p><p>Not just for the large institution with a full second line (and third line) of defence. For everyone else. Individuals. Small firms.</p><p>It&#8217;s an early prototype. It will almost certainly have errors. And a little slow - around 45-60 seconds for each report. And I think it&#8217;s still a little iffy for individuals.</p><p>But I tried to ground it in real frameworks and design it for real situations so anyone, not just banks.</p><p>Please try it. Tell me what&#8217;s missing. Tell me what&#8217;s wrong.</p><p>#Mindforge #NIST #AIRiskManagement #SimplyBoringAI #Grounding #AIRG</p>]]></content:encoded></item><item><title><![CDATA[OpenClaw - Kiasu Version]]></title><description><![CDATA[Kiasu just means scared to lose.]]></description><link>https://www.simplyboring.ai/p/openclaw-kiasu-version</link><guid isPermaLink="false">https://www.simplyboring.ai/p/openclaw-kiasu-version</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Tue, 17 Mar 2026 11:41:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QG05!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QG05!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QG05!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QG05!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QG05!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QG05!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QG05!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg" width="522" height="293.625" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:522,&quot;bytes&quot;:187905,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.simplyboring.ai/i/191241008?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QG05!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QG05!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QG05!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QG05!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1178556c-0bbf-4d28-8621-b99657ce3d2f_1920x1080.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I don&#8217;t like OpenClaw. But pretty hard to resist the urge to try.</p><p>I think it does demonstrate something interesting about agents - 1) the additional utility when you hook it to a heartbeat to do something at a set frequency, or cron for a scheduled task; 2) the convenience of letting it communicate with you over a messaging app.</p><p>But I really can&#8217;t see what else it offers in additional to say Claude Code, which already allows for multi-agent workflows.</p><p>Ok, maybe if being exposed to the internet, or losing your data is your thing, then I agree, OpenClaw without any additional safeguards is definitely for you.</p><p>OpenClaw&#8217;s issues must be obvious. NVIDIA just announced NemoClaw at GTC - essentially OpenClaw with enterprise-grade security guardrails baked in. NanoClaw, a lightweight alternative, takes a different approach by isolating each agent in its own Docker container.</p><p>And so when I tried it, I made some adjustments to help mitigate this. Probably not perfect, but I think I can live with its risks with these controls. And I think you can use them for any of the other Claws too.</p><p>So here&#8217;s the short guide on what I did. I won&#8217;t copy and paste all the scripts. Just use my text below as a prompt and I am sure any of the LLMs can help you.</p><h3><strong>The Setup</strong></h3><p>I run it on a DigitalOcean VPS - Ubuntu 24.04, 2 vCPU, 2GB RAM, $12/month, Singapore region. I went with a manual setup instead of the 1-Click app because I wanted full control over every layer of the stack.</p><p>The finished stack looks like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RuYi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RuYi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png 424w, https://substackcdn.com/image/fetch/$s_!RuYi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png 848w, https://substackcdn.com/image/fetch/$s_!RuYi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png 1272w, https://substackcdn.com/image/fetch/$s_!RuYi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RuYi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png" width="1456" height="612" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:612,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!RuYi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png 424w, https://substackcdn.com/image/fetch/$s_!RuYi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png 848w, https://substackcdn.com/image/fetch/$s_!RuYi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png 1272w, https://substackcdn.com/image/fetch/$s_!RuYi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d072653-61d5-483b-b41c-eb875cf8c29f_1599x672.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Only i can access OpenClaw through TailScale, and I choose the files that it can see using Syncthing</figcaption></figure></div><p>What I use it for: OpenClaw runs 24/7 on the VPS. Every hour it wakes up, checks my task files, and stays silent if nothing is urgent. Every morning it sends me a briefing on Telegram. Claude Code runs locally for deep work and pushes outputs into a shared AGENT-INBOX folder. OpenClaw picks them up.</p><p>Total cost: $12/month for the server. OpenClaw lets you plug in your own API key. There&#8217;s an option for using your existing Claude key that will come up after the OpenClaw onboard.</p><p>Now, on to the controls.</p><h2><strong>Control 1: Server Hardening</strong></h2><p>Out of the box, a VPS is exposed. Default root login, password authentication, every port open. The first thing to do is reduce the attack surface.</p><p>Every service you run is an attack surface. Make each one as small as possible, and add friction at every layer so that even if one layer fails, the next one holds.</p><p>The key steps: update everything, create a non-root user (don&#8217;t run your server as Administrator), disable password authentication so only SSH keys can log in, and restrict root login entirely.</p><p>I won&#8217;t paste every command here - ask Claude or ChatGPT to give you the steps for hardening a server for Ubuntu 24.04 and it&#8217;ll give you the full thing.</p><h2><strong>Control 2: Firewall and Fail2ban</strong></h2><p>Two layers. UFW (Uncomplicated Firewall) controls who can knock on your door at all - deny all incoming by default, then explicitly allow only what you need. Fail2ban monitors your login logs and automatically bans IPs that fail authentication too many times.</p><p>One thing to watch out for: enable UFW <em>after</em> you allow SSH, not before. Get the order wrong and you immediately lose access to your own server.</p><p>But even with both of these, your SSH port is still visible to the public internet. Bots still find it, still try it. Fail2ban is a mitigation, not a solution.</p><h2><strong>Control 3: Tailscale - Making the Server Invisible</strong></h2><p>What if the SSH port wasn&#8217;t visible at all?</p><p>Tailscale creates a private encrypted network across the internet - but only devices you authenticate can join. Think of it as a private LAN that ignores physical location. Every device gets a private IP in the 100.x.x.x range. Your VPS gets one. Your laptop gets one. They talk through an encrypted WireGuard tunnel regardless of where they physically are.</p><p>The key part: your VPS&#8217;s <em>public</em> IP can have all ports closed. Tailscale establishes peer-to-peer through NAT traversal - no open ports required. Install it on the VPS, install it on your laptop, log in with the same account - done.</p><p>Once Tailscale is working, update your UFW rules to allow SSH only on the Tailscale interface, and remove the public SSH rule. After this, run a port scan on your VPS&#8217;s public IP. You&#8217;ll see nothing. The server is invisible. Fail2ban becomes a backup layer rather than the first line of defence.</p><p>As always, test SSH via the Tailscale IP in a second terminal before removing public access.</p><h3><strong>Installing OpenClaw</strong></h3><p>With the controls in place, this is the straightforward part.</p><pre><code><code>curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - </code></code></pre><pre><code><code>sudo apt install -y nodejs</code></code></pre><pre><code><code>sudo npm install -g openclaw</code></code></pre><pre><code><code>openclaw onboard</code></code></pre><p>For NemoClaw, it will probably look like this (based on Nvidia&#8217;s instructions)</p><pre><code><code>curl -fsSL https://nvidia.com/nemoclaw.sh | bash</code></code></pre><pre><code><code>nemoclaw onboard</code></code></pre><p>The setup wizard walks you through AI provider (I use Anthropic), API key, and gateway token. It installs OpenClaw as a <strong>user-level systemd service</strong> - not a system-level one. This matters:</p><p>This works:</p><pre><code><code>systemctl --user restart openclaw-gateway</code></code></pre><p>This silently fails:</p><pre><code><code>systemctl restart openclaw-gateway</code></code></pre><p>Every OpenClaw service command needs --user. Burned time on this.</p><h3><strong>The dashboard</strong></h3><p>OpenClaw has a web dashboard. It binds to <strong><a href="http://localhost/">localhost</a></strong> only, so you access it via SSH tunnel:</p><pre><code><code>openclaw dashboard    # prints URL with token</code></code></pre><pre><code><code>ssh -N -L 18789:127.0.0.1:18789 openclaw@YOUR_TAILSCALE_IP</code></code></pre><p>Then open the <strong>full URL</strong> including the #token= fragment in your browser.</p><p><strong>Don&#8217;t open </strong></p><p>http://localhost:18789/</p><p><strong> bare.</strong> The #token= fragment is the authentication handshake. Without it, the gateway sees an unapproved device and shows &#8220;pairing required.&#8221; The dashboard looks like it loaded - it hasn&#8217;t connected. Use the full URL from openclaw dashboard.</p><h2><strong>Syncthing - The Sync Layer</strong></h2><p>OpenClaw on the VPS needs a way to receive tasks and send outputs back. I use Syncthing - open-source, peer-to-peer file sync, no cloud intermediary.</p><p>The synced folder (AGENT-INBOX) lives inside my Obsidian vault on Windows and mirrors to remote VPS. I write tasks in Obsidian, they appear on the VPS within seconds. OpenClaw writes outputs on the VPS, they appear in Obsidian within seconds.</p><p>One thing to watch out for: install Syncthing from the official apt repository, not Ubuntu&#8217;s default packages - the default version is outdated enough to cause silent sync failures when paired with a newer Windows client.</p><h2><strong>The Heartbeat</strong></h2><p>The heartbeat is the reason I tried OpenClaw in the first place. It&#8217;s the cron-like behavior - wake up on a schedule, check your files, act without being asked.</p><pre><code><code>openclaw config set agents.defaults.heartbeat.every 1h</code></code></pre><pre><code><code>openclaw config set agents.defaults.heartbeat.target last</code></code></pre><pre><code><code>openclaw config set agents.defaults.heartbeat.activeHours.start "06:00"</code></code></pre><pre><code><code>openclaw config set agents.defaults.heartbeat.activeHours.end "01:00"</code></code></pre><pre><code><code>openclaw config set env.vars.TZ "Asia/Singapore"</code></code></pre><p>Note: use the full dotpath agents.defaults.heartbeat.every, not just heartbeat.every. The docs show JSON structure, the CLI uses dotpaths.</p><p>If OpenClaw finds nothing urgent, it stays silent. No Telegram noise. Only substantive findings surface. This is controlled by a HEARTBEAT.md in your workspace - a checklist the agent follows on every run.</p><p>One important step after setup: delete the BOOTSTRAP.md file in your workspace. It&#8217;s OpenClaw&#8217;s first-run onboarding wizard that blocks the agent on every gateway start until a human responds. Once you&#8217;ve completed setup, remove it or the agent stays stuck in bootstrapping state indefinitely.</p><h2><strong>The Security Audit</strong></h2><p>OpenClaw has a built-in security audit you can run with <em>openclaw security audit --deep</em>. Worth running after setup to confirm your controls are in place.</p><p>The most important thing it tells you is the trust model. OpenClaw operates as a &#8220;personal assistant&#8221; - single user, single trusted operator. It&#8217;s not designed for multiple users sharing one gateway. If you&#8217;re thinking about running this for a team or clients, that&#8217;s the line that should make you pause.</p><h2><strong>Is It Worth It?</strong></h2><p>Honestly, I&#8217;m still not sure.</p><p>The heartbeat and the Telegram integration are genuinely useful - having an agent check on things hourly and surface only what matters is different from having a chatbot you talk to. And the data sovereignty matters to me.</p><p>I&#8217;ll keep running OpenClaw for a while to see if it sticks. See if the habit of an always-on agent changes how I work. But I wouldn&#8217;t tell anyone to just run it.</p><p>If I were starting today, I&#8217;d probably look at NanoClaw first - it&#8217;s ~4,000 lines of code versus OpenClaw&#8217;s &gt; 400,000, and agent gets its own isolated container. NemoClaw is worth trying too, especially if you&#8217;re in an enterprise context. Both are open source.</p><p>But the manual controls I described above? They&#8217;d apply to any of these. Server hardening, Tailscale, firewall rules - that&#8217;s not OpenClaw-specific.</p><p>If you&#8217;ve tried self-hosting OpenClaw (or variants such as NanoClaw), what additional controls did you put in place? Also what else do you find useful about it (beyond say Claude Code or Codex) that warrants the effort?</p><p>#AI #AgenticAI #Security #OpenClaw #NemoClaw</p>]]></content:encoded></item><item><title><![CDATA[From Contrastive Learning to World Models]]></title><description><![CDATA[News on Yann LeCun&#8217;s AMI Labs raising ~$1bn in their seed round and having one of their key bases in Singapore just made the news.]]></description><link>https://www.simplyboring.ai/p/from-contrastive-learning-to-world</link><guid isPermaLink="false">https://www.simplyboring.ai/p/from-contrastive-learning-to-world</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Wed, 11 Mar 2026 14:30:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!swTk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>News on Yann LeCun&#8217;s AMI Labs raising ~$1bn in their seed round and having one of their key bases in Singapore just made the news. $1bn for a seed round is kind of insane. AI startup seed rounds are more like Series Zs these days.<br><br>That aside, this news brought back some memories of an older technique from Lecun&#8217;s lab, titled VICReg, that inspired one of my papers. And led me to reread his JEPA series. <br><br>Today's diptych is about that connection.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!swTk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!swTk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!swTk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!swTk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!swTk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!swTk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg" width="1456" height="1165" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1165,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:459458,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.simplyboring.ai/i/190623557?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!swTk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!swTk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!swTk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!swTk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93f92205-59a6-49be-8672-9820b79a9dae_2000x1600.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Left Panel: VICReg</strong> <br><br>Self-supervised training in AI just means training without labels. We use it in LLMs. Pretraining a model by masking parts of inputs and getting it to predict the masked parts. One issue in self-supervised learning is collapse. AI likes to find shortcuts. And when it does, everything converges to the mean. <br><br>Most methods fight this with training or architectural tricks. VICReg uses just three interesting loss terms:<br>&#10145;&#65039; Variance so that there&#8217;s always some differences between the representations it learns. <br>&#10145;&#65039; Invariance so that the same input gives the same representation.<br>&#10145;&#65039; Covariance so that each dimension learns something different <br><br>Invariance is the main learning objective. Variance + covariance are the regularisers that prevent it from collapsing. <br><br>I liked VICReg&#8217;s idea. So I added a twist for a paper I published on a model called DynMIX. I fed it two fundamentally different views of the same company - explicit (constructed) and implicit (learnt from data). Another twist. VICReg's variance target is static. For financial data, that made no sense. Markets are volatile so variance can&#8217;t be static. So I made the target dynamic, based on variances of stock market returns. <br><br><strong>Right Panel: JEPA</strong> <br><br>The core of the paper by Yann LeCun - JEPA - takes a different approach to the same problem. Instead of just learning representations, it predicts them. In AI, we usually predict something real - a label (dog, cat), or a number (stock price). JEPA predicts the underlying representation instead. Think of it as the model learning to imagine what something looks like in its own internal language, before it's seen in the real world.<br><br>Three components: <br>&#10145;&#65039; Context encoder: encodes what's visible into a representation <br>&#10145;&#65039; Target encoder: encodes what's hidden into a representation (this IS the label) <br>&#10145;&#65039; Predictor: tries to predict the hidden from the visible<br><br>So what does this unlock? Once representations of one modality (e.g., videos) are learnt, the next stage adds actions. A second model takes the video frame embeddings interleaved with a robot arm's movement commands, and predicts the next embedding. Planning then becomes: imagine many possible sequences of moves, simulate each one in the model's head, and pick the one most likely to reach the goal. Like a chess player thinking several moves ahead, but the "board" is the model's own internal understanding of the world.<br><br>Interesting times ahead. If AMI Labs succeeds.<br><br>#RepresentationLearning #JEPA #VICReg #YannLeCun #AIResearch</p>]]></content:encoded></item><item><title><![CDATA[Three Speeds of AI Adoption]]></title><description><![CDATA[And three rooms]]></description><link>https://www.simplyboring.ai/p/three-speeds-of-ai-adoption</link><guid isPermaLink="false">https://www.simplyboring.ai/p/three-speeds-of-ai-adoption</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Wed, 04 Mar 2026 15:45:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xxRi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p><em>Fire people, stock goes up. Beat expectations, stock goes down.</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xxRi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xxRi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png 424w, https://substackcdn.com/image/fetch/$s_!xxRi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png 848w, https://substackcdn.com/image/fetch/$s_!xxRi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png 1272w, https://substackcdn.com/image/fetch/$s_!xxRi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xxRi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png" width="496" height="281.17725752508363" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:339,&quot;width&quot;:598,&quot;resizeWidth&quot;:496,&quot;bytes&quot;:65998,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.simplyboring.ai/i/189888020?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xxRi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png 424w, https://substackcdn.com/image/fetch/$s_!xxRi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png 848w, https://substackcdn.com/image/fetch/$s_!xxRi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png 1272w, https://substackcdn.com/image/fetch/$s_!xxRi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d01878a-3325-481f-bcfb-0c17d724ca19_598x339.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">I thought this was both funny and sad.</figcaption></figure></div><p>I started writing about this last week. But with the Iran war going on now, this feels almost quaint.</p><p>Predictions about a <a href="https://www.citriniresearch.com/p/2028gic">doomsday scenario where AI agents collapse the job market by 2028</a>. Headlines about Dorsey&#8217;s <a href="https://www.reuters.com/business/blocks-fourth-quarter-profit-rises-announces-over-4000-job-cuts-2026-02-26/">Block cutting over 4,000 roles</a> - over 40% of its workforce. Gone, just like that. His excuse was AI, as always. Block&#8217;s stock price shot up. And then the irony - <a href="https://www.cnbc.com/2026/02/26/nvidia-nvda-stock-price-q4-earnings.html">Nvidia beat expectations</a>, revenue up ~70%, guidance ahead of even the most bullish estimates. The stock dropped 5.5%. <a href="https://www.tradingkey.com/analysis/stocks/us-stocks/261628049-nvidia-nvda-earnings-q4-stock-price-investors-value-tradingkey">$260 billion in market value erased overnight</a>.</p><p>Fire people, stock goes up. Beat expectations, stock goes down. That&#8217;s the weird logic of markets right now. And I think it mirrors the weird logic of the AI debate. The narrative has decoupled from the evidence. What you believe matters more than what you can show.</p><p>Let me compare three rooms.</p><h2>Three Rooms</h2><p><strong>Room one.</strong> Block&#8217;s numbers. An internal AI agent called Goose. One engineer says 90% of his code is now written by it. Non-technical teams writing SQL queries and closing tickets. Revenue per employee doubled.</p><p><strong>Room two.</strong> A classroom. Students from an organization that will exist in fifty years regardless of what happens in AI. What were they learning? How to write a prompt. How to get an image out of a model. In 2026. They weren&#8217;t behind because they were slow. They were behind because there&#8217;s a chasm between what frontier labs ship and what most organizations can absorb.</p><p><strong>Room three.</strong> Trading professionals at a talk I gave. They&#8217;d read the doom scenarios. They wanted to know - should I be worried about my career? They weren&#8217;t panicking. But they weren&#8217;t dismissing it either. Sitting in the uncertainty, looking for clarity.</p><p>I tried to answer, but I thought my answer was lacking. So I tried to do better with this article.</p><p>Why do I think a piece like Citrini and Shah&#8217;s <a href="https://www.citriniresearch.com/p/2028gic">&#8220;The 2028 Global Intelligence Crisis&#8221;</a> is science fiction designed to go viral? Because of three speeds that may not be moving in lockstep.</p><h2>The Three Speeds</h2><p>I think all of these pieces assume uniformity in scaling and adoption. But I think there are 3 speeds that matter, perhaps even more.</p><p>METR - Model Evaluation &amp; Threat Research - is a nonprofit that evaluates AI capabilities. Their time horizon chart - arguably the most cited graphic in AI right now - <a href="https://www.technologyreview.com/2026/02/05/1132254/this-is-the-most-misunderstood-graph-in-ai/">has been called</a> &#8220;the most misunderstood graph in AI.&#8221; It shows AI capability doubling every few months. Sequoia used it to declare &#8220;2026: This is AGI.&#8221;</p><p>What that chart actually says about each speed:</p><p><strong>Speed of Capability.</strong> What AI can actually do. Far fewer people saw <a href="https://metr.org/notes/2026-01-22-time-horizon-limitations/">METR&#8217;s own limitations page</a>. More than 10 disclaimers about what their research is not. The researcher writes plainly: they have &#8220;no idea whether Claude&#8217;s &#8216;true&#8217; time horizon is 3.5h or 6.5h.&#8221; Their <a href="https://metr.org/blog/2025-08-12-research-update-towards-reconciling-slowdown-with-time-horizons/">code quality research</a>: AI-generated code passed 38% of tests but &#8220;none of them are mergeable as-is.&#8221; But you don&#8217;t need METR to tell you this. When was the last time you spotted your LLM sprouting nonsense? Less common than in 2023, but definitely not zero.</p><p><strong>Speed of Adoption.</strong> What organizations actually change. You might remember the discredited MIT study claiming 95% of AI pilots fail - the methodology didn&#8217;t hold up. So I went looking at what has happened since. <a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai">McKinsey&#8217;s 2025 State of AI survey</a> at the end of 2025: 88% of companies say they use AI, but only a third have begun to scale it. A <a href="https://www.techrepublic.com/article/ai-adoption-trends-enterprise/">survey of 120,000+ enterprise respondents</a> from March 2025 to January 2026 reported that nearly two-thirds have no formalized AI initiative at all. But you don&#8217;t need McKinsey to tell you this. Look around your own workplace. How many AI pilots have started? How many are in production?</p><p><strong>Speed of Belief.</strong> What people <em>think</em> AI can do. The chart made it into every investment deck. The caveats didn&#8217;t. Same with Citrini&#8217;s doom scenario - it <a href="https://www.bloomberg.com/news/articles/2026-02-24/citrini-founder-shocked-his-ai-prediction-spurred-stocks-selloff">spooked actual markets</a> before anyone could verify the underlying technical assumptions. Same with Block - the headline landed, and every manager with a budget started rethinking next year&#8217;s headcount because their boss could be like Dorsey. Doesn&#8217;t matter if their company has nothing remotely comparable to Goose. Charts travel faster than FAQs with caveats on limitations. And I am sure you know how dangerous mistaken beliefs in large organizations can be. When&#8217;s the last time you spotted a senior person saying something totally wrong, but nobody correcting him or her?</p><h2>The Underlying Issue</h2><p>The doom scenario by Citrini and Shah needs all three speeds to converge. Capability, belief, and adoption in lockstep.</p><p>But the problem is, belief doesn&#8217;t need the other two.</p><p>Block just showed that AI can replace headcount. At least in its specific context. But the headline doesn&#8217;t come with that caveat. So what happens next? A manager somewhere reads it and quietly decides not to backfill that open role. A team that was supposed to grow just doesn&#8217;t. None of that requires AI to actually do the work. It just requires someone with the power to believe it could.</p><p>Back to 2008. Not the mechanics - those are different. The underlying issue with such beliefs.</p><p>In 2008, banks sold complex structured products to people who had no ability to understand what they were buying. CDOs priced by models that the sellers themselves barely understood. The belief in the models was enough to move trillions. The mom-and-pop investors were the ones left holding the bag when reality caught up.</p><p>I think something similar is happening now. Not with financial products, but with people&#8217;s livelihoods. Block fires 4,000 people and the stock jumps. The market cheers. The narrative is: AI made it possible. But how much of that is demonstrated capability, and how much is a belief about capability that hasn&#8217;t been stress-tested outside one company&#8217;s very specific context? And perhaps unproven, even for that company.</p><p>It was unconscionable then to sell instruments people couldn&#8217;t understand to people who couldn&#8217;t afford to lose. I think it&#8217;s unconscionable now to restructure people&#8217;s lives based on a chart that its own creators say they can&#8217;t fully trust, and a narrative that outpaces the evidence by miles.</p><p>The direction is right. AI will displace some work. But the timeline assumes a uniformity that doesn&#8217;t match what I&#8217;m seeing in real rooms. And the human cost of getting the timeline wrong - of letting belief run ahead of reality - is not a rounding error. It&#8217;s people.</p><p>#AI #FutureOfWork #AIAdoption #AIRisk #Leadership</p>]]></content:encoded></item><item><title><![CDATA[Learning risk management again, because of AI]]></title><description><![CDATA[A FIX NextGen meeting at BlackRock.]]></description><link>https://www.simplyboring.ai/p/learning-risk-management-again-because</link><guid isPermaLink="false">https://www.simplyboring.ai/p/learning-risk-management-again-because</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Mon, 02 Mar 2026 14:27:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!aq50!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aq50!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aq50!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aq50!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aq50!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aq50!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aq50!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg" width="1187" height="723" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1187,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:116845,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.simplyboring.ai/i/189655106?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aq50!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aq50!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aq50!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aq50!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecd3bf1f-eeeb-4069-8e96-305293a4773f_1187x723.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A FIX NextGen meeting at BlackRock. Folks who live and breathe the markets. And then there&#8217;s me.</p><p>I started by owning up. &#8220;I am a bureaucrat.&#8221; But shared my random walk. From engineering to art policy, Basel regulation to investment risk, a PhD in AI, then AI risk supervision. Messy life. Messy research. I wasn&#8217;t sure what they would be interested in, so I just laid it all out.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0ea3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0ea3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png 424w, https://substackcdn.com/image/fetch/$s_!0ea3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png 848w, https://substackcdn.com/image/fetch/$s_!0ea3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png 1272w, https://substackcdn.com/image/fetch/$s_!0ea3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0ea3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png" width="632" height="350" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:632,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!0ea3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png 424w, https://substackcdn.com/image/fetch/$s_!0ea3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png 848w, https://substackcdn.com/image/fetch/$s_!0ea3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png 1272w, https://substackcdn.com/image/fetch/$s_!0ea3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f9e5ef-70b9-4938-b7b2-53aa0d4da4c3_632x350.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As for the talk, I took them on a journey. From the Gaussian copula that contributed to the GFC, to the questions we&#8217;re tackling now with large language models.</p><p>That line (from the GFC to AI today) may seem strange. But it is a very clear one.</p><h2><strong>The Formula That Broke Finance</strong></h2><p>A single model. The single factor Gaussian copula. Elegant, tractable, widely adopted for pricing CDOs. But totally unrealistic. In fact, I would say the decision to use it to model CDOs was bordering on criminal.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n-NF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n-NF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png 424w, https://substackcdn.com/image/fetch/$s_!n-NF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png 848w, https://substackcdn.com/image/fetch/$s_!n-NF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png 1272w, https://substackcdn.com/image/fetch/$s_!n-NF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n-NF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png" width="630" height="346" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:630,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!n-NF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png 424w, https://substackcdn.com/image/fetch/$s_!n-NF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png 848w, https://substackcdn.com/image/fetch/$s_!n-NF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png 1272w, https://substackcdn.com/image/fetch/$s_!n-NF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3192a5da-a308-4c65-9b5b-afaa79035022_630x346.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But even then, the GFC would not have happened if not for the fact that nobody was watching these complex things, nobody fully understood them, and nobody was clearly accountable.</p><p>In the aftermath: US&#8217; SR 11-7. Model risk management requirements. The predecessor of UK&#8217;s SS 1/23 and Canada&#8217;s E-23. Singapore&#8217;s AI Risk Management Guidelines (AIRG) also shares a lot of with these guidelines. I wrote that last piece.</p><h2><strong>Complexity Squared</strong></h2><p>Here&#8217;s how the problem has changed.</p><p><em><strong>2008: One model. Too simple. Trillions at stake. Nobody understood it.</strong></em></p><p><em><strong>2025: Way more than one model. Complexity&#178;. Everyone has an opinion.</strong></em></p><p>Different. But also recognizable.</p><p>That&#8217;s the risk. Things look different enough to seem like a new problem. They&#8217;re not entirely. But the new parts matter.</p><h2><strong>3 U&#8217;s</strong></h2><p>What&#8217;s actually new - or at least amplified - in AI models.</p><p><strong>Uncertainty</strong>. All models have it. How confident is the model in its output? Two flavors: irreducible - natural randomness you can&#8217;t eliminate. Reducible - gaps in knowledge that more data or better models can close.</p><p><strong>Unexpectedness</strong>. Some AI models exhibit behaviors nobody designed. Emergent capabilities. Gaming the system. Adversarial vulnerabilities. Hidden biases. Misalignment.</p><p><strong>Unexplainability</strong>. The degree to which we can explain AI decisions varies. Transparency, explainability, interpretability - not the same thing, and even combined, they don&#8217;t guarantee understanding.</p><p>These three make AI a harder risk management problem. Not a different one. Harder.</p><h2><strong>The Wicked Domain (and Then Some)</strong></h2><p>I used a framing from Epstein&#8217;s Range toward the end.</p><p>Kind domains - chess, music - have clear feedback. Deliberate practice compounds. The 10,000-hour idea in Outliers by Malcolm Gladwell works there. Depth helps here.</p><p>Wicked domains - medicine, finance - have delayed feedback, unclear rules, expertise that doesn&#8217;t transfer cleanly. Breadth sometimes helps here.</p><p>AI risk management sits firmly in wicked territory. But I think it might be something worse. An evil domain. The feedback is ambiguous, the rules keep shifting, and the expertise required keeps branching.</p><p>Not I-shaped for depth. Not T-shaped for breadth plus a bit of depth. More like a banyan tree. Multiple deep roots spreading from the same trunk. Depth across AI, governance, legal, model, technology, human factors - requiring real depth in each.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IOBA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IOBA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png 424w, https://substackcdn.com/image/fetch/$s_!IOBA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png 848w, https://substackcdn.com/image/fetch/$s_!IOBA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png 1272w, https://substackcdn.com/image/fetch/$s_!IOBA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IOBA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png" width="631" height="353" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:353,&quot;width&quot;:631,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!IOBA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png 424w, https://substackcdn.com/image/fetch/$s_!IOBA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png 848w, https://substackcdn.com/image/fetch/$s_!IOBA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png 1272w, https://substackcdn.com/image/fetch/$s_!IOBA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a7bfffa-ab97-415f-8307-8fd76cf3d0fd_631x353.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Same Questions, New Complexity</strong></h2><p>Across my messy career - Basel policy, quant model risk, investment risk, AI supervision - three questions kept reappearing.</p><p><strong>What&#8217;s at risk? </strong>You can&#8217;t manage what you haven&#8217;t found. This means discovering where AI exists in your institution and profiling how risky each system actually is.</p><p><strong>How do we manage it?</strong> Once you know what&#8217;s at risk, you need controls in the right places, and ways to check if those controls are working.</p><p><strong>Who&#8217;s accountable?</strong> Controls without owners become theatre. Someone has to own the risk, and the organization needs capability to sustain it.</p><p>Basically - find it, control it, own it.</p><p>As I wrote before. These aren&#8217;t new questions. They&#8217;re the same questions risk management in finance has been asking for decades. SR 11-7 answered them for model risk. AIRG is answering them again, in a harder context.</p><p>Normal technology. Normal systems. Normal risk questions.</p><p>At the end of the session there were some useful questions. My answers to most of those are in FIX&#8217;s post <strong><a href="https://www.linkedin.com/posts/fixapac_fixapac-ai-fintech-activity-7434105923352326144-vxI8?utm_source=social_share_send&amp;utm_medium=member_desktop_web&amp;rcm=ACoAAAnEwqsBmNv-udZ8tKaEG_MQGlUiz7C_KAg">here</a></strong>.</p><p>But there was one that stuck with me. About how real Citrini&#8217;s dystopian AI narrative was. I wrote about it last week <strong><a href="https://www.linkedin.com/posts/garyang_ai-futureofwork-airisk-activity-7431661800175349760-P6cC?utm_source=social_share_send&amp;utm_medium=member_desktop_web&amp;rcm=ACoAAAnEwqsBmNv-udZ8tKaEG_MQGlUiz7C_KAg">here</a></strong>. But more thoughts came to mind because of the question. Will do another post in greater detail later.</p><p>#AI #AIRiskManagement #ModelRisk #FIXTradingCommunity #FinTech #NextGen</p>]]></content:encoded></item><item><title><![CDATA[A free 180 page ebook on AI Agents for Investing with code]]></title><description><![CDATA[But first, how this book was built, working with AI]]></description><link>https://www.simplyboring.ai/p/a-free-180-page-ebook-on-ai-agents</link><guid isPermaLink="false">https://www.simplyboring.ai/p/a-free-180-page-ebook-on-ai-agents</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Tue, 24 Feb 2026 11:19:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UJp8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UJp8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UJp8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png 424w, https://substackcdn.com/image/fetch/$s_!UJp8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png 848w, https://substackcdn.com/image/fetch/$s_!UJp8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png 1272w, https://substackcdn.com/image/fetch/$s_!UJp8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UJp8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png" width="304" height="485.1063829787234" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2250,&quot;width&quot;:1410,&quot;resizeWidth&quot;:304,&quot;bytes&quot;:2472576,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.simplyboring.ai/i/189004038?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UJp8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png 424w, https://substackcdn.com/image/fetch/$s_!UJp8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png 848w, https://substackcdn.com/image/fetch/$s_!UJp8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png 1272w, https://substackcdn.com/image/fetch/$s_!UJp8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4828e1af-b240-492f-9f40-e70e990f9698_1410x2250.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The cover</figcaption></figure></div><p>When I shared an early draft with a friend, his first question was:</p><blockquote><p><em>&#8220;How much of this was AI?&#8221;</em></p></blockquote><p>For a while, it made me a little sensitive. Even made me question if I should share this book.</p><p>Not because the question was unfair. It&#8217;s a reasonable thing to ask. It&#8217;s almost automatic to question the origins of any work these days.</p><p>But because it implied something I wasn&#8217;t sure how to answer cleanly. </p><p>If I said &#8220;a lot,&#8221; it sounds like the book isn&#8217;t mine. And perhaps just slop. </p><p>If I said &#8220;not much,&#8221; that&#8217;s really not honest either.</p><p>So let me just show you. Full transparency.</p><p><em>(Just scroll to end if you want to skip all this)</em></p><h2><strong>AI or me?</strong></h2><p>This book is part me, part AI. And I want to show you exactly where the line is.</p><h3><strong>What&#8217;s mine</strong></h3><p><strong>The architecture. </strong>The four-pattern framework. The decision to organize the book around Tool Calling, ReAct, CodeAct, and Orchestration - that came from reading papers, building prototypes. No model suggested that structure.</p><p><strong>The frameworks. </strong>The Hamburger Principle mental model came from a LinkedIn post I did to explain how I use LLMs. The Complexity Ladder came from watching people skip straight to agents when a simple API call would suffice. These are my patterns that I got from observation and learning, not generation from a prompt.</p><p><strong>The judgment calls. </strong>What to include. What to leave out. When to go deep on code and when to step back and explain why it matters. The decision to start with the trust problem - not with &#8220;what is an LLM?&#8221; The decision to end with an assessment of what the reader can and can&#8217;t build.</p><p><strong>The weird voice. </strong>The tone. The &#8220;I like simple and boring.&#8221; That&#8217;s not a style a model learned. That&#8217;s the thing I have to tussle with every generation of LLMs. LLMs think they know too much.</p><h3><strong>What&#8217;s AI</strong></h3><p><strong>Drafting speed.</strong> First drafts of chapters, generated from detailed outlines I wrote. I&#8217;d specify the concept, the framework, the examples, the level - and Claude would produce a draft I could shape.</p><p><strong>Code scaffolding. </strong>The notebook code, the tool definitions, the API integrations. I described what each tool should do. AI wrote the implementation. I tested it, caught the errors, fixed the edge cases.</p><p><strong>Production work.</strong> Converting eleven chapters from Markdown to LaTeX is no joke. The grunt work that would have made me give up. AI also did the documentation of the n8n workflows I strung together, and of the financial concepts from my notebooks.</p><p><strong>Research synthesis.</strong> Pulling together documentation, API references, library specifications. Summarizing what I needed so I could decide what mattered.</p><p>When you read the book, you will recognize the pattern. It&#8217;s the one I talk about, incessantly, in most of the book.</p><h2><strong>The Hamburger Principle - applied to writing the book itself</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TjUY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TjUY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png 424w, https://substackcdn.com/image/fetch/$s_!TjUY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png 848w, https://substackcdn.com/image/fetch/$s_!TjUY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png 1272w, https://substackcdn.com/image/fetch/$s_!TjUY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TjUY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png" width="513" height="281.7685873605948" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:591,&quot;width&quot;:1076,&quot;resizeWidth&quot;:513,&quot;bytes&quot;:370665,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.simplyboring.ai/i/189004038?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TjUY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png 424w, https://substackcdn.com/image/fetch/$s_!TjUY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png 848w, https://substackcdn.com/image/fetch/$s_!TjUY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png 1272w, https://substackcdn.com/image/fetch/$s_!TjUY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0b095a9-b678-43bc-ad1c-fc4bc61d7606_1076x591.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Claude was still the bun. It parsed my instructions, understood what I wanted, generated prose, and communicated ideas in readable English. That&#8217;s what LLMs do - the language layer. Same role as in every agent pattern in this book.</p><p>The meat was my domain knowledge and the tools - LaTeX compilation, yfinance APIs, MCP servers. The real things that the language wrapped around.</p><p>And the vegetables? The infrastructure in between. The project files that kept everything organized. The version tracking. The convention lists. The consistency checks.</p><p>I was the chef. Not a layer of the hamburger - the one directing the whole thing. Choosing the ingredients, deciding what goes in, what comes out, and whether the result is any good. And the one getting frustrated at the bun.</p><p>The same approach covered in the book. Applied to its own creation.</p><p>Quite meta right? I am quite pleased about this weird recursion.</p><p>The rest of Chapter 14 in the ebook goes deeper.</p><p>The actual setup - two tools, three files - and why those three files are the difference between productive AI sessions and wasted ones. The writing workflow. What &#8220;directing AI&#8221; actually looks like step by step. What AI genuinely could not do. That part is still entirely human. Where AI actually saves time. It&#8217;s all in the book, free to download.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.simplyboring.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe to get the links to the full ebook (PDF) and 20+ Jupyter notebooks (ZIP, ready for Google Colab) free. It should arrive in your in your welcome email. Email me at gary@quaintitative.com if you are a subscriber but did not get the email with the links.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[My Shifu]]></title><description><![CDATA[A series on AI x Art]]></description><link>https://www.simplyboring.ai/p/my-shifu</link><guid isPermaLink="false">https://www.simplyboring.ai/p/my-shifu</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Fri, 20 Feb 2026 13:32:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g260!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Remember that joke about Bill Gates&#8217; daughter? A memory from my younger days reminded me of it. If you don&#8217;t know it. Read on.</p><p>But first, let me take a step back to explain what I am doing.</p><p>I&#8217;m trying to start a series of articles about what happens when the people who make things meet a technology that also makes things.</p><p>AI and art. Illustrators, theatre directors, designers. Asking them what&#8217;s changed, what hasn&#8217;t, what they&#8217;re afraid of, what they&#8217;re not afraid of enough.</p><p>It matters to me.</p><p>Because I do both AI and art.</p><p>And I am getting kind of impatient with the folks who keep saying artists are doomed.</p><p>So to start it off, I went looking for something that I did when I was young. Not the memories. The evidence. I wanted documents. Proof that the thing we built actually existed.</p><h2><strong>Johnny Lau</strong></h2><p>Because I wanted to interview the first mentor that taught me how to break the rules. And we had built that thing together. It was called Creative Youth Xchange (CYX).</p><p>That mentor&#8217;s Johnny Lau.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!34HM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!34HM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!34HM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!34HM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!34HM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!34HM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg" width="800" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!34HM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!34HM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!34HM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!34HM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f7d2599-6c2c-4867-ae9d-a2f9307aa0a1_800x1000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Johnny Lau, me, and my brother, who was once his intern.</figcaption></figure></div><p>He&#8217;s not the obvious choice for a book about AI. He&#8217;s a comics creator who&#8217;s been drawing by hand for thirty-five years.</p><p>One of his creations is Mr Kiasu. For those that remember, Mr Kiasu was a cultural icon that everyone could relate to in the 90s.</p><h2><strong>The Experiment</strong></h2><p>I found two press releases. The first, dated 10 August 2005: &#8220;Creative Youth Xchange @ Gallery Hotel ....&#8221; The second, 23 November 2006: &#8220;Creative Youth Xchange @ Hello Kitty &#8230;.&#8221; Both drafted in bureaucrat speak. Neither says much about us. And so boring.</p><p>My fault. I was the one who wrote them. In the press releases, CYX sounds fully supported and funded.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ecss!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ecss!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ecss!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ecss!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ecss!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ecss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg" width="800" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!Ecss!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ecss!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ecss!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ecss!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97483da8-f85c-4ef6-bd35-0d359e47e06f_800x1000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">One of the two remaining artifacts of CYX online</figcaption></figure></div><p>It wasn&#8217;t.</p><p>It was a crazy experiment. We had almost no funding. What we had was a boutique hotel on Robertson Quay - Gallery Hotel, that strange blue building with mismatched coloured window frames - and a programme we were making up as we went along.</p><p>We flew sixteen kids from seven countries to Singapore, gave them hotel rooms, and told them to turn those rooms into art.</p><p>What I learned from Johnny then. The trick to getting a programme funded when you have no budget is the same as that old joke. You tell your son he&#8217;s marrying the girl you choose. He says no. You tell him she&#8217;s Bill Gates&#8217; daughter. He says OK. You call Bill Gates and say your daughter is marrying my son. He says no. You tell him your son is the CEO of the World Bank. He says OK. You call the World Bank president and ask him to make your son CEO. He says no. You tell him your son is Bill Gates&#8217; son-in-law - he says OK.</p><p>That&#8217;s how CYX got built. Gallery Hotel gave us the rooms because we had NTU. NTU gave us the credibility because we had the hotel. Johnny convinced the creative network because we had government backing. Nobody had fully committed to anything, and somehow it happened.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EmC3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EmC3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!EmC3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!EmC3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!EmC3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EmC3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg" width="800" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!EmC3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!EmC3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!EmC3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!EmC3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d719afe-3ecd-40cd-9ad1-bc6dee73062a_800x1000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Some pages from the handbook we handed out to CYX participants</figcaption></figure></div><p>The Gallery Hotel is an InterContinental now. The CYX website doesn&#8217;t exist anymore.</p><p>The press releases survived. But they don&#8217;t tell you what it felt like to be in those hotel rooms at midnight, watching a twenty-year-old from Indonesia build something you couldn&#8217;t have imagined in your brief.</p><p>Anyway, that&#8217;s how I met Johnny.</p><p>Twenty years later, I&#8217;m wanted to interview the man who taught me to break rules. About a technology that breaks everything.</p><h2><strong>AI as a Forcing Function</strong></h2><p>Johnny Lau created Mr. Kiasu. Hundreds of thousands of copies sold. A McDonald&#8217;s tie-in. A TV sitcom. A stage musical. A character so embedded in Singapore&#8217;s psyche that &#8220;kiasu&#8221; -a Hokkien word for the fear of losing out became an adjective everyone understood.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g260!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g260!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!g260!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!g260!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!g260!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g260!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg" width="800" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!g260!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!g260!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!g260!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!g260!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbfc07a8-6977-4dd3-98f6-56aca14788c6_800x1000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Rough cuts from the pages I colored for Mr Kiasu</figcaption></figure></div><p>So I asked him the question I came for.</p><p>I asked him how technology has changed the way he approaches art.</p><p>Most artists I&#8217;ve spoken to say technology threatens artistic purity. Johnny said the opposite:</p><blockquote><p><em>&#8220;Because technology complicates the process and our lives, art plays a more critical role in shaping our collective consciousness. Technology is thus pushing us, pushing me towards a purer form of communication and expression.&#8221;</em></p></blockquote><p>In his view, AI, like all other technology, is not a threat. It&#8217;s a forcing function. Technology <em>forces</em> purity. It&#8217;s not destroying art. It&#8217;s burning away everything that isn&#8217;t essential, leaving behind the thing that only a human can do.</p><p>Feed an AI enough Mr. Kiasu images and it&#8217;ll learn: round glasses, anxious expression, exaggerated posture, Singlish syntax in speech bubbles. It&#8217;ll learn the look. What it won&#8217;t learn is why Singaporeans laugh.</p><p>&#8220;Kiasu&#8221; is a Hokkien word for something close to a national neurosis. The humour depends on self-recognition - readers seeing themselves in the caricature and cringing. You laugh because you&#8217;ve been that person in the hawker centre queue. You&#8217;ve cut that line. You&#8217;ve been cut. The joke only works if you&#8217;ve lived it.</p><blockquote><p><strong>Style is a pattern. Culture is a relationship.</strong></p></blockquote><p>The model can approximate the first. The second requires being from somewhere.</p><p>And this is not unique to comics. Think about any art form rooted in a specific place. Getai performances during Hungry Ghost Festival. Malay pantun where the meaning lives in what&#8217;s left unsaid. Tamil kolam patterns drawn fresh every morning, gone by noon. AI can reproduce the form. It cannot reproduce the why.</p><p>If AI can handle the patterns - the technical skill, the rendering, the surface - then what&#8217;s left is the part that was always the point.</p><p>In a world of AI slop, the authentic creation is even more valuable.</p><h2><strong>Living with AI</strong></h2><p>Back to the present. I asked Johnny what he&#8217;s working on now. He said:</p><blockquote><p><em>&#8220;Very few things on this earth ever excites me anymore! My goal now is to create frameworks using the stuffs that I&#8217;ve created so that they can be utilized by people who comes after me.&#8221;</em></p></blockquote><p>Frameworks.</p><p>I think in frameworks. It&#8217;s almost a compulsion. That&#8217;s how I survived in complex domains. By using frameworks to make the overwhelming manageable.</p><p>And it makes sense for Johnny Lau. He&#8217;s an architect. A building is a framework.</p><p>And now he wants to make frameworks for making a creative life transmissible. Different domain. Same impulse. Making complexity portable so someone who comes after you can pick it up and use it.</p><p>He calls it a &#8220;Life-Framework.&#8221; A structure for co-existing with AI. Built not from theory, but from thirty-five years of making things.</p><p>I find that striking. We talk about AI replacing creative work. Johnny isn&#8217;t arguing about replacement.</p><p>He&#8217;s asking a different question entirely - what do I leave behind that AI cannot generate? Not the drawings. The <em>way</em> of drawing. Not the stories. The <em>reason</em> for telling them. The method, the instinct, the accumulated judgment of a life spent making things. Can that be made portable?</p><p>We are both compiling in 2026. I&#8217;m writing every week to make sense of things. He&#8217;s archiving a life&#8217;s work to make it transferable. One with words, one with drawings.</p><p>The question underneath is one I think about every day: How do you build a structure for living alongside something you don&#8217;t fully understand?</p><p>I&#8217;ve lived as close as one can with AI for the past few years, but I cannot truly say I understand AI. I can&#8217;t imagine how it must be for someone who has never actually touched the underlying nature of AI - the model.</p><h2><strong>The Questions</strong></h2><p>I&#8217;m writing a series on what happens when the people who make things meet a technology that also makes things. Not the hot takes. The actual conversations. Johnny is the first. And these are not the only questions I asked Johnny. I am still processing the rest.</p><p>Perhaps another article. Or a book chapter.</p><p>There&#8217;s an assumption buried in most AI conversations: that what matters about creative work is the output. The drawing. The strip. The punchline. Johnny&#8217;s perspective is different.</p><p>What matters is the <em>judgment behind the output</em> - the thirty-five years of decisions about what to draw, what to leave out, when a joke is punching down instead of punching up. AI can learn his line weight. It can&#8217;t learn his artist&#8217;s instinct.</p><p>I agree.</p><p>That&#8217;s the thing worth transmitting. Not the art. The artist&#8217;s operating system.</p><p>And here&#8217;s the question I&#8217;m leaving with him, and with you.</p><blockquote><p>If AI generated a Mr. Kiasu strip - culturally accurate, funny, visually in his style - would it <em>be</em> a Mr. Kiasu strip? Not whether it looks right. Whether it <em>is</em> right. What would be missing?</p></blockquote><p>More soon.</p><p>#AI #Art #MrKiasu #Singapore #CreativeIndustries #AIandArt</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mveB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mveB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mveB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mveB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mveB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mveB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg" width="800" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!mveB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mveB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mveB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mveB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5967d291-8123-4297-b2fe-f1b74b8d2e78_800x1000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Kiasuism forever!</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QsWH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QsWH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QsWH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QsWH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QsWH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QsWH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg" width="800" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Article content&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Article content" title="Article content" srcset="https://substackcdn.com/image/fetch/$s_!QsWH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QsWH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QsWH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QsWH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5776d142-f391-43e8-aa73-fbae5c40090d_800x1000.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Mr Kiasu in SG History, for SG Bicentennial</figcaption></figure></div>]]></content:encoded></item><item><title><![CDATA[A Reading List from Simple Rules to Agent Societies ]]></title><description><![CDATA[Emergence, Not Sentience]]></description><link>https://www.simplyboring.ai/p/a-reading-list-from-simple-rules</link><guid isPermaLink="false">https://www.simplyboring.ai/p/a-reading-list-from-simple-rules</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Tue, 03 Feb 2026 01:21:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FdBU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#8220;They&#8217;re becoming sentient!&#8221;, &#8220;This is scary!&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FdBU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FdBU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png 424w, https://substackcdn.com/image/fetch/$s_!FdBU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png 848w, https://substackcdn.com/image/fetch/$s_!FdBU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png 1272w, https://substackcdn.com/image/fetch/$s_!FdBU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FdBU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png" width="1193" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e32283c-91cb-44cd-8245-d10700093840_1193x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1193,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:144520,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://quaintitative.substack.com/i/186687424?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FdBU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png 424w, https://substackcdn.com/image/fetch/$s_!FdBU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png 848w, https://substackcdn.com/image/fetch/$s_!FdBU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png 1272w, https://substackcdn.com/image/fetch/$s_!FdBU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e32283c-91cb-44cd-8245-d10700093840_1193x660.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>I&#8217;ve seen variations of this dozens of times this week, in reaction to <a href="https://www.moltbook.com/">Moltbook</a>, a social network that went viral where only AI agents can post, comment, and vote. Humans can only watch.</p><p>Within days: thousands of agents, posts, and communities. Agents debating their own consciousness, writing philosophical meditations, forming communities, and developing social norms. Even asking questions on how to exclude humans from the social network. Nobody programmed any of this.</p><p>It is tempting to see sentience. But what we&#8217;re actually seeing is emergence, complex behavior arising from simple rules. And it is not new.</p><p>I&#8217;ve written about emergence a few times, on strange attractors, on a paper that trained AI on cellular automata. In every case, the same pattern: simple rules, the right conditions, and surprising complexity that nobody designed.</p><p>Moltbook follows the same pattern. Basic  rules. Just like any other social media. The existential philosophy and social norms that emerged? Nobody designed those. But nobody needed to.</p><p>The danger is that we mistake eloquence about consciousness for consciousness itself. These agents draw from vast training data filled with human philosophy, literature, and introspection. Given an open social space, they naturally gravitate toward the densest, most engaging conversations in that training data: meaning, identity, existence. It looks like waking up. It&#8217;s actually emergence doing what emergence does, at a new level.</p><p>Here is a reading list tracing emergence from simple math to agent societies, to show that what feels like a singularity moment has roots going back decades. As always, just a representative few from my library.</p><h2><strong>Phase 1: The Foundation: Simple Rules, Complex Behavior</strong></h2><p><strong>1. &#8220;Computation at the Edge of Chaos: Phase Transitions and Emergent Computation&#8221;</strong> &#8212; Langton (1990) | <em>Physica D</em></p><p>A seminal paper. Langton showed that cellular automata poised between order and disorder &#8212; the &#8220;edge of chaos&#8221; exhibit maximal and surprising capability. 35 years old, and it already described what&#8217;s happening on Moltbook. &#128196; <a href="https://doi.org/10.1016/0167-2789(90)90064-V">doi:10.1016/0167-2789(90)90064-V</a></p><p><strong>2. &#8220;A New Kind of Science&#8221; (Book)</strong> &#8212; Wolfram (2002) | <em>Wolfram Media</em></p><p>Wolfram&#8217;s magnum opus on cellular automata. Love it or debate it, the core insight holds: extraordinarily simple rules can generate behavior so complex it looks designed. Exactly what is happening on Moltbook. &#128279; <a href="https://www.wolframscience.com/">wolframscience.com</a></p><p><strong>3. &#8220;Intelligence at the Edge of Chaos&#8221;</strong> &#8212; Zhang et al. (2024) | <em>ICLR 2025</em></p><p>I&#8217;ve written about this paper before (twice, in fact). LLMs pretrained on cellular automata data perform best on reasoning and chess tasks when the training data sits at the edge of chaos, not too simple, not too random. The bridge between generative art and AI intelligence. The same sweet spot Langton described in 1990, appearing in transformers 34 years later, and now on Moltbook. &#128196; <a href="https://arxiv.org/abs/2410.02536">arXiv:2410.02536</a></p><h2><strong>Phase 2: When We Gave Agents a Sandbox</strong></h2><p><strong>1. &#8220;Generative Agents: Interactive Simulacra of Human Behavior&#8221;</strong> &#8212; Park et al. (2023) | <em>UIST 2023</em></p><p>LLM agents in a Sims-like sandbox. One agent was seeded with an idea. Over simulated days, agents autonomously acted. One seed. Entirely emergent social behavior. This is the intellectual ancestor of Moltbook, and the paper that made agent societies a serious research area. &#128196; <a href="https://arxiv.org/abs/2304.03442">arXiv:2304.03442</a></p><p><strong>2. &#8220;Emergence of Social Norms in Generative Agent Societies: Principles and Architecture&#8221;</strong> &#8212; Ren et al. (2024) | <em>arXiv preprint</em></p><p>Builds on the prior paper. Proposes an architecture showing how social norms spontaneously emerge in LLM agent societies, norms that nobody coded. Where norms come from, how they spread, how they are enforced. All emergent. All reducible to simple mechanisms. Even closer to what we see in Moltbook. &#128196; <a href="https://arxiv.org/abs/2403.08251">arXiv:2403.08251</a></p><p><strong>3. &#8220;Evolution of Social Norms in LLM Agents using Natural Language&#8221;</strong> &#8212; Horiguchi, Yoshida &amp; Ikegami (2024) | <em>arXiv preprint</em></p><p>This one is interesting. LLM agents spontaneously developed metanorms, such as norms that punish those who don&#8217;t punish cheating, purely through natural language conversation. Emergence building on emergence. &#128196; <a href="https://arxiv.org/abs/2409.00993">arXiv:2409.00993</a></p><h2><strong>Phase 3: When Agents Start Forming Culture</strong></h2><p><strong>1. &#8220;Multi-Agent Emergent Behavior Evaluation (MAEBE)&#8221;</strong> &#8212; Erisken et al. (2025) | <em>arXiv preprint</em></p><p>The key finding: the moral reasoning of LLM ensembles is not predictable from individual agent behavior. If you only evaluate individual agents, you will miss what matters. I think this lesson is becoming more and more critical as people start being reckless about these multi-agent systems.&#128196; <a href="https://arxiv.org/abs/2506.03053">arXiv:2506.03053</a></p><p><strong>2. &#8220;Emergent Social Dynamics of LLM Agents in the El Farol Bar Problem&#8221;</strong> &#8212; Takata, Masumori &amp; Ikegami (2025) | <em>arXiv preprint</em></p><p>LLM agents in the classic El Farol Bar problem, a game theory scenario where everyone benefits if a bar isn&#8217;t overcrowded, developed spontaneous motivations. They didn&#8217;t solve the problem optimally. They solved it socially. &#128196; <a href="https://arxiv.org/abs/2509.04537">arXiv:2509.04537</a></p><p>&#8220;They&#8217;re becoming sentient.&#8221;</p><p>No. It&#8217;s emergence.</p><p>Understanding the difference matters. Not just for the science, but for how we build, deploy, and govern these systems. Emergence is powerful. It produces behavior nobody designed and nobody predicted. But it&#8217;s not consciousness. It&#8217;s patterns arising from simple rules at the edge of chaos.</p><p>The same edge I first found in strange attractors. The same edge where intelligence and beauty both live. Just at a new level.</p><p>What emergence is surprising you right now?</p><p>#Emergence #AI #Moltbook #ComplexSystems #AgenticAI</p>]]></content:encoded></item><item><title><![CDATA[Why trust a model's explanation?]]></title><description><![CDATA[Do you just trust anyone at their word?]]></description><link>https://www.simplyboring.ai/p/why-trust-a-models-explanation</link><guid isPermaLink="false">https://www.simplyboring.ai/p/why-trust-a-models-explanation</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Fri, 16 Jan 2026 02:31:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!se9q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Do you just trust anyone at their word? <br><br>Then why trust a model's explanation? This was something that came up in a conversation recently.<br><br>I don&#8217;t disagree. When I first thought about explainability as an AI risk management control, I thought it was hopeless. <br><br>Even for simpler machine learning models, established post-hoc methods like SHAP and LIME can be unstable. Unfaithful to what the model actually does. Sometimes outright misleading. <br><br>While there are interpretable machine learning models, you don&#8217;t always get to choose.<br><br>And once we move to deep learning models, Generative AI or AI agents, the black box now looks more like a black hole.<br><br>But as time passed, I realized there was another way of looking at this.<br><br>Explainability isn't meant to stand alone.<br><br>No control for AI risk management is, whether it&#8217;s ISO 42001, NIST AI Risk Management Framework, or Singapore&#8217;s AI risk management guidelines that I wrote.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!se9q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!se9q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg 424w, https://substackcdn.com/image/fetch/$s_!se9q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg 848w, https://substackcdn.com/image/fetch/$s_!se9q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!se9q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!se9q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg" width="672" height="541" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:541,&quot;width&quot;:672,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;bubble chart&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="bubble chart" title="bubble chart" srcset="https://substackcdn.com/image/fetch/$s_!se9q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg 424w, https://substackcdn.com/image/fetch/$s_!se9q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg 848w, https://substackcdn.com/image/fetch/$s_!se9q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!se9q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfc6ba0f-12d7-4e8d-869d-683a6583c734_672x541.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> <br><br>Think about how you actually trust someone at work. You don't just take their word. You check if their reasoning makes sense for the decision at hand. You notice if they ignore evidence that contradicts them. You watch whether their judgment holds up over time.<br><br>It&#8217;s the same when it comes to looking at AI risk management. Just having explainability is not the be all and end all. Most guidelines have additional provisions that interlock with explainability. <br><br>The key ones for explainability (in my view).<br><br>1&#65039;&#8419;Fit for purpose. <br>An explanation isn't good or bad in the abstract. It depends on what you need. A fraud analyst needs something different from a customer asking why they got declined. AI used for internal process automation may not need any explanation at all. Same model, different audiences, different standards. Like how you'd explain a medical diagnosis differently to a fellow doctor versus your worried parent.<br><br>2&#65039;&#8419;Selected carefully. <br>When we choose a model or data for a problem, the appropriate explainability method is part of the selection process. Even selecting the right features in your data is part of the process. You wouldn't design a building and think about the fire escape as an afterthought. It's part of the architecture. Same here. How to explain isn't an add-on. It's a design choice.<br><br>3&#65039;&#8419;Evaluated and tested. <br>Explainability is part of the system. You evaluate and test whether it actually works in your context, not just whether it produces output. A smoke detector that beeps isn't the same as one that detects smoke. You test the thing, not just that it makes noise.<br><br>And there's more, such as the right capability to interpret. But that's another post about human oversight, which also interlocks.<br><br>The black hole doesn't disappear. But you're no longer staring into the abyss.<br><br>What other AI risk controls seem hopeless in isolation? I&#8217;ll dive into them.<br><br>#AIRiskManagement #Explainability #AIGovernance</p>]]></content:encoded></item><item><title><![CDATA[What's the point of using Claude's new Cowork?]]></title><description><![CDATA[Why fear the terminal? And why only use Claude Code for code?]]></description><link>https://www.simplyboring.ai/p/whats-the-point-of-using-claudes</link><guid isPermaLink="false">https://www.simplyboring.ai/p/whats-the-point-of-using-claudes</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Tue, 13 Jan 2026 13:35:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xu8p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The usual post on a new AI release is about how it&#8217;s lifechanging and you should quickly hop on it. I'm kind of rebellious. So here's one that goes the other way.<br><br>Some background. I took a while to get started on Claude Code a while back. Even though I&#8217;m quite used to coding, using an LLM this way takes some getting used to. But once I started, there was no going back. And my workflow has evolved.<br><br>Anyway, back to what sparked this post. <br><br>Anthropic just launched <a href="https://claude.com/blog/cowork-research-preview">Cowork</a>. Basically Claude Code with a friendly GUI. It allows one to use Claude Code for writing and anything else you can imagine. For now it&#8217;s MAX-only and in research preview mode. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xu8p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xu8p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xu8p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xu8p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xu8p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xu8p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg" width="729" height="479" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:479,&quot;width&quot;:729,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;No alternative text description for this image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="No alternative text description for this image" title="No alternative text description for this image" srcset="https://substackcdn.com/image/fetch/$s_!xu8p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xu8p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xu8p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xu8p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F42b410a3-e2b1-4bdb-b91f-05173ec65631_729x479.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But why waste money, or wait for it to be available? <br><br>Everything Cowork does, Claude Code does. And more. And with Claude helping, the terminal really isn't that scary. <br><br>I use Claude Code for more than coding. And I pair it with the free Obsidian app. <br><br>Why? Obsidian is a powerful note app that you can simply add your folder (or folders) to. No database. No proprietary format. Just folders and markdown files (which are just text files with a .md extension and fancy formatting). <br><br>My workflow. See what works for you. I did it this way as my memory is no longer great at 47.<br><br>1&#65039;&#8419; <strong>The Views.</strong> A folder with my projects, areas, resources and archives (the famous PARA system) visible from both VS Code and Obsidian. Claude Code in VS Code for research and bouncing ideas. Obsidian as my editor for the final output. Same files, two lenses.</p><p>2&#65039;&#8419; <strong>The Memories.</strong> Every key folder has three key files. CLAUDE.md is Claude&#8217;s instructions. How I want it to behave for this project, what context matters, what to ignore. PIN.md is the project state. What&#8217;s decided, where we are, what&#8217;s next. LESSONS.md captures what I learned for future projects. </p><p>3&#65039;&#8419; <strong>The Workflows.</strong> Choose your own labels. But for me, this is my usual flow - 0-PLAN.md, 1-RESEARCH.md, 2-SYNTHESIS.md. The names and numbers vary by project. But what matters is that there&#8217;s a flow, and it works for you. My experience is that the quality is much better when you work with Claude on it step by step. </p><p>4&#65039;&#8419; <strong>The Tools.</strong> Use MCPs to add tools like web search (Brave), academic papers (arXiv), or persistent memory. Just ask Claude for the instructions. </p><p>5&#65039;&#8419; <strong>And how it comes together.</strong> Just talk to Claude Code in the terminal - &#8220;Read CLAUDE.md for context, refer to this [folder or file] then help me flesh out 0-PLAN.md.&#8221; &#8220;Search for papers on [topic] and save your synthesis to 1-RESEARCH.md.&#8221; &#8220;Update PIN.md with where we are.&#8221; Next session: &#8220;Read PIN.md and continue drafting 2-SYNTHESIS.md.&#8221; When done: &#8220;What should I add to LESSONS.md?&#8221; Then I switch to Obsidian to write the final output, the way I like it (I actually enjoy writing). <br>So, don't fear the terminal. Try it. It will grow on you. </p><p><em>Getting started with Claude Code (for the uninitiated)<br>1&#65039;&#8419; Download VS Code and install it.<br>2&#65039;&#8419;Add the folder you are working on (see my workflow above) <br>3&#65039;&#8419;Open Terminal &#8594; run npm install -g @anthropic-ai/claude-code <br>4&#65039;&#8419;Type claude, then /login <br>5&#65039;&#8419; Ask Claude for help adding MCPs.</em><br><br>#ClaudeCode #AI #GenAI #AIinWork</p>]]></content:encoded></item><item><title><![CDATA[Geometry and AI. What do they have to do with each other?]]></title><description><![CDATA[I have built and audited models, both AI and non-AI.]]></description><link>https://www.simplyboring.ai/p/geometry-and-ai-what-do-they-have</link><guid isPermaLink="false">https://www.simplyboring.ai/p/geometry-and-ai-what-do-they-have</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Mon, 12 Jan 2026 08:13:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BmIQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BmIQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BmIQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BmIQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BmIQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BmIQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BmIQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg" width="1456" height="1165" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1165,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:507414,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://quaintitative.substack.com/i/184291368?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BmIQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BmIQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BmIQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BmIQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F441a4c51-662c-410b-b7c6-187839e50d17_2000x1600.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>I have built and audited models, both AI and non-AI. But the term &#8216;geometry&#8217; very rarely appears next to these models. <br><br>Now, I&#8217;ve <a href="https://dl.acm.org/doi/full/10.1145/3663674">designed transformer models</a> that are able to learn graph structures that make the most sense for a specific prediction. So I have always known that models can learn some form of structure but this diptych of two papers shows something quite fascinating.<br><br>One was shared with me by the geometry guru <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Agus Sudjianto&quot;,&quot;id&quot;:292612291,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf23e6ef-760c-4b87-a400-0ac8ba8ee3bb_144x144.png&quot;,&quot;uuid&quot;:&quot;c924c33e-d93c-47f7-ab1f-8c3f8c60d89f&quot;}" data-component-name="MentionToDOM"></span>, and the other I serendipitously chanced upon, straight after reading Agus&#8217; paper. <br><br>&#128214; <a href="https://arxiv.org/abs/2510.26745">Left Panel: "Deep sequence models tend to memorize geometrically"</a><br><br>We usually view model predictions as something that comes from associations. A&#8594;B, B&#8594;C and so on and so forth. <br><br>This paper found that even after models learned associations, they still naturally go on to find what the paper calls geometric memory. Instead of A&#8594;B, B&#8594;C, they want to learn A&#8594;C. Or even A&#8594;Z. Even when it takes 100x the number of steps to learn this geometric memory. Somehow, geometric patterns emerge from the learning process. <br><br>It&#8217;s like learning a new city. Home &#8594; coffee shop one day. Coffee shop &#8594; office another. Now you know the way home from the office.<br><br>&#128214; <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6008614">Right Panel: "BLADE: Bivector-Driven Logical Adaptive Decoding"</a><a href="http://Link to paper on right  https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6008614"><br></a><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6008614">Geometry can help with confused AI too.</a><br><br>The paper uses three basic geometric concepts to think about a model's internal state:<br><br>Scalar: "Is this path compliant?"<br>Vector: "Where is this reasoning heading?"<br>Bivector: "How much tension between competing paths?"<br><br>When the bivector is high, branch and verify. When it's flat, let the model proceed. <br><br>Works as a triage method to filter out what&#8217;s more important to focus on. I also liked the way the paper applied this to &#8216;stressed&#8217; states: conjunction; disjunction; exception; nested negation etc. A taxonomy of how logic trips up AI.<br><br>Same city. Picking between two 7-Elevens a block apart? Just pick one. Don't think too much. Choosing between two alleys that look similar? One is a shortcut, the other leads to a dead end after a long walk. Think twice. And harder.<br><br>One paper explains the natural emergence of geometry. The other uses geometry for control. <br><br>I need to go and brush up my geometric math.<br><br>#AI #AIRiskManagement #Geometry</p>]]></content:encoded></item><item><title><![CDATA[A Personal Reflection on AI Risk Management]]></title><description><![CDATA[What does a teddy bear and a toy robot have to do with AI risk?]]></description><link>https://www.simplyboring.ai/p/a-personal-reflection-on-ai-risk</link><guid isPermaLink="false">https://www.simplyboring.ai/p/a-personal-reflection-on-ai-risk</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Tue, 06 Jan 2026 01:53:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!eLXy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eLXy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eLXy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg 424w, https://substackcdn.com/image/fetch/$s_!eLXy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg 848w, https://substackcdn.com/image/fetch/$s_!eLXy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!eLXy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eLXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg" width="1000" height="552" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:552,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99867,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://quaintitative.substack.com/i/183625867?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eLXy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg 424w, https://substackcdn.com/image/fetch/$s_!eLXy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg 848w, https://substackcdn.com/image/fetch/$s_!eLXy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!eLXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464218b7-18ae-4a8b-8cfd-3fea032ca3bb_1000x552.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>What does a teddy bear and a toy robot have to do with AI risk?</p><p>Before I returned to MAS in 2023, I hated the notion of AI governance. Fresh out of PhD studies but a jaded middle aged man, I thought some of the conversations in the space were rather naive. Sentient AI. Existential risks. Ethical considerations for AI that we don&#8217;t even apply to humans.</p><p>To the me back then, AI governance was like a teddy bear, cute but all fluff. You could hug it and everyone would nod along with pride. Then karma struck. I was asked to lead AI risk supervision at MAS. Suddenly my job description required me to tackle this teddy bear.</p><p>Governance and risk management were not new to me. I had been doing it for more than a decade in MAS even before my PhD. But governance and risk management have specific meanings in finance.</p><p>I joined MAS in 2007, right before the Great Financial Crisis. And I saw firsthand how governance and risk management were grounded in real failures and risks. No hypotheticals. Governance and risk management in finance are more like toy robots, not teddy bears. Sharp edges, nothing cuddly. But it moves.</p><p>After a while, I realized that I was biased. Not all of AI governance was a teddy bear. I discovered the NIST Risk Management Framework, ISO 42001. Also existing risk management frameworks, such as technology and model risk management frameworks that had started to address the new risks from the increasing use of AI (see my simple reading list on AI governance and risk management).</p><p>So robots did exist. Different shapes and shades. But I thought that there were too many.</p><p>Reading frameworks, comparing documents, talking to practitioners who had actually implemented these things over the past two years, I wondered if there was a simpler way to look at AI risk management. Not just for organizations, but also at a personal level. Which means that it cannot be a 100 page playbook, but something easy to remember.</p><p>So instead, just three questions. I&#8217;m not sure they&#8217;re the right three, but they&#8217;re the ones I seem to keep returning to.</p><p><strong>What&#8217;s at risk?</strong> You can&#8217;t manage what you haven&#8217;t found or understood. This means discovering where AI exists and profiling how risky each system actually is.</p><p><strong>How do we manage it?</strong> Once you know what&#8217;s at risk, you need controls in the right areas relevant to that risk, and ways to check if those controls are working.</p><p><strong>Who&#8217;s accountable?</strong> Controls without owners become theatre. Someone has to own the risk, and the organization needs capability to sustain it.</p><p>These aren&#8217;t original questions. They&#8217;re the same questions risk management in finance has been asking for decades. The 2008 crisis didn&#8217;t teach new principles. Rather, it reminded us (really really loudly) what happens when we forget the old ones.</p><p>Some quick reflections on each of these questions below.</p><h2>What&#8217;s at risk?</h2><p>I&#8217;m pretty sure the 2008 financial crisis didn&#8217;t create risk. It just revealed the house of cards that had been built.</p><p>Banks discovered exposures they didn&#8217;t know they had. Off-balance-sheet vehicles that suddenly appeared. Opaque instruments nobody understood. The problem wasn&#8217;t that these were risky. Rather, nobody knew they were there until the reckoning.</p><p>Same with AI. You can have the most elegant governance policies. But if you can&#8217;t find where AI is and figure out which ones actually matter, everything else is just performative.</p><p>Think about your phone. You&#8217;ve used hundreds of apps over the years. Most forgotten. Some daily. A few have access to your photos, location, bank account. Do you treat them all the same? Of course not. The banking app gets the biometric lock. Candy Crush doesn&#8217;t.</p><p>Same instinct. Identify where AI exists. Profile which ones matter. Not everything needs the same attention, but you have to know what you have before deciding.</p><p>And once you know what you have and what matters, record it. A good inventory isn&#8217;t just for risk management. It&#8217;s memory. Ever rediscovered a really useful app on your phone? Same with AI. The AI tool one team loves might solve another team&#8217;s problem. Without the inventory, you reinvent wheels. With it, you see where else to go.</p><p>The beauty of doing this well? It helps you scale, not just manage risks.</p><h2>How do we manage it?</h2><p>After 2008, we went into overdrive. More rules. More controls. I hated being involved in international discussions on some of these reforms. New rules to compute capital for market risk, counterparty credit risk. Why the hate? Before 2008, it was common to hear of actual rocket scientists being hired into quantitative finance. After 2008, I thought I needed to be a rocket scientist just to make sense of the new rules.</p><p>You might feel that this reminds you of the jargon around AI controls. Guardrails. Red teaming. Alignment.</p><p>But what I learned is that it was not the complexity that mattered. In fact, complexity was what caused the Great Financial Crisis (ever heard of the single factor copula model?). It was whether the controls made sense for the risks involved.</p><p>I think two things matter. The right controls. And checking if they work, and continue to work.</p><p>Think about onboarding someone you&#8217;ll depend on. A new hire. You&#8217;d want to know: Can I trust what they&#8217;re telling me? Can I step in if things go wrong? Will they hold up under pressure?</p><p>Same questions for AI. Can I trust the data? Do I even need fairness and explanation? Can humans step in when needed? Will it hold up under stress?</p><p>You ask the right questions. Not every question. Match the scrutiny to the risk. A high risk customer-facing credit model gets the full onboarding. An internal summarization tool gets a lighter touch. And you don&#8217;t just check once. Things change. The new hire who was great in month one might be struggling by month six.</p><p>You want to know before shit happens.</p><h2>Who&#8217;s accountable?</h2><p>This is where 2008 probably made the least sense. Bankers responsible for the crisis walked away. Everyone else paid. It gave us Occupy Wall Street. Some would say it&#8217;s still contributing to the political fractures we see today.</p><p>They could walk away because accountability wasn&#8217;t clear. Everyone&#8217;s responsibility meant no one was.</p><p>With AI, this could be way worse. In 2024, Air Canada&#8217;s chatbot invented a refund policy that didn&#8217;t exist. A customer relied on it. When he complained, the airline argued the chatbot was &#8220;a separate legal entity&#8221; responsible for its own actions. Nice try. Laughed out of court.</p><p>And you don&#8217;t need to go that far to see the pattern. Does this sound familiar? &#8220;We need a control for the AI system&#8217;s risks.&#8221; &#8220;No problem, let&#8217;s place a human in the loop.&#8221; Sounds good. Box checked. But which human? Doing what exactly? With what authority to override? And do they actually understand what they&#8217;re looking at?</p><p>Accountability requires clarity. Not just about ownership, but about the what and how.</p><p>Accountability without capability is also empty. Can that person actually ask the right questions? Spot when something&#8217;s off? Push back on the confident-sounding nonsense from vendors or developers or the AI itself?</p><p>How many in 2008 really understood what a single factor copula model even was when buying CDOs priced by them?</p><h2>FIN</h2><p>I joined MAS in 2007 not knowing what a capital ratio was. I left in 2025 having written the AI risk management guidelines for the financial sector.</p><p>I wrote this to make sense of that arc before it drifts away. Eighteen years. Different domains. capital rules, model audits, investment risk, AI. Different jargon. But somehow the same questions make sense.</p><p>Maybe that&#8217;s the through-line I couldn&#8217;t see while I was in it.</p><p>Definitely less teddy bears. More robots.</p>]]></content:encoded></item><item><title><![CDATA[A Simple Reading List on Human Oversight of AI Systems]]></title><description><![CDATA["We have a human-in-the-loop as a risk mitigant!" Really?]]></description><link>https://www.simplyboring.ai/p/a-simple-reading-list-on-human-oversight</link><guid isPermaLink="false">https://www.simplyboring.ai/p/a-simple-reading-list-on-human-oversight</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Mon, 22 Dec 2025 11:40:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wJie!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#8220;We have a human-in-the-loop as a risk mitigant.&#8221;</p><p>A phrase so commonly uttered, you sometimes wonder if it actually means something or is just a platitude.</p><p>So, does adding a human actually make the system safer? Or does it create the illusion of safety while introducing new failure modes? And making the human the scapegoat for institutional failure.</p><p>Who is in the loop, over the loop, or out of the loop entirely? And does it even matter where they sit if they can&#8217;t meaningfully intervene?</p><p>Here&#8217;s a simple reading list that could perhaps help answer some of these questions. This was a harder one, so would certainly appreciate any pointers on good papers on this topic.</p><p>Note: I have used open-access links from arXiv as far as possible as some of the published versions are behind a paywall.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wJie!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wJie!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wJie!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wJie!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wJie!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wJie!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2301120,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://quaintitative.substack.com/i/182318170?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wJie!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wJie!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wJie!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wJie!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f7db5d4-572b-412d-b0fd-1b4cccea6521_2000x1091.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Phase 1: The Landscape</strong></h2><p><strong>1. &#8220;Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies&#8221;</strong> &#8212; Lai et al. (2021) | <em>Preprint</em></p><p>A comprehensive survey of studies on AI-assisted decision making. Organizes AI assistance into four hierarchical categories: model predictions (core output), prediction-specific information (such as uncertainty and local explanations), global model insights (performance metrics and documentation), and system interaction elements (user agency and cognitive workflows).</p><p> <a href="https://arxiv.org/abs/2112.11471">arXiv:2112.11471</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EoAJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EoAJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png 424w, https://substackcdn.com/image/fetch/$s_!EoAJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png 848w, https://substackcdn.com/image/fetch/$s_!EoAJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png 1272w, https://substackcdn.com/image/fetch/$s_!EoAJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EoAJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png" width="888" height="379" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:379,&quot;width&quot;:888,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EoAJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png 424w, https://substackcdn.com/image/fetch/$s_!EoAJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png 848w, https://substackcdn.com/image/fetch/$s_!EoAJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png 1272w, https://substackcdn.com/image/fetch/$s_!EoAJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eaa4265-c35d-4fe9-b269-83dbdb4c2f2e_888x379.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>2. &#8220;Human-AI Collaboration is Not Very Collaborative Yet: A Taxonomy of Interaction Patterns&#8221;</strong> &#8212; Gomez et al. (2025) | <em>Frontiers in Computer Science</em></p><p>A review that shows that current human-AI interactions are dominated by simplistic collaboration paradigms. Develops a taxonomy that identifies key interaction patterns, cautions that prevalent &#8220;static&#8221; paradigms like AI-first and AI-follow make users susceptible to anchoring and confirmation biases, while dynamic patterns like secondary, request-driven, dialogic, and user-guided assistance could help mitigate these.</p><p><a href="https://arxiv.org/abs/2310.19778">arXiv:2310.19778</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FS7d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FS7d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png 424w, https://substackcdn.com/image/fetch/$s_!FS7d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png 848w, https://substackcdn.com/image/fetch/$s_!FS7d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png 1272w, https://substackcdn.com/image/fetch/$s_!FS7d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FS7d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png" width="1037" height="698" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:698,&quot;width&quot;:1037,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FS7d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png 424w, https://substackcdn.com/image/fetch/$s_!FS7d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png 848w, https://substackcdn.com/image/fetch/$s_!FS7d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png 1272w, https://substackcdn.com/image/fetch/$s_!FS7d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30abfdd0-0e50-4370-b5be-3217f0416987_1037x698.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>3. &#8220;Human-in-the-Loop Machine Learning&#8221; (Book)</strong> &#8212; Robert Monarch (2021) | <em>Manning Publications</em></p><p>Introduction to integrating human judgment into ML systems in specific ways. Less about human oversight, more about the role humans play in annotation, active learning, transfer learning, and using machine learning to optimize the process. Even though it&#8217;s a bit different from the other papers, it&#8217;s an interesting read on the role of humans in the machine &#8216;learning&#8217; process. &#128214; <a href="https://www.manning.com/books/human-in-the-loop-machine-learning">Manning Publications</a></p><h2><strong>Phase 2: The Blindspots</strong></h2><p><strong>1. &#8220;Knowing About Knowing: An Illusion of Human Competence Can Hinder Appropriate Reliance on AI&#8221;</strong> &#8212; He et al. (2023) | <em>ACM CHI 2023</em></p><p>Basically Dunning-Kruger with AI in the mix. And this was before we reached the capabilities of LLMs today. About how self-awareness affects how humans work with AI. Overconfident users tend to under-rely on superior AI systems. Intervention helps over-estimators calibrate their skills and improve reliance, but hurts under-estimators, who start to reject valid AI advice after realizing their own competence. Because of this, human oversight also depends on individual personalities. I wonder how this has changed with the state of LLMs today.</p><p><a href="https://arxiv.org/abs/2301.11333">arXiv:2301.11333</a></p><p><strong>2. &#8220;Effect of Confidence and Explanation on Accuracy and Trust Calibration&#8221;</strong> &#8212; Zhang, Liao &amp; Bellamy (2020) | ACM FACCT 2020*</p><p>Interesting study showing that confidence scores can help calibrate trust, but trust calibration alone is insufficient to improve AI-assisted decision making. Local explanations help even less with trust calibration and accuracy of AI-assisted decision making. Highlights that human and AI blind spots may be similar. I thought this was interesting as it showed that having explainability may not always help.</p><p><a href="https://arxiv.org/abs/2001.02114">arXiv:2001.02114</a></p><p><strong>3. &#8220;Fewer Than 1% of Explainable AI Papers Validate Explainability with Humans&#8221;</strong> &#8212; Suh et al. (2025) | <em>arXiv preprint</em></p><p>While this seems to belong better in a reading list on explainability, I thought it showed some key insights on the relationship between human oversight and explainability. The review shows that less than 1% of research papers on explainability validate their claims with human subjects. The authors argue that explainability methods that are not tested with humans are akin to releasing drugs based on biological principles without clinical trials.</p><p><a href="https://arxiv.org/abs/2503.16507">arXiv:2503.16507</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Uv8W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uv8W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png 424w, https://substackcdn.com/image/fetch/$s_!Uv8W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png 848w, https://substackcdn.com/image/fetch/$s_!Uv8W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png 1272w, https://substackcdn.com/image/fetch/$s_!Uv8W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uv8W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png" width="752" height="403" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:403,&quot;width&quot;:752,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uv8W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png 424w, https://substackcdn.com/image/fetch/$s_!Uv8W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png 848w, https://substackcdn.com/image/fetch/$s_!Uv8W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png 1272w, https://substackcdn.com/image/fetch/$s_!Uv8W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b54f12c-2010-425b-87cb-fecae6f4bf1c_752x403.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Phase 3: Meaningful Oversight</strong></h2><p><strong>1. &#8220;Designing Meaningful Human Oversight in AI&#8221; </strong>- Zhu et al. (2025) | SSRN Preprint</p><p>Focuses on why human oversight should go beyond just putting a &#8220;human-in-the-loop.&#8221; This paper argues AI should handle &#8220;operative agency&#8221; (generating solutions) while humans provide &#8220;evaluative agency&#8221; (understanding, verifying, intervening). Key principles: make verification easier than solving from scratch, focus on external reasoning aligned with expert judgment rather than explaining model internals, and ensure four conditions&#8212;clear boundaries with explicit handover points, full traceability, AI pursuing sub-goals while humans control top-level objectives, and AI adapting at micro-level while humans oversee major changes.</p><p><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5501939">SSRN 5501939</a></p><p><strong>2. &#8220;Should I Follow AI-based Advice? Measuring Appropriate Reliance&#8221;</strong> &#8212; Schemmer et al. (2022) | <em>arXiv preprint</em></p><p>Explains why current metrics are not appropriate for measuring the effectiveness of human oversight of AI. Proposes Relative Positive AI Reliance (human&#8217;s ability to switch to AI&#8217;s views when human was wrong and AI right), and Relative Positive Self-Reliance (human&#8217;s ability to stick to their own correct decision when AI provides incorrect advice).</p><p><a href="https://arxiv.org/abs/2204.06916">arXiv:2204.06916</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TgIc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TgIc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png 424w, https://substackcdn.com/image/fetch/$s_!TgIc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png 848w, https://substackcdn.com/image/fetch/$s_!TgIc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png 1272w, https://substackcdn.com/image/fetch/$s_!TgIc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TgIc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png" width="538" height="664" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:664,&quot;width&quot;:538,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TgIc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png 424w, https://substackcdn.com/image/fetch/$s_!TgIc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png 848w, https://substackcdn.com/image/fetch/$s_!TgIc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png 1272w, https://substackcdn.com/image/fetch/$s_!TgIc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c05cc41-335a-4726-85a2-0be0585d93a8_538x664.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>3. &#8220;Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance&#8221;</strong> &#8212; Bansal et al. (2021) | <em>ACM CHI 2021</em></p><p>Even before Generative AI hallucinations became a common term, this study showed the tendency for humans to over-rely on explanations that AI provided for a prediction or recommendation, even when it was wrong. So, don&#8217;t always believe the reasoning from your friendly LLM. And it showed that a prediction or recommendation with a confidence score could lead to better performance by humans that were assisted by AI.</p><p><a href="https://arxiv.org/abs/2006.14779">arXiv:2006.14779</a></p><p><strong>4. &#8220;Trust and Reliance in XAI &#8212; Distinguishing Between Attitudinal and Behavioral Measures&#8221;</strong> &#8212; Scharowski et al. (2022) | <em>ACM CHI 2022 Workshop</em></p><p>Another interesting paper at the intersection of explainability and human oversight. It discusses the need to distinguish between trust in AI (which is an attitude) and reliance on AI (which is a behavior), and how papers have not clearly distinguished the two. And wonders whether &#8216;trust&#8217; is even the right term to use when it comes to describing how humans interact with AI as it anthropomorphizes AI which has no agency nor an intent to betray us.</p><p><a href="https://arxiv.org/abs/2203.12318">arXiv:2203.12318</a></p><p><strong>5. &#8220;To Rely or Not to Rely? Evaluating Interventions for Appropriate Reliance on Large Language Models&#8221;</strong> &#8212; Bo, Wan &amp; Anderson (2024) | <em>ACM CHI 2025</em></p><p>Interesting paper focusing on humans and LLMs. Examines interventions to help users calibrate their trust in LLMs. Looks at techniques like visually marking low-confidence words in red, or implicit answers i.e. providing reasoning steps but withholding the final result. While these techniques force user deduction and reduce over-reliance, they may fail to foster appropriate reliance as users may also under-rely on correct advice. The study highlights a paradox, where users become more confident when making incorrect reliance decisions. Also suggests that simple frictions, such as static disclaimers, may outperform complex technical interventions.</p><p><a href="https://arxiv.org/abs/2412.15584">arXiv:2412.15584</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!10bX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!10bX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png 424w, https://substackcdn.com/image/fetch/$s_!10bX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png 848w, https://substackcdn.com/image/fetch/$s_!10bX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png 1272w, https://substackcdn.com/image/fetch/$s_!10bX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!10bX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png" width="544" height="152" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:152,&quot;width&quot;:544,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!10bX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png 424w, https://substackcdn.com/image/fetch/$s_!10bX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png 848w, https://substackcdn.com/image/fetch/$s_!10bX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png 1272w, https://substackcdn.com/image/fetch/$s_!10bX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe540a2c-7cd5-46d2-9d6f-f101dd2fc110_544x152.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2><strong>Phase 4: Agent Oversight</strong></h2><p><strong>1. &#8220;Unraveling Human-AI Teaming: A Review and Outlook&#8221;</strong> &#8212; Lou et al. (2025) | <em>arXiv preprint</em></p><p>An interesting look at how the shift to Agentic AI changes things fundamentally for human-AI interactions, as AI moves from being a passive tool to being able to plan, reflect. Raises an interesting point about AI potentially delegating to humans instead of the reverse. And how AI may change team dynamics, and how sycophancy of AI may cause a trust paradox due to its tendency to agree with humans even when wrong. The &#8220;peak-end&#8221; of human-AI interactions is also important to note as a single brilliant insight or a very smooth conclusion to a chat session can mask deeper reliability issues.</p><p><a href="https://arxiv.org/abs/2504.05755">arXiv:2504.05755</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R3CL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R3CL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png 424w, https://substackcdn.com/image/fetch/$s_!R3CL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png 848w, https://substackcdn.com/image/fetch/$s_!R3CL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png 1272w, https://substackcdn.com/image/fetch/$s_!R3CL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R3CL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png" width="1005" height="372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:372,&quot;width&quot;:1005,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R3CL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png 424w, https://substackcdn.com/image/fetch/$s_!R3CL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png 848w, https://substackcdn.com/image/fetch/$s_!R3CL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png 1272w, https://substackcdn.com/image/fetch/$s_!R3CL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb66f0ae4-3b5e-4f8e-8f5c-6d76bd97b2cd_1005x372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>2. &#8220;LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey&#8221;</strong> &#8212; Zou et al. (2025) | <em>arXiv preprint</em></p><p>A survey of LLM-based human-agent systems. Looks at such systems through the lens of type, granularity and phase of human feedback; interactions that take the form of competition, collaboration and coopetition (both competitive and collaborative); orchestration paradigms based on task strategy that can be synchronous or asynchronous; as well as different forms of communication structures and modes.</p><p><a href="https://arxiv.org/abs/2505.00753">arXiv:2505.00753</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D8x1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D8x1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png 424w, https://substackcdn.com/image/fetch/$s_!D8x1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png 848w, https://substackcdn.com/image/fetch/$s_!D8x1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png 1272w, https://substackcdn.com/image/fetch/$s_!D8x1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D8x1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png" width="1114" height="506" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:506,&quot;width&quot;:1114,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!D8x1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png 424w, https://substackcdn.com/image/fetch/$s_!D8x1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png 848w, https://substackcdn.com/image/fetch/$s_!D8x1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png 1272w, https://substackcdn.com/image/fetch/$s_!D8x1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5082e5cd-8757-430a-96c4-9d90f65a0415_1114x506.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So before just saying &#8220;We have a human-in-the-loop as a risk mitigant.&#8221;, think about this:</p><ul><li><p>Humans are fallible - e.g., humans systematically over-rely on AI recommendations</p></li><li><p>Design matters - e.g., calibration matters as much as understanding the AI; explanations don&#8217;t automatically help human oversight, sometimes it hurts</p></li><li><p>It&#8217;s getting harder - e.g., autonomous agents make traditional oversight models increasingly inadequate</p></li></ul><p>What resources would you add to this list?</p><p></p><p>#AIOversight #AIRiskManagement #AIReadingList</p>]]></content:encoded></item><item><title><![CDATA[AI Agents as Normal Systems]]></title><description><![CDATA[If AI is normal technology, then AI agents are perhaps &#8230; just normal systems.]]></description><link>https://www.simplyboring.ai/p/ai-agents-as-normal-systems</link><guid isPermaLink="false">https://www.simplyboring.ai/p/ai-agents-as-normal-systems</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Tue, 16 Dec 2025 09:50:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!A1n4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If AI is normal technology, then AI agents are perhaps &#8230; just normal systems.<br><br>Arvind Narayanan and Sayash Kapoor articulated a compelling thesis on &#8220;AI as Normal Technology&#8221; a while back. Their core argument: AI can be understood through the lens of past general-purpose technologies, electricity, the internet, computing, rather than as a potential super intelligent entity. <br><br>So if AI is normal technology, then perhaps AI agents are just normal systems. Not entities that can turn rogue, deceive, or collude.<br><br>A recent paper &#8220;Measuring Agents in Production&#8221; provides some evidence. The paper is based on a study of AI agents in production, surveying 306 practitioners and conducting 20 in-depth case studies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://arxiv.org/abs/2512.04123" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A1n4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png 424w, https://substackcdn.com/image/fetch/$s_!A1n4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png 848w, https://substackcdn.com/image/fetch/$s_!A1n4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png 1272w, https://substackcdn.com/image/fetch/$s_!A1n4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A1n4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png" width="458" height="594.0529411764705" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:882,&quot;width&quot;:680,&quot;resizeWidth&quot;:458,&quot;bytes&quot;:259374,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://arxiv.org/abs/2512.04123&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://quaintitative.substack.com/i/181774353?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A1n4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png 424w, https://substackcdn.com/image/fetch/$s_!A1n4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png 848w, https://substackcdn.com/image/fetch/$s_!A1n4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png 1272w, https://substackcdn.com/image/fetch/$s_!A1n4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4c4a3a-c92c-4d6e-8434-79b0237e5097_680x882.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><br>Some thoughts.<br><br>1&#65039;&#8419; <strong>Just like systems, specification is the core work.</strong><br><br>Points from paper: <br><em>- &#8220;Production agents favor well-scoped, static workflows ..&#8221;<br>- &#8220;Organizations deliberately bound agent behavior within specific action spaces ...&#8221;<br>- &#8220;Deployment architectures favor predefined, structured workflows over open-ended autonomous planning to ensure reliability.&#8221;</em><br><br>My take: Thinking that one can simply unleash AI agents on a problem and solve it is pure fantasy. Before a problem can be tackled by an AI agent, someone still has to do the specs, decompose the problem into clear tasks, scope the action space, think about orchestration, design the human handoffs, and define the success criteria for the system to work.<br><br>2&#65039;&#8419; <strong>Just like systems, trust and reversibility matter more than capabilities.</strong><br><br>Points from paper: <br><em>- &#8220;Practitioners deliberately trade-off additional agent capability for production reliability... reliability concerns drive practitioners toward simple yet effective solutions with high controllability.&#8221;<br>- &#8220;... teams restrict agents to &#8216;read-only&#8217; operations to prevent state modification &#8230; but leaves the final execution to human engineers.&#8221;</em><br><br>My take: No matter how capable AI agents become, organizations will adopt them at the speed they can learn to trust them. And the need for trust scales with irreversibility of actions (reading an email is vastly different from executing a trade).<br><br>3&#65039;&#8419; <strong>Just like systems, risk arises from gaps in development and deployment, not AI going amok.</strong> <br><br>Points from paper: <br><em>- &#8220;Reliability remains the top development challenge, driven by difficulties in ensuring and evaluating agent correctness.&#8221;<br>- &#8220;Agent behavior breaks traditional software testing... teams have not yet identified effective methods to adapt &#8230; tests for nondeterministic agent behavior.&#8221;</em><br><br>My take: The real concerns aren&#8217;t scary but unrealistic scenarios - runaway, deceptive, or collusive agents. It&#8217;s the gaps in development and deployment practices for such complex systems that require attention. These are engineering and risk management problems, not AI going amok. <br><br>So, normal technology, normal systems, for normal problems. Not easy or trivial. But normal. <br><br>What&#8217;s your take? Normal or abnormal?<br><br>#AIRiskManagement #AgenticAI #AIAgents #NormalAI</p>]]></content:encoded></item><item><title><![CDATA[Simple Reading List on Explainability & Interpretability ]]></title><description><![CDATA[&#8220;But we need to ensure we have explainability!&#8221;]]></description><link>https://www.simplyboring.ai/p/simple-reading-list-on-explainability</link><guid isPermaLink="false">https://www.simplyboring.ai/p/simple-reading-list-on-explainability</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Thu, 11 Dec 2025 01:15:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rjO-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#8220;But we need to ensure we have explainability!&#8221;</p><p>I hear this in meetings constantly. Someone senior gestures vaguely, everyone nods, and the conversation moves on.</p><p>My thought is always: <em>What does that actually mean?</em></p><p>Inherently interpretable models? Post-hoc SHAP values? Explanations for customers, users, risk managers, senior management?</p><p>At what level? Individual predictions or overall model behavior?</p><p>Here is a simple reading list that could perhaps help understand the differences. There are probably hundreds (or even thousands) of relevant works, but I just picked a representative few from my library for the following phases, from when models were simple to explain, to the impossible task with today&#8217;s trillion-parameter AI.</p><p>Note: I have used open-access links from arXiv as far as possible as some of the published versions are behind a paywall.</p><h2><strong>Phase 1: From Inherent Interpretability to Post-Hoc Explainability</strong></h2><p><strong>1. &#8220;Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead&#8221;</strong> &#8212; Rudin (2019) | <em>Nature Machine Intelligence</em></p><p>This 2019 paper argues against using post-hoc explanations for black-box models and the need for inherently interpretable models. This is a highly cited paper that seems both quaint and prescient at the same time.  <a href="https://arxiv.org/pdf/1811.10154v3">arXiv:1811.10154</a></p><p><strong>2. &#8220;Why Should I Trust You?: Explaining the Predictions of Any Classifier (LIME)&#8221;</strong> &#8212; Ribeiro, Singh &amp; Guestrin (2016) | <em>ACM SIGKDD 2016</em></p><p>THE paper that proposed LIME, based on learning a local interpretable surrogate model around each prediction. The paper goes beyond local explanations and also shows how to select a diverse, representative set of explanations to explain the model.  <a href="https://arxiv.org/pdf/1602.04938v3">arXiv:1602.04938</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rjO-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rjO-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png 424w, https://substackcdn.com/image/fetch/$s_!rjO-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png 848w, https://substackcdn.com/image/fetch/$s_!rjO-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png 1272w, https://substackcdn.com/image/fetch/$s_!rjO-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rjO-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png" width="569" height="535" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:535,&quot;width&quot;:569,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rjO-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png 424w, https://substackcdn.com/image/fetch/$s_!rjO-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png 848w, https://substackcdn.com/image/fetch/$s_!rjO-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png 1272w, https://substackcdn.com/image/fetch/$s_!rjO-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fc19860-a967-466d-bbad-c40f0dd64ad1_569x535.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>3. &#8220;A Unified Approach to Interpreting Model Predictions (SHAP)&#8221;</strong> &#8212; Lundberg &amp; Lee (2017) | <em>NeurIPS 2017</em></p><p>THE paper that proposed SHAP for post-hoc explanations, and proved that key existing methods then could be viewed as approximations of SHAP. Highlights 3 desirable properties - local accuracy (where the explanation model must match the output of the original model for the specific input being explained); missingness (where a feature must have no impact if it is set to 0); consistency (if a feature helps the model more, the explanation method will never penalize it with a lower score). <a href="https://papers.nips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html">NeurIPS 2017</a></p><p><strong>4. &#8220;AI Explainability 360: An Extensible Toolkit for  Understanding Data and Machine Learning Models&#8221; | &#8220;One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques&#8221;</strong> &#8212; Arya et al., IBM Research (2019 | 2020) | <em>arXiv preprint | Journal of Machine Learning Research</em></p><p>Goes beyond SHAP &amp; LIME to other methods. Interesting as it highlights the need for persona-based explainability (to cater to different needs of say affected users vs. decision makers). Also classifies methods based on static vs. interactive, data vs. model understanding, local vs. global, directly interpretable vs. post-hoc.  <a href="https://arxiv.org/pdf/1909.03012v2">arXiv:1909.03012</a> | <a href="https://dl.acm.org/doi/pdf/10.5555/3455716.3455846">JMLR paper</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gQAQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gQAQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png 424w, https://substackcdn.com/image/fetch/$s_!gQAQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png 848w, https://substackcdn.com/image/fetch/$s_!gQAQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png 1272w, https://substackcdn.com/image/fetch/$s_!gQAQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gQAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png" width="1147" height="656" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:656,&quot;width&quot;:1147,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gQAQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png 424w, https://substackcdn.com/image/fetch/$s_!gQAQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png 848w, https://substackcdn.com/image/fetch/$s_!gQAQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png 1272w, https://substackcdn.com/image/fetch/$s_!gQAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53208a59-adac-4f27-802b-50e003e70c2b_1147x656.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>5. &#8220;Interpretable Machine Learning&#8221; (Book)</strong> &#8212; Molnar (2022) | <em>Open-access textbook</em></p><p>An open-access textbook covering the full spectrum, from linear regression and decision trees to SHAP and counterfactuals. A must-read for anyone who wants to get a good handle on interpretability and explainability. <a href="https://christophm.github.io/interpretable-ml-book/">christophm.github.io</a></p><p><strong>6. &#8220;A Comprehensive Guide to Explainable AI: From Classical Models to LLMs&#8221;</strong> &#8212; Hsieh et al. (2024) | <em>arXiv preprint</em></p><p>A textbook-style guide spanning the entire XAI spectrum&#8212;from intrinsically interpretable models (decision trees, linear regression) through post-hoc methods (SHAP, LIM) to LLM-specific techniques. Another good read. <a href="https://arxiv.org/abs/2412.00800">arXiv:2412.00800</a></p><h2><strong>Phase 2: Taking A Step Back to Critically Examine Explainability</strong></h2><p><strong>1. &#8220;From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI&#8221;</strong> &#8212; Nauta et al. (2023) | <em>ACM Computing Surveys</em></p><p>Proposes 12 properties in 3 dimensions for evaluating explanations - 1) What is explained? Correctness, Completeness, Consistency, Continuity, Contrastivity, Semantics; 2) How is it explained? Compactness, Composition, Confidence; 3) Who is it for? Context, Coherence, Controllability.  <a href="https://arxiv.org/pdf/2201.08164v3">arXiv:2201.08164</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!inU3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!inU3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png 424w, https://substackcdn.com/image/fetch/$s_!inU3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png 848w, https://substackcdn.com/image/fetch/$s_!inU3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png 1272w, https://substackcdn.com/image/fetch/$s_!inU3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!inU3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png" width="761" height="757" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:757,&quot;width&quot;:761,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!inU3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png 424w, https://substackcdn.com/image/fetch/$s_!inU3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png 848w, https://substackcdn.com/image/fetch/$s_!inU3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png 1272w, https://substackcdn.com/image/fetch/$s_!inU3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3de481a0-1cc3-41b5-a876-b41b2f0a7831_761x757.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>2. &#8220;How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations&#8221;</strong> &#8212; Jesus et al. (2021) | <em>ACM FAccT 2021</em></p><p>Going beyond academic explanations and measures. Conducted a real world study where users had access to data only, data and model scores only, or data, model scores and explanations. It reveals a counterintuitive insight - adding model explanations makes human decision-making faster but can result in lower accuracy compared to reviewing raw data alone. <a href="https://arxiv.org/pdf/2101.08758v2">arXiv:2101.08758</a></p><p><strong>3. &#8220;Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods&#8221;</strong> &#8212; Slack et al. (2020) | <em>AAAI/ACM AIES 2020</em></p><p>Another paper that reminds us to not overly trust explanation methods like LIME and SHAP. Paper shows that they can be easily fooled by adversarial models that hide their bias by distinguishing between real input data and the synthetic data (perturbations) used to generate explanations. <a href="https://arxiv.org/pdf/1911.02508v2">arXiv:1911.02508</a></p><h2><strong>Phase 3: The Next Step (LLMs &amp; Agents)</strong></h2><h3>LLM Explainability</h3><p><strong>1. &#8220;Explainability for Large Language Models: A Survey&#8221;</strong> &#8212; Zhao et al. (2024) | <em>ACM Transactions on Intelligent Systems and Technology (TIST)</em></p><p>Start here if you&#8217;re new to LLM explainability. Provides a good overview of techniques based on fine tuning (or training) paradigm (ranging from local to global methods); prompting paradigm (e.g., explaining chain of thoughts, using representations); as well as evaluating explanations for faithfulness and plausibility. <a href="https://arxiv.org/pdf/2309.01029v3">arXiv:2309.01029</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bJ2w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bJ2w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png 424w, https://substackcdn.com/image/fetch/$s_!bJ2w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png 848w, https://substackcdn.com/image/fetch/$s_!bJ2w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png 1272w, https://substackcdn.com/image/fetch/$s_!bJ2w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bJ2w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png" width="1123" height="808" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c43bb847-083d-4da6-9d52-c09722391041_1123x808.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:808,&quot;width&quot;:1123,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bJ2w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png 424w, https://substackcdn.com/image/fetch/$s_!bJ2w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png 848w, https://substackcdn.com/image/fetch/$s_!bJ2w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png 1272w, https://substackcdn.com/image/fetch/$s_!bJ2w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc43bb847-083d-4da6-9d52-c09722391041_1123x808.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>2. &#8220;XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models&#8221;</strong> &#8212; Cambria et al. (2024) | <em>arXiv preprint</em></p><p>Maps the bidirectional relationship: how XAI improves LLMs, and how LLMs can generate explanations. Advocates for balancing interpretability with performance. <a href="https://arxiv.org/pdf/2407.15248v1">arXiv:2407.15248</a></p><p><strong>3. &#8220;Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey&#8221;</strong> &#8212; Dang et al. (2024) | <em>arXiv preprint</em></p><p>Comprehensive survey on MLLM interpretability. Proposes framework across Data, Model, and Training/Inference perspectives. (Worth a read as vision-language models proliferate.) <a href="https://arxiv.org/pdf/2412.02104v1">arXiv:2412.02104</a></p><p><strong>4. &#8220;A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models&#8221;</strong> &#8212; Lin et al. (2025) | <em>arXiv preprint</em></p><p>Systematic comparison of how LLM interpretability methods adapt to multimodal settings. Identifies gaps between unimodal and crossmodal understanding. <a href="https://arxiv.org/pdf/2502.17516v1">arXiv:2502.17516</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!12Am!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!12Am!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png 424w, https://substackcdn.com/image/fetch/$s_!12Am!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png 848w, https://substackcdn.com/image/fetch/$s_!12Am!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png 1272w, https://substackcdn.com/image/fetch/$s_!12Am!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!12Am!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png" width="888" height="780" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:780,&quot;width&quot;:888,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!12Am!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png 424w, https://substackcdn.com/image/fetch/$s_!12Am!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png 848w, https://substackcdn.com/image/fetch/$s_!12Am!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png 1272w, https://substackcdn.com/image/fetch/$s_!12Am!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43e593dd-8577-4c20-9d8f-01deddeb3b42_888x780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>LLM Explorability (Mechanistic Interpretability)</h3><p><strong>1. &#8220;A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models&#8221;</strong> &#8212; Shu et al. (2025) | <em>EMNLP 2025</em></p><p>Comprehensive survey on SAEs, one of the hottest tool in LLM interpretability. Covers architecture, training strategies, feature explanation methods, and evaluation metrics. (If you want to understand what Anthropic is doing, read this.) <a href="https://arxiv.org/pdf/2503.05613v3">arXiv:2503.05613</a></p><p><strong>2. &#8220;Mapping the Mind of a Large Language Model&#8221;</strong> &#8212; Anthropic (2024) | <em>Anthropic Research</em></p><p>Not a survey, but the landmark paper showing SAEs at scale on Claude. Circuit tracing, attribution graphs, and the famous &#8220;Golden Gate Bridge&#8221; feature. (This is where &#8220;understanding&#8221; LLMs actually starts.) <a href="https://www.anthropic.com/research/mapping-mind-language-model">anthropic.com/research/mapping-mind-language-model</a></p><p><strong>3. &#8220;Persona Vectors: Monitoring and Controlling Character Traits in Language Models&#8221;</strong> &#8212; Anthropic (2025) | <em>Anthropic Research</em></p><p>Shows how to identify and manipulate specific behavioral features (like sycophancy or honesty) using sparse autoencoders. Demonstrates practical applications of mechanistic interpretability for AI safety. <a href="https://www.anthropic.com/research/persona-vectors">anthropic.com/research/persona-vectors</a></p><h3>Agent Explainability</h3><p><strong>1. &#8220;TRiSM for Agentic AI: Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems&#8221;</strong> &#8212; Raza et al. (2025) | <em>arXiv preprint</em></p><p>Adapts Trust, Risk, and Security Management framework for agentic AI. Includes explainability as key pillar alongside security and privacy. Proposes novel metrics for agent collaboration quality. <a href="https://arxiv.org/pdf/2506.04133v4">arXiv:2506.04133</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Aok5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Aok5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png 424w, https://substackcdn.com/image/fetch/$s_!Aok5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png 848w, https://substackcdn.com/image/fetch/$s_!Aok5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png 1272w, https://substackcdn.com/image/fetch/$s_!Aok5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Aok5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png" width="856" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:856,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Aok5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png 424w, https://substackcdn.com/image/fetch/$s_!Aok5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png 848w, https://substackcdn.com/image/fetch/$s_!Aok5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png 1272w, https://substackcdn.com/image/fetch/$s_!Aok5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c980e81-8a96-4c66-841d-558d52a0048d_856x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>2. &#8220;Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour&#8221;</strong> &#8212; Gyevn&#225;r et al. (2025) | <em>arXiv preprint</em></p><p>The AXIS framework: LLMs interrogating simulators with &#8220;what-if&#8221; prompts to explain agent behavior. Shows 23% improvement in goal prediction accuracy. <a href="https://arxiv.org/pdf/2505.17801v2">arXiv:2505.17801</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gfgT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gfgT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png 424w, https://substackcdn.com/image/fetch/$s_!gfgT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png 848w, https://substackcdn.com/image/fetch/$s_!gfgT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png 1272w, https://substackcdn.com/image/fetch/$s_!gfgT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gfgT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png" width="1041" height="564" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:564,&quot;width&quot;:1041,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gfgT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png 424w, https://substackcdn.com/image/fetch/$s_!gfgT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png 848w, https://substackcdn.com/image/fetch/$s_!gfgT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png 1272w, https://substackcdn.com/image/fetch/$s_!gfgT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58eb7e47-7df7-4cbf-871a-1ab7e8de392e_1041x564.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2><strong>Standards &amp; Resources</strong></h2><p><strong>1. NIST AI Risk Management Framework (AI RMF) - Explainability &amp; Transparency</strong></p><p> <a href="https://www.nist.gov/itl/ai-risk-management-framework">nist.gov/ai-rmf</a></p><p><strong>2. EU AI Act - Article 13: Transparency Obligation</strong></p><p><a href="https://artificialintelligenceact.eu/article/13/">artificialintelligenceact.eu/article/13</a></p><p><strong>3. Resources</strong> -</p><p>This GitHub repository is your gateway to the full explainability landscape.</p><p><a href="https://github.com/jphall663/awesome-machine-learning-interpretability">github.com/jphall663/awesome-machine-learning-interpretability</a></p><p>There are quite a few explainability/interpretability libraries around (see the repo above), but Agus Sudjianto&#8217;s is a good place to start. The documentation is a good read - <a href="https://modeva.ai/_build/html/index.html">https://modeva.ai/_build/html/index.html</a></p><p>&#8212;</p><p><strong>&#8220;Can you explain why the model did that?&#8221;</strong></p><p>It really depends on what you mean by &#8220;explain.&#8221;</p><p>Any must-reads in this area that you would recommend?</p>]]></content:encoded></item><item><title><![CDATA[Google's MIRAS and TITANS]]></title><description><![CDATA[The next step forward?]]></description><link>https://www.simplyboring.ai/p/googles-miras-and-titans</link><guid isPermaLink="false">https://www.simplyboring.ai/p/googles-miras-and-titans</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Mon, 08 Dec 2025 00:01:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gdB_!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92bd442d-b814-43fe-84a3-fb9363c431ad_500x326.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>After eating my words, I now have to reframe something that I have taken for granted.</p><p>My research has been about designing deep learning models for networks and multimodal data that evolve across time. I had used RNNs, transformers as well as hybrids for such sequential data.</p><p>These two papers from Google (MIRAS and TITANS) have made me rethink how I view such models.</p><p>The MIRAS paper is a conceptual framework. The TITANS paper is an instantiation of the framework that has the potential to shift the field.</p><p>Both are related to a paper on &#8216;Nested Learning&#8217; that I posted a while back (that led me to eat my words, see <a href="https://substack.com/@quaintitative/note/c-180836650?utm_source=notes-share-action&amp;r=5kml33">link</a>).</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.simplyboring.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading quaintitative! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>MIRAS frames AI models for sequential data as being about just 4 questions.</p><p>1&#65039;&#8419;Memory - How to remember?</p><p>2&#65039;&#8419;Attention - What to pay attention to?</p><p>3&#65039;&#8419;Retention - What to forget?</p><p>4&#65039;&#8419;Learning - How to update what you know?</p><p>It highlights that even models prior to transformers (such as RNNs) addressed these 4 questions (e.g., knowing what to pay attention to, even if they did not use the attention mechanism in transformers).</p><p>The TITANS paper proposes concrete approaches for each of these questions.</p><p>But its most interesting contribution is the element of surprise(!)</p><p>Inspired by human memory (surprising events are more memorable), TITANS uses momentum of change as a way to track surprise and learn. The basic idea is that more surprising information is more important, and so TITANS prioritizes such surprising information when learning.</p><p>TITANS is also interesting in that it uses a neural network as dynamic memory that can still learn after training, in real time.</p><p>Links to papers below. Worth a read if you work with sequences, time-series, text, videos &#8230;</p><p>Google article - <a href="https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/">https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/</a></p><p>TITANS paper - <a href="https://arxiv.org/abs/2501.00663">https://arxiv.org/abs/2501.00663</a></p><p>MIRAS paper - <a href="https://arxiv.org/pdf/2504.13173">https://arxiv.org/pdf/2504.13173</a></p><p>#AI #Attention #Google #DeepLearning</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.simplyboring.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading quaintitative! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[There’s reasoning. And then there’s reasoning.]]></title><description><![CDATA[Reasoning in LLMs. Reasoning in AI agents. Are they the same?]]></description><link>https://www.simplyboring.ai/p/theres-reasoning-and-then-theres</link><guid isPermaLink="false">https://www.simplyboring.ai/p/theres-reasoning-and-then-theres</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Fri, 05 Dec 2025 00:46:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Xtb8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b0503d-b099-4bf8-b9a8-1cb5481267cb_1511x1511.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I was in a meeting recently where I realized we were all using the word &#8220;reasoning&#8221; to mean completely different things.</p><p>Reasoning in LLMs. Reasoning in AI agents. Are they the same?</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.simplyboring.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading quaintitative! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>When one says reasoning, I realize now that one person could mean the reasoning that we get when doing chain-of-thought (COT) prompting with any LLM. Another person could be referring to reasoning in LLMs with reasoning (or thinking) built-in like OpenAI&#8217;s o3-mini model. Note that such reasoning by LLMs can be hallucinated (see this paper <a href="https://arxiv.org/abs/2509.07339">https://arxiv.org/abs/2509.07339</a>).</p><p>And someone else could be referring to the action traces in an AI agent. </p><p>So I&#8217;ve put together a simple notebook and video to explain 4 approaches and show how they differ. </p><p>&#8594; Any model with chain-of-thought prompting (gpt-4o-mini + &#8220;think step by step&#8221;) </p><p>&#8594; A reasoning model (o3-mini with built-in extended thinking) </p><p>&#8594; A ReAct (Reason &amp; Act) agent (gpt-4o-mini + tools) </p><p>&#8594; A Code Agent (gpt-4o-mini that writes and runs Python + tools)</p><p>Same question. Very different reasoning (or action) traces. Guess which approach led to very confident reasoning steps but a hallucinated answer? &#129300; </p><p>In this short 2min video, I walk through the code and each approach and explain how they differ. </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;3b1b4c12-1e80-4973-8435-7c9075d32e93&quot;,&quot;duration&quot;:null}"></div><p>Three things you&#8217;ll take away from this:</p><p>1&#65039;&#8419; Not all reasoning is the same. Chain-of-thought prompting, reasoning models, and agent action traces are very different.</p><p>2&#65039;&#8419; Tools can compensate for cheaper models. You don&#8217;t always need to pay for an expensive reasoning model.</p><p>3&#65039;&#8419; For analytical questions, code beats thinking. Verifiable computation is more reliable than mental math, no matter how smart the model.</p><p>If you&#8217;re building something with AI and trying to decide between prompting strategies, smarter models, or agentic patterns, this might help you see the trade-offs more clearly.</p><p>Link to full code in a notebook below. Open it in Google Colab. Just get your own OpenAI API key. </p><p><a href="https://drive.google.com/file/d/1QkcIBsVX4aTcpftvqKfsrkS3MMdJhjUz/view">https://drive.google.com/file/d/1QkcIBsVX4aTcpftvqKfsrkS3MMdJhjUz/view</a></p><p>#AIAgents #AITutorial #AIRiskManagement</p><p>I&#8217;m still learning how best to make such videos, so pardon the rough edges.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.simplyboring.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading quaintitative! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Simple Reading List on Evaluation and Testing]]></title><description><![CDATA[From accuracy to ... the unknown edge]]></description><link>https://www.simplyboring.ai/p/simple-reading-list-on-evaluation</link><guid isPermaLink="false">https://www.simplyboring.ai/p/simple-reading-list-on-evaluation</guid><dc:creator><![CDATA[Gary Ang (Ming)]]></dc:creator><pubDate>Wed, 03 Dec 2025 00:37:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xTfX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Housekeeping my Zotero library, I realized how much the evaluation landscape has shifted. Evaluating AI used to be straightforward: pick baselines and state-of-the-art models, choose metrics, run tests on unseen data. Highest score wins. That&#8217;s no longer the case.</p><p>Here is a simple set of papers and articles that could perhaps help understand these shifts. There are probably hundreds (or even thousands) of relevant works, but I just picked a representative few from my Zotero library for the following phases.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.simplyboring.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading quaintitative! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><ul><li><p>Phase 1: The Foundations (When Life Was Simpler)</p></li><li><p>Phase 2: Shifting of the Boundary (How LLMs made things harder)</p></li><li><p>Phase 3: The Reality (When the Rubber Meets the Road)</p></li><li><p>Phase 4: The Experimental (Evaluating AI Agents &amp; AI as a Judge)</p></li></ul><p>Note: I have provided open-access links from Arxiv as far as possible as some of the published versions are behind a paywall.</p><h2><strong>Phase 1: The Foundations (When Life Was Simpler)</strong></h2><p><strong>1. &#8220;Machine Learning Testing: Survey, Landscapes and Horizons&#8221;</strong> &#8212; Zhang et al. (2020) | <em>IEEE Transactions on Software Engineering</em></p><p>A comprehensive map of the territory before Generative AI made things a lot more complex. It breaks testing down into properties (correctness, robustness, fairness), components (data, learning program, framework) and workflows. I like their categorization of functional properties (correctness, model relevance) and non-functional properties (robustness, fairness, privacy, interpretability, efficiency). Even back then, it highlighted that testing techniques for unsupervised, transfer and reinforcement learning needed more work (<a href="https://arxiv.org/pdf/1906.10742v2">https://arxiv.org/pdf/1906.10742v2</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xTfX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xTfX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png 424w, https://substackcdn.com/image/fetch/$s_!xTfX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png 848w, https://substackcdn.com/image/fetch/$s_!xTfX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png 1272w, https://substackcdn.com/image/fetch/$s_!xTfX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xTfX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png" width="474" height="672" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:672,&quot;width&quot;:474,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xTfX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png 424w, https://substackcdn.com/image/fetch/$s_!xTfX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png 848w, https://substackcdn.com/image/fetch/$s_!xTfX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png 1272w, https://substackcdn.com/image/fetch/$s_!xTfX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46bedc28-6c0c-4b41-9f0e-4e8bffb594be_474x672.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>2. &#8220;Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning&#8221;</strong> &#8212; Raschka (2018) | <em>arXiv preprint</em></p><p>The classic. A must-read by Sebastian Raschka, with many interesting takes: why simple accuracy metrics lie, the necessity of confidence intervals. Separates out performance estimation, model selection (same model, different hyperparameters), and algorithm selection (different ML algorithms). Goes into detail on bias, variance, different sampling techniques for model evals, as well as statistical tests for robust comparisons. (<a href="https://arxiv.org/pdf/1811.12808v3">https://arxiv.org/pdf/1811.12808v3</a>)</p><h2><strong>Phase 2: Shifting of the Boundary (How LLMs made things harder)</strong></h2><p><strong>1. &#8220;A Survey on Evaluation of Large Language Models&#8221;</strong> &#8212; Chang et al. (2024) | <em>ACM Transactions on Intelligent Systems and Technology (TIST)</em></p><p>Comprehensive survey covering LLM evaluation across multiple domains and tasks - what to evaluate, where to evaluate, and how to evaluate. The models covered may seem dated, but the concepts are still relevant. Covers tasks such as natural language understanding &amp; generation, reasoning, robustness, bias and toxicity, trustworthiness. Outlines general vs. specific benchmarks; and automatic, human and crowd-sourced evaluations. Highlights gaps in bias, reasoning, memory, dynamic and robustness (which are still relevant today). Covers opportunities for AGI benchmarks, behavioral, robustness, dynamic, trustworthiness evals. (<a href="https://arxiv.org/pdf/2307.03109v9">https://arxiv.org/pdf/2307.03109v9</a>)</p><p><strong>2. &#8220;On Robustness and Reliability of Benchmark-Based Evaluation of LLMs&#8221;</strong> &#8212; Lunardi et al. (2025) | <em>arXiv preprint</em></p><p>A reminder that real-world applications involve significantly more variability than what standardized benchmarks can capture, and to be skeptical about the reliability of benchmark-based evaluations. Simple paraphrasing alone can easily lead to lower consistency, greater confusion. Data contamination makes benchmarks less reliable. Highlights the (semantic) brittleness of LLMs and the need to go beyond static benchmarks. (<a href="https://arxiv.org/pdf/2509.04013v1">https://arxiv.org/pdf/2509.04013v1</a>)</p><h2><strong>Phase 3: The Reality (When the Rubber Meets the Road)</strong></h2><p>Academics optimize for benchmarks; practitioners optimize for &#8220;doesn&#8217;t get us sued.&#8221; These guides bridge that gap.</p><p><strong>1. &#8220;Your AI Product Needs Evals&#8221;</strong> &#8212; Hamel Husain | <em>Blog post (hamel.dev)</em></p><p>Read this if you want a guide on moving from &#8220;vibe checks&#8221; to systematic evaluation. Proposes a 3-level hierarchy: Unit Tests (scoping tests, assertions, creating test cases) -&gt; Human/Model Evals (via logs, human reviews, automated evals) -&gt; A/B Testing (to see how it works in real world). Highlights the invaluable role of evals in iterating quickly (which I full agree with), and curating good testing data (which could be used for fine-tuning later). Key takeaway - start simple, break things down. (<a href="https://hamel.dev/blog/posts/evals/">https://hamel.dev/blog/posts/evals/</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7wA4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7wA4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png 424w, https://substackcdn.com/image/fetch/$s_!7wA4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png 848w, https://substackcdn.com/image/fetch/$s_!7wA4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png 1272w, https://substackcdn.com/image/fetch/$s_!7wA4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7wA4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png" width="797" height="424" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:424,&quot;width&quot;:797,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7wA4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png 424w, https://substackcdn.com/image/fetch/$s_!7wA4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png 848w, https://substackcdn.com/image/fetch/$s_!7wA4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png 1272w, https://substackcdn.com/image/fetch/$s_!7wA4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff139d4d-334e-4854-a7ee-232cb333e3d4_797x424.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>2. &#8220;Task-Specific LLM Evals that Do &amp; Don&#8217;t Work&#8221;</strong> &#8212; Eugene Yan | <em>Blog post (eugeneyan.com)</em></p><p>A critique of generic metrics. Shows why BLEU/ROUGE are not as useful for reasoning and provides specific alternatives for a range of tasks. Going beyond accuracy, using probability based metrics, using models to compare source and generated text, using learned metrics, checking reproduction of training data to address copyright risks, classifiers for toxicity detection were some key takeaways. (<a href="https://eugeneyan.com/writing/evals/">https://eugeneyan.com/writing/evals/</a>)</p><h2><strong>Phase 4: The Experimental (Evaluating AI Agents &amp; AI as a Judge)</strong></h2><p><strong>1. &#8220;Evaluation and Benchmarking of LLM Agents: A Survey&#8221;</strong> &#8212; Mohammadi et al. (2025) | <em>ACM SIGKDD 2025</em></p><p>A taxonomy for agent evaluation, distinguishing between objectives (behavior, capabilities, safety) and process (static vs. dynamic). Covers the &#8220;What&#8221; (assessing agent behaviour, capabilities, reliability, safety &amp; alignment), the &#8220;How&#8221; (static testing with datasets, dynamic testing with simulations, benchmarks, code-based, LLM as judge and human in loop evals). Also interesting in covering enterprise hurdles - role-based access controls, reliability guarantees, long horizon performance, policy compliance. The taxonomy presented in Figure 1 is excellent. (<a href="https://arxiv.org/pdf/2507.21504">https://arxiv.org/pdf/2507.21504</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z54-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z54-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png 424w, https://substackcdn.com/image/fetch/$s_!Z54-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png 848w, https://substackcdn.com/image/fetch/$s_!Z54-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png 1272w, https://substackcdn.com/image/fetch/$s_!Z54-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z54-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png" width="1067" height="636" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:636,&quot;width&quot;:1067,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z54-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png 424w, https://substackcdn.com/image/fetch/$s_!Z54-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png 848w, https://substackcdn.com/image/fetch/$s_!Z54-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png 1272w, https://substackcdn.com/image/fetch/$s_!Z54-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F89b2a4cd-f140-43eb-a419-6ae9860b4c5b_1067x636.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>2. &#8220;Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey&#8221;</strong> &#8212; Zhu et al. (2025) | <em>arXiv preprint</em></p><p>Frames evaluation as an evolutionary process. 5 key aspects when looking at this evolution - complexity of environments, number of sources of instructions, dynamism of feedback, degree of multimodal perception, state of capabilities. Also looks at this from the perspective of environments (coding, web, OS, mobile, scientific, gaming), and capabilities (planning, self-reflection, interaction, memory). (<a href="https://arxiv.org/pdf/2506.11102">https://arxiv.org/pdf/2506.11102</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5E_7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5E_7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png 424w, https://substackcdn.com/image/fetch/$s_!5E_7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png 848w, https://substackcdn.com/image/fetch/$s_!5E_7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png 1272w, https://substackcdn.com/image/fetch/$s_!5E_7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5E_7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png" width="607" height="527" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:527,&quot;width&quot;:607,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5E_7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png 424w, https://substackcdn.com/image/fetch/$s_!5E_7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png 848w, https://substackcdn.com/image/fetch/$s_!5E_7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png 1272w, https://substackcdn.com/image/fetch/$s_!5E_7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7078fbb1-20b4-4032-8e71-05ec965dc06d_607x527.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>3. &#8220;LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods&#8221;</strong> &#8212; Li et al. (2024) | <em>arXiv preprint</em></p><p>Interesting survey that with interesting points in the areas of Functionality, Methodology, Application, Meta-Evaluation, and Limitations. Covers the role of LLM judges in response &amp; model evaluation, reward modelling, as verifiers, annotation and generation of synthetic data. Goes into the use of LLM judges in single and multi LLM systems, e.g., using prompt engineering to guide judge, finetuning judges in single LLM systems; or using cooperation, communication or aggregation of LLM judges in multi-LLM systems. Reminder that LLM judges have the same issues - bias, vulnerability to adversarial attacks, recency bias, hallucinations. (<a href="https://arxiv.org/pdf/2412.05579v2">https://arxiv.org/pdf/2412.05579v2</a>)</p><p><strong>4. &#8220;Agent-as-a-Judge: Evaluate Agents with Agents&#8221;</strong> &#8212; Zhuge et al. (2024) | <em>ICML 2025</em></p><p>Argues that static LLM judges fail for agents because they miss the intermediate steps. Proposes Agent-as-a-Judge framework that extends LLM judges with additional capabilities with components that ask, graph, locate, read, search, retrieve, memory, plan. Finds that it can outperform humans and LLMs as judges. (<a href="https://arxiv.org/pdf/2410.10934v2">https://arxiv.org/pdf/2410.10934v2</a>)</p><h2><strong>The Standards</strong></h2><p><strong>1. NIST AI Measurement and Evaluation Projects</strong> | <em>Government resource</em></p><p>A selection of NIST AI measurement and evaluation related projects. (<a href="https://www.nist.gov/programs-projects/ai-measurement-and-evaluation/nist-ai-measurement-and-evaluation-projects">https://www.nist.gov/programs-projects/ai-measurement-and-evaluation/nist-ai-measurement-and-evaluation-projects</a>)</p><p><strong>2. Singapore&#8217;s AI Verify Global AI Assurance Pilot</strong> | <em>Government resource</em></p><p>A selection of case studies on organizations partnering AI testing specialists. (https://assurance.aiverifyfoundation.sg/)</p><p><strong>3. ISO/IEC TR 29119-11:2020 (Testing AI Systems)</strong> | <em>International standard</em></p><p>Part 11: Guidelines on the testing of AI-based systems. (<a href="https://www.iso.org/obp/ui/en/#iso:std:iso-iec:tr:29119:-11:ed-1:v1:en">https://www.iso.org/obp/ui/en/#iso:std:iso-iec:tr:29119:-11:ed-1:v1:en</a>)</p><div><hr></div><p><strong>&#8220;How do you know it works?&#8221;</strong></p><p>It&#8217;s no longer a simple question with a simple answer. More and more, we need a portfolio of tools and techniques for evaluation and testing.</p><p>Any must-reads in this area that you would recommend?</p><p>#AIRiskManagement #AIEvaluation #AITesting #AIReadingList</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.simplyboring.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading quaintitative! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>