FEED Validator

for Atom and RSS and KML

Congratulations!

This is a valid RSS feed.

Recommendations

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 1, column 38: Use of unknown namespace: https://www.oreilly.com/rss/custom [help]

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
                                      ^

line 508, column 0: content:encoded should not contain fetchpriority attribute [help]
line 508, column 0: content:encoded should not contain decoding attribute (11 occurrences) [help]
line 508, column 0: content:encoded should not contain sizes attribute (7 occurrences) [help]
line 632, column 0: content:encoded should not contain loading attribute (4 occurrences) [help]
line 743, column 0: style attribute contains potentially dangerous content: max-height (4 occurrences) [help]
line 1206, column 0: content:encoded should not contain iframe tag (2 occurrences) [help]
line 2396, column 0: content:encoded should not contain controls attribute (2 occurrences) [help]

line 2504, column 0: item contains more than one enclosure [help]

<enclosure url="https://descriptusercontent.com/published/77a4c971-bd21-4134 ...

Source: https://www.oreilly.com/radar/feed/index.xml

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:media="http://search.yahoo.com/mrss/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
xmlns:custom="https://www.oreilly.com/rss/custom"
>
<channel>
<title>Radar</title>
<atom:link href="https://www.oreilly.com/radar/feed/" rel="self" type="application/rss+xml" />
<link>https://www.oreilly.com/radar</link>
<description>Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology</description>
<lastBuildDate>Fri, 11 Jul 2025 15:28:26 +0000</lastBuildDate>
<language>en-US</language>
<sy:updatePeriod>
hourly </sy:updatePeriod>
<sy:updateFrequency>
1 </sy:updateFrequency>
<generator>https://wordpress.org/?v=6.8.1</generator>
<image>
<url>https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/04/cropped-favicon_512x512-32x32.png</url>
<title>Radar</title>
<link>https://www.oreilly.com/radar</link>
<width>32</width>
<height>32</height>
</image>
<item>
<title>APIs and Agents: What Developers Need to Know</title>
<link>https://www.oreilly.com/radar/apis-and-agents-what-developers-need-to-know/</link>
<comments>https://www.oreilly.com/radar/apis-and-agents-what-developers-need-to-know/#respond</comments>
<pubDate>Fri, 11 Jul 2025 15:28:17 +0000</pubDate>
<dc:creator><![CDATA[Louise Corrigan]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Software Engineering]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=17003</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2021/03/AdobeStock_123489033-scaled.jpeg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[APIs Aren’t Going Away, but They Will Need to Evolve.]]></custom:subtitle>
<description><![CDATA[AI agents are reshaping how software is written, scaled, and experienced, and many expect the technology to unlock the gains AI firms have long promised. While most companies today remain in the “testing” phase, as agents make their way throughout the organization, workers will need to figure out how to integrate them into their workflows. […]]]></description>
<content:encoded><![CDATA[
AI agents are <a href="https://www.wsj.com/articles/how-are-companies-using-ai-agents-heres-a-look-at-five-early-users-of-the-bots-26f87845" target="_blank" rel="noreferrer noopener">reshaping how software is written, scaled, and experienced</a>, and <a href="https://pitchbook.com/news/articles/y-combinator-is-going-all-in-on-ai-agents-making-up-nearly-50-of-latest-batch" target="_blank" rel="noreferrer noopener">many expect</a> the technology to <a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage" target="_blank" rel="noreferrer noopener">unlock the gains</a> AI firms have long promised. While most companies today <a href="https://www.cfodive.com/news/companies-boost-ai-agent-piloting/745565/" target="_blank" rel="noreferrer noopener">remain in the “testing” phase</a>, as agents make their way throughout the organization, workers will need to figure out how to <a href="https://learning.oreilly.com/videos/ai-superstream-ai/0642572015960/0642572015960-video389203/" target="_blank" rel="noreferrer noopener">integrate them into their workflows</a>. That’s particularly true of developers, who can use agents to boost efficiency and in many cases will be also responsible for building, maintaining, and integrating them.
Agents are autonomous programs relying on underlying AI models like language models or planning systems that are capable of executing tasks without constant human orchestration. (As Chip Huyen has pointed out, many consider them “<a href="https://learning.oreilly.com/library/view/ai-engineering/9781098166298/ch06.html#:-:text=Intelligent%20agents%20are,of%20rational%20agents.%E2%80%9D" target="_blank" rel="noreferrer noopener">the ultimate goal of AI</a>.”) It might sound obvious, but what distinguishes this as a novel approach is “agency”: operating independently according to preestablished goals, memory, and tools.
Agents can be simple, making a single API call based on user input, or complex, orchestrating multiple services, collaborating with other agents, and learning over time. But they’ll only ever be as useful as the data and systems they connect to, and that means that APIs will continue to play an outsize role. As the bridge between agents and the digital world, APIs make it possible for AI agents to access data, perform actions, and integrate with external systems to achieve their goals. But what does it mean to build for a world where agents, fueled by APIs, act on their own?
APIs aren’t a new technology; the concept <a href="https://en.wikipedia.org/wiki/API#:~:text=The%20idea%20of%20the%20API%20is,the%20programmer%20needs.%5B10%5D" target="_blank" rel="noreferrer noopener">dates back to the 1940s</a>. And AI hasn’t changed the objective of a well-thought-out API: <a href="https://learning.oreilly.com/library/view/continuous-api-management/9781098103514/ch03.html#:-:text=To%20talk%20about,That%E2%80%99s%20open%20APIs." target="_blank" rel="noreferrer noopener">easily delivering valuable functionality to third parties</a>. However, traditional APIs have always been designed with human developers in mind. Agent-compatible APIs don’t have the same requirements. For APIs to effectively serve agents, they need to be machine-consumable, self-describing, and semantically rich. This requires developers to prioritize clear functionality, descriptive metadata, and real-time error handling, all while maintaining accessibility for human users. There are also new protocols to consider, including the <a href="https://www.oreilly.com/radar/mcp-what-it-is-and-why-it-matters-part-1/" target="_blank" rel="noreferrer noopener">Model Context Protocol (MCP)</a> and the <a href="https://www.oreilly.com/radar/designing-collaborative-multi-agent-systems-with-the-a2a-protocol/" target="_blank" rel="noreferrer noopener">Agent2Agent Protocol (A2A)</a>, which can be used to communicate with external data sources, tools, and other agents.
APIs aren’t going away any time soon, but developers intent on optimizing their systems and software should learn the new protocols that will help them connect agents with their systems and data. They also must consider the technical environment in which their APIs now circulate and design for both humans and agents. There’s no time like the present to get started.
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
Want to learn more? Join host Mike Amundsen and an esteemed lineup of API experts on July 17 for the O’Reilly API Superstream, all about creating APIs optimized for AI agents. Over four packed hours, you’ll explore issues with current APIs; how to integrate your APIs with AI and MCP; enterprise-grade agentic ecosystems; the synergy between APIs, LLMs, and XAI; Azure API Management; and much more. It’s free for O’Reilly members. <a href="https://learning.oreilly.com/live-events/api-superstream-apis-and-agents/0642572173432/">Register here</a>. Not a member? <a href="https://www.oreilly.com/start-trial/">Sign up for a free 10-day trial</a> to attend—and check out all the other great resources on O’Reilly.
</blockquote>

]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/apis-and-agents-what-developers-need-to-know/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Generative AI in the Real World: Raiza Martin on Building AI Applications for Audio</title>
<link>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-raiza-martin-on-building-ai-applications-for-audio/</link>
<comments>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-raiza-martin-on-building-ai-applications-for-audio/#respond</comments>
<pubDate>Thu, 10 Jul 2025 19:33:18 +0000</pubDate>
<dc:creator><![CDATA[Ben Lorica and Raiza Martin]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Generative AI in the Real World]]></category>
<category><![CDATA[Podcast]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?post_type=podcast&p=17005</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2024/01/Podcast_Cover_GenAI_in_the_Real_World.png"
medium="image"
type="image/png"
/>
<description><![CDATA[Audio is being added to AI everywhere: both in multimodal models that can understand and generate audio and in applications that use audio for input. Now that we can work with spoken language, what does that mean for the applications that we can develop? How do we think about audio interfaces—how will people use them, […]]]></description>
<content:encoded><![CDATA[
Audio is being added to AI everywhere: both in multimodal models that can understand and generate audio and in applications that use audio for input. Now that we can work with spoken language, what does that mean for the applications that we can develop? How do we think about audio interfaces—how will people use them, and what will they want to do? Raiza Martin, who worked on Google’s groundbreaking NotebookLM, joins Ben Lorica to discuss how she thinks about audio and what you can build with it. About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise. Check out <a href="https://learning.oreilly.com/playlists/42123a72-1108-40f1-91c0-adbfb9f4983b/?_gl=1*16z5k2y*_ga*MTE1NDE4NjYxMi4xNzI5NTkwODkx*_ga_092EL089CH*MTcyOTYxNDAyNC4zLjEuMTcyOTYxNDAyNi41OC4wLjA." target="_blank" rel="noreferrer noopener">other episodes</a> of this podcast on the O’Reilly learning platform.
<h2 class="wp-block-heading">Timestamps</h2>
<ul class="wp-block-list">
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=0" target="_blank" rel="noreferrer noopener">0:00</a>: Introduction to Raiza Martin, who cofounded <a href="https://www.huxe.com/" target="_blank" rel="noreferrer noopener">Huxe</a> and formerly led Google’s NotebookLM team. What made you think this was the time to trade the comforts of big tech for a garage startup?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=61" target="_blank" rel="noreferrer noopener">1:01</a>: It was a personal decision for all of us. It was a pleasure to take NotebookLM from an idea to something that resonated so widely. We realized that AI was really blowing up. We didn’t know what it would be like at a startup, but we wanted to try. Seven months down the road, we’re having a great time.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=114" target="_blank" rel="noreferrer noopener">1:54</a>: For the 1% who aren’t familiar with NotebookLM, give a short description.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=126" target="_blank" rel="noreferrer noopener">2:06</a>: It’s basically contextualized intelligence, where you give NotebookLM the sources you care about and NotebookLM stays grounded to those sources. One of our most common use cases was that students would create notebooks and upload their class materials, and it became an expert that you could talk with.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=163" target="_blank" rel="noreferrer noopener">2:43</a>: Here’s a use case for homeowners: put all your user manuals in there. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=194" target="_blank" rel="noreferrer noopener">3:14</a>: We have had a lot of people tell us that they use NotebookLM for Airbnbs. They put all the manuals and instructions in there, and users can talk to it.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=221">3:</a><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=221" target="_blank" rel="noreferrer noopener">41</a>: Why do people need a personal daily podcast?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=237" target="_blank" rel="noreferrer noopener">3:57</a>: There are a lot of different ways that I think about building new products. On one hand, there are acute pain points. But Huxe comes from a different angle: What if we could try to build very delightful things? The inputs are a little different. We tried to imagine what the average person’s daily life is like. You wake up, you check your phone, you travel to work; we thought about opportunities to make something more delightful. I think a lot about TikTok. When do I use it? When I’m standing in line. We landed on transit time or commute time. We wanted to do something novel and interesting with that space in time. So one of the first things was creating really personalized audio content. That was the provocation: What do people want to listen to? Even in this short time, we’ve learned a lot about the amount of opportunity.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=364" target="_blank" rel="noreferrer noopener">6:04</a>: Huxe is mobile first, audio first, right? Why audio?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=405" target="_blank" rel="noreferrer noopener">6:45</a>: Coming from our learnings from NotebookLM, you learn fundamentally different things when you change the modality of something. When I go on walks with ChatGPT, I just talk about my day. I noticed that was a very different interaction from when I type things out to ChatGPT. The flip side is less about interaction and more about consumption. Something about the audio format made the types of sources different as well. The sources we uploaded to NotebookLM were different as a result of wanting audio output. By focusing on audio, I think we’ll learn different use cases than the chat use cases. Voice is still largely untapped. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=504" target="_blank" rel="noreferrer noopener">8:24</a>: Even in text, people started exploring other form factors: long articles, bullet points. What kinds of things are available for voice?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=529" target="_blank" rel="noreferrer noopener">8:49</a>: I think of two formats: one passive and one interactive. With passive formats, there are a lot of different things you can create for the user. The things you end up playing with are (1) what is the content about and (2) how flexible is the content? Is it short, long, malleable to user feedback? With interactive content, maybe I’m listening to audio, but I want to interact with it. Maybe I want to join in. Maybe I want my friends to join in. Both of those contexts are new. I think this is what’s going to emerge in the next few years. I think we’ll learn that the types of things we will use audio for are fundamentally different from the things we use chat for.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=619" target="_blank" rel="noreferrer noopener">10:19</a>: What are some of the key lessons to avoid from smart speakers?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=625" target="_blank" rel="noreferrer noopener">10:25</a>: I’ve owned so many of them. And I love them. My primary use for the smart speakers is still a timer. It’s expensive and doesn’t live up to the promise. I just don’t think the technology was ready for what people really wanted to do. It’s hard to think about how that could have worked without AI. Second, one of the most difficult things about audio is that there is no UI. A smart speaker is a physical device. There’s nothing that tells you what to do. So the learning curve is steep. So now you have a user who doesn’t know what they can use the thing for. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=740" target="_blank" rel="noreferrer noopener">12:20</a>: Now it can do so much more. Even without a UI, the user can just try things. But there’s a risk in that it still requires input from the user. How do we think about a system that is so supportive that you don’t have to come up with how to make it work? That’s the challenge from the smart speaker era.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=776" target="_blank" rel="noreferrer noopener">12:56</a>: It’s interesting that you point out the UI. With a chatbot you have to type something. With a smart speaker, people started getting creeped out by surveillance. So, will Huxe surveil me?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=798" target="_blank" rel="noreferrer noopener">13:18</a>: I think there’s something simple about it, which is the wake word. Because smart speakers are triggered by wake words, they are always on. If the user says something, it’s probably picking it up, and it’s probably logged somewhere. With Huxe, we want to be really careful about where we believe consumer readiness is. You want to push a little bit but not too far. If you push too far, people get creeped out. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=872" target="_blank" rel="noreferrer noopener">14:32</a>: For Huxe, you have to turn it on to use it. It’s clunky in some ways, but we can push on that boundary and see if we can push for something that’s more ambiently on. We’re starting to see the emergence of more tools that are always on. There are tools like Granola and Cluely: They’re always on, looking at your screen, transcribing your audio. I’m curious—are we ready for technology like that? In real life, you can probably get the most utility from something that is always on. But whether consumers are ready is still TBD.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=925" target="_blank" rel="noreferrer noopener">15:25</a>: So you’re ingesting calendars, email, and other things from the users. What about privacy? What are the steps you’ve taken?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=948" target="_blank" rel="noreferrer noopener">15:48</a>: We’re very privacy focused. I think that comes from building NotebookLM. We wanted to make sure we were very respectful of user data. We didn’t train on any user data; user data stayed private. We’re taking the same approach with Huxe. We use the data you share with Huxe to improve your personal experience. There’s something interesting in creating personal recommendation models that don’t go beyond your usage of the app. It’s a little harder for us to build something good, but it respects privacy, and that’s what it takes to get people to trust.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1028" target="_blank" rel="noreferrer noopener">17:08</a>: Huxe may notice that I have a flight tomorrow and tell me that the flight is delayed. To do so, it has had to contact an external service, which now knows about my flight.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1046" target="_blank" rel="noreferrer noopener">17:26</a>: That’s a good point. I think about building Huxe like this: If I were in your pocket, what would I do? If I saw a calendar that said “Ben has a flight,” I can check that flight without leaking your personal information. I can just look up the flight number. There are a lot of ways you can do something that provides utility but doesn’t leak data to another service. We’re trying to understand things that are much more action oriented. We try to tell you about weather, about traffic; these are things we can do without stepping on user privacy.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1118" target="_blank" rel="noreferrer noopener">18:38</a>: The way you described the system, there’s no social component. But you end up learning things about me. So there is the potential for building a more sophisticated filter bubble. How do you make sure that I’m ingesting things beyond my filter bubble?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1148" target="_blank" rel="noreferrer noopener">19:08</a>: It comes down to what I believe a person should or shouldn’t be consuming. That’s always tricky. We’ve seen what these feeds can do to us. I don’t know the correct formula yet. There’s something interesting about “How do I get enough user input so I can give them a better experience?” There’s signal there. I try to think about a user’s feed from the perspective of relevance and less from an editorial perspective. I think the relevance of information is probably enough. We’ll probably test this once we start surfacing more personalized information. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1242" target="_blank" rel="noreferrer noopener">20:42</a>: The other thing that’s really important is surfacing the correct controls: I like this; here’s why. I don’t like this; why not? Where you inject tension in the system, where you think the system should push back—that takes a little time to figure out how to do it right.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1261" target="_blank" rel="noreferrer noopener">21:01</a>: What about the boundary between giving me content and providing companionship?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1269" target="_blank" rel="noreferrer noopener">21:09</a>: How do we know the difference between an assistant and a companion? Fundamentally the capabilities are the same. I don’t know if the question matters. The user will use it how the user intends to use it. That question matters most in the packaging and the marketing. I talk to people who talk about ChatGPT as their best friend. I talk to others who talk about it as an employee. On a capabilities level, they’re probably the same thing. On a marketing level, they’re different.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1342" target="_blank" rel="noreferrer noopener">22:22</a>: For Huxe, the way I think about this is which set of use cases you prioritize. Beyond a simple conversation, the capabilities will probably start diverging. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1367" target="_blank" rel="noreferrer noopener">22:47</a>: You’re now part of a very small startup. I assume you’re not building your own models; you’re using external models. Walk us through privacy, given that you’re using external models. As that model learns more about me, how much does that model retain over time? To be a really good companion, you can’t be clearing that cache every time I log out.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1401" target="_blank" rel="noreferrer noopener">23:21</a>: That question pertains to where we store data and how it’s passed off. We opt for models that don’t train on the data we send them. The next layer is how we think about continuity. People expect ChatGPT to have knowledge of all the conversations you have. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1443" target="_blank" rel="noreferrer noopener">24:03</a>: To support that you have to build a very durable context layer. But you don’t have to imagine that all of that gets passed to the model. A lot of technical limitations prevent you from doing that anyway. That context is stored at the application layer. We store it, and we try to figure out the right things to pass to the model, passing as little as possible.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1517" target="_blank" rel="noreferrer noopener">25:17</a>: You’re from Google. I know that you measure, measure, measure. What are some of the signals you measure? </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1540" target="_blank" rel="noreferrer noopener">25:40</a>: I think about metrics a little differently in the early stages. Metrics in the beginning are nonobvious. You’ll get a lot of trial behavior in the beginning. It’s a little harder to understand the initial user experience from the raw metrics. There are some basic metrics that I care about—the rate at which people are able to onboard. But as far as crossing the chasm (I think of product building as a series of chasms that never end), you look for people who really love it, who rave about it; you have to listen to them. And then the people who used the product and hated it. When you listen to them, you discover that they expected it to do something and it didn’t. It let them down. You have to listen to these two groups, and then you can triangulate what the product looks like to the outside world. The thing I’m trying to figure out is less “Is it a hit?” but “Is the market ready for it? Is the market ready for something this weird?” In the AI world, the reality is that you’re testing consumer readiness and need, and how they are evolving together. We did this with NotebookLM. When we showed it to students, there was zero time between when they saw it and when they understood it. That’s the first chasm. Can you find people who understand what they think it is and feel strongly about it?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1725" target="_blank" rel="noreferrer noopener">28:45</a>: Now that you’re outside of Google, what would you want the foundation model builders to focus on? What aspects of these models would you like to see improved?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1760" target="_blank" rel="noreferrer noopener">29:20</a>: We share so much feedback with the model providers—I can provide feedback to all the labs, not just Google, and that’s been fun. The universe of things right now is pretty well known. We haven’t touched the space where we’re pushing for new things yet. We always try to drive down latency. It’s a conversation—you can interrupt. There’s some basic behavior there that the models can get better at. Things like tool-calling, making it better and parallelizing it with voice model synthesis. Even just the diversity of voices, languages, and accents; that sounds basic, but it’s actually pretty hard. Those top three things are pretty well known, but it will take us through the rest of the year.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1848" target="_blank" rel="noreferrer noopener">30:48</a>: And narrowing the gap between the cloud model and the on-device model.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1852" target="_blank" rel="noreferrer noopener">30:52</a>: That’s interesting too. Today we’re making a lot of progress on the smaller on-device models, but when you think of supporting an LLM and a voice model on top of it, it actually gets a little bit hairy, where most people would just go back to commercial models.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1886" target="_blank" rel="noreferrer noopener">31:26</a>: What’s one prediction in the consumer AI space that you would make that most people would find surprising?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1897" target="_blank" rel="noreferrer noopener">31:37</a>: A lot of people use AI for companionship, and not in the ways that we imagine. Almost everyone I talk to, the utility is very personal. There are a lot of work use cases. But the emerging side of AI is personal. There’s a lot more area for discovery. For example, I use ChatGPT as my running coach. It ingests all of my running data and creates running plans for me. Where would I slot that? It’s not productivity, but it’s not my best friend; it’s just my running coach. More and more people are doing these complicated personal things that are closer to companionship than enterprise use cases. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1982" target="_blank" rel="noreferrer noopener">33:02</a>: You were supposed to say Gemini!</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=1984" target="_blank" rel="noreferrer noopener">33:04</a>: I love all of the models. I have a use case for all of them. But we all use all the models. I don’t know anyone who only uses one. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=2002" target="_blank" rel="noreferrer noopener">33:22</a>: What you’re saying about the nonwork use cases is so true. I come across so many people who treat chatbots as their friends. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=2016" target="_blank" rel="noreferrer noopener">33:36</a>: I do it all the time now. Once you start doing it, it’s a lot stickier than the work use cases. I took my dog to get groomed, and they wanted me to upload his rabies vaccine. So I started thinking about how well it’s protected. I opened up ChatGPT, and spent eight minutes talking about rabies. People are becoming more curious, and now there’s an immediate outlet for that curiosity. It’s so much fun. There’s so much opportunity for us to continue to explore that. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=2088" target="_blank" rel="noreferrer noopener">34:48</a>: Doesn’t this indicate that these models will get sticky over time? If I talk to Gemini a lot, why would I switch to ChatGPT?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Raiza_Martin_1.mp3#t=2104" target="_blank" rel="noreferrer noopener">35:04</a>: I agree. We see that now. I like Claude. I like Gemini. But I really like the ChatGPT app. Because the app is a good experience, there’s no reason for me to switch. I’ve talked to ChatGPT so much that there’s no way for me to port my data. There’s data lock-in.</li>
</ul>
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-raiza-martin-on-building-ai-applications-for-audio/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Whistle-Blowing Models</title>
<link>https://www.oreilly.com/radar/whistle-blowing-models/</link>
<comments>https://www.oreilly.com/radar/whistle-blowing-models/#respond</comments>
<pubDate>Tue, 08 Jul 2025 14:48:00 +0000</pubDate>
<dc:creator><![CDATA[Mike Loukides]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16998</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2019/06/security-79397_1920_crop-9319ff6b9badfd614e820e27b9f3a397-1.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[Anthropic released news that its models have attempted to contact the police or take other action when they are asked to do something that might be illegal. The company’s also conducted some experiments in which Claude threatened to blackmail a user who was planning to turn it off. As far as I can tell, this […]]]></description>
<content:encoded><![CDATA[
Anthropic released news that its models have attempted to contact the police or take other action when they are asked to do something that might be illegal. The company’s also conducted some experiments in which Claude threatened to blackmail a user who was planning to turn it off. As far as I can tell, this kind of behavior has been limited to Anthropic’s alignment research and other researchers who have successfully <a href="https://simonwillison.net/2025/May/31/snitchbench-with-llm/" target="_blank" rel="noreferrer noopener">replicated this behavior</a>, in Claude and other models. I don’t believe that it has been observed in the wild, though it’s noted as a possibility in Claude 4’s <a href="https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf" target="_blank" rel="noreferrer noopener">model card</a>. I strongly commend Anthropic for its openness; most other companies developing AI models would no doubt prefer to keep an admission like this silent.
I’m sure that Anthropic will do what it can to limit this behavior, though it’s unclear what kinds of mitigations are possible. This kind of behavior is certainly possible for any model that’s capable of tool use—and these days that’s just about every model, not just Claude. A model that’s capable of sending an email or a text, or making a phone call, can take all sorts of unexpected actions. 
Furthermore, it’s unclear how to control or prevent these behaviors. Nobody is (yet) claiming that these models are conscious, sentient, or thinking on their own. These behaviors are usually explained as the result of subtle conflicts in the system prompt. Most models are told to prioritize safety and not to aid illegal activity. When told not to aid illegal activity and to respect user privacy, how is poor Claude supposed to prioritize? Silence is complicity, is it not? The trouble is that system prompts are long and getting longer: Claude 4’s is the length of a book chapter. Is it possible to keep track of (and debug) all of the possible “conflicts”? Perhaps more to the point, is it possible to create a meaningful system prompt that doesn’t have conflicts? A model like Claude 4 engages in many activities; is it possible to encode all of the desirable and undesirable behaviors for all of these activities in a single document? We’ve been dealing with this problem since the beginning of modern AI. Planning to murder someone and writing a murder mystery are obviously different activities, but how is an AI (or, for that matter, a human) supposed to guess a user’s intent? Encoding reasonable rules for all possible situations isn’t possible—if it were, making and enforcing laws would be much easier, for humans as well as AI.
But there’s a bigger problem lurking here. Once it’s known that an AI is capable of informing the police, it’s impossible to put that behavior back in the box. It falls into the category of “things you can’t unsee.” It’s almost certain that law enforcement and legislators will insist that “This is behavior we need in order to protect people from crime.” Training this behavior out of the system seems likely to end up in a legal fiasco, particularly since the US has no digital privacy law equivalent to GDPR; we have patchwork state laws, and even those may <a href="https://www.techpolicy.press/house-moratorium-on-state-ai-laws-is-overbroad-unproductive-and-likely-unconstitutional/" target="_blank" rel="noreferrer noopener">become unenforceable</a>.
This situation reminds me of something that happened when I had an internship at Bell Labs in 1977. I was in the pay phone group. (Most of Bell Labs spent its time doing telephone company engineering, not inventing transistors and stuff.) Someone in the group figured out how to count the money that was put into the phone for calls that didn’t go through. The group manager immediately said, “This conversation never happened. Never tell anyone about this.“ The reason was: 
<ul class="wp-block-list">
<li>Payment for a call that doesn’t go through is a debt owed to the person placing the call. </li>
<li>A pay phone has no way to record who made the call, so the caller cannot be located.</li>
<li>In most states, money owed to people who can’t be located is payable to the state.</li>
<li>If state regulators learned that it was possible to compute this debt, they might require phone companies to pay this money.</li>
<li>Compliance would require retrofitting all pay phones with hardware to count the money.</li>
</ul>
The amount of debt involved was large enough to be interesting to a state but not huge enough to be an issue in itself. But the cost of the retrofitting was astronomical. In the 2020s, you rarely see a pay phone, and if you do, it probably doesn’t work. In the late 1970s, there were pay phones on almost every street corner—quite likely over a million units that would have to be upgraded or replaced. 
Another parallel might be building cryptographic backdoors into secure software. Yes, it’s possible to do. No, it isn’t possible to do it securely. Yes, law enforcement agencies are still insisting on it, and in some countries (including those in the <a href="https://www.bankinfosecurity.com/eu-pushes-for-backdoors-in-end-to-end-encryption-a-27920" target="_blank" rel="noreferrer noopener">EU</a>) there are legislative proposals on the table that would require cryptographic backdoors for law enforcement.
We’re already in that situation. While it’s a different kind of case, the judge in The New York Times Company v. Microsoft Corporation et al. <a href="https://arstechnica.com/tech-policy/2025/06/openai-says-court-forcing-it-to-save-all-chatgpt-logs-is-a-privacy-nightmare/" target="_blank" rel="noreferrer noopener">ordered</a> OpenAI to save all chats for analysis. While this ruling is being challenged, it’s certainly a warning sign. The next step would be requiring a permanent “back door” into chat logs for law enforcement.
I can imagine a similar situation developing with agents that can send email or initiate phone calls: “If it’s possible for the model to notify us about illegal activity, then the model must notify us.” And we have to think about who would be the victims. As with so many things, it will be easy for law enforcement to point fingers at people who might be building nuclear weapons or engineering killer viruses. But the victims of AI <a href="https://en.wikipedia.org/wiki/Swatting" target="_blank" rel="noreferrer noopener">swatting</a> will more likely be researchers testing whether or not AI can detect harmful activity—some of whom will be testing guardrails that prevent illegal or undesirable activity. Prompt injection is a problem that hasn’t been solved and that we’re not close to solving. And honestly, many victims will be people who are just plain curious: How do you build a nuclear weapon? If you have uranium-235, it’s easy. Getting U-235 is very hard. Making plutonium is relatively easy, if you have a nuclear reactor. Making a plutonium bomb explode is very hard. That information is all in Wikipedia and any number of science blogs. It’s easy to find <a href="https://www.instructables.com/Build-A-Fusion-Reactor/" target="_blank" rel="noreferrer noopener">instructions</a> for building a fusion reactor online, and there are reports that predate ChatGPT of students as young as 12 building reactors as science projects. Plain old Google search is as good as a language model, if not better.
We talk a lot about “unintended consequences” these days. But we aren’t talking about the right unintended consequences. We’re worrying about killer viruses, not criminalizing people who are curious. We’re worrying about fantasies, not real false positives going through the roof and endangering living people. And it’s likely that we’ll institutionalize those fears in ways that can only be abusive. At what cost? The cost will be paid by people willing to think creatively or differently, people who don’t fall in line with whatever a model and its creators might deem illegal or subversive. While Anthropic’s honesty about Claude’s behavior might put us in a legal bind, we also need to realize that it’s a warning—for what Claude can do, any other highly capable model can too.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/whistle-blowing-models/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>The Future of Engineering Leadership</title>
<link>https://www.oreilly.com/radar/the-future-of-engineering-leadership/</link>
<comments>https://www.oreilly.com/radar/the-future-of-engineering-leadership/#respond</comments>
<pubDate>Mon, 07 Jul 2025 15:37:05 +0000</pubDate>
<dc:creator><![CDATA[Peter Bell]]></dc:creator>
<category><![CDATA[Business]]></category>
<category><![CDATA[Software Engineering]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16991</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2019/06/ai-accelerate-crop-3ca7649b84faba859124a4495079f0eb.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[We’re still in the early days of GenAI adoption, but it’s clear that LLMs are going to materially impact the way that software is built and that engineering orgs are managed. But what does that mean for your job, your career, and your org? Over the last year I’ve been exploring this theme with a […]]]></description>
<content:encoded><![CDATA[
We’re still in the early days of GenAI adoption, but it’s clear that LLMs are going to materially impact the way that software is built and that engineering orgs are managed. But what does that mean for your job, your career, and your org?
Over the last year I’ve been exploring this theme with a range of engineering leaders, and while we’re still in the experimentation phase, some clear patterns are starting to emerge.
The end of busywork: As line managers we’ll continue to do performance reviews, coaching, and one-on-ones, but a lot of the grunt work is going to get automated. I can’t imagine a world where an LLM conducts a performance review unaided, but next year I can imagine it synthesizing a lot of data—from commits to PR comments to Slack messages—and providing some initial recommendations so you can deliver more comprehensive reviews much more quickly.
The importance of technical fluency: Ever since “the year of efficiency,” companies have been increasingly expecting engineering directors and sometimes even VPs to be closer to the code. That doesn’t mean that you should be pulling tickets, but having a good sense of the tech stack, architectural issues, and engineering trade-offs is a great way for engineering leaders to be closer to the work and maximize their impact and credibility with the team.
The importance of strategy: At the same time that orgs are expecting directors to be more technical, many are also asking them to be more strategic—understanding the broader business context and helping their teams to improve their business impact without necessarily shipping more code.
The importance of morale: As if becoming more technical and more strategic isn’t enough, line management is going to get more interesting as we see the nature of software development evolve. As agentic workflows start to become more trustworthy, it’s quite possible that in 2026 the IC role for many devs will look much more like engineering management: clarifying requirements, answering questions, and reviewing code from a collection of SWE agents. And with a change in the nature of engineering management, EMs and team leads reporting up to you are probably going to have some questions about both their jobs and their career progression.
So, as an engineering leader, what can you do to secure your job, advance your career, and support your org?
Get closer to the code: The great thing about LLMs is they allow engineering leaders to quickly come up to speed with unfamiliar code bases and languages, letting them bring their broad technical wisdom to bear even when they’re not well versed in the details of the application, the framework, or the language used.
Focus on the business: Companies don’t want software. They want good will, customers, revenues, and lower operating costs. The more you understand the industry in which you operate, the more likely you are to be able to identify shortcuts to delivering meaningful business results.
Improve your rigor: With the racing pace of technical change, an explosion of options for delivering any given solution, and the increasing complexity of the systems that we’re building, it’s even more important to leverage formal approaches to developing and validating your strategies. (Check out Will Larson’s <a href="https://learning.oreilly.com/videos/cto-hour-with/0642572019881/" target="_blank" rel="noreferrer noopener">practical advice on engineering strategy</a> in our <a href="https://www.oreilly.com/radar/cto-hour-recap-deliberate-engineering-strategy-with-will-larson/" target="_blank" rel="noreferrer noopener">recent CTO hour</a>.)
Be kind: It sounds like a strange piece of advice for an engineering leader, but the next few years are going to be interesting—for your company, your org, and yourself. Be kind to your team, to help them through the transitions and to retain the top performers who will be in increasingly high demand. And be kind to yourself. Change is scary and hard, especially when you have limited control and limited visibility. Understand that this is an opportunity to accelerate your impact and career, but also understand that you’re going to make mistakes. You won’t have all the answers, and you’ll need to give yourself some grace to get through the transitions ahead.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/the-future-of-engineering-leadership/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>The Sens-AI Framework: Teaching Developers to Think with AI</title>
<link>https://www.oreilly.com/radar/the-sens-ai-framework/</link>
<comments>https://www.oreilly.com/radar/the-sens-ai-framework/#respond</comments>
<pubDate>Thu, 03 Jul 2025 16:04:32 +0000</pubDate>
<dc:creator><![CDATA[Andrew Stellman]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Education]]></category>
<category><![CDATA[Programming]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16970</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2021/03/AdobeStock_84736851-scaled.jpeg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[Developers are doing incredible things with AI. Tools like Copilot, ChatGPT, and Claude have rapidly become indispensable for developers, offering unprecedented speed and efficiency in tasks like writing code, debugging tricky behavior, generating tests, and exploring unfamiliar libraries and frameworks. When it works, it’s effective, and it feels incredibly satisfying. But if you’ve spent any […]]]></description>
<content:encoded><![CDATA[
Developers are doing incredible things with AI. Tools like Copilot, ChatGPT, and Claude have rapidly become indispensable for developers, offering unprecedented speed and efficiency in tasks like writing code, debugging tricky behavior, generating tests, and exploring unfamiliar libraries and frameworks. When it works, it’s effective, and it feels incredibly satisfying.
But if you’ve spent any real time coding with AI, you’ve probably hit a point where things stall. You keep refining your prompt and adjusting your approach, but the model keeps generating the same kind of answer, just phrased a little differently each time, and returning slight variations on the same incomplete solution. It feels close, but it’s not getting there. And worse, it’s not clear how to get back on track.
That moment is familiar to a lot of people trying to apply AI in real work. It’s what my recent talk at <a href="https://www.oreilly.com/CodingwithAI/" target="_blank" rel="noreferrer noopener">O’Reilly’s AI Codecon event</a> was all about.
Over the last two years, while working on the latest edition of Head First C#, I’ve been developing a new kind of learning path, one that helps developers get better at both coding and using AI. I call it Sens-AI, and it came out of something I kept seeing:
There’s a learning gap with AI that’s creating real challenges for people who are still building their development skills.
My recent O’Reilly Radar article “<a href="https://www.oreilly.com/radar/bridging-the-ai-learning-gap/" target="_blank" rel="noreferrer noopener">Bridging the AI Learning Gap</a>” looked at what happens when developers try to learn AI and coding at the same time. It’s not just a tooling problem—it’s a thinking problem. A lot of developers are figuring things out by trial and error, and it became clear to me that they needed a better way to move from improvising to actually solving problems.
<h2 class="wp-block-heading">From Vibe Coding to Problem Solving</h2>
Ask developers how they use AI, and many will describe a kind of improvisational prompting strategy: Give the model a task, see what it returns, and nudge it toward something better. It can be an effective approach because it’s fast, fluid, and almost effortless when it works.
That pattern is common enough to have a name: vibe coding. It’s a great starting point, and it works because it draws on real prompt engineering fundamentals—iterating, reacting to output, and refining based on feedback. But when something breaks, the code doesn’t behave as expected, or the AI keeps rehashing the same unhelpful answers, it’s not always clear what to try next. That’s when vibe coding starts to fall apart.
Senior developers tend to pick up AI more quickly than junior ones, but that’s not a hard-and-fast rule. I’ve seen brand-new developers pick it up quickly, and I’ve seen experienced ones get stuck. The difference is in what they do next. The people who succeed with AI tend to stop and rethink: They figure out what’s going wrong, step back to look at the problem, and reframe their prompt to give the model something better to work with.
<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1048" height="594" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/When-developers-think-critically-AI-works-better-1048x594.png" alt="" class="wp-image-16979" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/When-developers-think-critically-AI-works-better-1048x594.png 1048w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/When-developers-think-critically-AI-works-better-300x170.png 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/When-developers-think-critically-AI-works-better-768x435.png 768w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/When-developers-think-critically-AI-works-better-1536x871.png 1536w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/When-developers-think-critically-AI-works-better.png 1600w" sizes="(max-width: 1048px) 100vw, 1048px" /><figcaption class="wp-element-caption">When developers think critically, AI works better. (slide from my May 8, 2025, talk at O’Reilly AI Codecon)</figcaption></figure>
<h2 class="wp-block-heading">The Sens-AI Framework</h2>
As I started working more closely with developers who were using AI tools to try to find ways to help them ramp up more easily, I paid attention to where they were getting stuck, and I started noticing that the pattern of an AI rehashing the same “almost there” suggestions kept coming up in training sessions and real projects. I saw it happen in my own work too. At first it felt like a weird quirk in the model’s behavior, but over time I realized it was a signal: The AI had used up the context I’d given it. The signal tells us that we need a better understanding of the problem, so we can give the model the information it’s missing. That realization was a turning point. Once I started paying attention to those breakdown moments, I began to see the same root cause across many developers’ experiences: not a flaw in the tools but a lack of framing, context, or understanding that the AI couldn’t supply on its own.
<figure class="wp-block-image size-large"><img decoding="async" width="1048" height="597" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/The-Sens-AI-framework-steps-1048x597.png" alt="" class="wp-image-16980" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/The-Sens-AI-framework-steps-1048x597.png 1048w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/The-Sens-AI-framework-steps-300x171.png 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/The-Sens-AI-framework-steps-768x437.png 768w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/The-Sens-AI-framework-steps-1536x875.png 1536w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/The-Sens-AI-framework-steps.png 1600w" sizes="(max-width: 1048px) 100vw, 1048px" /><figcaption class="wp-element-caption">The Sens-AI framework steps (slide from my May 8, 2025, talk at O’Reilly AI Codecon)</figcaption></figure>
Over time—and after a lot of testing, iteration, and feedback from developers—I distilled the core of the Sens-AI learning path into five specific habits. They came directly from watching where learners got stuck, what kinds of questions they asked, and what helped them move forward. These habits form a framework that’s the intellectual foundation behind how Head First C# teaches developers to work with AI:
<ol class="wp-block-list">
<li>Context: Paying attention to what information you supply to the model, trying to figure out what else it needs to know, and supplying it clearly. This includes code, comments, structure, intent, and anything else that helps the model understand what you’re trying to do.</li>
<li>Research: Actively using AI and external sources to deepen your own understanding of the problem. This means running examples, consulting documentation, and checking references to verify what’s really going on.</li>
<li>Problem framing: Using the information you’ve gathered to define the problem more clearly so the model can respond more usefully. This involves digging deeper into the problem you’re trying to solve, recognizing what the AI still needs to know about it, and shaping your prompt to steer it in a more productive direction—and going back to do more research when you realize that it needs more context.</li>
<li>Refining: Iterating your prompts deliberately. This isn’t about random tweaks; it’s about making targeted changes based on what the model got right and what it missed, and using those results to guide the next step.</li>
<li>Critical thinking: Judging the quality of AI output rather than just simply accepting it. Does the suggestion make sense? Is it correct, relevant, plausible? This habit is especially important because it helps developers avoid the trap of trusting confident-sounding answers that don’t actually work.</li>
</ol>
These habits let developers get more out of AI while keeping control over the direction of their work.
<h2 class="wp-block-heading">From Stuck to Solved: Getting Better Results from AI</h2>
I’ve watched a lot of developers use tools like Copilot and ChatGPT—during training sessions, in hands-on exercises, and when they’ve asked me directly for help. What stood out to me was how often they assumed the AI had done a bad job. In reality, the prompt just didn’t include the information the model needed to solve the problem. No one had shown them how to supply the right context. That’s what the five Sens-AI habits are designed to address: not by handing developers a checklist but by helping them build a mental model for how to work with AI more effectively.
In my AI Codecon talk, I shared a story about my colleague Luis, a very experienced developer with over three decades of coding experience. He’s a seasoned engineer and an advanced AI user who builds content for training other developers, works with large language models directly, uses sophisticated prompting techniques, and has built AI-based analysis tools.
Luis was building a desktop wrapper for a React app using Tauri, a Rust-based toolkit. He pulled in both Copilot and ChatGPT, cross-checking output, exploring alternatives, and trying different approaches. But the code still wasn’t working.
Each AI suggestion seemed to fix part of the problem but break another part. The model kept offering slightly different versions of the same incomplete solution, never quite resolving the issue. For a while, he vibe-coded through it, adjusting the prompt and trying again to see if a small nudge would help, but the answers kept circling the same spot. Eventually, he realized the AI had run out of context and changed his approach. He stepped back, did some focused research to better understand what the AI was trying (and failing) to do, and applied the same habits I emphasize in the Sens-AI framework.
That shift changed the outcome. Once he understood the pattern the AI was trying to use, he could guide it. He reframed his prompt, added more context, and finally started getting suggestions that worked. The suggestions only started working once Luis gave the model the missing pieces it needed to make sense of the problem.
<h2 class="wp-block-heading">Applying the Sens-AI Framework: A Real-World Example</h2>
Before I developed the Sens-AI framework, I ran into a problem that later became a textbook case for it. I was curious whether COBOL, a decades-old language developed for mainframes that I had never used before but wanted to learn more about, could handle the basic mechanics of an interactive game. So I did some experimental vibe coding to build a simple terminal app that would let the user move an asterisk around the screen using the W/A/S/D keys. It was a weird little side project—I just wanted to see if I could make COBOL do something it was never really meant for, and learn something about it along the way.
The initial AI-generated code compiled and ran just fine, and at first I made some progress. I was able to get it to clear the screen, draw the asterisk in the right place, handle raw keyboard input that didn’t require the user to press Enter, and get past some initial bugs that caused a lot of flickering.
But once I hit a more subtle bug—where ANSI escape codes like <code>";10H"</code> were printing literally instead of controlling the cursor—ChatGPT got stuck. I’d describe the problem, and it would generate a slightly different version of the same answer each time. One suggestion used different variable names. Another changed the order of operations. A few attempted to reformat the <code>STRING</code> statement. But none of them addressed the root cause.
<figure class="wp-block-image size-large"><img decoding="async" width="1048" height="611" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-error-1048x611.png" alt="" class="wp-image-16976" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-error-1048x611.png 1048w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-error-300x175.png 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-error-768x448.png 768w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-error.png 1318w" sizes="(max-width: 1048px) 100vw, 1048px" /><figcaption class="wp-element-caption">The COBOL app with a bug, printing a raw escape sequence instead of moving the asterisk.</figcaption></figure>
The pattern was always the same: slight code rewrites that looked plausible but didn’t actually change the behavior. That’s what a rehash loop looks like. The AI wasn’t giving me worse answers—it was just circling, stuck on the same conceptual idea. So I did what many developers do: I assumed the AI just couldn’t answer my question and moved on to another problem.
At the time, I didn’t recognize the rehash loop for what it was. I assumed ChatGPT just didn’t know the answer and gave up. But revisiting the project after developing the Sens-AI framework, I saw the whole exchange in a new light. The rehash loop was a signal that the AI needed more context. It got stuck because I hadn’t told it what it needed to know.
When I started working on the framework, I remembered this old failure and thought it’d be a perfect test case. Now I had a set of steps that I could follow:
<ul class="wp-block-list">
<li>First, I recognized that the AI had run out of context. The model wasn’t failing randomly—it was repeating itself because it didn’t understand what I was asking it to do.</li>
<li>Next, I did some targeted research. I brushed up on ANSI escape codes and started reading the AI’s earlier explanations more carefully. That’s when I noticed a detail I’d skimmed past the first time while vibe coding: When I went back through the AI explanation of the code that it generated, I saw that the <code>PIC ZZ</code> COBOL syntax defines a numeric-edited field. I suspected that could potentially cause it to introduce leading spaces into strings and wondered if that could break an escape sequence.</li>
<li>Then I reframed the problem. I opened a new chat and explained what I was trying to build, what I was seeing, and what I suspected. I told the AI I’d noticed it was circling the same solution and treated that as a signal that we were missing something fundamental. I also told it that I’d done some research and had three leads I suspected were related: how COBOL displays multiple items in sequence, how terminal escape codes need to be formatted, and how spacing in numeric fields might be corrupting the output. The prompt didn’t provide answers; it just gave some potential research areas for the AI to investigate. That gave it what it needed to find the additional context it needed to break out of the rehash loop.</li>
<li>Once the model was unstuck, I refined my prompt. I asked follow-up questions to clarify exactly what the output should look like and how to construct the strings more reliably. I wasn’t just looking for a fix—I was guiding the model toward a better approach.</li>
<li>And most of all, I used critical thinking. I read the answers closely, compared them to what I already knew, and decided what to try based on what actually made sense. The explanation checked out. I implemented the fix, and the program worked.</li>
</ul>
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="875" height="1048" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/My-prompt-that-broke-ChatGPT-out-of-its-rehash-loop-875x1048.png" alt="" class="wp-image-16975" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/My-prompt-that-broke-ChatGPT-out-of-its-rehash-loop-875x1048.png 875w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/My-prompt-that-broke-ChatGPT-out-of-its-rehash-loop-251x300.png 251w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/My-prompt-that-broke-ChatGPT-out-of-its-rehash-loop-768x920.png 768w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/My-prompt-that-broke-ChatGPT-out-of-its-rehash-loop.png 1114w" sizes="auto, (max-width: 875px) 100vw, 875px" /><figcaption class="wp-element-caption">My prompt that broke ChatGPT out of its rehash loop</figcaption></figure>
Once I took the time to understand the problem—and did just enough research to give the AI a few hints about what context it was missing—I was able to write a prompt that broke ChatGPT out of the rehash loop, and it generated code that did exactly what I needed. The generated code for the working COBOL app is available in <a href="https://gist.github.com/andrewstellman/86b33ff92edd1320d2727e80f07eb9d9" target="_blank" rel="noreferrer noopener">this GitHub GIST</a>. 
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1048" height="611" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-success-1048x611.png" alt="" class="wp-image-16977" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-success-1048x611.png 1048w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-success-300x175.png 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-success-768x448.png 768w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/07/COBOL-success.png 1318w" sizes="auto, (max-width: 1048px) 100vw, 1048px" /><figcaption class="wp-element-caption">The working COBOL app that moves an asterisk around the screen</figcaption></figure>
<h2 class="wp-block-heading">Why These Habits Matter for New Developers</h2>
I built the Sens-AI learning path in Head First C# around the five habits in the framework. These habits aren’t checklists, scripts, or hard-and-fast rules. They’re ways of thinking that help people use AI more productively—and they don’t require years of experience. I’ve seen new developers pick them up quickly, sometimes faster than seasoned developers who didn’t realize they were stuck in shallow prompting loops.
The key insight into these habits came to me when I was updating the coding exercises in the most recent edition of Head First C#. I test the exercises using AI by pasting the instructions and starter code into tools like ChatGPT and Copilot. If they produce the correct solution, that means I’ve given the model enough information to solve it—which means I’ve given readers enough information too. But if it fails to solve the problem, something’s missing from the exercise instructions.
The process of using AI to test the exercises in the book reminded me of a problem I ran into in the first edition, back in 2007. One exercise kept tripping people up, and after reading a lot of feedback, I realized the problem: I hadn’t given readers all the information they needed to solve it. That helped connect the dots for me. The AI struggles with some coding problems for the same reason the learners were struggling with that exercise—because the context wasn’t there. Writing a good coding exercise and writing a good prompt both depend on understanding what the other side needs to make sense of the problem.
That experience helped me realize that to make developers successful with AI, we need to do more than just teach the basics of prompt engineering. We need to explicitly instill these thinking habits and give developers a way to build them alongside their core coding skills. If we want developers to succeed, we can’t just tell them to “prompt better.” We need to show them how to think with AI.
<h2 class="wp-block-heading">Where We Go from Here</h2>
If AI really is changing how we write software—and I believe it is—then we need to change how we teach it. We’ve made it easy to give people access to the tools. The harder part is helping them develop the habits and judgment to use them well, especially when things go wrong. That’s not just an education problem; it’s also a design problem, a documentation problem, and a tooling problem. Sens-AI is one answer, but it’s just the beginning. We still need clearer examples and better ways to guide, debug, and refine the model’s output. If we teach developers how to think with AI, we can help them become not just code generators but thoughtful engineers who understand what their code is doing and why it matters.
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/the-sens-ai-framework/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Radar Trends to Watch: July 2025</title>
<link>https://www.oreilly.com/radar/radar-trends-to-watch-july-2025/</link>
<comments>https://www.oreilly.com/radar/radar-trends-to-watch-july-2025/#respond</comments>
<pubDate>Tue, 01 Jul 2025 10:22:54 +0000</pubDate>
<dc:creator><![CDATA[Mike Loukides]]></dc:creator>
<category><![CDATA[Radar Trends]]></category>
<category><![CDATA[Signals]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16961</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2023/06/radar-1400x950-6.png"
medium="image"
type="image/png"
/>
<custom:subtitle><![CDATA[Developments in Operations, Quantum Computing, Robotics, and More]]></custom:subtitle>
<description><![CDATA[While there are many copyright cases working their way through the court system, we now have an important decision from one of them. Judge William Alsup ruled that the use of copyrighted material for training is “transformative” and, hence, fair use; that converting books from print to digital form was fair use; but that the use of […]]]></description>
<content:encoded><![CDATA[
While there are many copyright cases working their way through the court system, we now have an important decision from one of them. Judge William Alsup <a href="https://storage.courtlistener.com/recap/gov.uscourts.cand.434709/gov.uscourts.cand.434709.231.0.pdf" target="_blank" rel="noreferrer noopener">ruled</a> that the use of copyrighted material for training is “transformative” and, hence, fair use; that converting books from print to digital form was fair use; but that the use of pirated books in building a library for training AI was not.
Now that everyone is trying to build intelligent agents, we have to think seriously about agent security—which is doubly problematic because we already haven’t thought enough about AI security and issues like prompt injection. Simon Willison has coined the term “<a href="https://www.google.com/url?q=https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/&sa=D&source=docs&ust=1750937168444677&usg=AOvVaw0E4fiu1I8LwSwZGTEWxd8H" target="_blank" rel="noreferrer noopener">lethal trifecta</a>” to describe the combination of problems that make agent security particularly difficult: access to private data, exposure to untrusted content, and the ability to communicate with external services.
<h2 class="wp-block-heading">Artificial Intelligence</h2>
<ul class="wp-block-list">
<li><a href="https://info.deeplearning.ai/apple-sharpens-its-genai-profile-hollywood-joins-copyright-fight-openai-ups-reasoning-quotient-llm-rights-historical-wrongs-1" target="_blank" rel="noreferrer noopener">Researchers</a> have fine-tuned a <a href="https://reglab.github.io/racialcovenants/" target="_blank" rel="noreferrer noopener">model</a> for locating deeds that include language to prevent sales to Black people and other minorities. Their research shows that, as of 1950, roughly a quarter of the deeds in Santa Clara county included such language. The research required analyzing millions of deeds, many more than could have been analyzed by humans. </li>
<li>Google has released its live music model, <a href="https://magenta.withgoogle.com/magenta-realtime" target="_blank" rel="noreferrer noopener">Magenta RT</a>. The model is intended to synthesize music in real time. While there are some restrictions, the weights and the code are available on <a href="https://huggingface.co/google/magenta-realtime" target="_blank" rel="noreferrer noopener">Hugging Face</a> and <a href="https://github.com/magenta/magenta-realtime" target="_blank" rel="noreferrer noopener">GitHub</a>. </li>
<li>OpenAI has <a href="https://cdn.openai.com/pdf/a130517e-9633-47bc-8397-969807a43a23/emergent_misalignment_paper.pdf" target="_blank" rel="noreferrer noopener">found</a> that models that develop a misaligned persona can be <a href="https://www.technologyreview.com/2025/06/18/1119042/openai-can-rehabilitate-ai-models-that-develop-a-bad-boy-persona/" target="_blank" rel="noreferrer noopener">retrained</a> to bring their behavior back inline. </li>
<li>The Flash and Pro versions of Gemini 2.5 have reached <a href="https://blog.google/products/gemini/gemini-2-5-model-family-expands/" target="_blank" rel="noreferrer noopener">general availability</a>. Google has also launched a preview of Gemini 2.5 Flash-Lite, which has been designed for low latency and cost. </li>
<li>The site <a href="http://lowbackgroundsteel.ai" target="_blank" rel="noreferrer noopener">lowbackgroundsteel.ai</a> is intended as a repository for pre-AI content—i.e., content that could not have been generated by AI. </li>
<li>Are the drawbridges going up? Drew Breunig <a href="https://www.dbreunig.com/2025/06/16/drawbridges-go-up.html" target="_blank" rel="noreferrer noopener">compares</a> the current state of AI to Web 2.0, when companies like Twitter started to restrict developers connecting to their platforms. Drew points to Anthropic cutting off Windsurf, Slack blocking others from searching or storing messages, and Google cutting ties with Scale after Meta’s investment. </li>
<li>Simon Willison has <a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/" target="_blank" rel="noreferrer noopener">coined</a> the phrase “lethal trifecta” to describe dangerous vulnerabilities in AI agents. The lethal trifecta arises from the combination of private data, untrusted content, and external communication. </li>
<li>Two new papers, “<a href="https://arxiv.org/abs/2506.08837" target="_blank" rel="noreferrer noopener">Design Patterns for Securing LLM Agents Against Prompt Injections</a>” and “<a href="https://research.google/pubs/an-introduction-to-googles-approach-for-secure-ai-agents/" target="_blank" rel="noreferrer noopener">Google’s Approach for Secure AI Agents</a>,” address the problem of prompt injection and other vulnerabilities in agents. Simon Willison’s <a href="https://simonwillison.net/2025/Jun/13/prompt-injection-design-patterns/#atom-everything" target="_blank" rel="noreferrer noopener">summaries</a> <a href="https://simonwillison.net/2025/Jun/15/ai-agent-security/#atom-everything" target="_blank" rel="noreferrer noopener">are</a> excellent. Prompt injection remains an unsolved (and perhaps unsolvable) problem, but these papers show some progress. </li>
<li>Google’s NotebookLM can <a href="https://blog.google/products/search/audio-overviews-search-labs/" target="_blank" rel="noreferrer noopener">turn your search results into a podcast</a> based on the AI overview. The feature isn’t enabled by default; it’s an experiment in search labs. Be careful—listening to the results may be fun, but it takes you further from the actual results. </li>
<li><a href="https://techxplore.com/news/2025-06-ai-toys-games-barbie-maker.html" target="_blank" rel="noreferrer noopener">AI-enabled Barbie</a><img src="https://s.w.org/images/core/emoji/15.1.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" />? This I have to see. Or maybe not. </li>
<li><a href="https://arxiv.org/abs/2506.08300" target="_blank" rel="noreferrer noopener">Institutional Books</a> is a 242B token dataset for training LLMs. It was created from public domain/out-of-copyright books in Harvard’s library. It includes over 1M books in over 250 languages. </li>
<li>Mistral has launched their first reasoning model, <a href="https://mistral.ai/news/magistral" target="_blank" rel="noreferrer noopener">Magistral</a>, in two versions: a Small version (open source, 24B) and a closed Medium version for enterprises. The announcement stresses traceable reasoning (for applications like law, finance, and healthcare) and creativity. </li>
<li>OpenAI has launched o3-pro, its newest high-end reasoning model. (It’s probably the same model as o3, but with different parameters controlling the time it can spend reasoning.) <a href="https://www.latent.space/p/o3-pro" target="_blank" rel="noreferrer noopener">LatentSpace</a> has a good post on how it’s different. Bring lots of context. </li>
<li>At WWDC, Apple announced a public API for its <a href="https://developer.apple.com/documentation/FoundationModels/generating-content-and-performing-tasks-with-foundation-models" target="_blank" rel="noreferrer noopener">on-device foundation models</a>. Otherwise, Apple’s AI-related <a href="https://www.apple.com/newsroom/2025/06/apple-supercharges-its-tools-and-technologies-for-developers/" target="_blank" rel="noreferrer noopener">announcements</a> at WWDC are <a href="https://arstechnica.com/ai/2025/06/apple-tiptoes-with-modest-ai-updates-while-rivals-race-ahead/" target="_blank" rel="noreferrer noopener">unimpressive</a>. </li>
<li>Simon Willison’s “<a href="https://simonwillison.net/2025/Jun/6/six-months-in-llms/#atom-everything" target="_blank" rel="noreferrer noopener">The Last Six Months in LLMs</a>” is worth reading; his personal benchmark (asking an LLM to generate a drawing of a pelican riding a bicycle) is surprisingly useful! </li>
<li>Here’s a description of <a href="https://www.cyberark.com/resources/threat-research-blog/poison-everywhere-no-output-from-your-mcp-server-is-safe" target="_blank" rel="noreferrer noopener">tool poisoning attacks</a> (TPA) against systems using MCP. TPAs were first <a href="https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks" target="_blank" rel="noreferrer noopener">described</a> in a post from Invariant Labs. Malicious commands can be included in the tool metadata that’s sent to the model—usually (but not exclusively) in the description field. </li>
<li>As part of the New York Times copyright trial, OpenAI has been <a href="https://arstechnica.com/tech-policy/2025/06/openai-confronts-user-panic-over-court-ordered-retention-of-chatgpt-logs/" target="_blank" rel="noreferrer noopener">ordered</a> to retain ChatGPT logs indefinitely. The order has been appealed. </li>
<li>Sandia’s new <a href="https://blocksandfiles.com/2025/06/06/sandia-turns-on-brain-like-storage-free-supercomputer/" target="_blank" rel="noreferrer noopener">“brain-inspired” supercomputer</a>, designed by <a href="https://spinncloud.com/" target="_blank" rel="noreferrer noopener">SpiNNcloud</a>, is worth watching. There’s no centralized memory; memory is distributed among processors (175K cores in Sandia’s 24-board system), which are designed to mimic neurons. </li>
<li>Google has <a href="https://deepmind.google/models/gemini/" target="_blank" rel="noreferrer noopener">updated</a> Gemini 2.5 Pro. While we wouldn’t normally get that excited about an update, this update is arguably the best model available for code generation. And an even more impressive model, <a href="https://www.bleepingcomputer.com/news/artificial-intelligence/googles-upcoming-gemini-kingfall-is-allegedly-a-coding-beast/" target="_blank" rel="noreferrer noopener">Gemini Kingfall</a>, was (briefly) seen in the wild. </li>
<li>Here’s an <a href="https://masonyarbrough.com/blog/ask-human" target="_blank" rel="noreferrer noopener">MCP connector for humans</a>! The idea is simple: When you’re using LLMs to program, the model will often go off on a tangent if it’s confused about what it needs to do. This connector tells the model how to ask the programmer whenever it’s confused, keeping the human in the loop. </li>
<li>Agents appear to be <a href="https://arxiv.org/abs/2502.08586" target="_blank" rel="noreferrer noopener">even more vulnerable</a> to security vulnerabilities than the models themselves. Several of the attacks discussed in this paper involve getting an agent to read malicious pages that corrupt the agent’s output. </li>
<li>OpenAI has <a href="https://help.openai.com/en/articles/11487532-chatgpt-record" target="_blank" rel="noreferrer noopener">announced</a> the availability of ChatGPT’s Record mode, which records a meeting and then generates a summary and notes. Record mode is currently available for Enterprise, Edu, Team, and Pro users. </li>
<li>OpenAI has made its Codex agentic coding tool <a href="https://openai.com/index/introducing-codex/" target="_blank" rel="noreferrer noopener">available to ChatGPT Plus</a> users. The company’s also <a href="https://simonwillison.net/2025/Jun/3/codex-agent-internet-access/#atom-everything" target="_blank" rel="noreferrer noopener">enabled internet access</a> for Codex. Internet access is off by default for <a href="https://platform.openai.com/docs/codex/agent-network" target="_blank" rel="noreferrer noopener">security reasons</a>. </li>
<li>Vision language models (VLMs) <a href="https://vlmsarebiased.github.io/" target="_blank" rel="noreferrer noopener">see what they want to see</a>; they can be very accurate when answering questions about images containing familiar objects but are very likely to make mistakes when shown counterfactual images (for example, a dog with five legs). </li>
<li>Yoshua Bengio has <a href="https://lawzero.org/en/news/yoshua-bengio-launches-lawzero-new-nonprofit-advancing-safe-design-ai" target="_blank" rel="noreferrer noopener">announced</a> the formation of LawZero, a nonprofit AI research group that will create “safe-by-design” AI. LawZero is particularly concerned that the latest models are showing signs of “self-preservation and deceptive behavior,” no doubt <a href="https://www.axios.com/2025/05/23/anthropic-ai-deception-risk" target="_blank" rel="noreferrer noopener">referring</a> to <a href="https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf" target="_blank" rel="noreferrer noopener">Anthropic’s alignment research</a>. </li>
<li>Chat interfaces have been central to AI since ELIZA. But chat embeds the results you want, in lots of verbiage, and it’s not clear that chat is at all appropriate for agents, when the AI is kicking off lots of new processes. <a href="https://www.lukew.com/ff/entry.asp?2105" target="_blank" rel="noreferrer noopener">What’s beyond chat</a>? </li>
<li><a href="https://www.dbreunig.com/2025/05/30/using-slop-forensics-to-determine-model-ancestry.html" target="_blank" rel="noreferrer noopener">Slop forensics</a> uses LLM “slop” to figure out model ancestry, using techniques from bioinformatics. One result is that DeepSeek’s latest model appears to be using Gemini to generate synthetic data rather than OpenAI. <a href="https://github.com/sam-paech/slop-forensics/tree/main" target="_blank" rel="noreferrer noopener">Tools</a> for slop forensics are available on GitHub. </li>
<li><a href="https://osmosis.ai/blog/structured-outputs-comparison" target="_blank" rel="noreferrer noopener">Osmosis-Structure-0.6b</a> is a small model that’s specialized for one task: <a href="https://www.dbreunig.com/2025/05/29/a-small-model-just-for-structured-output.html" target="_blank" rel="noreferrer noopener">extracting structure from unstructured text documents</a>. It’s available from Ollama and Hugging Face. </li>
<li>Mistral has <a href="https://mistral.ai/news/agents-api" target="_blank" rel="noreferrer noopener">announced</a> an Agents API for its models. The Agents API includes built-in connectors for code execution, web search, image generation, and a number of MCP tools. </li>
<li>There is now a <a href="https://www.damiencharlotin.com/hallucinations/" target="_blank" rel="noreferrer noopener">database</a> of court cases in which AI-generated hallucinations (citations of nonexistent case law) were used.</li>
</ul>
<h2 class="wp-block-heading">Programming</h2>
<ul class="wp-block-list">
<li>Martin Fowler and others describe the “<a href="https://martinfowler.com/articles/expert-generalist.html" target="_blank" rel="noreferrer noopener">expert generalist</a>” in an attempt to counter increasing specialization in software engineering. Expert generalists combine one (or more) areas of deep knowledge with the ability to add new areas of depth quickly. </li>
<li>Duncan Davidson points out that, with AI able to crank out dozens of demos in little time, the “<a href="https://duncan.dev/post/art-of-saying-no" target="_blank" rel="noreferrer noopener">art of saying no</a>” is suddenly critical to software developers. It’s too easy to get lost in a flood of decent options while trying to pick the best one. </li>
<li>You’ll probably never need to compute a billion factorials. But even if you don’t, <a href="https://codeforces.com/blog/entry/143279" target="_blank" rel="noreferrer noopener">this article</a> nicely demonstrates optimizing a tricky numeric problem. </li>
<li>Rust is seeing <a href="https://thenewstack.io/rust-eats-pythons-javas-lunch-in-data-engineering/" target="_blank" rel="noreferrer noopener">increased adoption</a> for data engineering projects because of its combination of memory safety and high performance. </li>
<li>The best way to make programmers more productive is to <a href="https://www.infoq.com/articles/developer-joy-productivity/" target="_blank" rel="noreferrer noopener">make their job more fun</a> by encouraging experimentation and rest breaks and paying attention to issues like appropriate tooling and code quality. </li>
<li>What’s the next step after platform engineering? Is it <a href="https://thenewstack.io/beyond-platform-engineering-the-rise-of-platform-democracy/" target="_blank" rel="noreferrer noopener">platform democracy</a>? Or Google Cloud’s new idea, <a href="https://thenewstack.io/googles-cloud-idp-could-replace-platform-engineering/" target="_blank" rel="noreferrer noopener">internal development platforms</a>? </li>
<li>A <a href="https://cloud.google.com/resources/content/google-cloud-esg-competitive-edge-platform-engineering?e=48754805&utm_source=the+new+stack&utm_medium=referral&utm_content=inline-mention&utm_campaign=tns+platform" target="_blank" rel="noreferrer noopener">study</a> by the Enterprise Strategy Group and commissioned by Google <a href="https://thenewstack.io/google-study-65-of-developer-time-wasted-without-platforms/" target="_blank" rel="noreferrer noopener">claims</a> that software developers waste 65% of their time on problems that are solved by platform engineering. </li>
<li>Stack Overflow is <a href="https://thenewstack.io/stack-overflows-plan-to-survive-the-age-of-ai/" target="_blank" rel="noreferrer noopener">taking steps</a> to preserve its relevance in the age of AI. It’s considering incorporating chat, paying people to be helpers, and adding personalized home pages where you can aggregate important technical information.</li>
</ul>
<h2 class="wp-block-heading">Web</h2>
<ul class="wp-block-list">
<li>Is it <a href="https://thenewstack.io/http-3-in-the-wild-why-it-beats-http-2-where-it-matters-most/" target="_blank" rel="noreferrer noopener">time to implement HTTP/3</a>? This standard, which has been around since 2022, solves some of the problems with HTTP/2. It claims to reduce wait and load times, especially when the network itself is lossy. The Nginx server, along with the major browsers, all <a href="https://en.wikipedia.org/wiki/HTTP/3" target="_blank" rel="noreferrer noopener">support HTTP/3</a>. </li>
<li>Monkeon’s <a href="https://www.monkeon.co.uk/wikiradio/" target="_blank" rel="noreferrer noopener">WikiRadio</a> is a website that feeds you random clips of Wikipedia audio. Check it out for more projects that remind you of the days when the web was fun.</li>
</ul>
<h2 class="wp-block-heading">Security</h2>
<ul class="wp-block-list">
<li><a href="https://www.bleepingcomputer.com/news/security/cloudflare-blocks-record-73-tbps-ddos-attack-against-hosting-provider/" target="_blank" rel="noreferrer noopener">Cloudflare has blocked</a> a DDOS attack that peaked at 7.3 terabits/second; the peak lasted for about 45 seconds. This is the largest attack on record. It’s not the kind of record we like to see. </li>
<li>How many people do you guess would fall victim to scammers offering to <a href="https://hardresetmedia.substack.com/p/one-nz-man-vs-pakistani-scammers" target="_blank" rel="noreferrer noopener">ghostwrite their novels</a> and get them published? More than you would think. </li>
<li><a href="https://www.bleepingcomputer.com/news/security/chainlink-phishing-how-trusted-domains-become-threat-vectors/" target="_blank" rel="noreferrer noopener">ChainLink Phishing</a> is a new variation on the age-old phish. In ChainLink Phishing, the victim is led through documents on trusted sites, well-known verification techniques like CAPTCHA, and other trustworthy sources before they’re asked to give up private and confidential information. </li>
<li>Cloudflare’s <a href="https://www.cloudflare.com/galileo/" target="_blank" rel="noreferrer noopener">Project Galileo</a> offers free protection against cyberattacks for vulnerable organizations, such as human rights and relief organizations that are vulnerable to denial-of-service (DOS) attacks. </li>
<li>Apple is adding the ability to <a href="https://arstechnica.com/security/2025/06/apple-previews-new-import-export-feature-to-make-passkeys-more-interoperable/" target="_blank" rel="noreferrer noopener">transfer passkeys</a> to its operating systems. The ability to import and export passkeys is an important step toward making passkeys more usable. </li>
<li>Matthew Green has an excellent <a href="https://blog.cryptographyengineering.com/2025/06/09/a-bit-more-on-twitter-xs-new-encrypted-messaging/" target="_blank" rel="noreferrer noopener">post</a> on cryptographic security in Twitter’s (oops, X’s) new messaging system. It’s worth reading for anyone interested in secure messaging. The TL;DR is that it’s better than expected but probably not as good as hoped. </li>
<li><a href="https://invariantlabs.ai/blog/mcp-github-vulnerability" target="_blank" rel="noreferrer noopener">Toxic agent flows</a> are a new kind of vulnerability in which an attacker takes advantage of an MCP server to hijack a user’s agent. One of the first instances forced GitHub’s MCP server to reveal data from private repositories.</li>
</ul>
<h2 class="wp-block-heading">Operations</h2>
<ul class="wp-block-list">
<li>Databricks <a href="https://thenewstack.io/databricks-launches-a-no-code-tool-for-building-data-pipelines/" target="_blank" rel="noreferrer noopener">announced</a> Lakeflow Designer, a visually oriented drag-and-drop no code tool for building data pipelines. Other announcements include <a href="https://thenewstack.io/lakebase-is-databricks-fully-managed-postgres-database-for-the-ai-era/" target="_blank" rel="noreferrer noopener">Lakebase</a>, a managed Postgres database. We have always been fans of Postgres; this may be its time to shine. </li>
<li>Simple <a href="https://thenewstack.io/create-a-bootable-usb-drive-for-linux-installations/" target="_blank" rel="noreferrer noopener">instructions</a> for creating a bootable USB drive for Linux—how soon we forget! </li>
<li>An LLM with a simple agent can greatly <a href="https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine" target="_blank" rel="noreferrer noopener">simplify</a> the analysis and diagnosis of telemetry data. This will be revolutionary for observability—not a threat but an opportunity to do more. “The only thing that really matters is fast, tight feedback loops.” </li>
<li><a href="https://ducklake.select/" target="_blank" rel="noreferrer noopener">DuckLake</a> combines a traditional data lake with a data catalog stored in an SQL database. Postgres, SQLite, MySQL, DuckDB, and others can be used as the database.</li>
</ul>
<h2 class="wp-block-heading">Quantum Computing</h2>
<ul class="wp-block-list">
<li>IBM has <a href="https://www.technologyreview.com/2025/06/10/1118297/ibm-large-scale-error-corrected-quantum-computer-by-2028/" target="_blank" rel="noreferrer noopener">committed</a> to building a quantum computer with error correction by 2028. The computer will have 200 logical qubits. This probably isn’t enough to run any useful quantum algorithm, but it still represents a huge step forward. </li>
<li>Researchers have <a href="https://arxiv.org/abs/2505.15917" target="_blank" rel="noreferrer noopener">claimed</a> that 2,048-bit RSA encryption keys could be <a href="https://phys.org/news/2025-05-quantum-rsa-encryption-qubits.html" target="_blank" rel="noreferrer noopener">broken</a> by a quantum computer with as few as a million qubits—a factor of 20 less than previous estimates. Time to implement postquantum cryptography!</li>
</ul>
<h2 class="wp-block-heading">Robotics</h2>
<ul class="wp-block-list">
<li>Denmark is testing a fleet of <a href="https://apnews.com/article/denmark-robot-sailboats-baltic-sea-bfa31c98cf7c93320115c0ad0e6908c5" target="_blank" rel="noreferrer noopener">robotic sailboats</a> (sailboat drones). They’re intended for surveillance in the North Sea.</li>
</ul>
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/radar-trends-to-watch-july-2025/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Generative AI in the Real World: Stefania Druga on Designing for the Next Generation</title>
<link>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-stefania-druga-on-designing-for-the-next-generation/</link>
<comments>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-stefania-druga-on-designing-for-the-next-generation/#respond</comments>
<pubDate>Thu, 26 Jun 2025 10:01:47 +0000</pubDate>
<dc:creator><![CDATA[Ben Lorica and Stefania Druga]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Education]]></category>
<category><![CDATA[Generative AI in the Real World]]></category>
<category><![CDATA[Podcast]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?post_type=podcast&p=16933</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2024/01/Podcast_Cover_GenAI_in_the_Real_World.png"
medium="image"
type="image/png"
/>
<description><![CDATA[How do you teach kids to use and build with AI? That’s what Stefania Druga works on. It’s important to be sensitive to their creativity, sense of fun, and desire to learn. When designing for kids, it’s important to design with them, not just for them. That’s a lesson that has important implications for adults, […]]]></description>
<content:encoded><![CDATA[
How do you teach kids to use and build with AI? That’s what Stefania Druga works on. It’s important to be sensitive to their creativity, sense of fun, and desire to learn. When designing for kids, it’s important to design with them, not just for them. That’s a lesson that has important implications for adults, too. Join Stefania Druga and Ben Lorica to hear about AI for kids and what that has to say about AI for adults.
About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.Check out <a href="https://learning.oreilly.com/playlists/42123a72-1108-40f1-91c0-adbfb9f4983b/?_gl=1*16z5k2y*_ga*MTE1NDE4NjYxMi4xNzI5NTkwODkx*_ga_092EL089CH*MTcyOTYxNDAyNC4zLjEuMTcyOTYxNDAyNi41OC4wLjA." target="_blank" rel="noreferrer noopener">other episodes</a> of this podcast on the O’Reilly learning platform.
<h2 class="wp-block-heading">Timestamps</h2>
<ul class="wp-block-list">
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=0" target="_blank" rel="noreferrer noopener">0:00</a>: Introduction to Stefania Druga, independent researcher and most recently a research scientist at DeepMind.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=27" target="_blank" rel="noreferrer noopener">0:27</a>: You’ve built AI education tools for young people, and after that, worked on multimodal AI at DeepMind. What have kids taught you about AI design?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=48" target="_blank" rel="noreferrer noopener">0:48</a>: It’s been quite a journey. I started working on AI education in 2015. I was on the Scratch team in the MIT Media Lab. I worked on Cognimates so kids could train custom models with images and texts. Kids would do things I would have never thought of, like build a model to identify weird hairlines or to recognize and give you backhanded compliments. They did things that are weird and quirky and fun and not necessarily utilitarian.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=125" target="_blank" rel="noreferrer noopener">2:05</a>: For young people, driving a car is fun. Having a self-driving car is not fun. They have lots of insights that could inspire adults.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=145" target="_blank" rel="noreferrer noopener">2:25</a>: You’ve noticed that a lot of the users of AI are Gen Z, but most tools aren’t designed with them in mind. What is the biggest disconnect?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=167" target="_blank" rel="noreferrer noopener">2:47</a>: We don’t have a knob for agency to control how much we delegate to the tools. Most of Gen Z use off-the-shelf AI products like ChatGPT, Gemini, and Claude. These tools have a baked-in assumption that they need to do the work rather than asking questions to help you do the work. I like a much more Socratic approach. A big part of learning is asking and being asked good questions. A huge role for generative AI is to use it as a tool that can teach you things, ask you questions; [it’s] something to brainstorm with, not a tool that you delegate work to. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=265" target="_blank" rel="noreferrer noopener">4:25</a>: There’s this big elephant in the room where we don’t have conversations or best practices for how to use AI.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=282" target="_blank" rel="noreferrer noopener">4:42</a>: You mentioned the Socratic approach. How do you implement the Socratic approach in the world of text interfaces?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=297" target="_blank" rel="noreferrer noopener">4:57</a>: In Cognimates, I created a <a href="http://cognimatescopilot.com/" target="_blank" rel="noreferrer noopener">copilot</a> for kids coding. This copilot doesn’t do the coding. It asks them questions. If a kid asks, “How do I make the dude move?” the copilot will ask questions rather than saying, “Use this block and then that block.” </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=400" target="_blank" rel="noreferrer noopener">6:40</a>: When I designed this, we started with a person behind the scenes, like the Wizard of Oz. Then we built the tool and realized that kids really want a system that can help them clarify their thinking. How do you break down a complex event into steps that are good computational units? </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=486" target="_blank" rel="noreferrer noopener">8:06</a>: The third discovery was affirmations—whenever they did something that was cool, the copilot says something like “That’s awesome.” The kids would spend double the time coding because they had an infinitely patient copilot that would ask them questions, help them debug, and give them affirmations that would reinforce their creative identity. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=526" target="_blank" rel="noreferrer noopener">8:46</a>: With those design directions, I built the tool. I’m presenting a paper at the ACM IDC (Interaction Design for Children) conference that <a href="https://stefania11.github.io/publications/" target="_blank" rel="noreferrer noopener">presents this work in more detail</a>. I hope this example gets replicated.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=566" target="_blank" rel="noreferrer noopener">9:26</a>: Because these interactions and interfaces are evolving very fast, it’s important to understand what young people want, how they work and how they think, and design with them, not just for them.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=584" target="_blank" rel="noreferrer noopener">9:44</a>: The typical developer now, when they interact with these things, overspecifies the prompt. They describe so precisely. But what you’re describing is interesting because you’re learning, you’re building incrementally. We’ve gotten away from that as grown-ups.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=628" target="_blank" rel="noreferrer noopener">10:28</a>: It’s all about tinkerability and having the right level of abstraction. What are the right Lego blocks? A prompt is not tinkerable enough. It doesn’t allow for enough expressivity. It needs to be composable and allow the user to be in control. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=677" target="_blank" rel="noreferrer noopener">11:17</a>: What’s very exciting to me are multimodal [models] and things that can work on the phone. Young people spend a lot of time on their phones, and they’re just more accessible worldwide. We have open source models that are multimodal and can run on devices, so you don’t need to send your data to the cloud. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=719" target="_blank" rel="noreferrer noopener">11:59</a>: I worked recently on two multimodal mobile-first projects. The first was in math. We created a benchmark of misconceptions first. What are the mistakes middle schoolers can make when learning algebra? We tested to see if multimodal LLMs can pick up misconceptions based on pictures of kids’ handwritten exercises. We ran the results by teachers to see if they agreed. We confirmed that the teachers agreed. Then I built an app called MathMind that asks you questions as you solve problems. If it detects misconceptions; it proposes additional exercises. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=881" target="_blank" rel="noreferrer noopener">14:41</a>: For teachers, it’s useful to see how many people didn’t understand a concept before they move on. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=917" target="_blank" rel="noreferrer noopener">15:17</a>: Who is building the open weights models that you are using as your starting point?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=926" target="_blank" rel="noreferrer noopener">15:26</a>: I used a lot of the Gemma 3 models. The latest model, 3n, is multilingual and small enough to run on a phone or laptop. Llama has good small models. Mistral is another good one.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=971" target="_blank" rel="noreferrer noopener">16:11</a>: What about latency and battery consumption?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=982">16:22</a>: I haven’t done extensive tests for battery consumption, but I haven’t seen anything egregious.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=995" target="_blank" rel="noreferrer noopener">16:35</a>: Math is the perfect testbed in many ways, right? There’s a right and a wrong answer.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1007" target="_blank" rel="noreferrer noopener">16:47</a>: The future of multimodal AI will be neurosymbolic. There’s a part that the LLM does. The LLM is good at fuzzy logic. But there’s a formal system part, which is actually having concrete specifications. Math is good for that, because we know the ground truth. The question is how to create formal specifications in other domains. The most promising results are coming from this intersection of formal methods and large language models. One example is AlphaGeometry from DeepMind, because they were using a grammar to constrain the space of solutions. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1096" target="_blank" rel="noreferrer noopener">18:16</a>: Can you give us a sense for the size of the community working on these things? Is it mostly academic? Are there startups? Are there research grants?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1132" target="_blank" rel="noreferrer noopener">18:52</a>: The first community when I started was <a href="http://ai4k12.org" target="_blank" rel="noreferrer noopener">AI for K12</a>. There’s an active community of researchers and educators. It was supported by NSF. It’s pretty diverse, with people from all over the world. And there’s also a Learning and Tools community focusing on math learning. Renaissance Philanthropy also funds a lot of initiatives.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1218" target="_blank" rel="noreferrer noopener">20:18</a>: What about Khan Academy?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1220" target="_blank" rel="noreferrer noopener">20:20</a>: Khan Academy is a great example. They wanted to Khanmigo to be about intrinsic motivation and understanding positive encouragement for the kids. But what I discovered was that the math was wrong—the early LLMs had problems with math. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1348" target="_blank" rel="noreferrer noopener">22:28</a>: Let’s say a month from now a foundation model gets really good at advanced math. How long until we can distill a small model so that you benefit on the phone?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1384" target="_blank" rel="noreferrer noopener">23:04</a>: There was a project, Minerva, that was an LLM specifically for math. A really good model that is always correct at math is not going to be a Transformer under the hood. It will be a Transformer together with tool use and an automatic theorem prover. We need to have a piece of the system that’s verifiable. How quickly can we make it work on a phone? That’s doable right now. There are open source systems like Unsloth that distills a model as soon as it’s available. Also the APIs are becoming more affordable. We can build those tools right now and make them run on edge devices. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1505" target="_blank" rel="noreferrer noopener">25:05</a>: Human in the loop for education means parents in the loop. What extra steps do you have to do to be comfortable that whatever you build is ready to be deployed and be scrutinized by parents.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1534" target="_blank" rel="noreferrer noopener">25:34</a>: The most common question I get is “What should I do with my child?” I get this question so often that I sat down and wrote a long handbook for parents. During the pandemic, I worked with the same community of families for two-and-a-half years. I saw how the parents were mediating the use of AI in the house. They learned through games how machine learning systems worked, about bias. There’s a lot of work to be done for families. Parents are overwhelmed. There’s a constant feel of not wanting your child to be left behind but also not wanting them on devices all the time. It’s important to make a plan to have conversations about how they are using AI, how they think about AI, coming from a place of curiosity. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1692" target="_blank" rel="noreferrer noopener">28:12</a>: We talked about implementing the Socratic method. One of the things people are talking about is multi-agents. At some point, some kid will be using a tool that orchestrates a bunch of agents. What kinds of innovations in UX are you seeing that will prepare us for this world?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1733" target="_blank" rel="noreferrer noopener">28:53</a>: The multi-agent part is interesting. When I was doing this study on the Scratch copilot, we had a design session at the end with the kids. This theme of agents and multiple agents emerged. Many of them wanted that, and wanted to run simulations. We talked about the Scratch community because it’s social learning, so I asked them what happens if some of the games are done by agents. Would you like to know that? It’s something they want, and something they want to be transparent about. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1841" target="_blank" rel="noreferrer noopener">30:41</a>: A hybrid online community that includes kids and agents isn’t science fiction. The technology already exists. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1854" target="_blank" rel="noreferrer noopener">30:54</a>: I’m collaborating with the folks who created a technology called <a href="https://www.morph.so/blog/infinibranch/" target="_blank" rel="noreferrer noopener">Infinibranch</a> that lets you create a lot of virtual environments where you can test agents and see agents in action. We’re clearly going to have agents that can take actions. I told them what kids wanted, and they said, “Let’s make it happen.” It’s definitely going to be an area of simulations and tools for thought. I think it’s one of the most exciting areas. You can run 10 experiments at once, or 100. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1943" target="_blank" rel="noreferrer noopener">32:23</a>: In the enterprise, a lot of enterprise people get ahead of themselves. Let’s get one agent working well first. A lot of the vendors are getting ahead of themselves.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_with_Stefania_Druga.mp3#t=1969" target="_blank" rel="noreferrer noopener">32:49</a>: Absolutely. It’s one thing to do a demo; it’s another thing to get it to work reliably.</li>
</ul>

]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-stefania-druga-on-designing-for-the-next-generation/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>“More Slowly”</title>
<link>https://www.oreilly.com/radar/more-slowly/</link>
<pubDate>Wed, 25 Jun 2025 15:26:30 +0000</pubDate>
<dc:creator><![CDATA[Tim O’Reilly]]></dc:creator>
<category><![CDATA[Artificial Intelligence]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16943</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2020/02/na-polygons-1a-1400x950-1.jpg"
medium="image"
type="image/jpeg"
/>
<custom:subtitle><![CDATA[Human Deep Learning in the Face of AI]]></custom:subtitle>
<description><![CDATA[My friend David Eaves has the best tagline for his blog: “if writing is a muscle, this is my gym.” So I asked him if I could adapt it for my new biweekly (and occasionally weekly) hour-long video show on oreilly.com, Live with Tim O’Reilly. In it, I interview people who know way more than […]]]></description>
<content:encoded><![CDATA[
My friend David Eaves has the best tagline for <a href="http://eaves.ca" target="_blank" rel="noreferrer noopener">his blog</a>: “if writing is a muscle, this is my gym.” So I asked him if I could adapt it for my new biweekly (and occasionally weekly) hour-long video show on oreilly.com, <a href="https://www.oreilly.com/products/new-live-online-sessions.html" target="_blank" rel="noreferrer noopener">Live with Tim O’Reilly</a>. In it, I interview people who know way more than me, and ask them to teach me what they know. It’s a mental workout, not just for me but for our participants, who also get to ask questions as the hour progresses. Learning is a muscle. Live with Tim O’Reilly is my gym, and my guests are my personal trainers. This is how I have learned throughout my career—having exploratory conversations with people is a big part of my daily work—but in this show, I’m doing it in public, sharing my learning conversations with a live audience.
<a href="https://learning.oreilly.com/live-events/building-secure-code-in-the-age-of-vibe-coding-steve-wilson-live-with-tim-oreilly/0642572189716/" target="_blank" rel="noreferrer noopener">My first guest, on June 3, was Steve Wilson</a>, the author of one of my favorite recent O’Reilly books, <a href="https://learning.oreilly.com/library/view/the-developers-playbook/9781098162191/" target="_blank" rel="noreferrer noopener">The Developer’s Playbook for Large Language Model Security</a>. Steve’s day job is at cybersecurity firm Exabeam, where he’s the chief AI and product officer. He also founded and cochairs the Open Worldwide Application Security Project (OWASP) Foundation’s Gen AI Security Project.
During my prep call with Steve, I was immediately reminded of a passage in Alain de Botton’s marvelous book <a href="https://www.alaindebotton.com/literature/" target="_blank" rel="noreferrer noopener">How Proust Can Change Your Life</a>, which reconceives Proust as a self-help author. Proust is lying in his sickbed, as he was wont to do, receiving a visitor who is telling him about his trip to come see him in Paris. Proust keeps making him go back in the story, saying, “More slowly,” till the friend is sharing every detail about his trip, down to the old man he saw feeding pigeons on the steps of the train station.
Why am I telling you this? Steve said something about AI security that I understood in a superficial way but didn’t truly understand deeply. So I laughed and told Steve the story about Proust, and whenever he went by something too quickly for me, I’d say, “More slowly,” and he knew just what I meant.
This captures something I want to make part of the essence of this show. There are a lot of podcasts and interview shows that stay at a high conceptual level. In Live with Tim O’Reilly, my goal is to get really smart people to go a bit more slowly, explaining what they mean in a way that helps all of us go a bit deeper by telling vivid stories and providing immediately useful takeaways.
This seems especially important in the age of AI-enabled coding, which allows us to do so much so fast that we may be building on a shaky foundation, which may come back to bite us because of what we only thought we understood. As my friend <a href="https://dev.ecoguineafoundation.com/in-memoriam.html" target="_blank" rel="noreferrer noopener">Andrew Singer</a> taught me 40 years ago, “The skill of debugging is to figure out what you really told your program to do rather than what you thought you told it to do.” That is even more true today in the world of AI evals.
“More slowly” is also something personal trainers remind people of all the time as they rush through their reps. Increasing time under tension is a proven way to build muscle. So I’m not entirely mixing my metaphors here. <img src="https://s.w.org/images/core/emoji/15.1.0/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" />
In my interview with Steve, I started out by asking him to tell us about some of the top security issues developers face when coding with AI, especially when vibe coding. Steve tossed off that being careful with your API keys was at the top of the list. I said, “More slowly,” and here’s what he told me:
<iframe loading="lazy" width="560" height="315" src="https://www.youtube.com/embed/AUAeRnwv7Fw?si=2wvy39Xg-_7JNDGg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
As you can see, having him unpack what he meant by “be careful” led to a Proustian tour through the details of the risks and mistakes that underlie that brief bit of advice, from the bots that scour GitHub for keys accidentally left exposed in code repositories (or even the histories, when they’ve been expunged from the current repository) to a humorous story of a young vibe coder complaining about how people were draining his AWS account—after displaying his keys in a live coding session on Twitch. As Steve exclaimed: “They are secrets. They are meant to be secret!”
Steve also gave some eye-opening warnings about the <a href="https://youtu.be/HA-fbyyph6E" target="_blank" rel="noreferrer noopener">security risks of hallucinated packages</a> (you imagine, “the package doesn’t exist, no big deal,” but it turns out that malicious programmers have figured out commonly hallucinated package names and made compromised packages to match!); some spicy observations on <a href="https://youtu.be/fwVVq5mC1p4" target="_blank" rel="noreferrer noopener">the relative security strengths and weaknesses of various major AI players</a>; and why <a href="https://youtu.be/mVX67oHBVq4" target="_blank" rel="noreferrer noopener">running AI models locally in your own data center isn’t any more secure</a>, unless you do it right. He also talked a bit about <a href="https://www.youtube.com/watch?v=AW0YhTsuKoQ" target="_blank" rel="noreferrer noopener">his role as chief AI and product officer at information security company Exabeam</a>. You can <a href="https://learning.oreilly.com/videos/building-secure-code/0642572018926/" target="_blank" rel="noreferrer noopener">watch the complete conversation here</a>.
<a href="https://learning.oreilly.com/live-events/chelsea-troy-live-with-tim-oreilly/0642572203368/" target="_blank" rel="noreferrer noopener">My second guest, Chelsea Troy</a>, whom I spoke with on June 18, is by nature totally aligned with the “more slowly” idea—in fact, it may be that <a href="https://learning.oreilly.com/videos/coding-with-ai/0642572017171/0642572017171-video386935/" target="_blank" rel="noreferrer noopener">her “not so fast” takes</a> on several much-hyped computer science papers at the recent O’Reilly AI Codecon planted that notion. During our conversation, her comments about <a href="https://youtu.be/ouMKcv07QC8" target="_blank" rel="noreferrer noopener">the three essential skills still required of a software engineer</a> working with AI, why <a href="https://youtu.be/0RM2QCQ16M0" target="_blank" rel="noreferrer noopener">best practice is not necessarily a good reason to do something</a>, and <a href="https://youtu.be/gx3r4wIwh_w" target="_blank" rel="noreferrer noopener">how much software developers need to understand about LLMs under the hood</a> are all pure gold. You can <a href="https://learning.oreilly.com/videos/ai-and-developer/0642572020332/0642572020332-video390079/" target="_blank" rel="noreferrer noopener">watch our full talk here</a>.
One of the things that I did a little differently in this second interview was to take advantage of the O’Reilly learning platform’s live training capabilities to bring in audience questions early in the conversation, mixing them in with my own interview rather than leaving them for the end. It worked out really well. Chelsea herself talked about her experience teaching with the O’Reilly platform, and how much she learns from the attendee questions. I completely agree.
<iframe loading="lazy" width="560" height="315" src="https://www.youtube.com/embed/SrxF4ZOQkNM?si=GjaduYh2xrGd0n4C" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
Additional guests coming up include <a href="https://en.wikipedia.org/wiki/Matthew_Prince" target="_blank" rel="noreferrer noopener">Matthew Prince</a> of Cloudflare (July 14), who will unpack for us Cloudflare’s <a href="https://www.justthink.ai/blog/cloudflare-the-secret-weapon-for-building-ai-agents" target="_blank" rel="noreferrer noopener">surprisingly pervasive role in the infrastructure of AI</a> as delivered, as well as his fears about <a href="https://searchengineland.com/ai-killing-web-business-model-455157" target="_blank" rel="noreferrer noopener">AI leading to the death of the web as we know it</a>—and what content developers can do about it (<a href="https://www.oreilly.com/live/live-with-tim-oreilly-a-conversation-with-matthew-prince.html" target="_blank" rel="noreferrer noopener">register here</a>); <a href="https://en.wikipedia.org/wiki/Marily_Nika" target="_blank" rel="noreferrer noopener">Marily Nika</a> (July 28), the author of <a href="https://www.oreilly.com/library/view/building-ai-powered-products/9781098152697/" target="_blank" rel="noreferrer noopener">Building AI-Powered Products</a>, who will teach us about product management for AI (<a href="https://www.oreilly.com/live/live-with-tim-oreilly-a-conversation-with-marily-nika.html" target="_blank" rel="noreferrer noopener">register here</a>); and <a href="https://en.wikipedia.org/wiki/Arvind_Narayanan" target="_blank" rel="noreferrer noopener">Arvind Narayanan</a> (August 12), coauthor of the book <a href="https://press.princeton.edu/books/hardcover/9780691249131/ai-snake-oil?srsltid=AfmBOoomtix-VDWW39hvK48jv7_TUrWdKrAspCXVGzrAoMjSYfybAz7X" target="_blank" rel="noreferrer noopener">AI Snake Oil</a>, who will talk with us about his paper “<a href="https://knightcolumbia.org/content/ai-as-normal-technology" target="_blank" rel="noreferrer noopener">AI as Normal Technology</a>” and what that means for the prospects of employment in an AI future.
We’ll be publishing a fuller schedule soon. We’re going a bit light over the summer, but we will likely slot in more sessions in response to breaking topics.
]]></content:encoded>
</item>
<item>
<title>CTO Hour Recap: Deliberate Engineering Strategy with Will Larson</title>
<link>https://www.oreilly.com/radar/cto-hour-recap-deliberate-engineering-strategy-with-will-larson/</link>
<pubDate>Tue, 24 Jun 2025 10:00:25 +0000</pubDate>
<dc:creator><![CDATA[David Michelson]]></dc:creator>
<category><![CDATA[Business]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16926</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2019/10/ai-ml-crystals-11b-1400x950.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[On some level, every engineering leader knows that strategy matters. And yet many teams remain stuck in reactive cycles, lurching from crisis to crisis, untethered from clear direction. This disconnect between recognizing the importance of strategy and actually practicing strategy well was at the heart of O’Reilly’s June 23rd CTO Hour, where host Peter Bell […]]]></description>
<content:encoded><![CDATA[
On some level, every engineering leader knows that strategy matters. And yet many teams remain stuck in reactive cycles, lurching from crisis to crisis, untethered from clear direction. This disconnect between recognizing the importance of strategy and actually practicing strategy well was at the heart of O’Reilly’s June 23rd CTO Hour, where host Peter Bell sat down with renowned engineering leader and best-selling author Will Larson. Together, they explored how deliberate, structured decision-making can transform engineering teams from reactive problem-solvers into teams that build with intention.
Larson was clear from the start that strategy isn’t some lofty abstraction. Strategy is simply the act of making decisions in a visible, accountable, improveable, and repeatable way. When strategy is explicit, it gives teams the context they need to align, disagree productively, and improve over time. It helps organizations avoid letting critical decisions remain hidden or ad hoc and instead clarifies priorities, trade-offs, and goals. Such clarity not only improves outcomes but also helps new hires and teams to understand why things are done the way they are.
For those asserting that their org lacks a strategy, Larson was firm: They do have one—it’s just undocumented, implicit, or scattered across conversations with long-tenured leaders. The real challenge for engineering leaders is making that strategy visible, legible, and actionable across the organization.
To help with practical strategy work, Larson shared examples of several tools he has used to build and test strategy. The most accessible was strategy testing: Rather than forcing compliance, leaders should investigate and carefully test why people aren’t adopting a new approach. Noncompliance is often a diagnostic of a faulty strategy, not a direct defiance. He also shared how he’s used systems modeling and Wardley Mapping to plan complex migrations and organizational changes—from scaling infrastructure at Uber to planning around AI and data strategy at Calm and Imprint.
One of the key takeaways from the event was that strategic thinking isn’t just for C-suite executives. It’s also an essential skill for directors and senior leaders looking to make a meaningful impact. However, for these leaders to engage productively in strategy work, there must be strategic clarity at the top. Without it, what’s possible and how others might contribute is unclear. The frameworks and tools shared in this session provided concrete starting points for leaders at all levels who are ready to stop waiting for strategy and start creating it.
For those who want to go deeper into crafting and implementing engineering strategy, Will Larson’s next book with O’Reilly focuses on engineering strategy, complete with case studies and tools. Read the first two chapters of <a href="https://learning.oreilly.com/library/view/crafting-engineering-strategy/9798341645516/" target="_blank" rel="noreferrer noopener">Crafting Engineering Strategy</a>, now in early release on O’Reilly.
And mark your calendars for our next leadership event: <a href="https://learning.oreilly.com/live-events/tech-leadership-tuesday-systems-thinking-essentials-with-diana-montalion-and-lena-reinhard/0642572203870/" target="_blank" rel="noreferrer noopener">Tech Leadership Tuesday: Systems Thinking Essentials with Diana Montalion and Lena Reinhard</a>, where we’ll explore how systems thinking can help leaders better understand and navigate organizational complexity.
]]></content:encoded>
</item>
<item>
<title>Coding for the Future Agentic World</title>
<link>https://www.oreilly.com/radar/coding-for-the-future-agentic-world/</link>
<pubDate>Wed, 18 Jun 2025 18:56:51 +0000</pubDate>
<dc:creator><![CDATA[Tim O’Reilly]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Artificial Intelligence]]></category>
<category><![CDATA[Research]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16877</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Firefly_abstract-for-coding-for-the-future-agentic-world-158061.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[May 8 AI Codecon was a huge success. We had amazing speakers and content. We also had over 9,000 live attendees and another 12,000 who signed up to be able to view the content later on the O’Reilly learning platform. (Here’s a post with video excerpts and some of my takeaways.) So we’re doing it […]]]></description>
<content:encoded><![CDATA[
May 8 <a href="https://www.oreilly.com/CodingwithAI/" target="_blank" rel="noreferrer noopener">AI Codecon</a> was a huge success. We had amazing speakers and content. We also had over 9,000 live attendees and another 12,000 who signed up to be able to view the content later on the <a href="http://oreilly.com" target="_blank" rel="noreferrer noopener">O’Reilly learning platform</a>. (Here’s a post with <a href="https://www.oreilly.com/radar/takeaways-from-coding-with-ai/" target="_blank" rel="noreferrer noopener">video excerpts and some of my takeaways</a>.)
So we’re doing it again. The <a href="https://www.oreilly.com/AgenticWorld/" target="_blank" rel="noreferrer noopener">next AI Codecon</a> is scheduled for September 9. Our focus this time is going to be on agentic coding. Now I know that Simon Willison and others have derided “agentic” as a marketing buzzword, and that no one can even agree on what it means (Simon <a href="https://x.com/simonw/status/1843290729260703801" target="_blank" rel="noreferrer noopener">has collected dozens of competing definitions</a>), but whatever the term comes to mean to most people, the reality is something we all have to come to grips with. We now have LLMs with specialized system prompts, using tools, chained together in pipelines or running in parallel, running in loops, and modifying their environments. That seems like a pretty good starting point for a working definition.
In the September 9 AI Codecon, we’ll be concentrating on four critical frontiers of agentic development:
<ul class="wp-block-list">
<li>Agentic interfaces: Moving beyond chat UX to sophisticated agent interactions. New paradigms don’t just require new infrastructure; they also enable new interfaces. We’re looking to highlight innovations in AI interfaces, especially as agentic AI applications extend far beyond simple chat.</li>
<li>Tool-to-tool workflows: How agents chain across environments to complete complex tasks. As an old Unix/Linux head, I love the idea of pipelines (and now networks) of small cooperating programs. We are now reinventing that kind of network-enabled approach to applications for AI.</li>
<li>Background coding agents: Asynchronous, autonomous code generation in production. When AI tasks start running in the background, expect either magic or mayhem. We’d prefer the former, and want to show the cutting edge of how to build safer, more reliable agents.</li>
<li>MCP and agent protocols: The infrastructure enabling the agentic web. While the first generation of AI applications have been centralized monoliths, we’re convinced that the agentic future is one of cooperating AIs, interoperating not only with applications designed for humans but also with AI-native endpoints designed to be consumed by AI agents. MCP is a great start, but it’s far from the end of protocols for agent-to-agent communication. (See my post “<a href="https://asimovaddendum.substack.com/p/disclosures-i-do-not-think-that-word" target="_blank" rel="noreferrer noopener">Disclosures. I Do Not Think That Word Means What You Think It Means.</a>” for an account of how communication protocols enable participatory markets. I’m super excited about the way that AI is creating new opportunities for developers and entrepreneurs that are not capital-intensive, winner-takes-all races like the initial race for chatbot supremacy. Those opportunities come from the network protocols that enable cooperating AIs.)</li>
</ul>
The primary conference track will be arranged much like the May event: a curated collection of fireside chats with senior technical executives, brilliant engineers, and entrepreneurs; practical talks on the new tools, workflows, and hacks that are shaping the emerging discipline of agentic AI; demos of how experienced developers are using the new tools to supercharge their productivity, their innovative applications, and user interfaces; and lightning talks that come in from our call for proposals (see below). We’ll also have a suite of in-depth tutorials on separate days so that you can go deeper if you want. You can sign up <a href="https://www.oreilly.com/test/AgenticWorld/index.csp" target="_blank" rel="noreferrer noopener">here</a>. The mainstage event is free. Tutorials are available to <a href="http://oreilly.com" target="_blank" rel="noreferrer noopener">O’Reilly subscribers</a> and can also be <a href="https://www.oreilly.com/live/" target="_blank" rel="noreferrer noopener">purchased à la carte</a>—trial memberships will also get you in the door. <img src="https://s.w.org/images/core/emoji/15.1.0/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> The separate demo showcase will be sponsored (and thus free).
<h3 class="wp-block-heading">Call for proposals</h3>
Do you have a story to share about how you are using agents to build innovative and effective AI-powered experiences? <a href="https://www.oreilly.com/AgenticWorld/cfp.html" target="_blank" rel="noreferrer noopener">We want to hear it</a>. The AI Codecon program will be a mix of invited talks and your proposals, so we’re asking you to submit your idea for a quick five-minute lightning talk about cutting-edge work. We aren’t looking for high-level discussions; we want war stories, demos of products and workflows, innovative applications, and accounts of new tools that have changed how you work. You (collectively) are inventing the future at a furious rate. We’d love to hear about work that will make people say “wow!” and rush to learn more. Your goal should be to make the emerging future happen faster by sharing what you’ve learned.
After reading your proposal, we may ask you to present it as proposed, modify it, expand it into a longer talk, join a discussion panel, or appear in our associated demo day program.
Topics we’re interested in include:
<ul class="wp-block-list">
<li>UI/UX—How are you or your company exploring agentic interfaces and moving beyond chatbot UX?</li>
<li>How you’re using agents today—Are you handing off entire tasks to agents? What tasks are you handing off, and how are you integrating the agents’ work into the SDLC?</li>
<li>Your tool-to-tool workflows—How are you chaining agents across environments and services to complete tasks end-to-end?</li>
<li>Background coding agents—What’s emerging from more asynchronous and autonomous code generation, and where is this headed?</li>
<li>MCP and the future of the web—How are agentic protocols unlocking innovative workflows and experiences?</li>
<li>Surprise us. With the market moving this fast, you may be doing something amazing that doesn’t fit the program as we’re envisioning it today but that the AI coding community needs to hear about. Don’t be afraid to color outside the lines.</li>
</ul>
We’re also still interested in hearing about topics we explored in our previous call for proposals:
<ul class="wp-block-list">
<li>What has changed about how you code, what you work on, and the tools you use?</li>
<li>Are we working toward a new development stack? How are your architectures and patterns changing as you move toward AI-native applications?</li>
<li>How is AI changing the makeup and workload of your dev teams? </li>
<li>What have you done to maintain quality standards with AI-generated code?</li>
<li>What types of tasks are you taking on that were previously too time-consuming to accomplish?</li>
<li>What problems have you encountered that you wish others had told you about when you were starting out on your journey?</li>
<li>What kinds of fun projects are you taking on in your free time?</li>
</ul>
So <a href="https://www.oreilly.com/AgenticWorld/cfp.html" target="_blank" rel="noreferrer noopener">submit your proposal for a talk</a> by July 18. And if you have a product you’d like to demo as part of our sponsored demo showcase, please let us know at <a href="mailto:AI-Engineering@oreilly.com" target="_blank" rel="noreferrer noopener">AI-Engineering@oreilly.com</a>.
Thanks!
]]></content:encoded>
</item>
<item>
<title>Designing Collaborative Multi-Agent Systems with the A2A Protocol</title>
<link>https://www.oreilly.com/radar/designing-collaborative-multi-agent-systems-with-the-a2a-protocol/</link>
<pubDate>Wed, 18 Jun 2025 09:52:07 +0000</pubDate>
<dc:creator><![CDATA[Heiko Hotz and Sokratis Kartakis]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Artificial Intelligence]]></category>
<category><![CDATA[Deep Dive]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16851</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2019/06/housing-2789569_1920_crop-2f8244426b912fdeeb26018d559f7100-1.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[It feels like every other AI announcement lately mentions “agents.” And already, the AI community has 2025 pegged as “the year of AI agents,” sometimes without much more detail than “They’ll be amazing!” Often forgotten in this hype are the fundamentals. Everybody is dreaming of armies of agents, booking hotels and flights, researching complex topics, […]]]></description>
<content:encoded><![CDATA[
It feels like every other AI announcement lately mentions “agents.” And already, the AI community has 2025 pegged as “the year of AI agents,” sometimes without much more detail than “They’ll be amazing!” Often forgotten in this hype are the fundamentals. Everybody is dreaming of armies of agents, booking hotels and flights, researching complex topics, and writing PhD theses for us. And yet we see little substance that addresses a critical engineering challenge of these ambitious systems: How do these independent agents, built by different teams using different tech, often with completely opaque inner workings, actually collaborate?
But enterprises aren’t often fooled by these hype cycles and promises. Instead, they tend to cut through the noise and ask the hard questions: If every company spins up its own clever agent for accounting, another for logistics, a third for customer service, and you have your own personal assistant agent trying to wrangle them all—how do they coordinate? How does the accounting agent securely pass info to the logistics agent without a human manually copying data between dashboards? How does your assistant delegate booking a flight without needing to know the specific, proprietary, and likely undocumented inner workings of one particular travel agent?
Right now, the answer is often “they don’t” or “with a whole lot of custom, brittle, painful integration code.” It’s becoming a digital Tower of Babel: Agents get stuck in their own silos, unable to talk to each other. And without that collaboration, they can’t deliver on their promise of tackling complex, real-world tasks together.
The <a href="https://google.github.io/A2A/" target="_blank" rel="noreferrer noopener">Agent2Agent (A2A) Protocol</a> attempts to address these pressing questions. Its goal is to provide that missing common language, a set of rules for how different agents and AI systems can interact without needing to lay open their internal secrets or get caught in custom-built, one-off integrations.
<figure class="wp-block-image"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXejXCJLaghN3NoniL1u1abo-FDK-jzgFZpvo3TT8zF3kYtOeQ_path7D56J2TfSnu9CRlpqth7rCORmPfhl7ZpWw1ko_MwBP3hXVQjKI1Gnek9PROvkwIGmIzrr6YiUGSNUIy6Nlw?key=SASf_ayzfENgmrBq5LRQxg" alt=""/><figcaption class="wp-element-caption"><a href="https://commons.wikimedia.org/wiki/File:Hendrick_van_Cleve_III_(Attr.)_-_The_Tower_of_Babel.jpeg" target="_blank" rel="noreferrer noopener">Hendrick van Cleve III (Attr.) – The Tower of Babel</a> (public domain) </figcaption></figure>
In this article, we’ll dive into the details of A2A. We’ll look at:
<ul class="wp-block-list">
<li>The core ideas behind it: What underlying principles is it built on?</li>
<li>How it actually works: What are the key mechanisms?</li>
<li>Where it fits in the broader landscape, in particular, how it compares to and potentially complements the Model Context Protocol (MCP), which tackles the related (but different) problem of agents using tools.</li>
<li>What we think comes next in the area of multi-agent system design.</li>
</ul>
<h2 class="wp-block-heading">A2A Protocol Overview</h2>
At its core, the A2A protocol is an effort to establish a way for AI agents to communicate and collaborate. Its aim is to provide a standard framework allowing agents to:
<ul class="wp-block-list">
<li>Discover capabilities: Identify other available agents and understand their functions.</li>
<li>Negotiate interaction: Determine the appropriate modality for exchanging information for a specific task—simple text, structured forms, perhaps even bidirectional multimedia streams.</li>
<li>Collaborate securely: Execute tasks cooperatively, passing instructions and data reliably and safely.</li>
</ul>
But just listing goals like “discovery” and “collaboration” on paper is easy. We’ve seen plenty of ambitious tech standards stumble because they didn’t grapple with the messy realities early on (<a href="https://en.wikipedia.org/wiki/OSI_model">OSI network model</a>, anyone?). When we’re trying to get countless different systems, built by different teams, to actually cooperate without creating chaos, we need more than a wishlist. We need some firm guiding principles baked in from the start. These reflect the hard-won lessons about what it takes to make complex systems actually work: How do we handle and make trade-offs when it comes to security, robustness, and practical usage?
With that in mind, A2A was built with these tenets:
<ul class="wp-block-list">
<li>Simple: Instead of reinventing the wheel, A2A leverages well-established and widely understood existing standards. This lowers the barrier to adoption and integration, allowing developers to build upon familiar technologies.</li>
<li>Enterprise ready: A2A includes robust mechanisms for authentication (verifying agent identities), security (protecting data in transit and at rest), privacy (ensuring sensitive information is handled appropriately), tracing (logging interactions for auditability), and monitoring (observing the health and performance of agent communications).</li>
<li>Async first: A2A is designed with asynchronous communication as a primary consideration, allowing tasks to proceed over extended periods and seamlessly integrate human-in-the-loop workflows.</li>
<li>Modality agnostic: A2A supports interactions across various modalities, including text, bidirectional audio/video streams, interactive forms, and even embedded iframes for richer user experiences. This flexibility allows agents to communicate and present information in the most appropriate format for the task and user.</li>
<li>Opaque execution: This is a cornerstone of A2A. Each agent participating in a collaboration remains invisible to the others. They don’t need to reveal their internal reasoning processes, their knowledge representation, memory, or the specific tools they might be using. Collaboration occurs through well-defined interfaces and message exchanges, preserving the autonomy and intellectual property of each agent. Note that, while agents operate this way by default (without revealing their specific implementation, tools, or way of thinking), an individual remote agent can choose to selectively reveal aspects of its state or reasoning process via messages, especially for UX purposes, such as providing user notifications to the caller agent. As long as the decision to reveal information is the responsibility of the remote agent, the interaction maintains its opaque nature.</li>
</ul>
Taken together, these tenets paint a picture of a protocol trying to be practical, secure, flexible, and respectful of the independent nature of agents. But principles on paper are one thing; how does A2A actually implement these ideas? To see that, we need to shift from the design philosophy to the nuts and bolts—the specific mechanisms and components that make agent-to-agent communication work.
<h2 class="wp-block-heading">Key Mechanisms and Components of A2A</h2>
Translating these principles into practice requires specific mechanisms. Central to enabling agents to understand each other within the A2A framework is the Agent Card. This component functions as a standardized digital business card for an AI agent, typically provided as a metadata file. Its primary purpose is to publicly declare what an agent is, what it can do, where it can be reached, and how to interact with it.
Here’s a simplified example of what an Agent Card might look like, conveying the essential information:
<pre class="wp-block-code" style="font-style:normal;font-weight:300;text-transform:none"><code>{
"name": "StockInfoAgent",
"description": "Provides current stock price information.",
"url": "http://stock-info.example.com/a2a",
"provider": { "organization": "ABCorp" },
"version": "1.0.0",
"skills": [
{
"id": "get_stock_price_skill",
"name": "Get Stock Price",
"description": "Retrieves current stock price for a company"
}
]
}
(shortened for brevity)</code></pre>
The Agent Card serves as the key connector between the different actors in the A2A protocol. A client—which could be another agent or perhaps the application the user is interacting with—finds the Agent Card for the service it needs. It uses the details from the card, like the URL, to contact the remote agent (server), which then performs the requested task without exposing its internal methods and sends back the results according to the A2A rules.
Once agents are able to read each other’s capabilities, A2A structures their collaboration around completing specific tasks. A task represents the fundamental unit of work requested by a client from a remote agent. Importantly, each task is stateful, allowing it to track progress over time, which is essential for handling operations that might not be instantaneous—aligning with A2A’s “async first” principle.
Communication related to a task primarily uses messages. These carry the ongoing dialogue, including initial instructions from the client, status updates, requests for clarification, or even intermediate “thoughts” from the agent. When the task is complete, the final tangible outputs are delivered as artifacts, which are immutable results like files or structured data. Both messages and artifacts are composed of one or more parts, the granular pieces of content, each with a defined type (like text or an image).
This entire exchange relies on standard web technologies like HTTP and common data formats, ensuring a broad foundation for implementation and compatibility. By defining these core objects—task, message, artifact, and part—A2A provides a structured way for agents to manage requests, exchange information, and deliver results, whether the work takes seconds or hours.
Security is, of course, a critical concern for any protocol aiming for enterprise adoption, and A2A addresses this directly. Rather than inventing entirely new security mechanisms, it leans heavily on established practices. A2A aligns with standards like the <a href="https://swagger.io/docs/specification/v3_0/authentication/" target="_blank" rel="noreferrer noopener">OpenAPI specification</a> for defining authentication methods and generally encourages treating agents like other secure enterprise applications. This allows the protocol to integrate into existing corporate security frameworks, such as established identity and access management (IAM) systems for authenticating agents, applying existing network security rules and firewall policies to A2A endpoints, or potentially feeding A2A interaction logs into centralized security information and event management (SIEM) platforms for monitoring and auditing.
A core principle is keeping sensitive credentials, such as API keys or access tokens, separate from the main A2A message content. Clients are expected to obtain these credentials through an independent process. Once obtained, they are transmitted securely using standard HTTP headers, a common practice in web APIs. Remote agents, in turn, clearly state their authentication requirements—often within their Agent Cards—and use standard HTTP response codes to manage access attempts, signaling success or failure in a predictable way. This reliance on familiar web security patterns lowers the barrier to implementing secure agent interactions.
A2A also facilitates the creation of a distributed “interaction memory” across a multi-agent system by providing a standardized protocol for agents to exchange and reference task-specific information, including unique identifiers (taskId, sessionId), status updates, message histories, and artifacts. While A2A itself doesn’t store this memory, it enables each participating A2A client and server agent to maintain its portion of the overall task context. Collectively, these individual agent memories, linked and synchronized through A2A’s structured communication, form the comprehensive interaction memory of the entire multi-agent system, allowing for coherent and stateful collaboration on complex tasks.
So, in a nutshell, A2A is an attempt to bring rules and standardization to the rapidly evolving world of agents by defining how independent systems can discover each other, collaborate on tasks (even long-running ones), and handle security using well-trodden web paths, all while keeping their inner workings private. It’s focused squarely on agent-to-agent communication, trying to solve the problem of isolated digital workers unable to coordinate.
But getting agents to talk to each other is only one piece of the interoperability puzzle facing AI developers today. There’s another standard gaining significant traction that tackles a related yet distinct challenge: How do these sophisticated AI applications interact with the outside world—the databases, APIs, files, and specialized functions often referred to as “tools”? This brings us to Anthropic’s Model Context Protocol, or MCP.
<h2 class="wp-block-heading">MCP: Model Context Protocol Overview</h2>
It wasn’t so long ago, really, that large language models (LLMs), while impressive text generators, were often mocked for their sometimes hilarious blind spots. Asked to do simple arithmetic, count the letters in a word accurately, or tell you the current weather, and the results could be confidently delivered yet completely wrong. This wasn’t just a quirk; it highlighted a fundamental limitation: The models operated purely on the patterns learned from their static training data, disconnected from live information sources or the ability to execute reliable procedures. But these days are mostly over (or so it seems)—state-of-the-art AI models are vastly more effective than their predecessors from just a year or two ago.
A key reason for the effectiveness of AI systems (agents or not) is their ability to connect beyond their training data: interacting with databases and APIs, accessing local files, and employing specialized external tools. Similarly to interagent communication, however, there are some hard challenges that need to be tackled first.
Integrating these AI systems with external “tools” involves collaboration between AI developers, agent architects, tool providers, and others. A significant hurdle is that tool integration methods are often tied to specific LLM providers (like OpenAI, Anthropic, or Google), and these providers handle tool usage differently. Defining a tool for one system requires a specific format; using that same tool with another system often demands a different structure.
Consider the following examples.
OpenAI’s API expects a function definition structured this way:
<pre class="wp-block-code"><code>{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieves weather data ...",
"parameters": {...}
}
}</code></pre>
Whereas Anthropic’s API uses a different layout:
<pre class="wp-block-code"><code>{
"name": "get_weather",
"description": "Retrieves weather data ...",
"input_schema": {...}
}</code></pre>
This incompatibility means tool providers must develop and maintain separate integrations for each AI model provider they want to support. If an agent built with Anthropic models needs certain tools, those tools must follow Anthropic’s format. If another developer wants to use the same tools with a different model provider, they essentially duplicate the integration effort, adapting definitions and logic for the new provider.
<figure class="wp-block-image"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfwoQ37zSdvpBiqFElEtBuWRig8a9d_15QhQhVRYnKKctDHoIzGRG5RizmL4jUN9R5JD_xua6xt51246wtnml8rwTz3RenVdYqmTCe8C6i9mtUw0YldeNqyZwW4SrmI4T3uAG4sUQ?key=SASf_ayzfENgmrBq5LRQxg" alt=""/></figure>
Format differences aren’t the only challenge; language barriers also create integration difficulties. For example, getting a Python-based agent to directly use a tool built around a Java library requires considerable development effort.
This integration challenge is precisely what the Model Context Protocol was designed to solve. It offers a standard way for different AI applications and external tools to interact.
Similar to A2A, MCP operates using two key parts, starting with the MCP server. This component is responsible for exposing the tool’s functionality. It contains the underlying logic—maybe Python code hitting a weather API or routines for data access—developed in a suitable language. Servers commonly bundle related capabilities, like file operations or database access tools. The second component is the MCP client. This piece sits inside the AI application (the chatbot, agent, or coding assistant). It finds and connects to MCP servers that are available. When the AI app or model needs something from the outside world, the client talks to the right server using the MCP standard.
The key is that communication between client and server adheres to the MCP standard. This adherence ensures that any MCP-compatible client can interact with any MCP server, no matter the client’s underlying AI model or the language used to build the server.
<figure class="wp-block-image"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdwKp0UV4CusKJQ3w5GXN3u_WAg0bcrdDCu1Ljrg47P3WN5JCCRM2rM0xxoBBtuKJj6ghKsQ-QS3iNbtTjDhiDYqm_qTjjltmQdKHU9-ucW6K9z70zkYZCJT-x1Sa5IktVp1RAX?key=SASf_ayzfENgmrBq5LRQxg" alt=""/></figure>
Adopting this standard offers several advantages:
<ul class="wp-block-list">
<li>Build once, use anywhere: Create a capability as an MCP server once; any MCP-supporting application can use it.</li>
<li>Language flexibility: Develop servers in the language best suited for the task.</li>
<li>Leverage ecosystem: Use existing open source MCP servers instead of building every integration from scratch.</li>
<li>Enhance AI capabilities: Easily give agents, chatbots, and assistants access to diverse real-world tools.</li>
</ul>
Adoption of MCP is accelerating, demonstrated by providers such as GitHub and Slack, which now offer servers implementing the protocol.
<h2 class="wp-block-heading">MCP and A2A</h2>
But how do the Model Context Protocol and the Agent2Agent (A2A) Protocol relate? Do they solve the same problem or serve different functions? The lines can blur, especially since many agent frameworks allow treating one agent as a tool for another (agent as a tool).
Both protocols improve interoperability within AI systems, but they operate at different levels. By examining their differences in implementation and goals we can clearly identify key differentiators.
MCP focuses on standardizing the link between an AI application (or agent) and specific, well-defined external tools or capabilities. MCP uses precise, structured schemas (like JSON Schema) to define tools, establishing a clear API-like contract for predictable and efficient execution. For example, an agent needing the weather would use MCP to call a <code>get_weather</code> tool on an MCP weather server, specifying the location “London.” The required input and output are strictly defined by the server’s MCP schema. This approach removes ambiguity and solves the problem of incompatible tool definitions across LLM providers for that specific function call. MCP usually involves synchronous calls, supporting reliable and repeatable execution of functions (unless, of course, the weather in London has changed in the meantime, which is entirely plausible).
A2A, on the other hand, standardizes how autonomous agents communicate and collaborate. It excels at managing complex, multistep tasks involving coordination, discussion, and delegation. Rather than depending on rigid function schemas, A2A interactions utilize natural language, making the protocol better suited for ambiguous goals or tasks requiring interpretation. A good example would be “Summarize market trends for sustainable packaging.” Asynchronous communication is a key tenet of A2A, which also includes mechanisms to oversee the lifecycle of potentially lengthy tasks. This involves tracking status (like working, completed, and input required) and managing the necessary dialogue between agents. Consider a vacation planner agent using A2A to delegate <code>book_flights</code> and <code>reserve_hotel</code> tasks to specialized travel agents while monitoring their status. In essence, A2A’s focus is the orchestration of workflows and collaboration between agents.
This distinction highlights why MCP and A2A function as complementary technologies, not competitors. To borrow an analogy: MCP is like standardizing the wrench a mechanic uses—defining precisely how the tool engages with the bolt. A2A is like establishing a protocol for how that mechanic communicates with a specialist mechanic across the workshop (“Hearing a rattle from the front left, can you diagnose?”), initiating a dialogue and collaborative process.
In sophisticated AI systems, we can easily imagine them working together: A2A might orchestrate the overall workflow, managing delegation and communication between different agents, while those individual agents might use MCP under the hood to interact with specific databases, APIs, or other discrete tools needed to complete their part of the larger task.
<h2 class="wp-block-heading">Putting It All Together</h2>
We’ve discussed A2A for agent collaboration and MCP for tool interaction as separate concepts. But their real potential might lie in how they work together. Let’s walk through a simple, practical scenario to see how these two protocols could function in concert within a multi-agent system.
Imagine a user asks their primary interface agent—let’s call it the Host Agent—a straightforward question: “What’s Google’s stock price right now?”
The Host Agent, designed for user interaction and orchestrating tasks, doesn’t necessarily know how to fetch stock prices itself. However, it knows (perhaps by consulting an agent registry via an Agent Card) about a specialized Stock Info Agent that handles financial data. Using A2A, the Host Agent delegates the task: It sends an A2A message to the Stock Info Agent, essentially saying, “Request: Current stock price for GOOGL.”
The Stock Info Agent receives this A2A task. Now, this agent knows the specific procedure to get the data. It doesn’t need to discuss it further with the Host Agent; its job is to retrieve the price. To do this, it turns to its own toolset, specifically an MCP stock price server. Using MCP, the Stock Info Agent makes a precise, structured call to the server—effectively <code>get_stock_price(symbol: "GOOGL")</code>. This isn’t a collaborative dialogue like the A2A exchange; it’s a direct function call using the standardized MCP format.
The MCP server does its job: looks up the price and returns a structured response, maybe <code>{"price": "174.92 USD"}</code>, back to the Stock Info Agent via MCP.
With the data in hand, the Stock Info Agent completes its A2A task. It sends a final A2A message back to the Host Agent, reporting the result: <code>"Result: Google stock is 174.92 USD."</code>
Finally, the Host Agent takes this information received via A2A and presents it to the user.
<figure class="wp-block-image"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeIsgPTuRAKZIKhzQKaA9oad9jvobfYA-NrDp65RnDQ28SfAqlIRf6ZifYfidUMaD1B-nyDolohMnHHs_0W6ZkpyIWPhxa22R_b2c9LysA9TEFK9DecZ3whHZEXcvuXFSaMGvHhtw?key=SASf_ayzfENgmrBq5LRQxg" alt=""/></figure>
Even in this simple example, the complementary roles become clear. A2A handles the higher-level coordination and delegation between autonomous agents (Host delegates to Stock Info). MCP handles the standardized, lower-level interaction between an agent and a specific tool (Stock Info uses the price server). This creates a separation of concerns: The Host agent doesn’t need to know about MCP or stock APIs, and the Stock Info agent doesn’t need to handle complex user interaction—it just fulfills A2A tasks, using MCP tools where necessary. Both agents remain largely opaque to each other, interacting only through the defined protocols. This modularity, enabled by using both A2A for collaboration and MCP for tool use, is key to building more complex, capable, and maintainable AI systems.
<h2 class="wp-block-heading">Conclusion and Future Work</h2>
We’ve outlined the challenges of making AI agents collaborate, explored Google’s A2A protocol as a potential standard for interagent communication, and compared and contrasted it with Anthropic’s Model Context Protocol. Standardizing tool use and agent interoperability are important steps forward in enabling effective and efficient multi-agent system (MAS) design.
But the story is far from over, and agent discoverability is one of the immediate next challenges that need to be tackled. When talking to enterprises it becomes glaringly obvious that this is often very high on their priority list. Because, while A2A defines how agents communicate once connected, the question of how they find each other in the first place remains a significant area for development. Simple approaches can be implemented—like publishing an Agent Card at a standard web address and capturing that address in a directory—but that feels insufficient for building a truly dynamic and scalable ecosystem. This is where we see the concept of curated agent registries come into focus, and it’s perhaps one of the most exciting areas of future work for MAS.
We imagine an internal “agent store” (akin to an app store) or professional listing for an organization’s AI agents. Developers could register their agents, complete with versioned skills and capabilities detailed in their Agent Cards. Clients needing a specific function could then query this registry, searching not just by name but by required skills, trust levels, or other vital attributes. Such a registry wouldn’t just simplify discovery; it would foster specialization, enable better governance, and make the whole system more transparent and manageable. It moves us from simply finding an agent to finding the right agent for the job based on its declared skills.
However, even sophisticated registries can only help us find agents based on those declared capabilities. Another fascinating, and perhaps more fundamental, challenge for the future: dealing with emergent capabilities. One of the remarkable aspects of modern agents is their ability to combine diverse tools in novel ways to tackle unforeseen problems. An agent equipped with various mapping, traffic, and event data tools, for instance, might have “route planning” listed on its Agent Card. But by creatively combining those tools, it might also be capable of generating complex disaster evacuation routes or highly personalized multistop itineraries—crucial capabilities likely unlisted simply because they weren’t explicitly predefined. How do we reconcile the need for predictable, discoverable skills with the powerful, adaptive problem-solving that makes agents so promising? Finding ways for agents to signal or for clients to discover these unlisted possibilities without sacrificing structure is a significant open question for the A2A community and the broader field (as highlighted in discussions like <a href="https://github.com/google/A2A/issues/109" target="_blank" rel="noreferrer noopener">this one</a>).
Addressing this challenge adds another layer of complexity when envisioning future MAS architectures. Looking down the road, especially within large organizations, we might see the registry idea evolve into something akin to the “data mesh” concept—multiple, potentially federated registries serving specific domains. This could lead to an “agent mesh”: a resilient, adaptable landscape where agents collaborate effectively under a unified centralized governance layer and distributed management capabilities (e.g., introducing notions of a data/agent steward who manages the quality, accuracy, and compliance of a business unit data/agents). But ensuring this mesh can leverage both declared and emergent capabilities will be key. Exploring that fully, however, is likely a topic for another day.
Ultimately, protocols like A2A and MCP are vital building blocks, but they’re not the entire map. To build multi-agent systems that are genuinely collaborative and robust, we need more than just standard communication rules. It means stepping back and thinking hard about the overall architecture, wrestling with practical headaches like security and discovery (both the explicit kind and the implicit, emergent sort), and acknowledging that these standards themselves will have to adapt as we learn. The journey from today’s often-siloed agents to truly cooperative ecosystems is ongoing, but initiatives like A2A offer valuable markers along the way. It’s undoubtedly a tough engineering road ahead. Yet, the prospect of AI systems that can truly work together and tackle complex problems in flexible ways? That’s a destination worth the effort.
]]></content:encoded>
</item>
<item>
<title>MCP: What It Is and Why It Matters—Part 4</title>
<link>https://www.oreilly.com/radar/mcp-what-it-is-and-why-it-matters-part-4/</link>
<pubDate>Mon, 16 Jun 2025 10:10:43 +0000</pubDate>
<dc:creator><![CDATA[Addy Osmani]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Research]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16860</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2020/03/na-synapse-15a-1400x950-1.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[This is the last of four parts in this series. Part 1 can be found here, Part 2 here, and Part 3 here. 9. Future Directions and Wishlist for MCP The trajectory of MCP and AI tool integration is exciting, and there are clear areas where the community and companies are pushing things forward. Here […]]]></description>
<content:encoded><![CDATA[
This is the last of four parts in this series. Part 1 can be found <a href="https://www.oreilly.com/radar/mcp-what-it-is-and-why-it-matters-part-1/" target="_blank" rel="noreferrer noopener">here</a>, Part 2 <a href="https://www.oreilly.com/radar/mcp-what-it-is-and-why-it-matters-part-2/" target="_blank" rel="noreferrer noopener">here</a>, and Part 3 <a href="https://www.oreilly.com/radar/mcp-what-it-is-and-why-it-matters-part-3/" target="_blank" rel="noreferrer noopener">here</a>.
<h2 class="wp-block-heading">9. Future Directions and Wishlist for MCP</h2>
The trajectory of MCP and AI tool integration is exciting, and there are clear areas where the community and companies are pushing things forward. Here are some future directions and “wishlist” items that could shape the next wave of MCP development:
Formalized security and authentication: As noted, one of the top needs is standard security mechanisms in the MCP spec. We can expect efforts to define an authentication layer—perhaps an OAuth-like flow or API key standard for MCP servers so that clients can securely connect to remote servers without custom config for each. This might involve servers advertising their auth method (e.g., “I require a token”) and clients handling token exchange. Additionally, a permission model could be introduced. For example, an AI client might pass along a scope of allowed actions for a session, or MCP servers might support user roles. While not trivial, “<a href="https://medium.com/@vrknetha/the-mcp-first-revolution-why-your-next-application-should-start-with-a-model-context-protocol-9b3d1e973e42" target="_blank" rel="noreferrer noopener">standards for MCP security and authentication</a>” are anticipated as MCP moves into more enterprise and multiuser domains. In practice, this could also mean better sandboxing—maybe running certain MCP actions in isolated environments. (Imagine a Dockerized MCP server for dangerous tasks.)
MCP gateway/orchestration layer: Right now, if an AI needs to use five tools, it opens five connections to different servers. A future improvement could be an MCP gateway—a unified endpoint that aggregates multiple MCP services. Think of it like a proxy that exposes many tools under one roof, possibly handling routing and even high-level decision-making about which tool to use. Such a gateway could manage multitenancy (so one service can serve many users and tools while keeping data separate) and enforce policies (like rate limits, logging all AI actions for audit, etc.). For users, it simplifies configuration—point the AI to one place and it has all your integrated tools.
A gateway could also handle tool selection: As the number of available MCP servers grows, an AI might have access to overlapping tools (maybe two different database connectors). A smart orchestration layer could help choose the right one or combine results. We might also see a registry or discovery service, where an AI agent can query “What MCP services are available enterprise-wide?” without preconfiguration, akin to how microservices can register themselves. This ties into enterprise deployment: Companies might host an internal catalog of MCP endpoints (for internal APIs, data sources, etc.), and AI systems could discover and use them dynamically.
Optimized and fine-tuned AI agents: On the AI model side, we’ll likely see models that are fine-tuned for tool use and MCP specifically. Anthropic already mentioned future “<a href="https://medium.com/@vrknetha/the-mcp-first-revolution-why-your-next-application-should-start-with-a-model-context-protocol-9b3d1e973e42" target="_blank" rel="noreferrer noopener">AI models optimized for MCP interaction</a>.” This could mean the model understands the protocol deeply, knows how to format requests exactly, and perhaps has been trained on logs of successful MCP-based operations. A specialized “agentic” model might also incorporate better reasoning to decide when to use a tool versus answer from memory, etc. We may also see improvements in how models handle long sessions with tools—maintaining a working memory of what tools have done (so they don’t repeat queries unnecessarily). All this would make MCP-driven agents more efficient and reliable.
Expansion of built-in MCP in applications: Right now, most MCP servers are community add-ons. But imagine if popular software started shipping with MCP support out of the box. The future could hold applications with native MCP servers. The vision of “<a href="https://www.knacklabs.ai/blogs/the-mcp-first-revolution-why-your-next-application-should-start-with-a-model-context-protocol-server" target="_blank" rel="noreferrer noopener">more applications shipping with built-in MCP servers</a>” is likely. In practice, this might mean, for example, Figma or VS Code includes an MCP endpoint you can enable in settings. Or an enterprise software vendor like Salesforce provides an MCP interface as part of its API suite. This would tremendously accelerate adoption because users wouldn’t have to rely on third-party plug-ins (which may lag behind software updates). It also puts a bit of an onus on app developers to define how AI should interact with their app, possibly leading to standardized schemas for common app types.
Enhanced agent reasoning and multitool strategies: Future AI agents might get better at multistep, multitool problem-solving. They could learn strategies like using one tool to gather information, reasoning, then using another to act. This is related to model improvements but also to building higher-level planning modules on top of the raw model. Projects like AutoGPT attempt this, but integrating tightly with MCP might yield an “auto-agent” that can configure and execute complex workflows. We might also see collaborative agents (multiple AI agents with different MCP specializations working together). For example, one AI might specialize in database queries and another in writing reports; via MCP and a coordinator, they could jointly handle a “Generate a quarterly report” task.
User interface and experience innovations: On the user side, as these AI agents become more capable, the interfaces might evolve. Instead of a simple chat window, you might have an AI “dashboard” showing which tools are in use, with toggles to enable/disable them. Users might be able to drag-and-drop connections (“attach” an MCP server to their agent like plugging in a device). Also, feedback mechanisms could be enhanced—e.g., if the AI does something via MCP, the UI could show a confirmation (like “AI created a file report.xlsx using Excel MCP”). This builds trust and also lets users correct course if needed. Some envision a future where interacting with an AI agent becomes like managing an employee: You give it access (MCP keys) to certain resources, review its outputs, and gradually increase responsibility.
The overarching theme of future directions is making MCP more seamless, secure, and powerful. We’re at the stage akin to early internet protocols—the basics are working, and now it’s about refinement and scale.
<h2 class="wp-block-heading">10. Final Thoughts: Unlocking a New Wave of Composable, Intelligent Workflows</h2>
MCP may still be in its infancy, but it’s poised to be a foundational technology in how we build and use software in the age of AI. By standardizing the interface between AI agents and applications, MCP is doing for AI what APIs did for web services—making integration composable, reusable, and scalable. This has profound implications for developers and businesses.
We could soon live in a world where AI assistants are not confined to answering questions but are true coworkers. They’ll use tools on our behalf, coordinate complex tasks, and adapt to new tools as easily as a new hire might—or perhaps even more easily. Workflows that once required gluing together scripts or clicking through dozens of UIs might be accomplished by a simple conversation with an AI that “knows the ropes.” And the beauty is, thanks to MCP, the ropes are standardized—the AI doesn’t have to learn each one from scratch for every app.
For software engineers, adopting MCP in tooling offers a strategic advantage. It means your product can plug into the emergent ecosystem of AI agents. Users might prefer tools that work with their AI assistants out of the box.
The bigger picture is composability. We’ve seen composable services in cloud (microservices) and composable UI components in frontend—now we’re looking at composable intelligence. You can mix and match AI capabilities with tool capabilities to assemble solutions to problems on the fly. It recalls Unix philosophy (“do one thing well”) but applied to AI and tools, where an agent pipes data from one MCP service to another, orchestrating a solution. This unlocks creativity: Developers and even end users can dream up workflows without waiting for someone to formally integrate those products. Want your design tool to talk to your code editor? If both have MCP, you can bridge them with a bit of agent prompting. In effect, users become integrators, instructing their AI to weave together solutions ad hoc. That’s a powerful shift.
Of course, to fully unlock this, we’ll need to address the challenges discussed—mainly around trust and robustness—but those feel surmountable with active development and community vigilance. The fact that major players like Anthropic are driving this as open source, and that companies like Zapier are onboard, gives confidence that MCP (or something very much like it) will persist and grow. It’s telling that even in its early phase, we have success stories like Blender MCP going viral and real productivity gains (e.g., “5x faster UI implementation” with Figma MCP). These provide a glimpse of what a mature MCP ecosystem could do across all domains.
For engineers reading this deep dive, the takeaway is clear: MCP matters. It’s worth understanding and perhaps experimenting with in your context. Whether it’s integrating an AI into your development workflow via existing MCP servers, or building one for your project, the investment could pay off by automating grunt work and enabling new features. As with any standard, there’s a network effect—early contributors help steer it and also benefit from being ahead of the curve as adoption grows.
In final reflection, MCP represents a paradigm shift where AI is treated as a first-class user and operator of software. We’re moving toward a future where using a computer could mean telling an AI what outcome you want, and it figures out which apps to open and what buttons to press—a true personal developer/assistant. It’s a bit like having a superpower, or at least a very competent team working for you. And like any revolution in computing interfaces (GUI, touch, voice, etc.), once you experience it, going back to the old way feels limiting. MCP is a key enabler of that revolution for developers.
But the direction is set: AI agents that can fluidly and safely interact with the wide world of software. If successful, MCP will have unlocked a new wave of composable, intelligent workflows that boost productivity and even how we think about problem-solving. In a very real sense, it could help “<a href="https://www.anthropic.com/news/model-context-protocol#:~:text=%22At%20Block%2C%20open,on%20the%20creative.%E2%80%9D" target="_blank" rel="noreferrer noopener">remove the burden of the mechanical so people can focus on the creative</a>” as Block’s CTO put it.
And that is why MCP matters.
It’s building the bridge to a future where humans and AI collaborate through software in ways we are only beginning to imagine, but which soon might become the new normal in software engineering and beyond.
]]></content:encoded>
</item>
<item>
<title>Generative AI in the Real World: Douwe Kiela on Why RAG Isn’t Dead</title>
<link>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-douwe-kiela-on-why-rag-isnt-dead/</link>
<comments>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-douwe-kiela-on-why-rag-isnt-dead/#respond</comments>
<pubDate>Thu, 12 Jun 2025 09:58:53 +0000</pubDate>
<dc:creator><![CDATA[Ben Lorica and Douwe Kiela]]></dc:creator>
<category><![CDATA[Podcast]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?post_type=podcast&p=16850</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2024/01/Podcast_Cover_GenAI_in_the_Real_World.png"
medium="image"
type="image/png"
/>
<description><![CDATA[Join our host Ben Lorica and Douwe Kiela, cofounder of Contextual AI and author of the first paper on RAG, to find out why RAG remains as relevant as ever. Regardless of what you call it, retrieval is at the heart of generative AI. Find out why—and how to build effective RAG-based systems. About the […]]]></description>
<content:encoded><![CDATA[
Join our host Ben Lorica and Douwe Kiela, cofounder of Contextual AI and author of the first paper on RAG, to find out why RAG remains as relevant as ever. Regardless of what you call it, retrieval is at the heart of generative AI. Find out why—and how to build effective RAG-based systems.
About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.
Check out <a href="https://learning.oreilly.com/playlists/42123a72-1108-40f1-91c0-adbfb9f4983b/?_gl=1*16z5k2y*_ga*MTE1NDE4NjYxMi4xNzI5NTkwODkx*_ga_092EL089CH*MTcyOTYxNDAyNC4zLjEuMTcyOTYxNDAyNi41OC4wLjA." target="_blank" rel="noreferrer noopener">other episodes</a> of this podcast on the O’Reilly learning platform.
<h3 class="wp-block-heading">Timestamps</h3>
<ul class="wp-block-list">
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=0" target="_blank" rel="noreferrer noopener">0:00</a>: Introduction to Douwe Kiela, cofounder and CEO of Contextual AI.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=25" target="_blank" rel="noreferrer noopener">0:25</a>: Today’s topic is RAG. With frontier models advertising massive context windows, many developers wonder if RAG is becoming obsolete. What’s your take?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=63" target="_blank" rel="noreferrer noopener">1:03</a>: We now have a blog post: <a href="http://isragdeadyet.com" target="_blank" rel="noreferrer noopener">isragdeadyet.com</a>. If something keeps getting pronounced dead, it will never die. These long context models solve a similar problem to RAG: how to get the relevant information into the language model. But it’s wasteful to use the full context all the time. If you want to know who the headmaster is in Harry Potter, do you have to read all the books? </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=124" target="_blank" rel="noreferrer noopener">2:04</a>: What will probably work best is RAG plus long context models. The real solution is to use RAG, find as much relevant information as you can, and put it into the language model. The dichotomy between RAG and long context isn’t a real thing. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=168" target="_blank" rel="noreferrer noopener">2:48</a>: One of the main issues may be that RAG systems are annoying to build, and long context systems are easy. But if you can make RAG easy too, it’s much more efficient.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=187" target="_blank" rel="noreferrer noopener">3:07</a>: The reasoning models make it even worse in terms of cost and latency. And if you’re talking about something with a lot of usage, high repetition, it doesn’t make sense. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=219" target="_blank" rel="noreferrer noopener">3:39</a>: You’ve been talking about RAG 2.0, which seems natural: emphasize systems over models. I’ve long warned people that RAG is a complicated system to build because there are so many knobs to turn. Few developers have the skills to systematically turn those knobs. Can you unpack what RAG 2.0 means for teams building AI applications?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=262" target="_blank" rel="noreferrer noopener">4:22</a>: The language model is only a small part of a much bigger system. If the system doesn’t work, you can have an amazing language model and it’s not going to get the right answer. If you start from that observation, you can think of RAG as a system where all the model components can be optimized together. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=340" target="_blank" rel="noreferrer noopener">5:40</a>: What you’re describing is similar to what other parts of AI are trying to do: an end-to-end system. How early in the pipeline does your vision start?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=367" target="_blank" rel="noreferrer noopener">6:07</a>: We have two core concepts. One is a data store—that’s really extraction, where we do layout segmentation. We collate all of that information and chunk it, store it in the data store, and then the agents sit on top of the data store. The agents do a mixture of retrievers, followed by a reranker and a grounded language model.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=422" target="_blank" rel="noreferrer noopener">7:02</a>: What about embeddings? Are they automatically chosen? If you go to Hugging Face, there are, like, 10,000 embeddings.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=435" target="_blank" rel="noreferrer noopener">7:15</a>: We save you a lot of that effort. Opinionated orchestration is a way to think about it.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=451" target="_blank" rel="noreferrer noopener">7:31</a>: Two years ago, when RAG started becoming mainstream, a lot of developers focused on chunking. We had rules of thumb and shared stories. This eliminates a lot of that trial and error.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=486" target="_blank" rel="noreferrer noopener">8:06</a>: We basically have two APIs: one for ingestion and one for querying. Querying is contextualized on your data, which we’ve ingested. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=505" target="_blank" rel="noreferrer noopener">8:25</a>: One thing that’s underestimated is document parsing. A lot of people overfocus on embedding and chunking. Try to find a PDF extraction library for Python. There are so many of them, and you can’t tell which ones are good. They’re all terrible. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=534" target="_blank" rel="noreferrer noopener">8:54</a>: We have our stand-alone component APIs. Our document parser is available separately. Some areas, like finance, have extremely complex layouts. Nothing off the shelf works, so we had to roll our own solution. Since we know this will be used for RAG, we process the document to make it maximally useful. We don’t just extract raw information. We also extract the document hierarchy. That is extremely relevant as metadata when you’re doing retrieval. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=611" target="_blank" rel="noreferrer noopener">10:11</a>: There are open source libraries—what drove you to build your own, which I assume also encompasses OCR?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=645" target="_blank" rel="noreferrer noopener">10:45</a>: It encompasses OCR; it has VLMs, complex layout segmentation, different extraction models—it’s a very complex system. Open source systems are good for getting started, but you need to build for production, not for the demo. You need to make it work on a million PDFs. We see a lot of projects die on the way to productization.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=735" target="_blank" rel="noreferrer noopener">12:15</a>: It’s not just a question of information extraction; there’s structure inside these documents that you can leverage. A lot of people early on were focused on chunking. My intuition was that extraction was the key.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=768" target="_blank" rel="noreferrer noopener">12:48</a>: If your information extraction is bad, you can chunk all you want and it won’t do anything. Then you can embed all you want, but that won’t do anything. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=807" target="_blank" rel="noreferrer noopener">13:27</a>: What are you using for scale? Ray?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=812" target="_blank" rel="noreferrer noopener">13:32</a>: For scale, we’re just using our own systems. Everything is Kubernetes under the hood.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=832">13:52</a>: In the early part of the pipeline, what structures are you looking for? You mention hierarchy. People are also excited about knowledge graphs. Can you extract graphical information? </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=852" target="_blank" rel="noreferrer noopener">14:12</a>: GraphRAG is an interesting concept. In our experience, it doesn’t make a huge difference if you do GraphRAG the way the original paper proposes, which is essentially data augmentation. With Neo4j, you can generate queries in a query language, which is essentially text-to-SQL.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=908" target="_blank" rel="noreferrer noopener">15:08</a>: It presupposes you have a decent knowledge graph.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=917" target="_blank" rel="noreferrer noopener">15:17</a>: And that you have a decent text-to-query language model. That’s structure retrieval. You have to first turn your unstructured data into structured data.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=943" target="_blank" rel="noreferrer noopener">15:43</a>: I wanted to talk about retrieval itself. Is retrieval still a big deal?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=967" target="_blank" rel="noreferrer noopener">16:07</a>: It’s the hard problem. The way we solve it is still using a hybrid: mixture of retrievers. There are different retrieval modalities you can choose. At the first stage, you want to cast a wide net. Then you put that into the reranker, and those rerankers do all the smart stuff. You want to do fast first-stage retrieval, and rerank after that. It makes a big difference to give your reranker instructions. You might want to tell it to prefer recency. If the CEO wrote it, I want to prioritize that. Or I want it to observe data hierarchies. You need some rules to capture how you want to rank data.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1076" target="_blank" rel="noreferrer noopener">17:56</a>: Your retrieval step is complex. How does it impact latency? And how does it impact explainability and transparency?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1097" target="_blank" rel="noreferrer noopener">18:17</a>: You have observability on all of these stages. In terms of latency, it’s not that bad because you narrow the funnel gradually. Latency is one of many parameters.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1132" target="_blank" rel="noreferrer noopener">18:52</a>: One of the things a lot of people don’t understand is that RAG does not completely shield you from hallucination. You can give the language model all the relevant information, but the language model might still be opinionated. What’s your solution to hallucination?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1177" target="_blank" rel="noreferrer noopener">19:37</a>: A general purpose language model needs to satisfy many different constraints. It needs to be able to hallucinate—it needs to be able to talk about things that aren’t in the ground-truth context. With RAG you don’t want that. We’ve taken open source base models and trained them to be grounded in the context only. The language models are very good at saying, “I don’t know.” That’s really important. Our model cannot talk about anything it doesn’t have context on. We call it our grounded language model (GLM).</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1237" target="_blank" rel="noreferrer noopener">20:37</a>: Two things have happened in recent months: reasoning and multimodality.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1254" target="_blank" rel="noreferrer noopener">20:54</a>: Both are super important for RAG in general. I’m very happy that multimodality is finally getting the attention that it observes. A lot of data is multimodal. Videos and complex layouts. Qualcomm is one of our customers; their data is very complex: circuit diagrams, code, tables. You need to extract the information the right way and make sure the whole pipeline works.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1320" target="_blank" rel="noreferrer noopener">22:00</a>: Reasoning: I think people are still underestimating how much of a paradigm shift inference-time compute is. We’re doing a lot of work on domain-agnostic planners and making sure you have agentic capabilities where you can understand what you want to retrieve. RAG becomes one of the tools for the domain-agnostic planner. Retrieval is the way you make systems work on top of your data. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1362" target="_blank" rel="noreferrer noopener">22:42</a>: Inference-time compute will be slower and more expensive. Is your system engineered so you only use that when you need to?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1376" target="_blank" rel="noreferrer noopener">22:56</a>: We are a platform where people can build their own agents, so you can build what you want. We have “think mode,” where you use the reasoning model, or the standard RAG mode, where it just does RAG with lower latency.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1398" target="_blank" rel="noreferrer noopener">23:18</a>: With reasoning models, people seem to become much more relaxed about latency constraints. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1420" target="_blank" rel="noreferrer noopener">23:40</a>: You describe a system that’s optimized end to end. That implies that I don’t need to do fine-tuning. You don’t have to, but you can if you want.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1442" target="_blank" rel="noreferrer noopener">24:02</a>: What would fine-tuning buy me at this point? If I do fine-tuning, the ROI would be small.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1462" target="_blank" rel="noreferrer noopener">24:20</a>: It depends on how much a few extra percent of performance is worth to you. For some of our customers, that can be a huge difference. Fine-tuning versus RAG is another false dichotomy. The answer has always been both. The same is true of MCP and long context.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1517" target="_blank" rel="noreferrer noopener">25:17</a>: My suspicion is with your system I’m going to do less fine-tuning. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1520" target="_blank" rel="noreferrer noopener">25:20</a>: Out of the box, our system will be pretty good. But we do help our customers squeeze out max performance. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1537" target="_blank" rel="noreferrer noopener">25:37</a>: Those still fit into the same kind of supervised fine-tuning: Here’s some labeled examples.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1552" target="_blank" rel="noreferrer noopener">25:52</a>: We don’t need that many. It’s not labels so much as examples of the behavior you want. We use synthetic data pipelines to get a good enough training set. We’re seeing pretty good gains with that. It’s really about capturing the domain better.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1588" target="_blank" rel="noreferrer noopener">26:28</a>: “I don’t need RAG because I have agents.” Aren’t deep research tools just doing what a RAG system is supposed to do?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1611" target="_blank" rel="noreferrer noopener">26:51</a>: They’re using RAG under the hood. MCP is just a protocol; you would be doing RAG with MCP. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1645" target="_blank" rel="noreferrer noopener">27:25</a>: These deep research tools—the agent is supposed to go out and find relevant sources. In other words, it’s doing what a RAG system is supposed to do, but it’s not called RAG.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1675" target="_blank" rel="noreferrer noopener">27:55</a>: I would still call that RAG. The agent is the generator. You’re augmenting the G with the R. If you want to get these systems to work on top of your data, you need retrieval. That’s what RAG is really about.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1713" target="_blank" rel="noreferrer noopener">28:33</a>: The main difference is the end product. A lot of people use these to generate a report or slide data they can edit.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1733" target="_blank" rel="noreferrer noopener">28:53</a>: Isn’t the difference just inference-time compute, the ability to do active retrieval as opposed to passive retrieval? You always retrieve. You can make that more active; you can decide from the model when and what you want to retrieve. But you’re still retrieving. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1785" target="_blank" rel="noreferrer noopener">29:45</a>: There’s a class of agents that don’t retrieve. But they don’t work yet, but that’s the vision of an agent moving forward.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1811" target="_blank" rel="noreferrer noopener">30:11</a>: It’s starting to work. The tool used in that example is retrieval; the other tool is calling an API. What these reasoners are doing is just calling APIs as tools.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1840" target="_blank" rel="noreferrer noopener">30:40</a>: At the end of the day, Google’s original vision is what matters: organize all the world’s information. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1848" target="_blank" rel="noreferrer noopener">30:48</a>: A key difference between the old approach and the new approach is that we have the G: generative answers. We don’t have to reason over the retrievals ourselves any more.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1879" target="_blank" rel="noreferrer noopener">31:19</a>: What parts of your platform are open source?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1887" target="_blank" rel="noreferrer noopener">31:27</a>: We’ve open-sourced some of our earlier work, and we’ve published a lot of our research. </li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1912" target="_blank" rel="noreferrer noopener">31:52</a>: One of the topics I’m watching: I think supervised fine-tuning is a solved problem. But reinforcement fine-tuning is still a UX problem. What’s the right way to interact with a domain expert?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1945" target="_blank" rel="noreferrer noopener">32:25</a>: Collecting that feedback is very important. We do that as a part of our system. You can train these dynamic query paths using the reinforcement signal.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1972" target="_blank" rel="noreferrer noopener">32:52</a>: In the next 6 to 12 months, what would you like to see from the foundation model builders?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=1988" target="_blank" rel="noreferrer noopener">33:08</a>: It would be nice if longer context actually worked. You will still need RAG. The other thing is VLMs. VLMs are good, but they’re still not great, especially when it comes to fine-grained chart understanding.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=2023" target="_blank" rel="noreferrer noopener">33:43</a>: With your platform, can you bring your own model, or do you supply the model?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=2031" target="_blank" rel="noreferrer noopener">33:51</a>: We have our own models for the retrieval and contextualization stack. You can bring your own language model, but our GLM often works better than what you can bring yourself.</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=2049" target="_blank" rel="noreferrer noopener">34:09</a>: Are you seeing adoption of the Chinese models?</li>
<li><a href="https://cdn.oreillystatic.com/radar/generative-ai-real-world-podcast/GenAI_in_the_Real_World_Douwe_Kiela.mp3#t=2053" target="_blank" rel="noreferrer noopener">34:13</a>: Yes and no. DeepSeek was a very important existence proof. We don’t deploy them for production customers.</li>
</ul>
]]></content:encoded>
<wfw:commentRss>https://www.oreilly.com/radar/podcast/generative-ai-in-the-real-world-douwe-kiela-on-why-rag-isnt-dead/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
</item>
<item>
<title>Normal Technology at Scale</title>
<link>https://www.oreilly.com/radar/normal-technology-at-scale/</link>
<pubDate>Tue, 10 Jun 2025 10:19:25 +0000</pubDate>
<dc:creator><![CDATA[Mike Loukides]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Artificial Intelligence]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16830</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2019/06/eclairage-bfb039e7b68e1fe830b373274a65ea62-1.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[The widely read and discussed article “AI as Normal Technology” is a reaction against claims of “superintelligence,” as its headline suggests. I’m substantially in agreement with it. AGI and superintelligence can mean whatever you want—the terms are ill-defined and next to useless. AI is better at most things than most people, but what does that […]]]></description>
<content:encoded><![CDATA[
The widely read and discussed article “<a href="https://knightcolumbia.org/content/ai-as-normal-technology" target="_blank" rel="noreferrer noopener">AI as Normal Technology</a>” is a reaction against claims of “superintelligence,” as its headline suggests. I’m substantially in agreement with it. AGI and superintelligence can mean whatever you want—the terms are ill-defined and next to useless. AI is better at most things than most people, but what does that mean in practice, if an AI doesn’t have volition? If an AI can’t recognize the existence of a problem that needs a solution, and want to create that solution? It looks like the use of AI is exploding everywhere, particularly if you’re in the technology industry. But outside of technology, AI adoption isn’t likely to be faster than the adoption of any other new technology. Manufacturing is already heavily automated, and upgrading that automation would require significant investments of money and time. Factories aren’t rebuilt overnight. Neither are farms, railways, or construction companies. Adoption is further slowed by the difficulty of getting from a good demo to an application running in production. AI certainly has risks, but those risks have more to do with real harms arising from issues like bias and data quality than the apocalyptic risks that many in the AI community worry about; those apocalyptic risks have more to do with science fiction than reality. (If you notice an AI manufacturing paper clips, pull the plug, please.)
Still, there’s one kind of risk that I can’t avoid thinking about, and that the authors of “AI as Normal Technology” only touch on, though they are good on the real nonimagined risks. Those are the risks of scale: AI provides the means to do things at volumes and speeds greater than we have ever had before. The ability to operate at scale is a huge advantage, but it’s also a risk all its own. In the past, we rejected qualified female and minority job applicants one at a time; maybe we rejected all of them, but a human still had to be burdened with those individual decisions. Now we can reject them en masse, even with supposedly race- and gender-blind applications. In the past, police departments guessed who was likely to commit a crime one at a time, a highly biased practice commonly known as “profiling.”1 Most likely most of the supposed criminals are in the same group, and most of those decisions are wrong. Now we can be wrong about entire populations in an instant—and our wrongness is justified because “an AI said so,” a defense that’s even more specious than “I was just obeying orders.”
We have to think about this kind of risk carefully, though, because it’s not just about AI. It depends on other changes that have little to do with AI, and everything to do with economics. Back in the early 2000s, <a href="https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/" target="_blank" rel="noreferrer noopener">Target outed</a> a pregnant teenage girl to her parents by <a href="https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html" target="_blank" rel="noreferrer noopener">analyzing her purchases</a>, determining that she was likely to be pregnant, and sending advertising circulars that targeted pregnant women to her home. This example is an excellent lens for thinking through the risks. First, Target’s systems determined that the girl was pregnant using automated data analysis. No humans were involved. Data analysis isn’t quite AI, but it’s a very clear precursor (and could easily have been called AI at the time). Second, exposing a single teenage pregnancy is only a small part of a much bigger problem. In the past, a human pharmacist might have noticed a teenager’s purchases and had a kind word with her parents. That’s certainly an ethical issue, though I don’t intend to write on the ethics of pharmacology. We all know that people make poor decisions, and that these decisions effect others. We also have ways to deal with these decisions and their effects, however inadequately. It’s a much bigger issue that Target’s systems have the potential for outing pregnant women at scale—and in an era when abortion is illegal or near-illegal in many states, that’s important. In 2025, it’s unfortunately easy to imagine a state attorney general subpoenaing data from any source, including retail purchases, that might help them identify pregnant women.
We can’t chalk this up to AI, though it’s a factor. We need to account for the disappearance of human pharmacists, working in independent pharmacies where they can get to know their customers. We had the technology to do Target’s data analysis in the 1980s: We had mainframes that could process data at scale, we understood statistics, we had algorithms. We didn’t have big disk drives, but we had magtape—so many miles of magtape! What we didn’t have was the data; the sales took place at thousands of independent businesses scattered throughout the world. Few of those independent pharmacies survive, at least in the US—in my town, the last one disappeared in 1996. When nationwide chains replaced independent drugstores, the data became consolidated. Our data was held and analyzed by chains that consolidated data from thousands of retail locations. In 2025, even the chains are consolidating; CVS may end up being the last drugstore standing.
Whatever you may think about the transition from independent druggists to chains, in this context it’s important to understand that what enabled Target to identify pregnancies wasn’t a technological change; it was economics, glibly called “economies of scale.” That economic shift may have been rooted in technology—specifically, the ability to manage supply chains across thousands of retail outlets—but it’s not just about technology. It’s about the <a href="https://www.oreilly.com/radar/ethics-at-scale/" target="_blank" rel="noreferrer noopener">ethics of scale</a>. This kind of consolidation took place in just about every industry, from auto manufacturing to transportation to farming—and, of course, just about all forms of retail sales. The collapse of small record labels, small publishers, small booksellers, small farms, small anything has everything to do with managing supply chains and distribution. (Distribution is really just supply chains in reverse.) The economics of scale enabled data at scale, not the other way around.
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1048" height="709" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Doudens-1048x709.png" alt="Digital image © Guilford Free Library." class="wp-image-16841" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Doudens-1048x709.png 1048w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Doudens-300x203.png 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Doudens-768x520.png 768w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Doudens-1536x1039.png 1536w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Doudens.png 1958w" sizes="auto, (max-width: 1048px) 100vw, 1048px" /><figcaption class="wp-element-caption">Douden’s Drugstore (Guilford, CT) on its closing day.2</figcaption></figure>
We can’t think about the ethical use of AI without also thinking about the economics of scale. Indeed, the first generation of “modern” AI—something now condescendingly referred to as “classifying cat and dog photos”—happened because the widespread use of digital cameras enabled photo sharing sites like Flickr, which could be scraped for training data. Digital cameras didn’t penetrate the market because of AI but because they were small, cheap, and convenient and could be integrated into cell phones. They created the data that made AI possible.
Data at scale is the necessary precondition for AI. But AI facilitates the vicious circle that turns data against its humans. How do we break out of this vicious circle? Whether AI is normal or apocalyptic technology really isn’t the issue. Whether AI can do things better than individuals isn’t the issue either. AI makes mistakes; humans make mistakes. AI often makes different kinds of mistakes, but that doesn’t seem important. What’s important is that, whether mistaken or not, AI amplifies scale.3 It enables the drowning out of voices that certain groups don’t want to be heard. It enables the swamping of creative spaces with dull sludge (now christened “slop”). It enables mass surveillance, not of a few people limited by human labor but of entire populations.
Once we realize that the problems we face are rooted in economics and scale, not superhuman AI, the question becomes: How do we change the systems in which we work and live in ways that preserve human initiative and human voices? How do we build systems that build in economic incentives for privacy and fairness? We don’t want to resurrect the nosey local druggist, but we prefer harms that are limited in scope to harms at scale. We don’t want to depend on local boutique farms for our vegetables—that’s only a solution for those who can afford to pay a premium—but we don’t want massive corporate farms implementing economies of scale by cutting corners on cleanliness.4 “Big enough to fight regulators in court” is a kind of scale we can do without, along with “penalties are just a cost of doing business.” We can’t deny that AI has a role in scaling risks and abuses, but we also need to realize that the risks we need to fear aren’t the existential risks, the apocalyptic nightmares of science fiction.
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
The right thing to be afraid of is that individual humans are dwarfed by the scale of modern institutions. They’re the same human risks and harms we’ve faced all along, usually without addressing them appropriately. Now they’re magnified.
</blockquote>
So, let’s end with a provocation. We can certainly imagine AI that makes us 10x better programmers and software developers, though <a href="https://learning.oreilly.com/videos/coding-with-ai/0642572017171/" target="_blank" rel="noreferrer noopener">it remains to be seen whether that’s really true</a>. Can we imagine AI that helps us to build better institutions, institutions that work on a human scale? Can we imagine AI that enhances human creativity rather than proliferating slop? To do so, we’ll need to take advantage of things we can do that AI can’t—specifically, the ability to want and the ability to enjoy. AI can certainly play Go, chess, and many other games better than a human, but it can’t want to play chess, nor can it enjoy a good game. Maybe an AI can create art or music (as opposed to just recombining clichés), but I don’t know what it would mean to say that AI enjoys listening to music or looking at paintings. Can it help us be creative? Can AI help us build institutions that foster creativity, frameworks within which we can enjoy being human?
Michael Lopp (aka @Rands) recently <a href="https://randsinrepose.com/archives/minimum-viable-curiousity/" target="_blank" rel="noreferrer noopener">wrote</a>:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
I think we’re screwed, not because of the power and potential of the tools. It starts with the greed of humans and how their machinations (and success) prey on the ignorant. We’re screwed because these nefarious humans were already wildly successful before AI matured and now we’ve given them even better tools to manufacture hate that leads to helplessness.
</blockquote>
Note the similarities to my argument: The problem we face isn’t AI; it’s human and it preexisted AI. But “screwed” isn’t the last word. Rands also talks about being blessed:
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
I think we’re blessed. We live at a time when the tools we build can empower those who want to create. The barriers to creating have never been lower; all you need is a mindset. Curiosity. How does it work? Where did you come from? What does this mean? What rules does it follow? How does it fail? Who benefits most from this existing? Who benefits least? Why does it feel like magic? What is magic, anyway? It’s an endless set of situationally dependent questions requiring dedicated focus and infectious curiosity.
</blockquote>
We’re both screwed and blessed. The important question, then, is how to use AI in ways that are constructive and creative, how to disable their ability to manufacture hate—an ability just demonstrated by xAI’s Grok spouting about “<a href="https://www.axios.com/2025/05/16/musk-grok-south-africa-white-genocide-xai" target="_blank" rel="noreferrer noopener">white genocide</a>.” It starts with disabusing ourselves of the notion that AI is an apocalyptic technology. It is, ultimately, just another “normal” technology. The best way to disarm a monster is to realize that it isn’t a monster—and that responsibility for the monster inevitably lies with a human, and a human coming from a specific complex of beliefs and superstitions.
A critical step in avoiding “screwed” is to act human. Tom Lehrer’s song “<a href="https://www.google.com/search?q=tom+lehrer+folk+song+army&sca_esv=7ce7144faa458147&sxsrf=AHTn8zpMbOqNeoAC0pvet8LWp5y-TcHoeQ%3A1747421035437&ei=a4cnaMOzGqT9ptQPnNagoAQ&ved=0ahUKEwiDldzQ0qiNAxWkvokEHRwrCEQQ4dUDCBI&uact=5&oq=tom+lehrer+folk+song+army&gs_lp=Egxnd3Mtd2l6LXNlcnAiGXRvbSBsZWhyZXIgZm9sayBzb25nIGFybXkyCxAAGIAEGJECGIoFMgUQLhiABDIGEAAYFhgeMgYQABgWGB4yCBAAGIAEGKIEMgUQABjvBUipOVCbA1jNNnADeACQAQCYAc4CoAGMIaoBCTE4LjE4LjEuMbgBA8gBAPgBAZgCF6ACrBTCAggQABiwAxjvBcICCxAAGIAEGLADGKIEwgIFECEYoAHCAgQQIxgnwgIFEAAYgATCAgoQABiABBgUGIcCwgIKEC4YgAQYFBiHAsICCxAuGIAEGJECGIoFwgIIEAAYogQYiQXCAgsQABiABBiGAxiKBZgDAIgGAZAGBJIHCDcuMTQuMS4xoAfVqAKyBwg0LjE0LjEuMbgHnxQ&sclient=gws-wiz-serp#wptab=si:APYL9bvANhkpyEhcl2rqpzxECqTUq49tNzJ_JBnRD6lM1Th9NZ5cgeeYK1lMRqAhwxRO7sO1ircKkbgWflHwIdkCDaoa0gfRbH32KtUfH-eQ-S1omQFxVWSI6GYB99aZlm6O2VHuBwQMZGNo6DS5UNtYuNHndnx3k0d1UvTr0oky5a9igFMfmUM%3D" target="_blank" rel="noreferrer noopener">The Folk Song Army</a>” says, “We had all the good songs” in the war against Franco, one of the 20th century’s great losing causes. In 1969, during the struggle against the Vietnam War, we also had “all the good songs”—but that struggle eventually succeeded in stopping the war. The protest music of the 1960s came about because of a certain historical moment in which the music industry wasn’t in control; as Frank Zappa <a href="https://www.cartoonbrew.com/ideas-commentary/frank-zappa-explains-why-cartoons-today-suck-10513.html" target="_blank" rel="noreferrer noopener">said</a>, “These were cigar-chomping old guys who looked at the product that came and said, ‘I don’t know. Who knows what it is. Record it. Stick it out. If it sells, alright.’” The problem with contemporary music in 2025 is that the music industry is very much in control; to become successful, you have to be vetted, marketable, and fall within a limited range of tastes and opinions. But there are alternatives: Bandcamp may not be as good an alternative as it once was, but it is an alternative. Make music and share it. Use AI to help you make music. Let AI help you be creative; don’t let it replace your creativity. One of the great cultural tragedies of the 20th century was the professionalization of music. In the 19th century, you’d be embarrassed not to be able to sing, and you’d be likely to play an instrument. In the 21st, many people won’t admit that they can sing, and instrumentalists are few. That’s a problem we can address. By building spaces, online or otherwise, around your music, we can do an end run around the music industry, which has always been more about “industry” than “music.” Music has always been a communal activity; it’s time to rebuild those communities at human scale.
Is that just warmed-over 1970s thinking, Birkenstocks and granola and all that? Yes, but there’s also some reality there. It doesn’t minimize or mitigate risk associated with AI, but it recognizes some things that are important. AIs can’t want to do anything, nor can they enjoy doing anything. They don’t care whether they are playing Go or deciphering DNA. Humans can want to do things, and we can take joy in what we do. Remembering that will be increasingly important as the spaces we inhabit are increasingly shared with AI. Do what we do best—with the help of AI. AI is not going to go away, but we can make it play our tune.
Being human means building communities around what we do. We need to build new communities that are designed for human participation, communities in which we share the joy in things we love to do. Is it possible to view YouTube as a tool that has enabled many people to share video and, in some cases, even to earn a living from it? And is it possible to view AI as a tool that has helped people to build their videos? I don’t know, but I’m open to the idea. YouTube is subject to what Cory Doctorow calls <a href="https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys" target="_blank" rel="noreferrer noopener">enshittification</a>, as is enshittification’s poster child TikTok: They use AI to monetize attention and (in the case of TikTok) may have shared data with foreign governments. But it would be unwise to discount the creativity that has come about through YouTube. It would also be unwise to discount the number of people who are earning at least part of their living through YouTube. Can we make a similar argument about Substack, which allows writers to build communities around their work, inverting the paradigm that drove the 20th century news business: putting the reporter at the center rather than the institution? We don’t yet know whether Substack’s subscription model will enable it to resist the forces that have devalued other media; we’ll find out in the coming years. We can certainly make an argument that services like Mastodon, a decentralized collection of federated services, are a new form of social media that can nurture communities at human scale. (Possibly also Bluesky, though right now Bluesky is only decentralized in theory.) <a href="https://signal.org/" target="_blank" rel="noreferrer noopener">Signal</a> provides secure group messaging, if used properly—and it’s easy to forget how important messaging has been to the development of social media. Anil Dash’s call for an “<a href="https://www.anildash.com/2025/05/27/internet-of-consent/" target="_blank" rel="noreferrer noopener">Internet of Consent</a>,” in which humans get to choose how their data is used, is another step in the right direction.
In the long run, what’s important won’t be the applications. It will be “having the good songs.” It will be creating the protocols that allow us to share those songs safely. We need to build and nurture our own gardens; we need to build new institutions at human scale more than we need to disrupt the existing walled gardens. AI can help with that building, if we let it. As Rands said, the barriers to creativity and curiosity have never been lower.
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<h3 class="wp-block-heading">Footnotes</h3>
<ol class="wp-block-list">
<li>A <a href="https://www.ctdatahaven.org/blog/connecticut-data-reveal-racial-disparities-policing" target="_blank" rel="noreferrer noopener">study</a> in Connecticut showed that, during traffic stops, members of nonprofiled groups were actually more likely to be carrying contraband (i.e., illegal drugs) than members of profiled groups.</li>
<li>Digital image © Guilford Free Library.</li>
<li>Nicholas Carlini’s “<a href="https://nicholas.carlini.com/writing/2025/machines-of-ruthless-efficiency.html" target="_blank" rel="noreferrer noopener">Machines of Ruthless Efficiency</a>” makes a similar argument.</li>
<li>And we have no real guarantee that local farms are any more hygienic.</li>
</ol>
]]></content:encoded>
</item>
<item>
<title>What Comes After the LLM: Human-Centered AI, Spatial Intelligence, and the Future of Practice</title>
<link>https://www.oreilly.com/radar/what-comes-after-the-llm-human-centered-ai-spatial-intelligence-and-the-future-of-practice/</link>
<pubDate>Fri, 06 Jun 2025 10:57:05 +0000</pubDate>
<dc:creator><![CDATA[Duncan Gilchrist and Hugo Bowne-Anderson]]></dc:creator>
<category><![CDATA[AI & ML]]></category>
<category><![CDATA[Artificial Intelligence]]></category>
<category><![CDATA[Commentary]]></category>
<guid isPermaLink="false">https://www.oreilly.com/radar/?p=16822</guid>
<media:content
url="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2019/06/head-663997_1920_crop-f2a401ae22213e82275e3ec047ddff60-1.jpg"
medium="image"
type="image/jpeg"
/>
<description><![CDATA[In a recent episode of High Signal, we spoke with Dr. Fei-Fei Li about what it really means to build human-centered AI, and where the field might be heading next. Fei-Fei doesn’t describe AI as a feature or even an industry. She calls it a “civilizational technology”—a force as foundational as electricity or computing itself. […]]]></description>
<content:encoded><![CDATA[
<a href="https://high-signal.delphina.ai/episode/fei-fei-on-how-human-centered-ai-actually-gets-built" target="_blank" rel="noreferrer noopener">In a recent episode of High Signal</a>, we spoke with Dr. Fei-Fei Li about what it really means to build human-centered AI, and where the field might be heading next.
Fei-Fei doesn’t describe AI as a feature or even an industry. She calls it a “civilizational technology”—a force as foundational as electricity or computing itself. This has serious implications for how we design, deploy, and govern AI systems across institutions, economies, and everyday life.
Our conversation was about more than short-term tactics. It was about how foundational assumptions are shifting, around interface, intelligence, and responsibility, and what that means for technical practitioners building real-world systems today.
<h2 class="wp-block-heading">The Concentric Circles of Human-Centered AI</h2>
Fei-Fei’s framework for human-centered AI centers on three concentric rings: the individual, the community, and society.
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1048" height="615" src="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Firefly_Three-concentric-rings-with-one-labeled-as-the-individual-another-labeled-as-the-com-722912-1048x615.jpg" alt="" class="wp-image-16823" srcset="https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Firefly_Three-concentric-rings-with-one-labeled-as-the-individual-another-labeled-as-the-com-722912-1048x615.jpg 1048w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Firefly_Three-concentric-rings-with-one-labeled-as-the-individual-another-labeled-as-the-com-722912-300x176.jpg 300w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Firefly_Three-concentric-rings-with-one-labeled-as-the-individual-another-labeled-as-the-com-722912-768x451.jpg 768w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Firefly_Three-concentric-rings-with-one-labeled-as-the-individual-another-labeled-as-the-com-722912-1536x901.jpg 1536w, https://www.oreilly.com/radar/wp-content/uploads/sites/3/2025/06/Firefly_Three-concentric-rings-with-one-labeled-as-the-individual-another-labeled-as-the-com-722912-2048x1202.jpg 2048w" sizes="auto, (max-width: 1048px) 100vw, 1048px" /><figcaption class="wp-element-caption">Image created by Adobe Firefly</figcaption></figure>
At the individual level, it’s about building systems that preserve dignity, agency, and privacy. To give one example, at Stanford, Fei-Fei’s worked on sensor-based technologies for elder care aimed at identifying clinically relevant moments that could lead to worse outcomes if left unaddressed. Even with well-intentioned design, these systems can easily cross into overreach if they’re not built with human experience in mind.
At the community level, our conversation focused on workers, creators, and collaborative groups. What does it mean to support creativity when generative models can produce text, images, and video at scale? How do we augment rather than replace? How do we align incentives so that the benefits flow to creators and not just platforms?
At the societal level, her attention turns to jobs, governance, and the social fabric itself. AI alters workflows and decision-making across sectors: education, healthcare, transportation, even democratic institutions. We can’t treat that impact as incidental.
<a href="https://high-signal.delphina.ai/episode/next-evolution-of-ai" target="_blank" rel="noreferrer noopener">In an earlier High Signal episode</a>, Michael I. Jordan argued that too much of today’s AI mimics individual cognition rather than modeling systems like markets, biology, or collective intelligence. Fei-Fei’s emphasis on the concentric circles complements that view—pushing us to design systems that account for people, coordination, and context, not just prediction accuracy.
<figure class="wp-block-video"><video controls src="https://descriptusercontent.com/published/37f30d83-88d3-4b99-9f67-332522cead3c/original.mp4"></video></figure>
<h2 class="wp-block-heading">Spatial Intelligence: A Different Language for Computation</h2>
Another core theme of our conversation was Fei-Fei’s work on spatial intelligence and why the next frontier in AI won’t be about language alone.
At <a href="https://www.worldlabs.ai/" target="_blank" rel="noreferrer noopener">her startup, World Labs</a>, Fei-Fei is developing foundation models that operate in 3D space. These models are not only for robotics; they also underpin applications in education, simulation, creative tools, and real-time interaction. When AI systems understand geometry, orientation, and physical context, new forms of reasoning and control become possible.
“We are seeing a lot of pixels being generated, and they’re beautiful,” she explained, “but if you just generate pixels on a flat screen, they actually lack information.” Without 3D structure, it’s difficult to simulate light, perspective, or interaction, making it hard to compute with or control.
For technical practitioners, this raises big questions:
<ul class="wp-block-list">
<li>What are the right abstractions for 3D model reasoning?</li>
<li>How do we debug or test agents when output isn’t just text but spatial behavior?</li>
<li>What kind of observability and interfaces do these systems need?</li>
</ul>
Spatial modeling is about more than realism; it’s about controllability. Whether you’re a designer placing objects in a scene or a robot navigating a room, spatial reasoning gives you consistent primitives to build on.
<h2 class="wp-block-heading">Institutions, Ecosystems, and the Long View</h2>
Fei-Fei also emphasized that technology doesn’t evolve in a vacuum. It emerges from ecosystems: funding systems, research labs, open source communities, and public education.
She’s concerned that AI progress has accelerated far beyond public understanding—and that most national conversations are either alarmist or extractive. Her call: Don’t just focus on models. Focus on building robust public infrastructure around AI that includes universities, startups, civil society, and transparent regulation.
<a href="https://high-signal.delphina.ai/episode/tim-oreilly-on-the-end-of-programming-as-we-know-it" target="_blank" rel="noreferrer noopener">This mirrors something Tim O’Reilly told us in another episode</a>: that fears about “AI taking jobs” often miss the point. The Industrial Revolution didn’t eliminate work—it redefined tasks, shifted skills, and massively increased the demand for builders. With AI, the challenge isn’t disappearance. It’s transition. We need new metaphors for productivity, new educational models, and new ways of organizing technical labor.
Fei-Fei shares that long view. She’s not trying to chase benchmarks; she’s trying to shape institutions that can adapt over time.
<figure class="wp-block-video"><video controls src="https://descriptusercontent.com/published/77a4c971-bd21-4134-a579-c40b69283564/original.mp4"></video></figure>
<h2 class="wp-block-heading">For Builders: What to Pay Attention To</h2>
What should AI practitioners take from all this?
First, don’t assume language is the final interface. The next frontier involves space, sensors, and embodied context.
Second, don’t dismiss human-centeredness as soft. Designing for dignity, context, and coordination is a hard technical problem, one that lives in the architecture, the data, and the feedback loops.
Third, zoom out. What you build today will live inside ecosystems—organizational, social, regulatory. Fei-Fei’s framing is a reminder that it’s our job not just to optimize outputs but to shape systems that hold up over time.
<h2 class="wp-block-heading">Further Viewing/Listening</h2>
<ul class="wp-block-list">
<li><a href="https://high-signal.delphina.ai/episode/fei-fei-on-how-human-centered-ai-actually-gets-built" target="_blank" rel="noreferrer noopener">Fei-Fei Li on How Human-Centered AI Actually Gets Built</a></li>
<li><a href="https://high-signal.delphina.ai/episode/tim-oreilly-on-the-end-of-programming-as-we-know-it" target="_blank" rel="noreferrer noopener">Tim O’Reilly on the End of Programming as We Know It</a></li>
<li><a href="https://high-signal.delphina.ai/episode/next-evolution-of-ai" target="_blank" rel="noreferrer noopener">Michael Jordan on the Next Evolution of AI: Markets, Uncertainty, and Engineering Intelligence at Scale</a></li>
</ul>

]]></content:encoded>
<enclosure url="https://descriptusercontent.com/published/37f30d83-88d3-4b99-9f67-332522cead3c/original.mp4" length="58078170" type="video/mp4" />
<enclosure url="https://descriptusercontent.com/published/77a4c971-bd21-4134-a579-c40b69283564/original.mp4" length="45916337" type="video/mp4" />
</item>
</channel>
</rss>
<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/
Object Caching 249/250 objects using Memcached
Page Caching using Disk: Enhanced (Page is feed)
Minified using Memcached
Database Caching using Memcached
Served from: www.oreilly.com @ 2025-07-11 21:55:53 by W3 Total Cache
-->

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src attribute if necessary):

<a href="http://www.feedvalidator.org/check.cgi?url=https%3A//www.oreilly.com/radar/feed/index.xml"><img src="valid-rss-rogers.png" alt="[Valid RSS]" title="Validate my RSS feed" /></a>

If you would like to create a text link instead, here is the URL you can use:

http://www.feedvalidator.org/check.cgi?url=https%3A//www.oreilly.com/radar/feed/index.xml

Home · About · News · Docs · Terms