Congratulations!

[Valid Atom 1.0] This is a valid Atom 1.0 feed.

Recommendations

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

Source: http://www.mnot.net/blog/index.atom

  1. <?xml version="1.0" encoding="utf-8"?>
  2. <feed xmlns="http://www.w3.org/2005/Atom">
  3.  <title>mark nottingham</title>
  4.  <link rel="alternate" type="text/html" href="https://www.mnot.net/blog/" />
  5.  <link rel="self" type="application/atom+xml" href="https://www.mnot.net/blog/index.atom" />
  6.  <id>tag:www.mnot.net,2010-11-11:/blog//1</id>
  7.  <updated>2024-05-03T01:54:00Z</updated>
  8.  <subtitle></subtitle>
  9.  
  10.  <entry>
  11.    <title>No One Should Have That Much Power</title>
  12.    <link rel="alternate" type="text/html" href="https://www.mnot.net/blog/2024/04/29/power" />
  13.    <id>https://www.mnot.net/blog/2024/04/29/power</id>
  14.    <updated>2024-04-29T00:00:00Z</updated>
  15.    <author>
  16.        <name>Mark Nottingham</name>
  17.        <uri>https://www.mnot.net/personal/</uri>
  18.    </author>
  19.    <summary>It’s a common spy thriller trope. There’s a special key that can unlock something critical – business records, bank vaults, government secrets, nuclear weapons, maybe all of the above, worldwide.</summary>
  20.    
  21. <category term="Internet and Web" />
  22.    
  23.    <content type="html" xml:lang="en" xml:base="https://www.mnot.net/blog/2024/04/29/power">
  24.    <![CDATA[<p class="intro">It’s a common spy thriller trope. There’s a special key that can unlock something critical – business records, bank vaults, government secrets, nuclear weapons, maybe all of the above, worldwide.</p>
  25.  
  26. <p>Our hero has to stop this key from falling into bad people’s hands, or recover it before it’s too late. Perhaps at one point they utter something like the title of this post. You walk out of the theatre two hours later entertained but wondering why someone would be silly enough to create such a powerful artefact.</p>
  27.  
  28. <p>In a surprising move, law enforcement officials are once again <a href="https://www.europol.europa.eu/cms/sites/default/files/documents/EDOC-%231384205-v1-Joint_Declaration_of_the_European_Police_Chiefs.PDF">calling for such a thing to be created</a>. <a href="https://www.theage.com.au/national/australia-news-live-coalition-pushes-compulsory-age-limits-for-social-media-australians-hit-by-tax-surge-20240424-p5fm4z.html?post=p55wk1#p55wk1">Repeatedly</a>.</p>
  29.  
  30. <p>These authorities and their proxies say that they must have access to encrypted communications to keep us safe. They have been doing so for years – at first bluntly, now in a more subtle way. Encryption backdoors aren’t politically viable, so they take pains to say that they don’t want them while at the same time asking for a level of access that cannot be achieved except through backdooring encryption.</p>
  31.  
  32. <p>If you create a way to recover messages sent through a service, that’s a backdoor. If you run some code that evaluates messages on the endpoints and flags them if they meet some criteria, that isn’t an improvement; it’s a backdoor that can be <a href="https://youtu.be/DplqxrH6Xbg?t=3471">abused in myriad ways</a>. Centralising access to encrypted content <a href="https://www.rfc-editor.org/rfc/rfc9518.html">creates unavoidable systemic risks</a>.</p>
  33.  
  34. <p>This means that any such mechanism has to be handled like weapons-grade plutonium: losing control is a disaster of epic (or even existential) proportions. The few national governments who have nuclear capability <a href="https://www.nti.org/analysis/articles/overview-of-the-cns-global-incidents-and-trafficking-database/">struggle greatly to manage that risk</a>; why would we intentionally entrust something as powerful to every government in the world or potentially even every local police department? Or will it be just a privileged few governments that will have access?</p>
  35.  
  36. <p class="hero">The current crop of suggestions seem to concede that governments shouldn’t have direct access. Instead, they want services to backdoor themselves and act as gatekeepers to law enforcement. That’s not an improvement; it’s still centralized, and it makes these companies responsible for any misuse of the data that they have access to, requiring everyone on the planet to trust a few big tech companies with our private and most intimate conversations – hardly a direction that society wants to go in in 2024.  ‘Trust me, I’m in charge’ is a poor model of governance or security.</p>
  37.  
  38. <p>These ‘solutions’ also ignore the reality that the ‘bad guys’ will just use other tools to communicate; information is information. That will leave law abiding people giving up their privacy and security for little societal gain.</p>
  39.  
  40. <p class="hero">Law enforcement has more power than ever before because of digital technology. They are able to collect, process, summarise and track much more efficiently and at much greater scale. Genuinely new insights and capabilities are possible. So, when they want access to encrypted data because things have ‘gone dark’, it’s reasonable to ask ‘as compared to what?’</p>
  41.  
  42. <p>No one should have that much power, because messaging and other encrypted services have become people’s memories, their casual hallway chats, their intimate whispers. Yes, there is longstanding legal precedent for searching someone’s papers and home, but the barriers to doing so are considerable – not just those imposed by law, but also <em>physics</em>. There are few such inherent limits on a key that can trivially enable access to what amounts to anyone’s mind or identify anyone who thinks about a particular topic. Law enforcement struggles to solve real and serious problems, but the power they’re asking for is too vast and too easily misused, and they are failing to appreciate how it would operate on a global Internet.</p>
  43.  
  44. <p>One of the assumptions built into these calls is that if the tech community would <em><a href="https://www.eff.org/deeplinks/2023/07/uk-government-very-close-eroding-encryption-worldwide">just nerd harder</a></em>, a solution could be somehow magically found that preserved privacy and security while letting the ‘good guys’ have access. With all respect to the valuable work that law enforcement does to protect society, it’s equally as valid to ask them to <em>just police harder</em>.</p>]]>
  45.    </content>
  46.  </entry>
  47.  
  48.  <entry>
  49.    <title>Considerations for AI Opt-Out</title>
  50.    <link rel="alternate" type="text/html" href="https://www.mnot.net/blog/2024/04/21/ai-control" />
  51.    <id>https://www.mnot.net/blog/2024/04/21/ai-control</id>
  52.    <updated>2024-04-21T00:00:00Z</updated>
  53.    <author>
  54.        <name>Mark Nottingham</name>
  55.        <uri>https://www.mnot.net/personal/</uri>
  56.    </author>
  57.    <summary>Creating a Large Language Model (LLM) requires a lot of content – as implied by the name, LLMs need voluminous input data to be able to function well. Much of that content comes from the Internet, and early models have been seeded by crawling the whole Web.</summary>
  58.    
  59. <category term="Tech Regulation" />
  60.    
  61.    <content type="html" xml:lang="en" xml:base="https://www.mnot.net/blog/2024/04/21/ai-control">
  62.    <![CDATA[<p class="intro">Creating a Large Language Model (LLM) requires a <em>lot</em> of content – as implied by the name, LLMs need <a href="https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html">voluminous input data to be able to function well</a>. Much of that content comes from the Internet, and early models have been seeded by crawling the whole Web.</p>
  63.  
  64. <p>This now widespread practice of ingestion without consent is contentious, to put it mildly. Content creators feel that they should be compensated or at least have a choice about how their content is used; AI advocates caution that without easy access to input data, their ability to innovate will be severely limited, thereby curtailing the promised benefits of AI.</p>
  65.  
  66. <h3 id="the-policy-context">The Policy Context</h3>
  67.  
  68. <p>In the US, the Copyright Office has launched <a href="https://copyright.gov/ai">an initiative</a> to examine this and other issues surrounding copyright and AI. So far, they have avoided addressing the ingestion issue, but nevertheless it has come up repeatedly in their <a href="https://copyright.gov/ai/listening-sessions.html">public proceedings</a>:</p>
  69. <blockquote>
  70.  <p>“The interests of those using copyrighted materials for AI ingestion purposes must not be prioritized over the rights and interests of creators and copyright owners.” – <em>Keith Kupferschmid, Copyright Alliance</em></p>
  71. </blockquote>
  72.  
  73. <blockquote>
  74.  <p>“Training of AI language models begins with copying, which we believe has infringed our copyrights and has already deprived us of hundreds of millions of dollars in rightful revenues.  The additional violation of our moral right of attribution makes it impossible to tell which of our works have been copied to train AI and thus frustrates redress for either the economic infringement or the violation of our moral right to object to use of our work to train AI to generate prejudicial content. […] OpenAI, for example, has received a billion dollars in venture capital, none of which has been passed on to the authors of the training corpus even though, without that training corpus, chatGPT would be worthless.” – <em>Edward Hasbrouck, National Writers Union</em></p>
  75. </blockquote>
  76.  
  77. <p>It’s uncertain when (or if) the Copyright Office will provide more clarity on this issue. Also relevant in the US are the outcomes of cases like <a href="https://www.courtlistener.com/docket/66788385/getty-images-us-inc-v-stability-ai-inc/">Getty Images (US), Inc. v. Stability AI, Inc.</a></p>
  78.  
  79. <p>However, Europe has been more definitive about the ingestion issue. <a href="https://eur-lex.europa.eu/eli/dir/2019/790/oj">Directive 2019/790</a> says:</p>
  80. <blockquote>
  81.  <p>The [exception for copyright] shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
  82. </blockquote>
  83.  
  84. <p>This is reinforced by the <a href="https://www.europarl.europa.eu/news/en/press-room/20240308IPR19015/artificial-intelligence-act-meps-adopt-landmark-law">recently</a> adopted <a href="https://www.europarl.europa.eu/doceo/document/A-9-2023-0188-AM-808-808_EN.pdf">AI Act</a>:</p>
  85. <blockquote>
  86.  <p>Any use of copyright protected content requires the authorisation of the rightsholder concerned unless relevant copyright exceptions and limitations apply. Directive (EU) 2019/790 introduced exceptions and limitations allowing reproductions and extractions of works or other subject matter, for the purpose of text and data mining, under certain conditions. Under these rules, rightsholders may choose to reserve their rights over their works or other subject matter to prevent text and data mining, unless this is done for the purposes of scientific research. Where the rights to opt out has been expressly reserved in an appropriate manner, providers of general-purpose AI models need to obtain an authorisation from rightsholders if they want to carry out text and data mining over such works.</p>
  87. </blockquote>
  88.  
  89. <p>In other words, European law is about to require commercial AI crawlers to support an opt-out. However, it does not specify a particular mechanism: it only says that consent must be ‘expressly reserved in an appropriate manner.’</p>
  90.  
  91. <p>So, what might that opt-out signal look like?</p>
  92.  
  93. <h3 id="robotstxt-as-an-opt-out">Robots.txt as an Opt-Out</h3>
  94.  
  95. <p>Since most of the publicly available content on the Internet is accessed over the Web, it makes sense to consider how an opt-out might be expressed there as a primary mechanism. The Web already has a way for sites to opt-out of automated crawling: the <code class="language-plaintext highlighter-rouge">robots.txt</code> file, now specified by an <a href="https://www.rfc-editor.org/rfc/rfc9309.html">IETF Standards-Track RFC</a>.</p>
  96.  
  97. <p>At first glance, robots.txt intuitively maps to what’s required: a way to instruct automated crawlers on how to treat a site with some amount of granularity, including opting out of crawling altogether. Some LLMs have latched onto this it already; for example, OpenAI <a href="https://platform.openai.com/docs/gptbot">allows their crawler to be controlled by it</a>.</p>
  98.  
  99. <p>There are a lot of similarities between gathering Web content for search and gathering it for an LLM: the actual crawler software is very similar (if not identical), crawling the whole Web requires significant resources, and both uses create enormous potential value not only for the operators of the crawlers, but also for society.</p>
  100.  
  101. <p>However, it is questionable whether merely reusing to robots.txt as the opt-out mechanism is sufficient to allow rightsholders to fully express their reservation. Despite the similarities listed above, it is hard to ignore the ways that LLM ingest is different.</p>
  102.  
  103. <p>That’s because Web search can be seen as a service to sites; it makes them more discoverable on the Web, and is thus symbiotic – both parties benefit. LLM crawling, on the other hand, doesn’t have any benefits to the content owner, and may be perceived as harming them.</p>
  104.  
  105. <p>Through the lenses of those different purposes and their associated power dynamics, a few issues become apparent.</p>
  106.  
  107. <h3 id="1-usability-and-ecosystem-impact">1. Usability and Ecosystem Impact</h3>
  108.  
  109. <p>Robots.txt allows sites to target directives to bots in two different ways: by path on the site (e.g., <code class="language-plaintext highlighter-rouge">/images</code> vs. <code class="language-plaintext highlighter-rouge">/users</code>) and by User-Agent. The User-Agent identifies the bot, allowing sites to specify things like “I allow Google to crawl my site, but not Bing.” Or, “I don’t allow any bots.”</p>
  110.  
  111. <p>That might be adequate for controlling how your site appears in search engines, but problematic when applied to AI. Let’s look at an example.</p>
  112.  
  113. <p>To stop OpenAI from crawling your site, you can add:</p>
  114.  
  115. <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>User-Agent: GPTBot
  116. Disallow: /
  117. </code></pre></div></div>
  118.  
  119. <p>However, that directive doesn’t apply to Google, Mistral, or any other LLM-in-waiting out there; you’d have to target each individual one (and some folks are <a href="https://www.20i.com/blog/how-to-prevent-ai-from-scraping-your-website/">already advising on how to do that</a>).</p>
  120.  
  121. <p>If you miss one, that’s your fault, and it’ll be in that model forever, so careful (or just frustrated) people might decide to just ban everything:</p>
  122.  
  123. <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>User-Agent: *
  124. Disallow /
  125. </code></pre></div></div>
  126.  
  127. <p>But that has the downside of disallowing AI <em>and</em> search crawlers – even though presence in search engines is often critical to sites. To avoid that, you would have to enumerate all of the search engines and other bots that you want to allow, creating more work.</p>
  128.  
  129. <p>Significantly, doing so could also have a negative effect on the Web ecosystem: if sites have a stronger incentive to disallow unknown bots thanks to AI, it would be much harder to responsibly introduce new crawler-based services to the Web. That would tilt the table even further in the favour of already established ‘big tech’ actors.</p>
  130.  
  131. <p>There are two easy ways to fix these issues. One would be to define a special User-Agent that applies to <em>all</em> AI crawlers. For example:</p>
  132.  
  133. <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>User-Agent: AI-Ingest
  134. Disallow: /
  135. </code></pre></div></div>
  136.  
  137. <p>The other approach would be to create a new <a href="https://www.rfc-editor.org/rfc/rfc8615.html">well-known location</a> just for AI – for example <code class="language-plaintext highlighter-rouge">/.well-known/ai.txt</code>. That file might have the same syntax as <code class="language-plaintext highlighter-rouge">robots.txt</code>, or its notoriously quirky syntax could be ditched for something more modern.</p>
  138.  
  139. <p>Either solution above would make it easy for a site to opt-out of AI crawling of any sort without enumerating all of the potential AI crawlers in the world, and without impacting their search engine coverage or creating ecosystem risk.</p>
  140.  
  141. <p>I suspect that many have been assuming that one of these things will happen; they’re fairly obvious evolutions of existing practice. However, at least two more issues are still unaddressed.</p>
  142.  
  143. <h3 id="2-previously-crawled-content">2. Previously Crawled Content</h3>
  144.  
  145. <p>Web search and LLMs also differ in how they relate to time.</p>
  146.  
  147. <p>A search engine crawler has a strong interest in assuring that its index reflects the <em>current</em> Web. LLM crawlers, on the other hand, are ravenous without regard to its age or current availability on the Web. Once ingested content forms part of a model, they add value to that model for the lifetime of its use – and the model often persists for months or even years after the ingested content was obtained. Furthermore, that content might be reused to create future models, indefinitely.</p>
  148.  
  149. <p>That means that a content owner who isn’t aware of the LLM crawler <em>at crawl time</em> doesn’t have any recourse. From the Copyright Office sessions:</p>
  150.  
  151. <blockquote>
  152.  <p>We believe that writers should be compensated also for past training since it appears that the massive training that has already occurred for GPT and Bard to teach the engines to think and to write has already occurred[.] – <em>Mary Rasenberger, The Authors Guild</em></p>
  153. </blockquote>
  154.  
  155. <p>This shortcoming could be addressed by a relatively simple measure: stating that the policy for a given URL applies to any use of content obtained from that URL at model creation time, <em>regardless of when it was obtained</em>.</p>
  156.  
  157. <p>A significant amount of detail would need to be specified to make this work, of course. It would also likely necessitate some sort of grandfathering or transition period for existing models.</p>
  158.  
  159. <p>Needless to say, the impact of this kind of change <em>could</em> be massive: if 90% of the sites in the world opt out in this fashion (a la <a href="https://www.theverge.com/2021/10/31/22756135/apple-app-tracking-transparency-policy-snapchat-facebook-twitter-youtube-lose-10-billion">App Tracking Transparency</a>), it would be difficult to legally construct a new model (or at least market or use such a model in Europe, under the forthcoming rules).</p>
  160.  
  161. <p>On the other hand, if that many people don’t want to allow LLMs to use their content when offered a genuine chance to control it, shouldn’t their rights be honoured? Ultimately, if that’s the outcome, society will need to go back to the drawing board and figure out what it values more: copyright interests or the development of LLMs.</p>
  162.  
  163. <h3 id="3-control-of-metadata">3. Control of Metadata</h3>
  164.  
  165. <p>Another issue with reusing robots.txt is how that file itself is controlled. As a site-wide metadata mechanism, there is only one controller for robots.txt: the site administrator.</p>
  166.  
  167. <p>That means that on Facebook, Meta will decide whether your photos can be used to feed AI (theirs or others’), not you. On GitHub, Microsoft will decide how your repositories will be treated. And so on.</p>
  168.  
  169. <p>While robots.txt is great for single-owner sites (like this one), it doesn’t meet the needs of a concentrated world – it leverages the power that accrues to a small number of platform owners to decide policy for all of their users.</p>
  170.  
  171. <p>Avoiding that outcome means that users need to be able express their preference in the content itself, so that it persists no matter where it ends up. That means it’s necessary to be able to embed policy in things like images, videos, audio files, document formats like PDF, Office, and ePub, containers like ZIP files, file system paths for things like git repos, and so on. Assuming that a robots.txt-like approach is also defined, their relative precedence will also need to be specified.</p>
  172.  
  173. <p>Luckily, this is not a new requirement – our industry has considerable experience in embedding such metadata into file formats, for use cases like <a href="https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html">content provenance</a>. It just needs to be specified for AI control.</p>
  174.  
  175. <h3 id="whats-next">What’s Next?</h3>
  176.  
  177. <p>Policy decisions like that just made by Europe might be the drivers of change in LLM ingest practices, but I hope I’ve shown that the technical details of that ‘appropriate manner’ of opting out can significantly steer power between AI companies and content owners.</p>
  178.  
  179. <p>Notably, while the worldwide copyright regime is explicitly opt-in (i.e., you have to explicitly offer a license for someone to legally use your material, unless fair use applies), the European legislation changes this to opt-out for AI.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> Given that, offering content owners a genuine opportunity to do so is important, in my opinion.</p>
  180.  
  181. <p>I’ve touched on a few aspects that influence that opportunity above; I’m sure there are more.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> As I implied at the start, getting the balance right is going to take careful consideration and perhaps most importantly, sunlight.</p>
  182.  
  183. <p>However, It’s not yet clear where or how this work will happen. Notably, the <a href="https://ec.europa.eu/transparency/documents-register/detail?ref=C(2023)3215&amp;lang=en">standardisation request to the European Standardisation Organisations in support of safe and trustworthy artificial intelligence</a> does not mention copyright at all. Personally, I think that’s a good thing – worldwide standards need to be in open international standards bodies like the IETF, not regionally fragmented.</p>
  184.  
  185. <p>In that spirit, the IETF has recently created a <a href="https://www.ietf.org/mailman/listinfo/ai-control">mailing list to discuss AI control</a>. That’s likely the best place to follow up if you’re interested in discussing these topics.</p>
  186.  
  187. <div class="footnotes" role="doc-endnotes">
  188.  <ol>
  189.    <li id="fn:1" role="doc-endnote">
  190.      <p>See also <a href="https://eur-lex.europa.eu/eli/dir/2019/790/oj#rct_18">Recital 18</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
  191.    </li>
  192.    <li id="fn:2" role="doc-endnote">
  193.      <p>And I suspect other jurisdictions might follow the same approach; time will tell. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
  194.    </li>
  195.    <li id="fn:3" role="doc-endnote">
  196.      <p>For example, some of the input to the Copyright Office mentioned group licensing regimes. An opt-out mechanism could be adapted to support that. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
  197.    </li>
  198.  </ol>
  199. </div>]]>
  200.    </content>
  201.  </entry>
  202.  
  203.  <entry>
  204.    <title>There Are No Standards Police</title>
  205.    <link rel="alternate" type="text/html" href="https://www.mnot.net/blog/2024/03/13/voluntary" />
  206.    <id>https://www.mnot.net/blog/2024/03/13/voluntary</id>
  207.    <updated>2024-03-13T00:00:00Z</updated>
  208.    <author>
  209.        <name>Mark Nottingham</name>
  210.        <uri>https://www.mnot.net/personal/</uri>
  211.    </author>
  212.    <summary>It happens fairly often. Someone brings a proposal to a technical standards body like the IETF and expects that just because it becomes an RFC, people will adopt it. Or they’ll come across a requirement in an RFC and expect it to be enforced, perhaps with some kind of punishment. Or they’ll get angry that people don’t pay attention to an existing standard and do their own thing. This is so common that there’s a ready response widely used by IETF people in these situations:</summary>
  213.    
  214. <category term="Standards" />
  215.    
  216. <category term="Tech Regulation" />
  217.    
  218.    <content type="html" xml:lang="en" xml:base="https://www.mnot.net/blog/2024/03/13/voluntary">
  219.    <![CDATA[<p class="intro">It happens fairly often. Someone brings a proposal to a technical standards body like the IETF and expects that just because it becomes an RFC, people will adopt it. Or they’ll come across a requirement in an RFC and expect it to be enforced, perhaps with some kind of punishment. Or they’ll get angry that people don’t pay attention to an existing standard and do their own thing. This is so common that there’s a ready response widely used by IETF people in these situations:</p>
  220.  
  221. <p class="intro">“There are no standards police.”</p>
  222.  
  223. <p>In other words, even if you do consider Internet standards to be <a href="https://www.mnot.net/blog/2023/11/01/regulators">a regulatory force</a>, there is no <em>enforcement mechanism</em>. One of their key characteristics is that they’re <strong>voluntary</strong>. No one forces you to adopt them. No one can penalise you for violating a MUST; you have to want to conform.</p>
  224.  
  225. <p>Of course, you can still <em>feel</em> compelled to do so. If an interoperability standard gets broad adoption and everyone you want to communicate with expects you to honour it, you don’t have many options. For example, if you want to have a Web site, you need to interoperate with browsers; most of the time, they write down what they do in standards documents, and so you’ll need to conform to them.</p>
  226.  
  227. <p>But that’s the successful path. For every HTTP or HTML or TCP, there are hundreds of IETF RFCs, W3C Recommendations, and other standards documents that haven’t caught on – presumably much to their authors’ dismay. Adopting and using those documents was optional, and the market spoke: there wasn’t interest.</p>
  228.  
  229. <p class="hero">This aspect of the Internet’s standards has been critical to its success. If people were forced to adopt a specification just because some body had blessed it, it would place immense pressure on whatever process was used to create it. The stakes would be high because the future of the Internet would be on the line: businesses would play dirty; trolls would try to subvert the outcomes; governments would try to steer the results.</p>
  230.  
  231. <p>Of course, all of those things already happen in Internet standards; it’s just that the stakes are much lower.</p>
  232.  
  233. <p>So, voluntary adoption is a <em>proving function</em> – it means that not all of the weight of getting things right is on the standardisation process, and that process can be lighter than, for example, that used by the governments or the United Nations (I’ll get back to that in a minute). That’s important, because it turns out that it’s already incredibly difficult to create useful, successful, secure, private, performant, scalable, architecturally aligned technical specifications that change how the Internet works within all of the other natural constraints encountered; it’s threading-the-needle kind of stuff. And we need to be able to fail.</p>
  234.  
  235. <p class="hero">Historically, voluntary standards have been encouraged by governments in their purchasing and competition policies - for example,  <a href="https://www.federalregister.gov/documents/2016/01/27/2016-01606/revision-of-omb-circular-no-a-119-federal-participation-in-the-development-and-use-of-voluntary">OMB Circular A-119</a>, <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32012R1025">EU Regulation 1025/2012</a>, and the <a href="https://competition-policy.ec.europa.eu/system/files/2023-07/2023_revised_horizontal_guidelines_en.pdf">EC guidelines on horizontal agreements</a>. Standards bodies are a ‘safe space’ where competitors can cooperate without risking competition enforcement, so long as they follow a set of rules – and one of the biggest rules is that adoption should be voluntary, not mandatory or coerced (at least by those setting the standard).</p>
  236.  
  237. <p>But it’s no secret that the policy landscape for the Internet has changed drastically. Now, there is increasing interest in using interoperability standards as a mechanism to steer the Internet. Academics are <a href="https://hackcur.io/whats-wrong-with-loud-men-talking-loudly-the-ietfs-culture-wars/">diving deep into the cultures and mechanisms of technical standards</a>. Civil society folks are coming to technical standards bodies and <a href="https://datatracker.ietf.org/rg/hrpc/about/">trying to figure out how to incorporate human rights goals</a>.  <a href="https://themarket.ch/interview/tech-is-becoming-a-regulated-industry-ld.7520">Regulation is coming</a>, and policy experts are <a href="https://datatracker.ietf.org/doc/draft-hoffmann-gendispatch-policy-stakeholders/">trying to figure out how to get involved too</a>.</p>
  238.  
  239. <p>This influx has caused concern that that these relative newcomers are mistakenly focusing on standards as a locus of power when, in fact, the power is expressed in the <em>adoption</em> of a standardised technology. For example, Geoff Huston recently wrote an <a href="https://blog.apnic.net/2024/03/06/opinion-digital-sovereignty-and-standards/">opinion piece</a> along these lines.</p>
  240.  
  241. <p>I have no doubt that some still come to the IETF and similar bodies with such misapprehensions; we still have to remind people that ‘there are no standards police’ on a regular basis. However, I suspect that at least the policy people (including regulators) largely understand that it’s not that simple.</p>
  242.  
  243. <p>That’s because modern regulators are very aware that there are many influences on a regulatory space. They want to learn about the other forces acting on their target, as well as <a href="http://johnbraithwaite.com/responsive-regulation/">persuade and inform</a>. Similarly, those who are involved in policymaking are intensely aware of the diffuse nature of power. In short, their world view is more sophisticated than people give them credit for.</p>
  244.  
  245. <p>(All that said, I’m still interested and a bit nervous to see what <a href="https://www.un.org/techenvoy/global-digital-compact">Global Digital Compact</a> contains when it becomes public.)</p>
  246.  
  247. <p class="hero">Another concern is that governments might try to influence Internet standards to suit their purposes, and then exert pressure to make the results mandatory – short circuiting the proving function of voluntary standards.</p>
  248.  
  249. <p>Avoiding that requires separating the legal requirement from the standards effort, to give the latter a chance to fail. For example, <a href="https://datatracker.ietf.org/group/mimi/about/">MIMI</a> may or may not succeed in satisfying the DMA requirement for messaging interop. It is an attempt to establish voluntary standards that, if successful in the market, could satisfy legal regulatory requirements without using a preselecting standards venue.</p>
  250.  
  251. <p>Of course, that pattern is not new – for example, <a href="https://www.w3.org/WAI/">accessibility work in the W3C</a> is the basis of many regulatory requirements now, but wasn’t considered (AFAIK) by regulators until many years after its establishment.</p>
  252.  
  253. <p>Because of the newly intense focus on regulating technology, there’s likely to be increasing pressure on such efforts: both the pace and volume of standardisation will need to increase to meet the requirements that the standards bodies want to attempt to address. I suspect aligning the timelines and risk appetites of standards bodies and regulators are going to be some of the biggest challenges we’ll face if we want more successes.</p>
  254.  
  255. <p>So right now I believe the best way forward is to create ‘rails’ for interactions with legal regulators – e.g., improved communication, aligned expectations, and ways for an effort to be declined or to fail without disastrous consequences. Doing that will require some capacity building on the parts of standards bodies, but no fundamental changes to their models or decision-making processes.</p>
  256.  
  257. <p>This approach will not address everything. There are some areas where at least some regulators and the Internet standards community are unlikely to agree. Standards-based interoperability may not be realistically achievable in some instances, because of how entrenched a proprietary solution is. Decentralising a proprietary solution can face <a href="https://www.rfc-editor.org/rfc/rfc9518.html">many pitfalls</a>, and may be completely at odds with a centralized solution that already has broad adoption. And, most fundamentally, parties that are not inclined to cooperate can easily subvert a voluntary consensus process.</p>
  258.  
  259. <p>However, if things are arranged so that when conforming to a voluntary consensus standard that has seen wide review and market adoption is considered to be <em>prima facie</em> evidence of conformance to a regulatory requirement, perhaps we <em>do</em> sometimes have standards police, in the sense that legal requirements can be used to help kickstart standards-based interoperability where it otherwise wouldn’t get a chance to form.</p>]]>
  260.    </content>
  261.  </entry>
  262.  
  263.  <entry>
  264.    <title>RFC 9518 - What Can Internet Standards Do About Centralisation?</title>
  265.    <link rel="alternate" type="text/html" href="https://www.mnot.net/blog/2023/12/19/standards-and-centralization" />
  266.    <id>https://www.mnot.net/blog/2023/12/19/standards-and-centralization</id>
  267.    <updated>2023-12-19T00:00:00Z</updated>
  268.    <author>
  269.        <name>Mark Nottingham</name>
  270.        <uri>https://www.mnot.net/personal/</uri>
  271.    </author>
  272.    <summary>RFC 9518: Centralization, Decentralization, and Internet Standards has been published after more than two years of review, discussion, and revision.</summary>
  273.    
  274. <category term="Internet and Web" />
  275.    
  276. <category term="Standards" />
  277.    
  278. <category term="Tech Regulation" />
  279.    
  280.    <content type="html" xml:lang="en" xml:base="https://www.mnot.net/blog/2023/12/19/standards-and-centralization">
  281.    <![CDATA[<p class="intro"><a href="https://www.rfc-editor.org/rfc/rfc9518.html">RFC 9518: Centralization, Decentralization, and Internet Standards</a> has been published after more than two years of review, discussion, and revision.</p>
  282.  
  283. <p>It’s no secret that most people have been increasingly concerned about Internet centralization over the last decade or so. Having one party (or a small number of them) with a choke hold over any important part of the Internet is counter to its nature: as a ‘network of networks’, the Internet is about fostering relationships between <em>peers</em>, not allowing power to accrue to a few.</p>
  284.  
  285. <p>As I’ve <a href="/blog/2023/11/01/regulators">discussed previously</a>, Internet standards bodies like the <a href="https://www.ietf.org/">IETF</a> and <a href="https://www.w3.org/">W3C</a> can be seen as a kind of regulator, in that they constrain the behaviour of others. So it’s natural to wonder whether they can help avoid or mitigate Internet centralization.</p>
  286.  
  287. <p>I started drafting a document that explored these issues when I was a member of the <a href="https://iab.org/">Internet Architecture Board</a>. That eventually became <a href="https://datatracker.ietf.org/doc/draft-nottingham-avoiding-internet-centralization/">draft-nottingham-avoiding-internet-centralization</a>, which became an Independent Stream RFC today.</p>
  288.  
  289. <p>But it was a long journey. I started this work optimistic that standards could make a difference, in part because Internet standards bodies are (among many things) communities of people who are deeply invested in the success of the Internet, with a set of shared <a href="https://www.rfc-editor.org/rfc/rfc8890.html">end user-focused</a> values.</p>
  290.  
  291. <p>That optimism was quickly tempered. After digging into the mechanisms that we have available, the way that the markets work, and the incentives on the various actors, it became apparent that it was unrealistic to expect that standards documents – which of course don’t have any intrinsic power or authority if no one implements them – are up to the task of controlling centralization.</p>
  292.  
  293. <p>Furthermore, centralization is inherently difficult to eradicate: while you can reduce or remove some forms of it, it has a habit of popping up elsewhere.</p>
  294.  
  295. <p>That doesn’t mean that standards bodies should ignore centralization, or that there isn’t anything they can do to improve the state of the world regarding it (the RFC explores several); rather, that we should not expect standards to be sufficient to effectively address it on their own.</p>
  296.  
  297. <p>You can read <a href="https://www.rfc-editor.org/rfc/rfc9518.html">the RFC</a> for the full details. It covers what centralization is, how it can be both beneficial and harmful, the decentralization strategies we typically use to control it, and finally what Internet standards bodies can do.</p>
  298.  
  299. <p>One final note: I’d be much less satisfied with the result if I hadn’t had the excellent reviews that Eliot Lear (the Independent Submissions Editor) sourced from <a href="https://www.apnic.net/about-apnic/team/geoff-huston/">Geoff Huston</a> and <a href="https://www.internetgovernance.org/people/milton-mueller/">Milton Mueller</a>. Many thanks to them and everyone else who contributed.</p>]]>
  300.    </content>
  301.  </entry>
  302.  
  303.  <entry>
  304.    <title>How to Run an Australian Web Site in 2024</title>
  305.    <link rel="alternate" type="text/html" href="https://www.mnot.net/blog/2023/11/27/esafety-industry-standards" />
  306.    <id>https://www.mnot.net/blog/2023/11/27/esafety-industry-standards</id>
  307.    <updated>2023-11-27T00:00:00Z</updated>
  308.    <author>
  309.        <name>Mark Nottingham</name>
  310.        <uri>https://www.mnot.net/personal/</uri>
  311.    </author>
  312.    <summary>A while back, the eSafety Commissioner declined to register the proposed Industry Codes that I’ve previously written about. Now, they’ve announced a set of Industry Standards that, after a comment period, will likely be law.</summary>
  313.    
  314. <category term="Australia" />
  315.    
  316.    <content type="html" xml:lang="en" xml:base="https://www.mnot.net/blog/2023/11/27/esafety-industry-standards">
  317.    <![CDATA[<p class="intro">A while back, the eSafety Commissioner declined to register the proposed Industry Codes that <a href="https://www.mnot.net/blog/2022/09/11/esafety-industry-codes">I’ve previously written about</a>. Now, they’ve announced a set of <a href="https://www.esafety.gov.au/industry/codes/standards-consultation">Industry Standards</a> that, after a comment period, will likely be law.</p>
  318.  
  319. <p>If you run an online service that’s accessible to Australians, these Standards will apply to you. Of course, if you don’t live here, don’t do business here, and don’t want to come here, you can <em>probably</em> ignore them.</p>
  320.  
  321. <p>Assuming you do fall into one of those buckets, this post tries to walk through the implications, as a list of questions you’ll need to ask yourself.</p>
  322.  
  323. <p>I’m going to try to focus on the practical implications, rather than “showing my work” by deep-diving into the text of the standards and <a href="https://www.legislation.gov.au/Details/C2022C00052">supporting legislation</a>. This is based only upon my reading of the documents and a miniscule dollop of legal education; if there are things that I get wrong, corrections and suggestions are gladly taken. Note that this is not legal advice, and the Standards might change before they’re registered.</p>
  324.  
  325. <h3 id="does-the-standard-apply-to-your-service">Does the Standard Apply to Your Service?</h3>
  326. <p>The first question to answer is whether your service is covered by the <a href="https://www.esafety.gov.au/sites/default/files/2023-11/Draft%20Online%20Safety%20%28Designated%20Internet%20Services-Class%201A%20and%20Class%201B%20Material%29%20Industry%20Standard%202024.pdf">Online Safety (Designated Internet Services – Class 1A and Class 1B Material) Industry Standards 2024</a>.</p>
  327.  
  328. <p>The short answer is “yes, even <em>that</em> one.”</p>
  329.  
  330. <p>A Designated Internet Service (DIS) is one that allows “end-users to access material using an Internet carriage service.” This is a very broad definition that explicitly applies to Web sites. For simplicity, the remainder of this article will assume your service is a Web site, even though other information services can be a DIS.</p>
  331.  
  332. <p>In a nutshell, if “none of the material on the service is accessible to, or delivered to, one or more end-users in Australia”, your site is exempt. Otherwise, it’s covered (unless one of the other Codes or Standards takes precedence; see below).</p>
  333.  
  334. <p>So whether you’re Elon Musk or you have a personal Web site with no traffic, this standard applies to you, so long as it’s available to one Australian person – even if none actually visit. Don’t be fooled by “Industry” in the title. That default page that your Web server comes up with when your new Linux box boots for the first time? Covered. Note that it doesn’t even need to be on the <em>public</em> Internet; things like corporate Intranet sites are covered, as are content-free static sites like those used to park domains.</p>
  335.  
  336. <p>Given how broadly the legislation and standard are written, combined with how prevalent HTTP and similar protocols are on today’s Internet, it’s also reasonable to say that APIs are covered; there’s no inherent restrictions on formats or protocols in the eSafety standards – in fact, the definition of <em>material</em> in the Act includes “data”.</p>
  337.  
  338. <p>So, to be safe, <em>any</em> server available on the Internet is covered by the eSafety scheme, so long as it <em>can</em> be accessed by Australians.</p>
  339.  
  340. <h3 id="do-you-need-a-risk-assessment">Do You Need a Risk Assessment?</h3>
  341. <p>Assuming that your site is covered by the Standard, your next step is to figure out whether you need to perform a risk assessment.</p>
  342.  
  343. <p>Assuming that you’re not running a large commercial web site, a (ahem) “high impact” service (i.e., one that specialises in porn, violent content, and similar), or an AI-flavoured service, there are two interesting categorise that might get you out of performing a risk assessment.</p>
  344.  
  345. <p>The first is a “pre-assessed general purpose DIS.” You can qualify for this if you don’t allow users in Australia to post any material (including comments), or if posting is “to review or provide information on products, services, or physical points of interest or locations made available on the service.” It’s also OK if they are “sharing […] with other end-users for a business, informational, or government service or support purpose.”<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
  346.  
  347. <p>Does it seem like your site qualifies? Not so fast; that only covers “pre-assessment.” A <em>general purpose DIS</em> is a</p>
  348.  
  349. <blockquote>
  350.  <p>website or application that […] primarily provides information for business, commerce, charitable, professional, health, reporting news, scientific, educational, academic research, health, reporting news, scientific, educational, academic research, government, public service, emergency, or counselling and support service purposes.</p>
  351. </blockquote>
  352.  
  353. <p>Unless your site falls cleanly into one of those categories, you don’t have a general purpose DIS.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
  354.  
  355. <p>The second is an “enterprise DIS.” This is a site where “the account holder […] is an organisation (and not an individual).” Basically, if your users are companies or other organisations and not individual people, you don’t have to do an assessment.</p>
  356.  
  357. <h3 id="what-does-your-risk-assessment-contain">What Does Your Risk Assessment Contain?</h3>
  358. <p>Assuming you need a risk assessment (spoiler: you probably do, to be safe), you</p>
  359.  
  360. <blockquote>
  361.  <p> must formulate in writing a plan, and a methodology, for carrying out the assessment that ensure that the risks mentioned in subsection 8(1) in relation to the service are accurately assessed.</p>
  362. </blockquote>
  363.  
  364. <p>The risk referred to is that class 1A or class 1B material will be “generated or accessed by, or distributed by or to, end-users in Australia using the service.” Storage of such material is also included (even if it isn’t accessed).</p>
  365.  
  366. <p>To answer your next question, class 1A material is “child sexual exploitation material”, “pro-terror material”, or “extreme crime and violence material.” class 1B material is “crime and violence material” and “drug-related material.” There are long definitions of each of these kinds of material in the standard; I won’t repeat them here.</p>
  367.  
  368. <p>Your risk assessment must “undertake a forward-looking analysis” of what’s likely to change both inside and outside of your service, along with the impact of those changes. It’s also required to “specify the principle matters to be taken into account”, including eleven factors such as “the ages of end-users and likely end-users”, “safety by design guidance”, AI risks, terms of use, and so forth.</p>
  369.  
  370. <p>Your risk assessment has to be written down in detail. You must also “ensure that [it] is carried out by persons with the relevant skills, experience, and expertise” – although it’s not yet clear what that means in practice or how it will be enforced.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup></p>
  371.  
  372. <h3 id="whats-your-risk-profile">What’s Your Risk Profile?</h3>
  373. <p>Once you’ve done a risk assessment, you’ll have a risk profile – one of Tier 1, Tier 2, or Tier 3.</p>
  374.  
  375. <p>Let’s assume your site has no user-generated content, and you only upload very… normal… content– like this site.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup> You’re likely to be Tier 3.</p>
  376.  
  377. <p>If so, congratulations! Your work is just about done. Sections 34, 40, and 41 of the Standard apply to you – basically, the eSafety Commissioner can demand that you provide them with your risk assessment and how you arrived at it. You also have to investigate complaints, and keep records.</p>
  378.  
  379. <p>If you’re not Tier 3 – for example, you blog about drugs or crime, or you allow user uploads or comments, there are a whole slew of requirements you’ll need to conform to, which are well out of scope for this blog entry (since I’m mostly interested in the impact of regulation on small, non-commercial sites). Tip: get some professional help, quickly.</p>
  380.  
  381. <h3 id="what-other-standards-will-apply">What Other Standards Will Apply?</h3>
  382. <p>Keep in mind that we’ve gone through just one of the proposed Standards above. The other one is about <a href="https://www.esafety.gov.au/sites/default/files/2023-11/Draft%20Online%20Safety%20%28Relevant%20Electronic%20Services%20-%20Class%201A%20and%20Class%201B%20Material%29%20Industry%20Standard%202024%20_0.pdf">e-mail and chat services</a>, so if you run a mail server (of any flavour – maybe even on your infrastructure?), a chat server (e.g., Prosody, jabberd), or Mastodon server, buckle up.</p>
  383.  
  384. <p>There are also another set of <a href="https://www.esafety.gov.au/industry/codes/register-online-industry-codes-standards">Industry Codes</a> that cover things like hosting services, app stores, social media, search engines, and operating systems, if you happen to provide one of those.</p>
  385.  
  386. <p>Keep in mind that if you change anything on your site that impacts risk (e.g., adding a comment form), you’ll need to re-assess your risk (and likely conform to new requirements for reporting, etc.).</p>
  387.  
  388. <h3 id="what-does-enforcement-look-like">What Does Enforcement Look Like?</h3>
  389. <p>There are a <em>lot</em> of small Internet services out there – there are a lot of IP addresses and ports, after all.  I suspect many people running them will ignore these requirements – either because they don’t know about them, they think they’re too small, that the eSafety Commissioner won’t care about their site, or they’re willing to run the risk.</p>
  390.  
  391. <p>What <em>is</em> the risk, though?</p>
  392.  
  393. <p>Section 146 of the Online Safety Act 2021 sets the penalty for not complying with an Industry Standard at 500 penalty units – currently, AU$156,500 (a bit more than US$100,000).</p>
  394.  
  395. <p>In practice, the eSafety Commissioner is unlikely to come after any site if its content isn’t problematic in their eyes. Whether you want to rely upon that is up to you. Because the legislation and standard don’t have any exemptions for small services – even with limited audiences – you are relying upon their discretion if you don’t have a risk assessment ready for them.</p>
  396.  
  397. <h3 id="what-do-you-really-think">What Do You Really Think?</h3>
  398. <p>Improving online safety is an important task that needs more focus from society, and I’m proud that Australia is trying to improve things in this area. I’m critical of the eSafety Industry Codes and now Standards not because of their objective, but because of their unintended side effects.</p>
  399.  
  400. <p>Both the enabling instrument and this delegated legislation are written without consideration for the chilling effects and regulatory burden they create on parties that are arguably not its target. Requiring professional risk assessment raises costs for everyone, and creates incentives to just use big tech commercial services, rather than self host – leaning us further into things being run by a few, big companies.</p>
  401.  
  402. <p>Moreover, if a small personal site is distributing child porn or inciting terrorism, they’re not going to be caught because it doesn’t have a properly considered risk assessment ready to produce on demand – the eSafety Commissioner already has a range of other powers they can use in that case. They don’t have the resources to go after the countless small services out there for compliance issues, so all that will remain is the lingering chilling effects of these pointless requirements.</p>
  403.  
  404. <p>I get that most people will ignore these requirements, and the eSafety Commissioner is presumably relying upon that to give them the leeway to go after the people they need to target. I just think that creating laws that can be applied with so much discretion – where technically everyone is in violation, and the regulator can pick who they prosecute – is a shitty way to run a democracy.</p>
  405.  
  406. <div class="footnotes" role="doc-endnotes">
  407.  <ol>
  408.    <li id="fn:1" role="doc-endnote">
  409.      <p>Is it just me, or is “informational” a hole big enough to drive a truck through here? <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
  410.    </li>
  411.    <li id="fn:2" role="doc-endnote">
  412.      <p>Notably, the site you’re reading this on doesn’t clearly qualify for any of them, and so when these codes are registered, I’ll likely be doing a risk assessment (and posting it), even though it doesn’t allow comments any more (because, spam). <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
  413.    </li>
  414.    <li id="fn:3" role="doc-endnote">
  415.      <p>This seems to foretell the establishment of a new industry. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
  416.    </li>
  417.    <li id="fn:4" role="doc-endnote">
  418.      <p>Although it’s always tempting to write a blog entry that <em>depicts, expresses or otherwise deals with matters of drug misuse or addiction in such a way that the material offends against the standards of morality, decency and propriety generally accepted by reasonable adults to the extent that the material should be classified RC</em>. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
  419.    </li>
  420.  </ol>
  421. </div>]]>
  422.    </content>
  423.  </entry>
  424.  
  425. </feed>
  426.  

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

  1. Download the "valid Atom 1.0" banner.

  2. Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)

  3. Add this HTML to your page (change the image src attribute if necessary):

If you would like to create a text link instead, here is the URL you can use:

http://www.feedvalidator.org/check.cgi?url=http%3A//www.mnot.net/blog/index.atom

Copyright © 2002-9 Sam Ruby, Mark Pilgrim, Joseph Walton, and Phil Ringnalda