FEED Validator

for Atom and RSS and KML

Congratulations!

This is a valid RSS feed.

Recommendations

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 36, column 0: content:encoded should not contain fetchpriority attribute [help]

										<content:encoded><![CDATA[<p><a href="http://redmonk.com/sogrady/f ...

line 36, column 0: content:encoded should not contain decoding attribute (11 occurrences) [help]
```
										<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/f ...
```
line 36, column 0: content:encoded should not contain sizes attribute (10 occurrences) [help]
```
										<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/f ...
```
line 75, column 0: content:encoded should not contain loading attribute (7 occurrences) [help]
```
										<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/f ...
```

Source: http://redmonk.com/sogrady/feed/

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
xmlns:georss="http://www.georss.org/georss"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
>
<channel>
<title>tecosystems</title>
<atom:link href="https://redmonk.com/sogrady/feed/" rel="self" type="application/rss+xml" />
<link>https://redmonk.com/sogrady</link>
<description>because technology is just another ecosystem</description>
<lastBuildDate>Wed, 03 Sep 2025 00:54:52 +0000</lastBuildDate>
<language>en-US</language>
<sy:updatePeriod>
hourly </sy:updatePeriod>
<sy:updateFrequency>
1 </sy:updateFrequency>
<generator>https://wordpress.org/?v=6.8.2</generator>
<item>
<title>DocumentDB and the Future of Open Source</title>
<link>https://redmonk.com/sogrady/2025/09/02/documentdb/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Tue, 02 Sep 2025 16:44:28 +0000</pubDate>
<category><![CDATA[Databases]]></category>
<category><![CDATA[Open Source]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6088</guid>
<description><![CDATA[Daniel Case, CC BY-SA 3.0, via Wikimedia Commons It can be difficult to remember in the wake of PostgreSQL’s ongoing renaissance, but MySQL was for decades the default open source relational database, which in those days meant it was the default open source database. While the former’s evolution from Ingres was unfolding slowly over the]]></description>
<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/files/2025/09/1024px-Bayer_Aspirin_and_store-brand_generic_on_Canadian_drugstore_shelf.jpg"><img fetchpriority="high" decoding="async" src="http://redmonk.com/sogrady/files/2025/09/1024px-Bayer_Aspirin_and_store-brand_generic_on_Canadian_drugstore_shelf.jpg" alt="" width="1024" height="857" class="aligncenter size-full wp-image-6089" srcset="https://redmonk.com/sogrady/files/2025/09/1024px-Bayer_Aspirin_and_store-brand_generic_on_Canadian_drugstore_shelf.jpg 1024w, https://redmonk.com/sogrady/files/2025/09/1024px-Bayer_Aspirin_and_store-brand_generic_on_Canadian_drugstore_shelf-300x251.jpg 300w, https://redmonk.com/sogrady/files/2025/09/1024px-Bayer_Aspirin_and_store-brand_generic_on_Canadian_drugstore_shelf-768x643.jpg 768w, https://redmonk.com/sogrady/files/2025/09/1024px-Bayer_Aspirin_and_store-brand_generic_on_Canadian_drugstore_shelf-480x402.jpg 480w, https://redmonk.com/sogrady/files/2025/09/1024px-Bayer_Aspirin_and_store-brand_generic_on_Canadian_drugstore_shelf-749x627.jpg 749w" sizes="(max-width: 1024px) 100vw, 1024px" /></a>
<a href="https://commons.wikimedia.org/wiki/File:Bayer_Aspirin_and_store-brand_generic_on_Canadian_drugstore_shelf.jpg">Daniel Case</a>, <a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>, via Wikimedia Commons
It can be difficult to remember in the wake of PostgreSQL’s ongoing renaissance, but MySQL was for decades the default open source relational database, which in those days meant it was the default open source database. While the former’s evolution from Ingres was unfolding slowly over the late 1980’s and early 1990’s, MySQL was developed in 1994, released in 1995 and capitalized on the growing popularity of open source to become nearly ubiquitous. It was the default database in the most popular open source tech stack of the time, LAMP, and most developer texts assumed MySQL as the database.
While both MySQL and PostgreSQL were both open source databases, however, their development models were quite distinct. MySQL was built on a <a href="https://redmonk.com/sogrady/2008/03/16/open-source-licensing-obsolete-or-of-importance/">dual license</a> model, which allows customers to buy their way out of the licensing terms of the GPL they cannot or choose not to comply with. To be able to issue a dual license, however, MySQL has to own copyright to all of the code outright. In practice, this meant that the development burden was almost entirely on MySQL, because few commercial organizations are willing to write code just so another commercial entity can exclusively monetize it. This was and is the single entity open source model.
PostgreSQL, by contrast, was originally released not under a more restrictive copyleft license like the GPL, but the vanity PostgreSQL license with terms similar to the permissive MIT. Because the license imposed essentially no restrictions on usage of the project, any and all parties were free to use, modify and distribute the software as they saw fit – even if that meant closing the project off and making it proprietary. This licensing choice involved a trade off. On the one hand, no one vendor could exclusively monetize the software, but on the other, it allowed for the growth of a wider ecosystem with more participants in the project. Over time, PostgreSQL became a canonical example along with others like Linux of a multi-entity open source project, where multiple parties – even competitors – come together to collaborate on a project collectively.
For the better part of a decade after MySQL’s release, and for a wide variety of reasons, it dominated PostgreSQL from a visibility standpoint. Since that time, however, its dominance has steadily eroded as Google Trends documents here.
<a href="http://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-11.59.44@2x.png"><img decoding="async" src="http://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-11.59.44@2x-1024x747.png" alt="" width="1024" height="747" class="aligncenter size-large wp-image-6090" srcset="https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-11.59.44@2x-1024x747.png 1024w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-11.59.44@2x-300x219.png 300w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-11.59.44@2x-768x561.png 768w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-11.59.44@2x-480x350.png 480w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-11.59.44@2x-859x627.png 859w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-11.59.44@2x.png 1118w" sizes="(max-width: 1024px) 100vw, 1024px" /></a>
Eroded to the point, in fact, that MySQL has been surpassed by PostgreSQL at least in public visibility over the last five years.
<a href="http://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-12.19.27@2x.png"><img decoding="async" src="http://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-12.19.27@2x-1024x755.png" alt="" width="1024" height="755" class="aligncenter size-large wp-image-6091" srcset="https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-12.19.27@2x-1024x755.png 1024w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-12.19.27@2x-300x221.png 300w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-12.19.27@2x-768x566.png 768w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-12.19.27@2x-480x354.png 480w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-12.19.27@2x-107x80.png 107w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-12.19.27@2x-850x627.png 850w, https://redmonk.com/sogrady/files/2025/09/CleanShot-2025-09-02-at-12.19.27@2x.png 1104w" sizes="(max-width: 1024px) 100vw, 1024px" /></a>
MySQL is still an enormously popular database, to be clear. But it is no longer the dominant project, nor the default. It has ceded that title to PostgreSQL. Where, for example, did Databricks and Snowflake turn when their growth from data lake to more broadly capable data platforms required relational database capabilities? To PostgreSQL vendors – Neon and Crunchy Data respectively. When developers are building applications today, more often than not the assumption is that the database will be PostgreSQL, not MySQL.
There are many reasons for this change in fortunes, and no single project characteristic is responsible for it. But PostgreSQL advocates and vendors typically point to the advantages of its broader community, which again is enabled by its liberal license. Multi-entity open source means more commercial options, which large enterprises favor because competition limits vendors’ leverage with respect to pricing and it limits their risk to vendor behavior and changes. It can also mean broader project support and faster innovation, as evidenced by the speed at which PostgreSQL has been able to adapt to emerging market demands by adding the ability to handle new workloads from JSON to vector.
At present, then, the market is favoring a multi-entity solution for its relational database needs. Redis, meanwhile, dominated the in-memory key-value category for years, but a licensing change created a rift in that community and opened a path for a multi-entity Redis alternative in <a href="https://redmonk.com/sogrady/2024/07/16/post-valkey-world/">Valkey</a>. The jury is still out on which path the industry will choose in that case.
All of which brings us to DocumentDB.
Not, oddly enough, the AWS product of that name. Microsoft, apparently, has its own project called DocumentDB which it has donated to the Linux Foundation. This raises an immediate and obvious question about the DocumentDB trademark, but without any answers available that will have to be set aside for the moment other than to note that the LF now owns it.
Microsoft’s DocumentDB is a project built, in essence, to layer MongoDB API compatibility on to a PostgreSQL database. Unlike MongoDB, which is licensed under the non-open source Server Side Public License (SSPL), DocumentDB has been released under an MIT license. That licensing choice, along with its donation to a neutral foundation, is presumably why it was able to attract support from AWS, Cockroach, Crunchy, Google, Supabase, Yugabyte and others joining Microsoft in the launch.
This is not, of course, the first attempt at providing a MongoDB compatible alternative database. Microsoft’s Cosmos DB has been offering this for years, as has AWS with DocumentDB and more recently Google announced MongoDB compatibility for its Firestore product. Percona, meanwhile, offers vanilla MongoDB support. Lastly FerretDB, for its part, has also loudly and prominently positioned itself as a drop-in MongoDB alternative, to the extent that the latter party took exception and sued the former.
Prior to April 2021, MongoDB’s claims likely would have included copyright violations, but in the wake of <a href="https://redmonk.com/sogrady/2025/06/09/open-source-apis/">Google v Oracle</a>, those would seem unlikely charges to be sustained. Instead, MongoDB has alleged patent infringement, misuse of its trademark and unfair competition. As that litigation is still pending, there’s not much to take away from it and it could well be a simple tactic rather than a genuine attempt to prove the claims.
What we know, however, is that MongoDB – unlike some other examples here like Redis – is a single entity project. MongoDB is responsible for the entirety of their codebase, which is why they were able to unilaterally relicense the project from the AGPL to the SSPL in 2018. The SSPL, in fact, is an attempt to preserve MongoDB’s exclusivity by increasing the friction to offering the database as a service.
On the one hand, all of this attention and competition can in some sense be regarded as flattering to MongoDB. Out of all of the possible database projects and document databases, the industry has de facto agreed on MongoDB’s API as the industry standard. It is also possible that being that de facto standard will increase – potentially dramatically – the size of the addressable market, thereby offering MongoDB a larger opportunity to target.
On the other hand, the company clearly believes, as many of its database peer companies do judging by the licensing trends, that exclusivity is key to its present and future success. What’s difficult to see, however, is how that’s achievable in a world in which the Supreme Court has strongly suggested that copyright does not apply to APIs and all three large hyperscalers, the largest open source foundation and a selection of startups all feel comfortable from a legal standpoint publicly invoking MongoDB’s name.
One potential response might be found in the consumer packaged goods sector. It has been common practice for years to have a store brand, and a premium brand. In many cases they are the same, or at least a highly similar, product, and yet the premium brands survive because consumers are willing to pay a premium for an item perceived to be higher value.
The biggest and best opportunity for this model in the technology sector, coincidentally, also came from Microsoft. Many years ago, it was argued in this space that Microsoft – whose .NET stack and C# language were highly regarded technically but virtually non-existent outside of its own Windows ecosystem – could and should have given both of those products a chance to better compete via the Mono project. Originally the brainchild of Ximian, Miguel de Icaza and Nat Friedman’s open source startup, Mono was an alternative, open source friendly .NET runtime. Simply by offering the project a patent amnesty, and thereby removing the Damoclean sword of potential IP litigation, Microsoft could have overnight given itself a credible .NET story for the flood of new Linux servers arriving in market every day. Importantly, it could also have sold effectively against the alternative stack by positioning itself as the premium brand to the store brand. Unfortunately, however, this model was never tested as open source itself was anathema to Microsoft at the time, and a third rail issue that would never go anywhere strategically.
However MongoDB the company navigates the DocumentDB project announcement, it will be worth tracking more broadly the performance of single entity open source projects versus the growing popularity of multi-entity alternatives. Project and product selection is rarely if ever likely to come down to that single characteristic and product decisions are inherently more complicated, but the industry – buyers and sellers of technology alike – is increasingly investing into open source communities that are licensed in such a way that they cannot be controlled by one single entity. What the returns on those investments will be in future will have an enormous impact on the direction of the industry and the health of open source itself.
Disclosure: AWS, Crunchy Data, Google, Microsoft, MongoDB, Oracle (MySQL) and Percona are RedMonk customers. Cockroach, Databricks/Neon, Redis, Supabase and Yugabyte are not currently customers.
]]></content:encoded>
</item>
<item>
<title>The Cyber Resilience Act: A Five Alarm Fire</title>
<link>https://redmonk.com/sogrady/2025/08/06/cra-five-alarm-fire/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Wed, 06 Aug 2025 13:34:13 +0000</pubDate>
<category><![CDATA[Security]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6085</guid>
<description><![CDATA[On October 21, 2016, CNN’s website was knocked offline. So was the BBC and Guardian’s. Amazon, Etsy and Shopify too, along with Quora, Reddit, and Twitter – among others. Huge swaths of the internet were taken down by a series of attacks on the DNS provider Dyn. These Distributed Denial of Service (DDoS) attacks were]]></description>
<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/files/2025/08/Five-alarm_fire_at_Metro_HQ_May_27_2020_120.jpg"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2025/08/Five-alarm_fire_at_Metro_HQ_May_27_2020_120.jpg" alt="" width="802" height="1024" class="aligncenter size-full wp-image-6086" srcset="https://redmonk.com/sogrady/files/2025/08/Five-alarm_fire_at_Metro_HQ_May_27_2020_120.jpg 802w, https://redmonk.com/sogrady/files/2025/08/Five-alarm_fire_at_Metro_HQ_May_27_2020_120-235x300.jpg 235w, https://redmonk.com/sogrady/files/2025/08/Five-alarm_fire_at_Metro_HQ_May_27_2020_120-768x981.jpg 768w, https://redmonk.com/sogrady/files/2025/08/Five-alarm_fire_at_Metro_HQ_May_27_2020_120-480x613.jpg 480w, https://redmonk.com/sogrady/files/2025/08/Five-alarm_fire_at_Metro_HQ_May_27_2020_120-491x627.jpg 491w" sizes="auto, (max-width: 802px) 100vw, 802px" /></a>
On October 21, 2016, CNN’s website was knocked offline. So was the BBC and Guardian’s. Amazon, Etsy and Shopify too, along with Quora, Reddit, and Twitter – among others. Huge swaths of the internet were taken down by a series of attacks on the DNS provider Dyn. These Distributed Denial of Service (DDoS) attacks were conducted by legions of bots, which is to say tens of thousands of internet connected devices that had been compromised by malware called Mirai. It wasn’t an army of servers, or at least not just servers: it included cameras, printers, routers and even baby monitors. All of these devices from the Internet of Things (IoT) had been harnessed and retasked by Mirai, in this case in service of making a substantial portion of the internet unavailable.
The early versions of Mirai principally relied on unchanged default settings; later variations more directly attacked known vulnerabilities in the software running these internet enabled devices.
This attack was possible for two basic reasons.
<ol>
<li>First, and most obviously, software is hard to secure and protect. All software is vulnerable. Even the best and most dedicated software authors in the world are not able to produce perfectly secure software. </li>
<li>Making matters more complicated was (and is) the fact that there was little if any economic incentive for many providers of these IoT devices to protect them as experts like Bruce Schneier have been <a href="https://www.schneier.com/blog/archives/2021/02/router-security.html">saying</a> for years. It is hard to write secure software and keep it up to date and patched, and activities that are hard to do are by definition expensive. Nor were there any real penalties for not investing in producing secure artifacts, which is why most IoT vendors slapped together some cheap hardware and software, shipped it and called it good. </li>
</ol>
Even for those vendors that might feel obligated either ethically or financially to try and maintain their devices over time there was another critical problem: they didn’t write some or even most of the software on the devices they shipped – open source developers did. This meant that in many cases the device manufacturers did not understand in any meaningful way the software they relied on. It also meant that vendors often had no practical mechanism to get a given piece of open source patched short of begging – or worse, badgering – the developers in question. Developers who were already burning out in huge numbers because of unreasonable demands from users of the project to fix critical issues for free.
The industry reality, therefore, had (and has) a lot in common with a house of cards and was perfectly captured by Randall Monroe <a href="https://xkcd.com/2347/">here</a>. The European Union, to its credit, looked at this and said “maybe we shouldn’t be building on a house of cards.” Thus was the mandate for the Cyber Resilience Act (CRA) born.
The CRA’s initial remit was to try and tackle the IoT problem. But then came Log4shell. Present from 2013 on, the critical vulnerability in Log4j wasn’t discovered and disclosed until November of 2021. Then, thanks to the ubiquity of Log4j, all hell broke loose. Jen Easterly, the director of the United States Cybersecurity and Infrastructure Security Agency (CISA), called the exploit “one of the most serious I’ve seen in my entire career, if not the most serious.”
In the wake of the catastrophic incident, the EU decided to reconsider and revise the CRA’s mandate. No longer would it be strictly focused on software powering IoT devices. Instead it would encompass software, full stop.
On paper, the CRA makes sense. The world of software, after all, cannot be built indefinitely on a foundation that includes Munroe’s “project thanklessly maintained by a single developer from Nebraska since 2003” as a load bearing component. Change was and is necessary, as are new incentives – and penalties.
The question isn’t, therefore, whether or not something like the CRA is necessary and inevitable. The question is whether or not the CRA as it is written today is the appropriate tool for the job. And after multiple briefings on the subject, it seems safe to say that the jury is still very much out on that subject.
The CRA introduces a broad set of mandates for parties involved in the production of software, with the specific responsibilities varying depending on role. Among the many goals and requirements of the CRA are:
<ul>
<li>Shipping software free from known exploitable vulnerabilities </li>
<li>Shipping in a secure by default state</li>
<li>Ensuring that vulnerabilities can be patched easily</li>
<li>Making security updates available “for a minimum of 10 years or the remainder of the support period, whichever is longer”</li>
<li>Notifications of actively exploited vulnerabilities within 24 hours and general vulnerabilities within 72 hours both of which should include plans for mitigation and workarounds </li>
</ul>
The good news is that the CRA in 2025 is much improved from its initial incarnations in 2022. Those were characterized by one participant in the discussions as a “near-death experience for open source.” Indeed, without some of the current carveouts for open source developers, one plausible and even likely outcome would have been geo-fracturing of open source, specifically via the introduction of licenses that prohibited the usage of projects within the EU’s borders. That horrific outcome is potentially still on the table, but we’ll come back to that later.
For now it’s enough to know that open source foundations like Apache, Eclipse, Linux, Mozilla and many others all lobbied hard on behalf of open source and its developers to try and protect them from some of the act’s more far reaching requirements and penalties – it’s almost, as an aside, as if open source foundations could be <a href="https://redmonk.com/jgovernor/2024/09/13/open-source-foundations-considered-helpful/">considered helpful</a>. Their collective efforts have tempered some of the CRA’s more problematic provisions from an open source perspective, but questions still remain, and a host of implementation details and specifications have yet to be finalized so evaluating the act is challenging.
As an example, many of these details are currently being worked on in the Open Regulatory Compliance Working Group (ORC WG), which has a very useful FAQ <a href="https://github.com/orcwg/cra-hub/blob/main/faq.md">here</a>. It defines the new term “open source steward,” codified for the first time in this document, and its responsibilities. When it comes to discussing how a steward can demonstrate that its met its reporting and attestation obligations, or what happens if it does not, the FAQ’s answer is “No answer yet.” Likewise, it’s clear that if you’re the creator of an open source project but do not monetize it, you are under no obligations under the CRA. If you monetize it, though, the act’s implications are less clear, and specifically at what thresholds obligations are triggered as the current wording includes legally vague conditions like “the intention of making a profit.” Mere contributors to open source, at least, are specifically exempt from CRA requirements.
If the implications for open source are less dire than in the initial draft, there is one thing that is inarguable: the CRA is a veritable five alarm fire for manufacturers.
At present, and as described above, manufacturers currently rely heavily on a wide variety of open source projects to produce devices of all shapes and sizes. From databases to operating systems to runtimes, open source is the foundation on which everything from baby monitors to cars to lunar rovers rests. Effectively zero manufacturers have commercial relationships with the producer of every project, framework or library they’re incorporating, which means that they are likely to struggle with some of the reporting and attestation requirements. And the penalties if they do are quite severe.
The penalty for non-compliance, for example, is a fine of up to 15,000,000 EUR or up to 2.5% of its total worldwide annual turnover for the preceding financial year, whichever is higher. The penalty for incomplete or inaccurate information, meanwhile, is a fine of up to 5,000,000 EUR or up to 1% of its total worldwide annual turnover for the preceding financial year, whichever is higher.
Based on the financial downsides here, optimistic observers are concluding that the CRA could be a game changer in open source economics. As companies digest the potential penalties involved, they will be obligated to establish commercial relationships with open source projects they currently rely on at no cost. That means more money going from vendors relying on open source to those producing it, which would be a boon for developers. It also raises the question of what happens to product prices when manufacturers are compelled to pay for software they have to date consumed at no cost, but that’s outside the scope of this exercise.
Pessimists evaluating the potential impacts of the CRA on open source software, on the other hand, see a world in which greater commercial interest and monetization is more than offset by a sea of manufacturers flooding maintainers of popular projects with requests – or more likely, given the penalties involved, demands – for project related services to meet the CRA mandated reporting obligations. It also could result in less open source software overall as manufacturers bring some software back in house, or it could produce the aforementioned geo-fracturing as open source developers who do not wish to have anything to do with the CRA either abandon their projects – which the aforementioned ORCWG FAQ felt compelled to <a href="https://github.com/orcwg/cra-hub/blob/main/faq.md#faq-tmp-133a">discourage</a> – or attempt to prohibit usage of their software within the EU’s jurisdiction.
The CRA’s requirements notably do not take effect until 2027, which is perhaps why it has received so little attention to date. But given the scale and scope of the effort required to comply here, which dwarf those made for GDPR and are perhaps more comparable to Y2K remediation efforts, any manufacturer not already planning for the CRA is behind and likely to be facing a mad scramble in the years ahead.
]]></content:encoded>
</item>
<item>
<title>AI Tooling, Evolution and The Promiscuity of Modern Developers</title>
<link>https://redmonk.com/sogrady/2025/07/09/promiscuity-of-modern-developers/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Wed, 09 Jul 2025 13:05:24 +0000</pubDate>
<category><![CDATA[AI]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6079</guid>
<description><![CDATA[Zhixin Sun, Fangchen Zhao, Han Zeng, Cui Luo, Heyo Van Iten, Maoyan Zhu, CC BY 4.0, via Wikimedia Commons Historically, there have been two constants with developer tools. First, that their users were loyal to them. This was in part attributable to simple baby duck syndrome, but there were practical considerations as well such as]]></description>
<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/files/2025/07/1024px-Life_on_the_platform_margin_of_the_Miaolingian_sea_North_China.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2025/07/1024px-Life_on_the_platform_margin_of_the_Miaolingian_sea_North_China.png" alt="" width="1024" height="596" class="aligncenter size-full wp-image-6080" srcset="https://redmonk.com/sogrady/files/2025/07/1024px-Life_on_the_platform_margin_of_the_Miaolingian_sea_North_China.png 1024w, https://redmonk.com/sogrady/files/2025/07/1024px-Life_on_the_platform_margin_of_the_Miaolingian_sea_North_China-300x175.png 300w, https://redmonk.com/sogrady/files/2025/07/1024px-Life_on_the_platform_margin_of_the_Miaolingian_sea_North_China-768x447.png 768w, https://redmonk.com/sogrady/files/2025/07/1024px-Life_on_the_platform_margin_of_the_Miaolingian_sea_North_China-480x279.png 480w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a>
<a href="https://commons.wikimedia.org/wiki/File:Life_on_the_platform_margin_of_the_Miaolingian_sea,_North_China.png">Zhixin Sun, Fangchen Zhao, Han Zeng, Cui Luo, Heyo Van Iten, Maoyan Zhu</a>, <a href="https://creativecommons.org/licenses/by/4.0">CC BY 4.0</a>, via Wikimedia Commons
<hr />
Historically, there have been two constants with developer tools. First, that their users were loyal to them. This was in part attributable to simple <a href="https://en.wikipedia.org/wiki/Imprinting_(psychology)#Baby_duck_syndrome">baby duck syndrome</a>, but there were practical considerations as well such as keybindings and shortcuts. Many developers were unwilling to invest in retraining their muscle memory to an entirely different context and set of commands, and thus stuck with their text editor or IDE of choice even as a given tool aged and began to lag from a feature standpoint. There were exceptions, of course: Sublime Text attracted a sizable number of former Emacs and vi users, as did VS Code after it. But in general, migration away from popular tools were the exceptions that proved the rule.
The second thing that has been a given for developer tools is that they were free. Those with back problems or who need reading glasses to read a menu might object citing examples like Borland, but the reality is that it’s been decades since developers needed to find real budget to access quality developer tooling. The aforementioned Emacs and vi text editors were free, as was VS Code, and IDEs would soon follow. The open sourcing of NetBeans in June of 2000 by Sun was followed by the formation of the consortium a year later by IBM that eventually became the Eclipse Foundation, which meant developers not only had a free IDE at their disposal, but a choice between them. This industry trend was so strong, in fact, that up until very recently it was effectively impossible for startups in the developer tooling space to attract venture funding. When the competition is both quality and costs nothing, the returns on invested capital are far from certain.
Fast forward to four years ago last month.
On June 29, 2021 – a little less than a year and a half before OpenAI launched ChatGPT – GitHub introduced a brand new product called Copilot. Driven by talent acquired with the Semmle team, Copilot was regarded as a revelation at the time. There were precedents, to be sure: Tabnine, for one, has roots going back to 2013. But the combination of Copilot’s unrestricted access to the corpus of data that is GitHub and its timing proved transformational. With AI just beginning to accelerate in the wake of the publication of Google’s “Attention is All You Need” paper – which introduced the transformer architecture that so much of today’s AI relies on – Copilot lit the software world on fire, triggering dreams of hyper-productive software engineers while simultaneously stoking fears of wide scale developer unemployment.
GitHub was the first company in decades to buck precedent, and prove conclusively that it was possible – given the right product – to charge for developer tooling. Within two years, in fact, it was a $100M ARR business – an unimaginable figure given how reluctant developers and enterprises alike had been to pay literally anything for the primary tool of a developer’s trade.
If the second foundational assumption was shattered, however, surely the first would hold. It was widely assumed that developers, who already had had a long term affinity for GitHub itself, would demonstrate their characteristic loyalty to the coding assistant they had first imprinted on.
Except they did not, and do not.
What GitHub did in two years, in fact, Cursor did in 12 months: a year in, the company – with reportedly zero marketing – hit a hundred million run rate. Many of its users were former – and potentially future, as we’ll come back to – Copilot users.
Almost everything we knew, then – or thought we knew – about the developer tools space turned out to be wrong. Promiscuity has replaced loyalty, but the good news is that the budget is no longer anchored to zero dollars. There is no single reason for these developments, as there are a number of contributing factors.
Arguably the most important is the degree to which AI is inherently a transformational technology. With its ability to ingest, process and act on natural language, for example, inputs are often now a prompt or a spec – for neither of which keybindings or shortcuts matter particularly. AI has also has heralded a massive era of experimentation as vendors and projects seek creative new ways to apply the technology to the task of developing software. There is, at present, no consensus, no dominant approach, and there may never be. Some developers prefer the more free-wheeling “vibe code” approach via prompts; others prefer the more deliberate spec driven development – in many respects emulating the divide between authors of fiction who go by the seat of their pants versus those who rely on predetermined plots. In some cases developers want to be gradually stepped through proposed steps or changes; in others they just want the machine to come back when it’s done. Some tools merely propose changes, others like the recently relaunched Aboard will combine development with a full stack including a database. Some tools retain the UI elements of traditional IDEs; others are nothing more than a text field.
In short, we are in the midst of a Cambrian explosion of developer tools, and a dizzying array of approaches are currently being tested for their evolutionary fitness. Consider even an abbreviated, absolutely non-exhaustive list of related tools: Aboard, Bolt, Cline, Copilot, Cursor, ChatGPT / Codex, Claude / Code, Gemini / CLI, Factory, Lovable, Poolside, Replit, Same.dev, vibes.diy, v0, watsonX and Windsurf. Not all of these will succeed, and indeed <a href="https://bsky.app/profile/edzitron.com/post/3ltipuwuti22q">some argue</a> that all of these are doomed because the economic footing they’re built on is fatally unsound. That argument is built on two core assumptions, however: that vendor costs will never come down and that user costs can not be raised – neither of which seems entirely safe. More likely is that some of these options emerge and fundamentally change the way the industry builds software moving forward. Others, meanwhile, will be abandoned as dead ends in a manner consistent with both biological evolution and technical innovation. Developers, regardless, are far more willing to experiment with new tools, because they are fundamentally differentiated from one another in ways that past generations of developer tools have typically not been.
Also fueling the willingness to flit from tool to tool are cost and token inventory concerns. While developers are now objectively willing to pay for tools, they still appreciate free tiers and will use up whatever resources are made available to them at no cost. For paid plans, meanwhile, developers are frequently outstripping their allotted inventory of tokens at their given paid tier, and simply move on to the next tool with available credits. Whatever their tooling preferences, therefore, in some cases costs lead them to use deploy multiple distinct developer tools in development on the same application – again, a practice which would have been unthinkable even a few short years ago.
Lastly, there is the firmly engrained idea amongst developers that these tools are useful. The organizational metrics <a href="https://redmonk.com/rstephens/2024/11/26/dora2024/">may argue</a> otherwise, but a clear majority of individual developers feel more productive. There are, again, many who would argue that the tools are inherently unusable, inherently uneconomic, inherently immoral or some combination of all three. But that is, at this point, the minority opinion and as stated eloquently <a href="https://bsky.app/profile/lizthegrey.com/post/3lsomiy6ln22t">here</a> it seems extremely unlikely the tools will be uninvented. As such, developers will both keep using the tools, and will be willing – if reluctantly – to pay for them – likely even if the costs escalate, which creates the worrying potential for a <a href="https://bsky.app/profile/taylorbar.net/post/3ltisup4fsc2g">greater economic divide</a>. To that end, some developers are willing to pay to the degree that <a href="https://redmonk.com/jgovernor">my colleague</a> recently reported that in contrast to some skeptical enterprises that balked at the idea of paying $20-$40 a month per developer, a developer acquaintance of his recently stated that if one wasn’t spending hundreds of dollars a month on AI tooling they were not a serious developer. Which, again, raises the spectre of haves and have nots. A world in which the best developer tools cost nothing was a world with fewer barriers to entry, after all.
But if it’s clear that the rules have changed with respect to developer tools, the implications of this are more opaque. A few conclusions suggest themselves, however.
<ul>
<li>It’s Not Too Late: if it wasn’t too late for Cursor in the wake of Copilot’s explosive growth, it’s not too late for the next Cursor. Which, given that that was Windsurf, which was valued at $3B, seems to demonstrate the point adequately. There will remain opportunities for these businesses for the foreseeable future. There is room for experimentation, for user acquisition and for vendors that charge for developer tools. So while rumors suggest as one example that AWS has a new tool on the way – purportedly called <a href="https://techcrunch.com/2025/05/07/amazon-is-working-on-an-ai-code-generating-tool/">Kiro</a> – its window would still be open. There is also, importantly, opportunity for products that have “lost” users to gain them back, as developer promiscuity cuts both ways. </li>
<li>Partnerships Will be Important: under appreciated currently, at least as far as enterprise usage is concerned, is the white space that remains. All of the tools take novel approaches to accelerating software development. The majority, however, are narrowly focused on some aspect of the application development process, and like GitHub once upon a time, leave anything else as a problem to be solved by some combination of customers and partners. Tool vendors and downstream build, test, observation and deployment targets alike would be wise to start integrating to <a href="https://redmonk.com/sogrady/2020/10/06/developer-experience-gap/">close the gaps</a> in their developer experience. Importantly, however, this will require AI companies willing to engage with third parties, something many of them have seemed too busy to do to date. </li>
<li>Lack of an Approach Consensus Will Slow Enterprise Adoption: speaking of gaps in the developer experience, after the publication of the linked piece, RedMonk heard from dozens of organizations who perceived the same issue and were attacking the problem with a new product and/or company. The challenge was that they all took slightly different approaches to addressing the developer experience gap. As a result, enterprises struggled to compare the proverbial apples to oranges and the market for tools that would impact the problem lagged. AI tooling will be less susceptible to this, but the sheer variety of different approaches will suppress adoption to a degree as businesses are forced to wade through the variety of approaches the tools represent in an effort to decide which will be most impactful for their particular needs. </li>
</ul>
Evolution, at its core, is always a messy, non-linear process, and AI tooling will be no exception. But it inexorably hammers, reshapes and refines models, producing output that is ever more fit for purpose. And in that, too, AI will be no exception.
Disclosure: AWS (Kiro), GitHub (Copilot), Google (Gemini / CLI) and IBM (watsonX) are RedMonk customers. Aboard, Anthropic (Claude), Bolt, Cline, Cursor, Factory, Lovable, OpenAI (ChatGPT), Poolside, Replit, Same.dev, Tabnine, vibes.diy, Vercel, and Windsurf are not currently RedMonk customers.
]]></content:encoded>
</item>
<item>
<title>The RedMonk Programming Language Rankings: January 2025</title>
<link>https://redmonk.com/sogrady/2025/06/18/language-rankings-1-25/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Wed, 18 Jun 2025 17:10:20 +0000</pubDate>
<category><![CDATA[Programming Languages]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6076</guid>
<description><![CDATA[This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of developer communities where you can join and learn more about building modern applications in your preferred language. Even by our standards, dropping the Q1 programming language rankings the same month we run the Q3]]></description>
<content:encoded><![CDATA[<blockquote>
This iteration of the RedMonk programming Language Rankings is brought to you by Amazon Web Services. AWS manages a variety of <a href="https://aws.amazon.com/developer/community">developer communities</a> where you can join and learn more about building modern applications in your preferred language.
</blockquote>
Even by our standards, dropping the Q1 programming language rankings the same month we run the Q3 numbers is quite the delay. While the usual travel and school vacation delays have applied, however, the drawn out process in this case is deliberate on our part. As has been discussed in recent iterations of these rankings, the arrival of AI has had a significant and accelerating impact on Stack Overflow, which comprises one half of the data used to both plot and rank languages twice a year.
My colleague Rachel has been studying this impact in detail and has more on it <a href="https://redmonk.com/rstephens/2025/06/18/stackoverflow">here</a>, but for our purposes it’s enough to know that Stack Overflow’s value from an observational standpoint is not what it once was, and that has a tangible impact as we’ll see. Still to be determined on our end is whether Stack Overflow should continue to be used, and if not what a reasonable alternative might be. Stay tuned for more details on that front when we get to the Q3 rankings, which will presumably be in Q1 of next year.
In the meantime, however, as a reminder, this work is a continuation of the work originally performed by Drew Conway and John Myles White late in 2010. While the specific means of collection has changed, the basic process remains the same: we extract language rankings from GitHub and Stack Overflow, and combine them for a ranking that attempts to reflect both code (GitHub) and discussion (Stack Overflow) traction. The idea is not to offer a statistically valid representation of current usage, but rather to correlate language discussion and usage in an effort to extract insights into potential future adoption trends.
<h2>Our Current Process</h2>
The data source used for the GitHub portion of the analysis is the GitHub Archive. We query languages by pull request in a manner similar to the one GitHub used to assemble the State of the Octoverse. Our query is designed to be as comparable as possible to the previous process.
<ul>
<li>Language is based on the base repository language. While this continues to have the caveats outlined below, it does have the benefit of cohesion with our previous methodology.</li>
<li>We exclude forked repos.</li>
<li>We use the aggregated history to determine ranking (though based on the table structure changes this can no longer be accomplished via a single query.)</li>
<li>For Stack Overflow, we simply collect the required metrics using their useful data explorer tool.</li>
</ul>
With that description out of the way, please keep in mind the other usual caveats.
<ul>
<li>To be included in this analysis, a language must be observable within both GitHub and Stack Overflow. If a given language is not present in this analysis, that’s why.</li>
<li>No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.</li>
<li>There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.</li>
<li>All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinguishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.</li>
<li>In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top tiers of languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.</li>
<li>Languages that have communities based outside of Stack Overflow such as Mathematica will be under-represented on that axis. It is not possible to scale a process that measures one hundred different community sites, both because many do not have public metrics available and because measuring different community sites against one another is not statistically valid.</li>
</ul>
With that, here is the first quarter plot for 2025.
<a href="http://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-1024x844.png" alt="" width="1024" height="844" class="aligncenter size-large wp-image-6077" srcset="https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-1024x844.png 1024w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-300x247.png 300w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-768x633.png 768w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-1536x1266.png 1536w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-2048x1688.png 2048w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-480x396.png 480w, https://redmonk.com/sogrady/files/2025/06/lang.rank_.125.wm_-761x627.png 761w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a>
1 JavaScript 
2 Python 
3 Java 
4 PHP 
5 C# 
6 TypeScript 
7 CSS 
7 C++ 
9 Ruby 
10 C 
11 Swift 
12 Go 
12 R 
14 Shell 
14 Kotlin 
14 Scala 
17 Objective-C 
18 PowerShell 
19 Rust 
20 Dart
If you’re tracking from our last iteration of the rankings – and Rachel has the entire history of the Top 20 rankings charted <a href="https://redmonk.com/rstephens/2025/06/18/top20-jan2025">here</a>, the only change within our Top 20 languages is Dart dropping from a tie with Rust at 19 into sole possession of 20. In the decade and a half that we have been ranking these languages, this is by far the least movement within the top 20 that we have seen. While this is to some degree attributable to a general stasis that has settled over the rankings in recent years, the extraordinary lack of movement is likely also in part a manifestation of Stack Overflow’s decline in query volume. As that long time developer site sees fewer questions, it becomes less impactful in terms of driving volatility on its half of the rankings axis, and potentially less suggestive of trends moving forward. As mentioned above, we’re not yet at a point where Stack Overflow’s role in our rankings has been deprecated, but the conversations at least are happening behind the scenes.
With that, some results of note:
<ul>
<li>TypeScript (6): even acknowledging the general lack of movement within, it’s notable that TypeScript has effectively stalled just outside the Top 10. On the one hand, it can piggyback on the ubiquity of JavaScript while offering important safety provisions, but on the other, it has a reputation of not scaling particularly well. This reputation, in fact, has led Microsoft to reimplement the TypeScript compiler and tools in Go. The question now is whether this reimplementation will lead to greater performance, leading to greater adoption and more usage, or whether the fact that Microsoft felt it needed to be reimplemented in the first place could throw shade on the language. It will be interesting to watch, assuming we have data enough to observe any potential impact.
</li>
<li>
Kotlin (14) / Scala (14): both the JVM-based languages held their gains from our last ranking and it’s unclear what their prospects are for moving up more significantly. In 2015, when Go entered our rankings at 17, Scala was at 14 and jumped up briefly to 11 two years later. In 2023, however, Go passed Scala – having already been ranked above Kotlin – and has maintained that role ever since. And with Go finding new fans in companies like Microsoft and Rust making gains among other server side workloads, particularly those with security concerns, Kotlin and Scala’s growth paths are not assured.
</li>
<li>
Dart (20) / Rust (19): while Dart technically dropped one spot, that far down the rankings the actual differences are marginal at best. These two languages, which have little to nothing in common and are aimed at very different users and workloads, have tended to move in lockstep and this quarter’s run does not represent much of an exception in that regard.
</li>
<li>
Ballerina (64) / Bicep (79) / Grain / Moonbit / Zig (86): among the “languages we’re paying attention to” set, there was little more movement than within our Top 20, and for the most part movement among them was down. Grain and Moonbit remained unranked, while Ballerina dropped from 61 to 64, Bicep dropped from 78 to 79. Zig, however, did manage to jump, if only one spot from 87 to 86 – it probably does not hurt that Mitchell Hashimoto is a <a href="https://x.com/mitchellh/status/1841167210896900266?lang=en">major fan</a>. It is worth noting for these emerging languages, however, that they may be disproportionately impacted by Stack Overflow’s decline. In every case where the languages are ranked, they perform better within our GitHub rankings than they do within Stack Overflow’s: 62 vs 66 for Ballerina, 69 vs 73 for Bicep and 70 vs 83 for Zig. In Zig’s case in particular, then, it is possible that faster growth in code as measured by GitHub is being dragged down by the steep decline in query volume on Stack Overflow. Which is yet another reason why we’re carefully evaluating our options moving forward, but in the meantime we’ll keep all of these languages on our “to watch” list.
</li>
</ul>
Credit: My colleague Rachel Stephens wrote the queries that are responsible for the GitHub axis in these rankings. She is also responsible for the query design for the Stack Overflow data.
]]></content:encoded>
</item>
<item>
<title>Beyond Code: APIs as the Next OSS Battleground</title>
<link>https://redmonk.com/sogrady/2025/06/09/open-source-apis/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Mon, 09 Jun 2025 19:55:36 +0000</pubDate>
<category><![CDATA[Open Source]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6073</guid>
<description><![CDATA[On August 13th, 2010, Oracle sued Google over copyright and patent infringement claims relating to the reimplementation of the Java runtime within its Android platform. The suit took over a decade to resolve, and had several major twists and turns, but ultimately the Supreme Court decided in Google’s favor on April 5th, 2021. Among the]]></description>
<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279.jpg"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-683x1024.jpg" alt="" width="683" height="1024" class="aligncenter size-large wp-image-6074" srcset="https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-683x1024.jpg 683w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-200x300.jpg 200w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-768x1151.jpg 768w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-1025x1536.jpg 1025w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-480x720.jpg 480w, https://redmonk.com/sogrady/files/2025/06/pexels-roman-odintsov-11025279-418x627.jpg 418w" sizes="auto, (max-width: 683px) 100vw, 683px" /></a>
On August 13th, 2010, Oracle sued Google over copyright and patent infringement claims relating to the reimplementation of the Java runtime within its Android platform. The suit took over a decade to resolve, and had several major twists and turns, but ultimately the Supreme Court decided in Google’s favor on April 5th, 2021. Among the items at stake in this trial were the question of whether APIs were copyrightable, which is another way of saying the immediate future of the technology industry hung in the balance.
In its decision, the Supreme Court did not declare APIs immune from copyright, but rather held that Google’s use of the Java APIs constituted fair use. While it was not a total victory for those who would see APIs explicitly walled off from such concerns, it significantly raised the bar for legal challenges based on competitive usage of APIs. This was immediately relevant, as a loss would have almost certainly led to a widespread chilling effect across APIs industry-wide.
But Google vs Oracle is also critical to what may be the next front in the ongoing conflict between open source and commercial open source: APIs.
Those who have tracked popular open source projects such as PostgreSQL have likely heard a familiar observation amongst authors of the original project: that a database, for example, with Postgres API compatibility is not the same as a Postgres database. Databases that offer Postgres compatibility like AWS’ Aurora or Google’s AlloyDB, these fans argue, may not be fully compatible because of slight differences between the implementations, feature additions or omissions and more.
What cannot be argued is that the API for a large, successful and widely adopted software project is an enormously valuable asset. What might be argued is that it is possible, in certain cases, that the API is more valuable than the underlying code it represents. The underlying code for an API can and has been reimplemented in clean room settings, while the API must be a fixed point for developers.
With large projects that are maintained by multiple third parties such as Postgres, the potential friction from API reimplementations is minimal. By virtue of being a project worked on by many commercial vendors, there is no real exclusivity offered or claimed by the API.
The dynamics for single entity projects or open source projects developed primarily or solely by a single vendor, however, are quite another matter.
For many years now, open source projects and database projects specifically have developed a pattern or lifecycle from a licensing standpoint. Initial development is conducted under a typically permissive open source license, in which control is traded for usage and distribution growth. Once certain usage thresholds are met, and attract commensurate funding – venture or otherwise – permissive licenses are discarded in favor of licenses offering much stronger protections, up to and past the edge of what the definition of open source permits. These licensing “rug pulls” may have eased somewhat, in that commercial vendors <a href="https://redmonk.com/sogrady/2025/05/06/oss-forward-back/">appear to be pulling back</a> from source available licenses and finding an equilibrium around the strongest copyleft license in the AGPL, but the justification is the same: exclusivity.
In short, whether it’s the AGPL or non-open source, source available alternatives, the end goal for relicensing is to try and capture the vast majority or entirety of the revenue associated with a given open source project rather than share it with other vendors, particularly large hyperscalers. Many source available licenses explicitly forbid other companies from monetizing the licensed code. The AGPL, meanwhile, does not forbid third parties from monetizing a given codebase, but it does require them to share any changes or fixes they make – a practice that many avoid as a rule. Thus a project can be technically open, but practically speaking only monetized by the original author of a given project.
But what about their APIs?
In January of 2019, AWS released a long suspected new database, DocumentDB. It was, as might be guessed, a document database, and one specifically that offered some MongoDB compatibility. MongoDB had, one quarter prior, relicensed its database from the AGPL to the much more expansive SSPL. This was ostensibly an effort to thwart competition from the likes of AWS, but the timing made it clear that AWS wanted no part of even the less protective AGPL and had instead done a clean room reimplementation of MongoDB’s API to offer a datastore theoretically compatible with Mongo, but built on their own stack not subject to the requirements of the AGPL – or the SSPL for that matter. .
This all having taken place almost two years before the landmark Google v Oracle decision, however, AWS was very careful to state that its API compatibility was only up to the version last licensed as the AGPL. No one at the time had any real legal certainty on whether APIs were copyrightable and thus proprietary.
In the years since, as discussed, the industry does not have certainty, precisely, but it has made assumptions in the wake of the trial, one of them being that APIs are for all intents and purposes non-proprietary.
Which brings us to <a href="https://www.mongodb.com/blog/post/building-for-developers-not-imitators">this news</a> from late May, in which MongoDB announced that they had asked FerretDB to “stop engaging in unfair business practices.” Their claims are based on assertions that Ferret:
<ul>
<li>Misleads and deceives developers by falsely claiming that its product is a “replacement” for MongoDB “in every possible way and</li>
<li>FerretDB has infringed upon MongoDB’s patents.</li>
</ul>
Two things stand out immediately. First, that Mongo’s claims ultimately reduce to trademark and patent infringement matters, and second that neither API nor copyright are mentioned once. Setting aside the relative merits or lackthereof of these claims, which are best left to those with legal backgrounds, courts or both, the important question is whether this case is a one off or the shape of things to come.
Commercial open source projects have struggled to maximize their revenue exclusivity for years, primarily through the aforementioned series of relicensing efforts. Those efforts, however, are based on copyright as it applies to source code. If copyright doesn’t apply to APIs, or if the bar for fair use is low enough to be easily achievable from a legal standpoint, that may suggest a future in which competitive third parties “<a href="https://en.wikipedia.org/wiki/Leapfrogging_(strategy)">island hop</a>” the source code and go straight for the APIs. Given the size and usage base of some of the commercial open source projects, the economic incentives to do so are substantial indeed. APIs are ultimately a door for developers, and if that door can open to your products as easily as the original author’s, that will likely be of interest regardless of what the license on the original source code might be.
The upside to the Google v Oracle ruling was clear, in that an industry in which every last programming interface was considered proprietary would be a tectonic, systemic shock. The downside, though, is that we now have to hope that we don’t see a resurgence in interest in “<a href="https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguish">embrace, extend, extinguish</a>” efforts from large third parties trying to co-opt open source projects and user bases.
Either way, it seems likely that the next wave of conflict won’t be over licenses pertaining to code, but the APIs they implement.
Disclosure: AWS, Google, MongoDB and Oracle are RedMonk customers. FerretDB is not currently a RedMonk customer.
]]></content:encoded>
</item>
<item>
<title>Everyone Gets a Database</title>
<link>https://redmonk.com/sogrady/2025/06/06/data-consolidation/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Fri, 06 Jun 2025 16:50:39 +0000</pubDate>
<category><![CDATA[Databases]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6072</guid>
<description><![CDATA[Once upon a time, there were software categories called Application Performance Monitoring (APM) and Logging. They each involved the collection of large volumes of telemetry data, used among other things for the purpose of understanding and attacking problems at varying layers of the enterprise application stack. As time passed and infrastructure grew more distributed and]]></description>
<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/files/2021/10/pxfuel.com_-scaled.jpg"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2021/10/pxfuel.com_-1024x683.jpg" alt="Pendulum" width="1024" height="683" class="aligncenter size-large wp-image-5924" srcset="https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-1024x683.jpg 1024w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-300x200.jpg 300w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-768x512.jpg 768w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-1536x1024.jpg 1536w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-2048x1365.jpg 2048w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-480x320.jpg 480w, https://redmonk.com/sogrady/files/2021/10/pxfuel.com_-941x627.jpg 941w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a>
Once upon a time, there were software categories called Application Performance Monitoring (APM) and Logging. They each involved the collection of large volumes of telemetry data, used among other things for the purpose of understanding and attacking problems at varying layers of the enterprise application stack.
As time passed and infrastructure grew more distributed and applications more complex, a new software category gradually emerged: observability. Its aim was to provide those charged with running applications and infrastructure a more nuanced, granular and integrated view of software problems that might be known or unknown, ephemeral and/or involve multiple layers of a given stack. This approach proving effective, the category attracted more attention, and consequently money – both investment and revenue – began to flow more freely.
Unsurprisingly, then, vendors in the APM and Logging categories concluded that this newly emerging adjacent market represented both a logical extension to their existing capabilities as well as a potentially lucrative growth opportunity. Rather than leave money on the table, many vendors in these spaces grew sideways into the observability market competing with native observability players.
This is, in its own way, both a case study in market consolidation as well as just another in a long line of such cases. As is, increasingly, the consolidation we see within the database and data platforms market.
<hr />
Almost four years ago, it <a href="https://redmonk.com/sogrady/2021/10/26/general-purpose-database/">became apparent</a> that the pendulum that had swung away from general purpose databases and towards an array of specialized datastores had reversed and was well into its return trajectory. As captivating as the idea and abilities of bespoke databases built for a singular purpose was, the reality of both developing to and continually operating multiple databases (as well as significantly expanding the vendor procurement footprint) had set in. Enterprises and developers alike, though for reasons that had little in common, increasingly advantaged databases that could handle multiple workloads through a single engine and interface.
In the years since, that directional shift has not slowed. If anything, it’s accelerated. Database consolidation continues apace, and single workload databases are increasingly the exception rather than the rule.
There is a larger question facing the data sector, however: where and how will data lakes(houses) and databases collide? Recent events are suggestive, but the history is contradictory.
Five years ago, MongoDB – born as a document database but having since added the ability to handle workloads well beyond that including search, stream and vector – announced a new set of capabilities including a data lake product. This was an early effort to begin to converge the database with large scale data stores underneath them. Two years after that, they followed up that announcement with refinements on both the object storage and analytical query fronts.
Large scale data storage and databases, it seemed, would follow the macro trend and converged towards a single interface with the added bonus that procurement would only have to deal with one vendor. The trajectory seemed clear.
Clear, except that last fall MongoDB announced that it was deprecating its data lake offering, and that it would be end-of-lifed within a year, or three months from today. The question then became whether MongoDB’s deprecated effort to merge database with data platform was the outlier, or the shape of things to come.
That answer won’t be evident for some time, but two notable acquisitions suggest that convergence may yet be on the way.
<ul>
<li>One month ago on May 14th, Databricks acquired the database company Neon – a serverless Postgres database vendor that was well thought of amongst friends of RedMonk. The acquisition cost was $1B. </li>
<li>Earlier this week, meanwhile, its biggest rival Snowflake agreed to acquire another Postgres database vendor, Crunchy Data, for $250M. </li>
</ul>
These two dueling is, of course, nothing new. See, for example, their competition over the DBRX (Databricks) and Arctic (Snowflake) models, or the tug of war over Tabular – ultimately acquired by Databricks for $2B.
Both companies obviously see a future in which AI plays a if not the critical role with respect to data, which is logical given that AI is built on and from a high volume of data and that AI advantages existing data incumbents for both trust and data gravity reasons. But then again every software category today is making enormous bets on AI.
It is notable, however, that both vendors just as clearly see traditional database capabilities – PostgreSQL capabilities in particular – as likely to become, if they are not already, table stakes. Convergence, put simply, is the goal.
The challenge for these data platforms, as it was for MongoDB when it launched its data lake product, however, is market permission. While it makes all the sense in the world on paper for data platforms and databases to come together, markets do not always follow what makes sense on paper and do not always embrace a new product in a market new to the vendor. Enterprises are cautious about investing in products offered by vendors whose fundamental DNA lies in an entirely distinct market with different expectations. And from the seller’s side of the equation, vendors need to learn how to go to market and sell to different users and different buyers with differing sets of concerns.
It is too early to say whether or not Databricks and Snowflake will be granted permission to compete directly in database markets, or that they have or will acquire the ability to do so efficiently. But they’re collectively betting a billion and a quarter dollars that MongoDB had the right of it back in 2020, not in 2024, and that the market wants data lakes and databases offered by the same, single supplier.
They’re making the same bet, in other words, that APM and Logging companies made when they implicitly argued that observability should be a feature of their existing product rather than a brand new market of its own.
Disclosure: Crunchy Data and MongoDB are RedMonk customers. Databricks and Snowflake are not currently RedMonk customers.
]]></content:encoded>
</item>
<item>
<title>OSS: Two Steps Forward, One Step Back</title>
<link>https://redmonk.com/sogrady/2025/05/06/oss-forward-back/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Tue, 06 May 2025 15:20:21 +0000</pubDate>
<category><![CDATA[Open Source]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6070</guid>
<description><![CDATA[In October of 2018, MongoDB relicensed its previously open source database to a new source available license of its own creation. Up until that point, the license governing the project had been the AGPL, an OSI approved open source license that took the copyleft provisions of the GPL and extended them into applications hosted on]]></description>
<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-1024x1022.png" alt="" width="1024" height="1022" class="aligncenter size-large wp-image-5955" srcset="https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-1024x1022.png 1024w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-300x300.png 300w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-150x150.png 150w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-768x767.png 768w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-480x479.png 480w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1-628x627.png 628w, https://redmonk.com/sogrady/files/2018/12/Ouroboros-simple.svg_-1.png 1026w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a>
In October of 2018, MongoDB relicensed its previously open source database to a new source available license of its own creation. Up until that point, the license governing the project had been the AGPL, an OSI approved open source license that took the copyleft provisions of the GPL and extended them into applications hosted on networks.
In simple terms, where the GPL’s reciprocal provisions included software packaged and distributed in binary fashion, they did not apply to modified GPL projects hosted and made available over networks. If you wanted to distribute a new version of the Linux kernel, for example, you were required to make those changes available under the same terms. If you hosted applications on the internet running on a modified Linux kernel, however, you did not.
The AGPL was explicitly designed to close this so-called loophole. Historically, however, usage of the license has been rare relative to other open source licenses of both the copyleft and permissive varieties. This was in part because large internet companies strictly forbade its usage for fear of running afoul of the extended protections.
For MongoDB, however, the protections afforded by the AGPL did not go far enough. As a result, they set out to craft a license that followed in the AGPL’s footsteps by not applying reciprocal provisions to software hosted in a network fashion, but dramatically expanding the scope of these protections beyond the boundaries of the protected software itself and into adjacent software.
Specifically, the license says the following (emphasis added):
<blockquote>
“Service Source Code” means the Corresponding Source for the Program or the modified version, and the Corresponding Source for all programs that you use to make the Program or modified version available as a service, including, without limitation, management software, user interfaces, application program interfaces, automation software, monitoring software, backup software, storage software and hosting software, all such that a user could run an instance of the service using the Service Source Code you make available.
</blockquote>
The AGPL strictly governs a given project’s codebase, then, while the SSPL extends itself to any immediately adjacent software. If large internet companies – and cloud providers in particular – were averse to the AGPL, then, the SSPL was a non-starter. In practical terms it is nearly impossible to comply with the terms of the license, and the reach is clearly at odds with if not totally irreconcilable with the ninth requirement of the <a href="https://opensource.org/osd">open source definition</a>.
<blockquote>
The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open source software.
</blockquote>
MongoDB initially attempted to work with the OSI to have the SSPL accepted and approved as an open source license, but the combination of a flawed review process and the license’s fundamental nature led to the eventual abandonment of those efforts. The license instead remains source available, not open source.
Other commercial vendors seeking exclusivity, however, began to take up the source available license at the expense of open source alternatives. In January of 2021, Elastic moved away from the permissive Apache license to a dual license strategy, one license of which was the SSPL. In March of 2024, Redis did the same.
Recently, however, the SSPL’s trajectory appears to have been altered. In September of last year, Elastic added the AGPL as another licensing option – effectively deprecating the more restrictive SSPL. Then this year on May 1st, Redis again followed in Elastic’s footsteps and added the AGPL as an option. In describing the thought process behind the move, Salvatore Sanfilippo – the original author of Redis – <a href="https://antirez.com/news/151">said</a>:
<blockquote>
My feeling was that the SSPL, in practical terms, failed to be accepted by the community. The OSI wouldn’t accept it, nor would the software community regard the SSPL as an open license. In little time, I saw the hypothesis getting more and more traction, at all levels within the company hierarchy.
</blockquote>
No mention was made of the unique Redis fork <a href="https://redmonk.com/sogrady/2024/07/16/post-valkey-world/">Valkey</a> in that post, but it appears in the comments and the idea that it had no role in the internal discussions on the license choice is implausible. Regardless of the motivation, the decision to return to an open source license was a consequential one and further evidence of a changing trajectory for the license and for open source.
Nor is it just SSPL projects moving to the AGPL. Grafana and MinIO previously moved from Apache licenses to the AGPL in April and May of 2021, respectively. The Zitadel project, meanwhile, did the same in March.
What do these moves collectively suggest, then, about the health of open source? Two things, at least, are implied.
<ul>
<li>First, that commercial open source vendors continue to seek stronger protections for code they have authored. In the last decade plus, permissive licenses saw <a href="https://redmonk.com/sogrady/2017/01/13/the-state-of-open-source-licensing/">substantial jumps</a> in their usage, and because licensing has historically been more fashion statement than rigorous analysis, each new permissively licensed project encouraged the next. In recent years and amongst commercially backed projects, however, there has been a backlash. Company after company leveraged permissive licenses initially in a bid to gain ubiquity, then ratcheted up protections with new licenses as they transitioned from a focus on growth to exclusivity of value capture.
That much has been apparent for years. The real question was where the equilibrium would be found: in OSI approved open source licenses, or in source available alternatives? The available evidence at this time suggests that the AGPL may be that equilibrium, combining OSI-approval with staunch protections and disincentives for large clouds, among other potential competitors.
</li>
<li>
And speaking of protections and large clouds, the second implication is that the AGPL is now viewed as sufficiently protective. A large part of the original justification for the SSPL and other source available alternatives was the need for protections from large, hyperscale clouds picking up permissively licensed projects and using their greater resources and distribution to advantage themselves over the original authors. It was even claimed by individuals outside of MongoDB, in fact, that AWS’ DocumentDB – which replicated MongoDB’s API without leveraging the MongoDB codebase – was a demonstration of the power and necessity of the SSPL. The timeline, however, does not support this argument. The SSPL was first applied to MongoDB in October 2018. DocumentDB, meanwhile, was released in January of 2019. Even at the speed AWS operates, they did not write, stand up, test and release a new database service in two months. It’s clear that for AWS, at least, the AGPL was sufficient disincentive to avoid the codebase. Questions remain as to whether the AGPL is a deterrent strong enough for <a href="https://bsky.app/profile/msw.bsky.social/post/3lo6zd3awxs2v">Chinese cloud providers</a>, then, but rightly or wrongly the market seems to be settling on the AGPL as the commercial open source license of choice, not the SSPL.
Which, even if some are disappointed that the AGPL is being used effectively as a proprietary software license, represents a win for open source. It also suggests that the next competitive frontier, thanks to <a href="https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_Inc.?wprov=sfti1">Google v Oracle</a>, will not be codebases but APIs, but that’s a subject for another time.
</li>
</ul>
For now, it’s necessary to examine what represented a clear loss for open source, which was the drama surrounding the CNCF, NATS and Synadia. In late April, a conflict bubbled to the surface that would better have been kept private. In short, there were allegations that Synadia – the principal authors of NATS – wanted to withdraw its donation from the CNCF, including the trademarks. All is now putatively <a href="https://www.cncf.io/announcements/2025/05/01/cncf-and-synadia-align-on-securing-the-future-of-the-nats-io-project/">well</a> between the two organizations, and NATS and its trademark will remain with the CNCF with Synadia’s stewardship.
But the flareup starkly revealed traditional fault lines in the wider open source community around the role of foundations. For many, this situation provided an opportunity not to protest the alleged about face but rather to attack foundations generally and the CNCF specifically for their shortcomings, both perceived and real.
Generally, critiques of foundations are appropriate. While RedMonk’s view is generally that foundations are <a href="https://redmonk.com/jgovernor/2024/09/13/open-source-foundations-considered-helpful/">useful</a> and indeed vital, they are imperfect institutions that often struggle to balance the competing needs of vendors, enterprises and individual developers. Attention on where and how these “worst solutions except everything else that’s been tried,” to paraphrase Churchill, can be improved and refined is an important exercise.
The time for that exercise, here at least, is not when their existence is existentially threatened. Whatever else one may think of them, objectively speaking foundations exist as an external home for software, one in which certain guarantees and commitments are held sacrosanct, and in which commercial entities can trust and, optionally, collaborate. Vendors that choose to donate projects to foundations do so understanding, or at least should, that donation is a one way door. If commercial organizations can commit a project to this neutral third party, and then unilaterally withdraw from the foundation whenever it suits them, the guarantees of neutrality that foundations provide are immediately rendered worthless. The idea, then, the CNCF or any other foundation could blithely let a disaffected project depart was genuinely appalling to hear, akin to condoning the secession of states. However one feels about how a foundation has assisted or not assisted a member project, the idea that the irrevocable promise companies knowingly and willingly make when donating a project and its trademark to a foundation is actually, depending on the circumstances, revocable was genuinely shocking. It was also a sign that widespread bitterness towards and distaste for foundations remains a stronger force than may be commonly understood or appreciated.
While open source appears to have made progress towards a consensus around open source licenses that are commercially acceptable, then, the NATS storm was a black eye for open source broadly. Between the simmering antagonism towards foundations at scale, the direct attacks on the CNCF specifically and the revelations about NATS’ performance and Synadia’s alleged behaviors, it was not a great week for open source.
Two steps forward, one step back.
Disclosure: AWS and MongoDB are RedMonk customers. The CNCF, Elastic, Grafana, MinIO and Redis are not currently RedMonk customers.
]]></content:encoded>
</item>
<item>
<title>Nothing Permanent Except Change</title>
<link>https://redmonk.com/sogrady/2025/04/16/kelly/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Wed, 16 Apr 2025 19:25:09 +0000</pubDate>
<category><![CDATA[RedMonk Miscellaneous]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6068</guid>
<description><![CDATA[Thanks to some divided attention on our part from having Monki Gras, Kubecon and Google Next back to back to back – not to mention a child that suddenly and unexpectedly got sick at school this week – this post is coming out a few days later than we’d originally planned, but it is time]]></description>
<content:encoded><![CDATA[Thanks to some divided attention on our part from having Monki Gras, Kubecon and Google Next back to back to back – not to mention a child that suddenly and unexpectedly got sick at school this week – this post is coming out a few days later than we’d originally planned, but it is time to let our community know that Kelly Fitzpatrick is leaving RedMonk.
When she joined us seven years ago, our hope was that Kelly would bring in equal parts a modern Georgia Tech-honed tech comms professor with her training as a medieval historian to the tech field, and that’s exactly what she did. From her passion for good documentation to her interviews and instructor-quality explanations of various tech trends across all forms of media, Kelly helped audiences – both developer and executive – understand technology better. And given her crazy travel schedule over the years and her enthusiasm for hosting our traditional RedMonk beers, many of you have probably hoisted a pint or two with her as well. Our clients have enjoyed working with her, and so have we.
But now it’s time for her to continue to grow in a new role, and while her news is not ours to share, suffice it to say that she’s not going too far.
We’ll undoubtedly be tackling the difficult job of finding someone to fill her shoes at some point in the near future, but we’re not opening the hiring floodgates immediately because we need some time to recover from a couple of very busy months and because, well, [gestures broadly at everything]. When we and the world around us have both had a chance to catch our breath, we’ll begin thinking about who the next monk might be in earnest, but until then please join us in wishing Kelly a fond farewell. Thanks for all the hard work, Kelly, and best of luck in your next stop.
]]></content:encoded>
</item>
<item>
<title>DeepSeek and the Enterprise</title>
<link>https://redmonk.com/sogrady/2025/01/27/deepseek-and-the-enterprise/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Mon, 27 Jan 2025 22:16:55 +0000</pubDate>
<category><![CDATA[AI]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6065</guid>
<description><![CDATA[A little over one month ago on December 25th, a Chinese AI lab little known in the US dropped some new weights, following with the model card and paper the next day. Four days ago, NVIDIA was worth around $3.6T. At one point today, it was down to $2.9T – still an astronomical sum, to]]></description>
<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/files/2025/01/image-from-rawpixel-id-5950552-original-scaled.jpg"><img decoding="async" src="http://redmonk.com/sogrady/files/2025/01/image-from-rawpixel-id-5950552-original-1024x685.jpg" alt="" class="size-large wp-image-6066" /></a>
A little over one month ago on December 25th, a Chinese AI lab little known in the US dropped some <a href="https://huggingface.co/deepseek-ai/DeepSeek-V3-Base">new weights</a>, following with the <a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/main/README.md">model card</a> and paper the next day.
Four days ago, NVIDIA was worth around $3.6T. At one point today, it was down to $2.9T – still an astronomical sum, to be sure, but a market capitalization representing an almost equally epic market correction.
Just what happened in those thirty-three days?
The Chinese lab, of course, was DeepSeek. Keen observers <a href="https://simonwillison.net/2024/Dec/26/deepseek-v3/">realized</a> immediately that its qualities aside, the real import were the economics it represented. Per Willison’s numbers, DeepSeek v3 was a model some 40% larger than Meta’s Llama 3.1, but trained on roughly 9% as many GPU hours.
This matters because GPUs are both expensive and difficult to source. The best hardware, in fact, is theoretically unavailable in China because of the United States government’s chip ban. The DeepSeek team responded to this challenge by deeply seeking efficiencies, and apparently found them.
Seven days ago, the DeepSeek team released a new model, R1, a reasoning model comparable to OpenAI’s o1. That’s when things began to move quickly, because not only were DeepSeek’s training costs transformative economically, it was now bumping up against the performance of the best models the US had to offer. And unlike those models, DeepSeek’s were open. All carried the MIT license, which would theoretically make them legitimate open source software, but certain versions were trained on Llama, which is not open source software, and which in turn means that models trained on it cannot be considered open source. Likewise, we don’t have the original training data.
Regardless of whether they meet the technical definition, however, DeepSeek dropped models that were or claimed to be truly open, highly capable and game changers from an efficiency and thus cost standpoint. It took a couple of days for the market to evaluate some of these claims, and while there is much that we still don’t know about the models, engineers who’ve taken them apart in detail have come away impressed – to the point that there have been rumors of near panic within the leaders of those engineers at large, public AI shops on sites like Blind.
Which is why, when the market opened today, the bottom fell out for anything tangentially related to AI, with the NASDAQ closing down 3.1%. NVIDIA was far from the only tech company hammered by the market’s <a href="https://sherwood.news/markets/quick-and-dirty-timeline-of-markets-deepseek-freak/">freak out</a> today, but they were the most prominent because they are the proverbial 800 pound gorilla in the market for AI chips. With great success comes great visibility, for better and for worse.
While this decimation will likely prove to be a short term overreaction, and more temperate market corrections should be forthcoming in the coming days, if DeepSeek’s claims continue to be validated, this is an inflection point from an industry standpoint. Many have been examining the higher level industry and geopolitical implications from this news – if you’re a Stratechery subscriber, as one example, Ben Thompson has a good FAQ <a href="https://stratechery.com/2025/deepseek-faq/">here</a> – and we’ll all continue to sift through the fallout for weeks and months to come.
One aspect that has not seemed to attract much attention, however, are the implications for enterprise buyers and their relationships with the large, existing model providers such as Anthropic, AWS, Google, Microsoft and OpenAI.
While enterprise AI efforts’ biggest problem to date has not been the technology but understanding where and how best to apply it, they have had two critical concerns with respect to AI’s large, broadly capable foundational models.
<ul>
<li>First, and most obviously, is trust. Enterprises recognize that to maximize the benefit from AI, they need to be able to grant access to their own, internal data. Generally speaking, however, they have been unwilling to do this at scale. Vendor promises notwithstanding, one common pattern of adoption has been a limited proof of concept executed on a public model, and then going to production with a privately hosted and managed equivalent that has the data access it requires.
</li>
<li>
Second has been cost. Enterprises have been shocked, in many cases, at the unexpected costs – and unclear returns – from some scale investments in AI. While bringing AI back in house has offered some hope of cost reductions, internal capabilities are expensive and GPUs, as mentioned, have been difficult to acquire. This is presumably why AWS CEO Matt Garman said at reInvent, “On prem data environments are not well suited for AI. They’re just not.”
</li>
</ul>
Enterprises that want to embrace AI, in other words, have reasons to want to do so on their own infrastructure. But that has posed its own set of challenges, challenges which have led many enterprises to scale back their ambitions and turn their eyes from large, expensive foundational models to small, more cost efficient and easily trained alternatives. An approach which has been compelling to users who are employing AI tactically to solve a narrow, discrete problem rather than strategically.
DeepSeek, however, challenges these core assumptions.
<ul>
<li>What if enterprises didn’t have to rely on closed, private models for leading edge capabilities? </li>
<li>What if training costs could be reduced by an order of magnitude or more? </li>
<li>What if they did not require expensive, state of the art hardware to run their models? </li>
</ul>
DeepSeek’s most advanced model has been available for seven days. And as stated, there is a great deal of testing and experimentation ahead – and doubtless many enterprises will have concerns for geopolitical reasons about a model trained in China on unknown data sources. But if DeepSeek’s technical and efficiency promises hold, the challenge for AI vendors may not just be for ultimate model supremacy, but for the enterprise market they’ll need to justify their sky high valuations.
Disclosure: AWS, Google and Microsoft are RedMonk customers. Anthropic, DeepSeek, OpenAI and NVIDIA are not currently customers.
]]></content:encoded>
</item>
<item>
<title>The Dream of Hadoop is Alive in AI</title>
<link>https://redmonk.com/sogrady/2025/01/15/dream-of-hadoop-ai/</link>
<dc:creator><![CDATA[Stephen O'Grady]]></dc:creator>
<pubDate>Wed, 15 Jan 2025 15:51:34 +0000</pubDate>
<category><![CDATA[AI]]></category>
<category><![CDATA[Big Data]]></category>
<category><![CDATA[Data]]></category>
<category><![CDATA[Databases]]></category>
<guid isPermaLink="false">https://redmonk.com/sogrady/?p=6062</guid>
<description><![CDATA[Nineteen years ago come April, Yahoo allowed two developers to release a project called Hadoop as open source software. Based on the Google File System and MapReduce papers from Google, it was designed to enable querying operations on large scale datasets using commodity hardware. Importantly, in contrast to the standard relational databases of the time,]]></description>
<content:encoded><![CDATA[<a href="http://redmonk.com/sogrady/files/2025/01/IMG_2745.png"><img loading="lazy" decoding="async" src="http://redmonk.com/sogrady/files/2025/01/IMG_2745-1024x706.png" alt="" width="1024" height="706" class="aligncenter size-large wp-image-6063" srcset="https://redmonk.com/sogrady/files/2025/01/IMG_2745-1024x706.png 1024w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-300x207.png 300w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-768x529.png 768w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-1536x1059.png 1536w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-2048x1412.png 2048w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-480x331.png 480w, https://redmonk.com/sogrady/files/2025/01/IMG_2745-910x627.png 910w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a>
Nineteen years ago come April, Yahoo allowed two developers to release a project called Hadoop as open source software. Based on the Google File System and MapReduce papers from Google, it was designed to enable querying operations on large scale datasets using commodity hardware. Importantly, in contrast to the standard relational databases of the time, it could handle structured, semi-structured and unstructured data. The dream of Hadoop for many enterprises was opening their vast stores of accumulated data, which variedly widely in structure and normalization, to both routine and ad hoc querying. The ability to easily ask questions of data independent of its scale represented a nirvana for organizations always seeking to operate with better and more real time intelligence.
There were several obstacles to achieving this, however and today, while Hadoop is still around and in use within many enterprises, it has largely been leapfrogged by a variety of other on premise and cloud based alternatives.
One of the first barriers many organizations encountered was the querying itself. Writing a query in Hadoop required an engineer to understand both Java the language and the principles of MapReduce. Many outside of Google, in fact, were surprised when the company – which tended to be secretive and protective of its technology at the time – chose to release the MapReduce paper publicly at all. As it turned out, part of the justification was to simplify the on ramp for external hires; with the paper public, Google could hire talent already familiar with its principles rather than having to spend internal time and money familiarizing themselves with the concept.
So complex, in fact, was the task of writing MapReduce jobs that multiple organizations wrote their own alternative query interfaces; two of the most popular were Hive, created by Facebook, and Pig, a product of Yahoo. Both embraced a SQL-like interface, because it was simpler to hire engineers with SQL experience than with Java and MapReduce skills. IBM, for its part, tried to graft on a spreadsheet-like interface called BigSheets to Hadoop to enable even non-programmers to leverage Hadoop’s underlying scale to query very large scale datasets – what used to be called Big Data.
None of these alternative interfaces took off, however, and for that and a variety of other reasons including its lack of suitability for streaming workloads, number of moving parts and the ready availability of alternative managed services like AWS’ EMR/Redshift, Google BigQuery, Microsoft’s HDInsight / Synapse Analytics or – eventually – Databricks and Snowflake, Hadoop’s traction slipped.
The dream it offered, however, has never been closer.
The problem in recent years has not been the scale of data to be queried. While certain classes of data workloads remain expensive and difficult to operate on, over the last two decades advances in both hardware and software have made operating on large scale data both easier and, relatively speaking at least, more cost effective.
Instead the primary challenge has been the query interface itself. Whatever the language and frameworks selected, SQL-like otherwise, narrowed the funnel of potential users down to employees with the requisite set of technical skills. But as even modest users of today’s LLM systems are aware, querying datasets is now trivial if not a totally solved problem.
Anyone who’s taken the time to upload a set of data – be it public corporate earnings, a climate science dataset or even personal utility consumption data – into a consumer grade LLM can test this out. Gone is the need to write complex queries or carefully refine charts and dashboards. Instead, the interface is simple, natural language questions:
<ol>
<li>What does this balance sheet suggest about the overall health of the business?</li>
<li>What are the year on year trends with respect to temperature, humidity and windspeed within this dataset?</li>
<li>What are the seasonal fluctuations in my electricity consumption and how have they varied over the past three years?</li>
</ol>
There are caveats, of course, most notably the models propensity to make basic errors and the delta between an individual’s dataset and an enterprise’s. But the absolute lack of any friction whatsoever from question to answer is absolutely transformative. While most of the industry’s attention at present has been on AI for <a href="https://redmonk.com/kholterhoff/2023/11/01/10-things-developers-want-from-ai-code-assistants/">code assistance</a>, query assistance is likely to be at least as useful for the average enterprise employee. The benefit to the enterprise from query assistants, in fact, may be substantially greater than code assistants if some of the <a href="https://redmonk.com/rstephens/2024/11/26/dora2024/">counterintuitive findings from the DORA report</a> prove accurate.
Very few enterprises, of course, will be willing to feed the corporate data they once crawled with Hadoop to public models such as ChatGPT, Claude or Gemini. Regardless of promises made on the part of the public models, there is at least for the present a major gap in trust surrounding the potential for – and potential risks of – data exfiltration.
Which explains several things. First, why Snowflake is currently valued at over $55B and Databricks closed a round one month ago valuing the company at $62B. Second, it explains why the two companies have competed fiercely around their respective in house models Arctic and DBRX. And lastly, it helps explain the massive importance of and standardization on Apache Iceberg, which one of my colleagues will be covering in a soon to be released piece.
It’s about the dream of Hadoop, after all. It is well understood that AI <a href="https://redmonk.com/sogrady/2024/05/29/ai-patterns/">advantages incumbents</a>; all other points being equal, most enterprises would prefer to operate models on their data in place rather than have to trust new platforms and third parties, let alone migrate data. Databricks, Snowflake -along with the hyperscalers, obviously – are incumbents already trusted with large scale data from a large number of enterprises; that provides opportunity. Opportunity that they need to unlock with native, existing LLM interfaces – hence their respective investments in models. Iceberg, for its part, is fast becoming the Kubernetes of tables, which is to say the standard substrate on which everything is built across all of the above.
Enterprises have been migrating away from specialized datastores and towards multi-modal, general purpose datastores <a href="https://redmonk.com/sogrady/2021/10/26/general-purpose-database/">for years now</a>, to be sure. AI is just the latest workload they’re expected to handle natively. AI models, in fact, may offer the cleanest path forward towards <a href="https://redmonk.com/sogrady/2022/03/21/vertical-integration/">vertically integrating</a> application-like functionality into the database. It’s more straightforward than acquiring and integrating an independent application platform, certainly. Data vendors may or may not have the market permission to absorb one of the various <a href="https://redmonk.com/sogrady/2023/02/01/ai-paas/">PaaS-like</a> players, but they are already trusted to run AI-workloads – AI workloads that overlap, sometimes significantly, with traditional application workloads. There’s a reason vendors in the space refer to themselves as data platforms: that’s exactly what they are, and are becoming.
The dream of Hadoop isn’t here today, to be clear. Even if the technology were fully ready, questions about security, compliance, access control and more remain. And as always, there are concerns about model hallucinations. But thanks to AI, the financial markets clearly believe it to be closer than it’s ever been. And after using models to query a variety of datasets of varying size and scope, it’s hard to argue the point.
Disclosure: AWS, Google, IBM and Microsoft are RedMonk customers. Databricks, OpenAI and Snowflake are not currently RedMonk customers.
]]></content:encoded>
</item>
</channel>
</rss>

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src attribute if necessary):

<a href="http://www.feedvalidator.org/check.cgi?url=http%3A//redmonk.com/sogrady/feed/"><img src="valid-rss-rogers.png" alt="[Valid RSS]" title="Validate my RSS feed" /></a>

If you would like to create a text link instead, here is the URL you can use:

http://www.feedvalidator.org/check.cgi?url=http%3A//redmonk.com/sogrady/feed/

Home · About · News · Docs · Terms