This is a valid Atom 1.0 feed.
This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
<div class="alert alert-info" role="alert">
line 39, column 0: (32 occurrences) [help]
<div class="gatsby-highlight" data-language="text"><pre class="language-text ...
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<id>https://paul.querna.org/</id>
<title>Paul's Journal</title>
<updated>2020-08-30T03:59:36Z</updated>
<generator>artisanal-by-pq</generator>
<author>
<name>Paul Querna</name>
<email>journal@paul.querna.org</email>
<uri>https://twitter.com/pquerna</uri>
</author>
<link rel="alternate" href="https://paul.querna.org/"/>
<link rel="self" href="https://paul.querna.org/atom.xml"/>
<subtitle>This is my personal feed!</subtitle>
<icon>https://paul.querna.org/favicon.ico</icon>
<entry>
<title type="html"><![CDATA[Understanding and Exploiting Go’s DSA Verify Vulnerability]]></title>
<id>https://paul.querna.org/articles/2019/10/24/dsa-verify-poc/</id>
<link href="https://paul.querna.org/articles/2019/10/24/dsa-verify-poc/">
</link>
<updated>2019-10-24T17:20:00Z</updated>
<summary type="html"><![CDATA[Last week the Go project announced version 1.13.2.
It contained a fix for a bug in the function. The bug is considered a security…]]></summary>
<content type="html"><![CDATA[<p>Last week the <a href="https://groups.google.com/forum/#!topic/golang-announce/lVEm7llp0w0">Go project announced version 1.13.2</a>.
It contained a fix for a bug in the <code class="language-text">dsa.Verify</code> function. The bug is considered a security vulnerability and was assigned the name
CVE-2019-17596. Using CVSS to score the vulnerability, it would likely be classified as a <code class="language-text">MEDIUM</code>, because an attack vector for the vulnerability is over the network, without authentication. </p>
<div class="alert alert-info" role="alert">
<a href="https://github.com/pquerna/poc-dsa-verify-CVE-2019-17596">Just take me to the code on github!</a>
</div>
<p>The Go language has a <a href="https://www.cvedetails.com/vendor/14185/Golang.html">good track record from a security point of view</a>. Vulnerabilities have historically been in the developer toolchain (eg, effecting <code class="language-text">go get</code>), or logical errors. This vulnerability is different. It is a null pointer dereference causing a panic. Perhaps more importantly is that it could be exploited in many “pre-authentication” contexts. This is because public key cryptographic algorithms like the Digital Signature Algorithm (DSA) are used as an authentication mechanisms. Thankfully, due to the design of the Go language, this vulnerability is limited to crashing the process, and does not appear to be a mechanism to trigger a remote code execution or have a more serious impact.</p>
<p>A few years ago I was discussing network level pre-authentication exploits with <a href="https://twitter.com/marcwrogers">Marc Rogers</a>. I made a ridiculous statement about how they just aren’t going to happen that often — and how this is an important component for the vision of Zero Trust architectures. Marc responded with this:</p>
<blockquote class="blockquote text-center">
<p class="mb-0">Anything man makes, man can break</p>
<footer class="blockquote-footer">Marc Rogers</footer>
</blockquote>
<p>And today, Marc is right. DSA is math at its heart, but the <strong>implementation</strong> is still man made, and was broken. I hope these kinds of vulnerabilities aren’t common, since we need some building blocks to build systems upon, but since this is a more interesting vulnerability, I thought it would be fun to dive into how it works and see if we can build an exploit.</p>
<h2>Background and initial research</h2>
<p>The release announcement email said “Invalid DSA public keys can cause a panic in dsa.Verify”. Sounds simple enough, although the Go project did not provide any examples of what an invalid public key looks like. The next step is to look at the fix, as <a href="https://github.com/golang/go/commit/4cabf6992e98f74a324e6f814a7cb35e41b05f25">committed to git in the 1.13 release branch:</a></p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> w := new(big.Int).ModInverse(s, pub.Q)
+ if w == nil {
+ return false
+ }</code></pre></div>
<p>The <code class="language-text">math/big</code> package deals with very large numbers, and <a href="https://golang.org/pkg/math/big/#Int">the <code class="language-text">big.Int</code> type has a different design pattern</a> than many parts of Go standard library: Many functions in the package return new copies of <code class="language-text">*big.Int</code> for an operation, and if that operation has an error, they return nil instead. The <code class="language-text">big.Int.ModInverse</code> function is documented as doing this. If we look further along in the the <code class="language-text">dsa.Verify</code> function, we can see w is used without checking if <code class="language-text">ModInverse</code> failed. The commit to fix the bug is a simple guard, checking the return value of <code class="language-text">ModInverse</code>, and failing verification if it failed.</p>
<h2>Breaking ModInverse</h2>
<p>Since the release announcement mentioned invalid public keys could cause the panic, it seems clear we just need to make a <code class="language-text">pub.Q</code> that causes the ModInverse function to return <code class="language-text">nil</code>. The ModInverse documents its failure conditions:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">// ModInverse sets z to the multiplicative inverse of g in the ring ℤ/nℤ
// and returns z. If g and n are not relatively prime, g has no multiplicative
// inverse in the ring ℤ/nℤ. In this case, z is unchanged and the return value
// is nil.</code></pre></div>
<p>I didn’t take time to really comprehend what the documentation was explaining, instead thinking I needed to construct a seemingly valid pub.Q, but slightly invalid somehow. I dove headfirst in with an ignorant fuzzing phase. I started with a pub.Q from a valid DSA key parameters, and thought I could increment it by one until ModInverse failed. I made a small test case, let my laptop run for a minute trying higher values, but it did not work.</p>
<p>I paused and took time to read the code, to understand what <code class="language-text">ModInverse</code> is doing. The critical error condition path is this:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> d.GCD(&x, nil, g, n)
// if and only if d==1, g and n are relatively prime
if d.Cmp(intOne) != 0 {
return nil
}</code></pre></div>
<p><small class="text-muted"><a href="https://github.com/golang/go/blob/go1.13.1/src/math/big/int.go#L773-L778">src/math/big/int.go#L758-L788</a></small></p>
<p>We just need the Greatest Common Denominator (GCD) of the two numbers to not be the integer 1. </p>
<p>The other piece of information I realized at this point, was that the r parameter in the <code class="language-text">dsa.Verify</code> function is from the DSA SIgnature. In most cases if you are an attacker, you could be in a position to provide both the public key and the signature to verify. After staring at very large numbers like <code class="language-text">1289233352290115814210005730521570412018870172097</code> for awhile, I decided to use the smallest numbers possible that could cause a GCD of more than one.</p>
<p>When you reduce the problem down to this, you could use the number two (2) for <code class="language-text">r</code>, and four (4) for <code class="language-text">pub.Q</code>, since the greatest common denominator of these numbers is 2, the condition returns nil:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> r := new(big.Int).SetInt64(2)
q := new(big.Int).SetInt64(4)
d := new(big.Int).GCD(nil, nil, r, q)</code></pre></div>
<p><small class="text-muted"><a href="https://play.golang.org/p/fuVrBIUUzux">full example</a></small></p>
<h2>Back to DSA Verify</h2>
<p>Now that we have numbers that cause <code class="language-text">ModInverse</code> to return nil, we need to construct a test case that can cause <code class="language-text">dsa.Verify</code> to crash. But when I tried our numbers out, I saw that <code class="language-text">dsa.Verify</code> returned false instead of crashing. Going back to the unpatched function, we see this:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> if r.Sign() < 1 || r.Cmp(pub.Q) >= 0 {
return false
}
if s.Sign() < 1 || s.Cmp(pub.Q) >= 0 {
return false
}
w := new(big.Int).ModInverse(s, pub.Q)
n := pub.Q.BitLen()
if n&7 != 0 {
return false
}</code></pre></div>
<p><small class="text-muted"><a href="https://github.com/golang/go/blob/go1.13.1/src/crypto/dsa/dsa.go#L274-L286">src/crypto/dsa/dsa.go#L274-L286</a></small></p>
<p>There are 3 conditionals we must pass before crashing, in addition <code class="language-text">ModInverse</code> returning <code class="language-text">nil</code>. The first two conditions are simple enough, we cannot use a negative r or s, and the pub.Q must be greater than r and s. Our choices of 2 and 4 work fine. The last conditional is a little different. It’s checking how many bits it would take to represent pub.Q in binary. With a value of 4, the <code class="language-text">BitLen()</code> is only 3. The minimal value with a <code class="language-text">BitLen()</code> that is greater than 7 is 128.</p>
<p>Setting <code class="language-text">s=2</code>, <code class="language-text">r=2</code>, <code class="language-text">pub.Q=128</code> we are able to crash <code class="language-text">dsa.Verify</code>:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> r.SetInt64(2)
s.SetInt64(2)
priv.PublicKey.Q.SetInt64(128)
dsa.Verify(&priv.PublicKey, hashed, r, s))</code></pre></div>
<p><small class="text-muted"><a href="https://github.com/pquerna/poc-dsa-verify-CVE-2019-17596/blob/master/dsa_test.go">complete dsa_test.go</a></small></p>
<h2>Exploiting dsa.Verify over SSH</h2>
<p>Making a local test case that crashes is trivial even if there isn’t a security vulnerability, what makes this crash interesting is if we can trigger it over a network protocol. Many protocols can use DSA to verify the identity of the other peer. I wanted to demonstrate the vulnerability in a protocol that many people used, but in a proof of concept that is not directly weaponizable. Breaking SSH Clients seemed like a good target, since it would require a man in the middle connection for most attackers, and is just a client crash worst case. I’m going to leave exploiting this vulnerability TLS Client Certificates as an exercise for the reader…</p>
<p>In the SSH-2 protocol, there is a Key Exchange phase. One of the messages from the Server to the Client signed with its “host key”, and as part of the protocol, the client must run the <code class="language-text">dsa.Verify</code> function on this signed data. For this exploit, all we need to do is inject our bad values for r, s, and pub.Q into the SSH Key exchange.</p>
<p>The <a href="https://github.com/gliderlabs/ssh">gliderlabs/ssh</a> package makes it easy to construct a mock SSH server, so then we can try to crash an SSH client. On the server, the first step is to construct a crypto.Signer which returns our evil values:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> priv.PublicKey.Q.SetInt64(128)
fs := &fakeSigner{
R: new(big.Int).SetInt64(2),
S: new(big.Int).SetInt64(2),
public: priv.PublicKey,
}</code></pre></div>
<p><small class="text-muted"><a href="https://github.com/pquerna/poc-dsa-verify-CVE-2019-17596/blob/master/ssh_test.go#L19-L24">ssh_test.go#L19-L24</a></small></p>
<p>The <code class="language-text">crypto/ssh</code> package uses a different interface for its <code class="language-text">Signers</code>, but there is a helper function to convert a crypto.Signer into the interface the ssh package needs: <code class="language-text">ssh.NewSignerFromSigner</code>. To the mock SSH server, we add the evil signer as a host key.</p>
<p>On the client, we just call ssh.Dial with a default configuration:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text"> conn, err := gossh.Dial("tcp", addr, clientConfig)
require.NoError(t, err)
defer conn.Close()</code></pre></div>
<p><small class="text-muted"><a href="https://github.com/pquerna/poc-dsa-verify-CVE-2019-17596/blob/master/ssh_test.go#L47-L51">ssh_test.go#L47-L51</a></small></p>
<p>Running this with Go 1.13.1, we get a crash:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x536c9c]
goroutine 6 [running]:
math/big.(*Int).Mul(0xc000043bb8, 0xc000043bd8, 0x0, 0xc000016460)
math/big/int.go:168 +0xdc
crypto/dsa.Verify(0xc00000e460, 0xc000016460, 0x14, 0x20, 0xc000043cc0, 0xc000043ca0, 0xc000120280)
crypto/dsa/dsa.go:289 +0x214
golang.org/x/crypto/ssh.(*dsaPublicKey).Verify(0xc00000e460, 0xc000016440, 0x20, 0x20, 0xc00011a2a0, 0x0, 0x0)
golang.org/x/crypto@v0.0.0-20191011191535-87dc89f01550/ssh/keys.go:474 +0x367
golang.org/x/crypto/ssh.verifyHostKeySignature(0x807f00, 0xc00000e460, 0xc00011e580, 0x807f00, 0xc00000e460)
golang.org/x/crypto@v0.0.0-20191011191535-87dc89f01550/ssh/client.go:124 +0xd9</code></pre></div>
<p>Another interesting part of this crash, is because of how the SSH Client library is using goroutines for processing, it is not possible to use the <code class="language-text">recover()</code> function to return from the crash. The <code class="language-text">ssh.Dial</code> creates a goroutine for the connection, and when this verify fails, its in a new goroutine without a recover function, meaning the Go runtime has no choice but to exit the process. This design and use of goroutines in the <code class="language-text">ssh.Client</code> is not a good pattern, since callers to Dial are unable to recover from errors. In <a href="https://github.com/golang/go/issues/34960">issue #34960</a>, it describes that the effect on <code class="language-text">net/http.Server</code> is limited because it internally <code class="language-text">recover()</code> the panic in its connection handling goroutine. </p>
<p>Test cases using a Docker container that has an old and vulnerable version of Go are on github at <a href="https://github.com/pquerna/poc-dsa-verify-CVE-2019-17596">pquerna/poc-dsa-verify-CVE-2019-17596</a></p>
<h2>Conclusion</h2>
<p>This type of vulnerability is a good example of one that could be identified using static analysis or fuzzing. Daniel Mandragona and regilero were credited with discovering and reporting the issue, but I have not seen any mention of how they found the bug. Even with static analysis, it would take some work to fully understand if the error conditions could actually be exploited, which leads to many static analysis issues being ignored.</p>
<p>Finally, as a general statement, DSA itself should not be used anymore. The only reason to enable it is to support a legacy system. OpenSSH for example removed support for DSA in recent releases. This class of vulnerability isn’t isolated to DSA, but every code path has potential vulnerabilities — so if you can disable DSA completely in your systems, you should.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Mission vs Strategy: Github and Open Source]]></title>
<id>https://paul.querna.org/articles/2018/06/04/mission-vs-strategy/</id>
<link href="https://paul.querna.org/articles/2018/06/04/mission-vs-strategy/">
</link>
<updated>2018-06-04T18:45:00Z</updated>
<summary type="html"><![CDATA[Today, Microsoft announced it’s 7.5 Billion dollar acquisition of Github. I want to start with a congratulations to my friends at Github — I…]]></summary>
<content type="html"><![CDATA[<p>Today, <a href="https://news.microsoft.com/2018/06/04/microsoft-to-acquire-github-for-7-5-billion/">Microsoft announced it’s 7.5 Billion dollar acquisition of Github</a>. I want to start with a congratulations to my friends at Github — I think this is a great outcome, and Microsoft is probably one of the best long term homes for Github.</p>
<p>I want to address some of the negativity about the acquisition. Parts of the open source community are worried that Microsoft is going to ruin Github. I think these concerns are misplaced at a tactics and strategy level. However, at a mission level, I think it is important to compare Github to an open source foundation like the Apache Software Foundation<sup><a href="#foundation-1">1</a></sup>.</p>
<p>I’ve been a long term user of Github and an advocate for using it for many years. I’m also a member of the Apache Software Foundation. These are two of the juggernauts of the modern open source movement. They both are trying to encourage contributions to existing projects, they are both trying to get communities to live and grow on their platforms.</p>
<p>I think parts of the community is missing something important: The ASF and Github are alike in so many ways, but they have massively different missions.</p>
<h2>Different Missions</h2>
<p>Github was a venture backed, for-profit corporation. Github has used tactics and strategy to create returns is based on growing open source communities. This means as an open source community, you had a short term synergistic relationship with Github. The Github product helped your community grow and be productive. This is great. Github also had other strategies that leveraged it’s open source popularity to create business services and an enterprise on-premise product. </p>
<p>Github’s mission, as a for-profit corporation, is to generate a financial return. VC backing, which also mandates the generation of a return, only reinforces this mission.</p>
<p>Apache is a non-profit 501(c)(3) foundation. Its tactics and strategy are to provide services and support for open source communities. Seemingly, not that different from a community point of view in the short term.</p>
<p>Apache’s mission however, is to provide software for the public good. <a href="https://www.apache.org/foundation/">It’s literally the first line on the ASF About Page</a>.</p>
<p>A mission is important. When people in a place change, so will the tactics and strategy.</p>
<h2>100 years later</h2>
<p>In 100 years, I hope the ASF is still a relevant way to provide software for the public good. I think there is a decent chance of this.</p>
<p>In 100 years, I don’t know if even the Microsoft brand will exist, let alone Github.</p>
<p>Github was never a replacement for the ASF, and at the same time, the <strong>ASF should learn from it</strong>. Github massively widened who contributes to open source. They made contributions easier. They innovated on what open source even means. They built an amazing product that I use every day.</p>
<p>The communities I’m part of, I believe will outlive Github. Communities can benefit from these for-profit endeavors. Synergy between their needs and the tactics of a for-profit company are good for the community. But as a community we must understand that we are part of the product, There is a benefit to the company for helping.</p>
<p><strong>Github’s strategy including building an amazing product, but don’t confuse the missions.</strong></p>
<hr/>
<p><a name="foundation-1">[1]</a>: I used the ASF as the primary example in this post, but you can swap ASF for Node Foundation, Linux Foundation, Free Software Foundation, etc. They all have broadly similar missions around producing software and supporting communities.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[My next journey, ScaleFT]]></title>
<id>https://paul.querna.org/articles/2015/05/11/scaleft/</id>
<link href="https://paul.querna.org/articles/2015/05/11/scaleft/">
</link>
<updated>2015-05-11T21:30:00Z</updated>
<summary type="html"><![CDATA[I’m excited to announce that I’m co-founding a startup: ScaleFT Datacenter Knowledge: ScaleFT Wants To Help Ops Teams Tackle Complexity Of…]]></summary>
<content type="html"><![CDATA[<p>I’m excited to announce that I’m co-founding a startup: <a href="https://www.scaleft.com/">ScaleFT</a></p>
<ul>
<li>Datacenter Knowledge: <a href="http://www.datacenterknowledge.com/archives/2015/05/11/scaleft-wants-help-ops-teams-tackle-complexity-running-public-clouds/">ScaleFT Wants To Help Ops Teams Tackle Complexity Of Running On Public Clouds</a></li>
<li>Fortune: <a href="http://fortune.com/2015/05/11/rackspace-backed-startup-security/">Rackspace-backed startup seeks to boost security across clouds</a></li>
<li>TechCrunch: <a href="http://techcrunch.com/2015/05/11/scaleft-wants-to-make-managing-public-clouds-safer-raises-800k-seed-round-led-by-rackspace/">ScaleFT Wants To Make Managing Public Clouds Safer, Raises $800K Seed Round</a></li>
<li>VentureBeat: <a href="http://venturebeat.com/2015/05/11/scaleft-launches-with-800k-from-cloudkick-founders-and-rackspace/">ScaleFT launches with $800K from Cloudkick founders and Rackspace</a></li>
</ul>
<h2>Reflection</h2>
<p>In the last 6 and half years, I’ve seen so much: From a <a href="http://geoff.greer.fm/photos/cloudkick_office/">rag tag startup</a>, to <a href="https://journal.paul.querna.org/articles/2010/12/16/rackspace-acquires-cloudkick/">being acquired</a>, to building <a href="https://journal.paul.querna.org/articles/2014/07/02/putting-teeth-in-our-public-cloud/">products at a publicly traded company</a>. I was also lucky enough to encounter my wife Kristy through this process. I have no regrets about it. This was one of those good runs, a run of time through which I met many interesting people who will forever shape my life.</p>
<p>But the time has come: To branch out, to explore, to define something as my own — and that is why I’m excited, to create, to push, to learn, to be a founder.</p>
<p>I find it amusing that Cloudkick’s original mission was to make Sysadmin’s lives better: Cloudkick started, with the basics, like visualizing your servers and monitoring. Cloudkick was acquired before we got much further. I look at <a href="https://coreos.com/">CoreOS as a continuation</a>, iterating on what it means to be an operating systems. ScaleFT has the same basic domain, with this tilt: How can we iterate on how a team of humans operates a software system. </p>
<p>For example, I see often actions taken in production are reported via an email after the fact — “Hey I just change X on the load balancer” in an email to the team. This is a common experience for operations teams. I think we can do better. I think we can make the actions a person takes in production reflected in many places, instantly, accurately, and in a way that augments teams to the achieve their fullest potential.</p>
<p>I’ve personally lived in the space between operations and software development. I want to make this a better world, an efficient world, a safe world — so that is why I’m creating <a href="https://www.scaleft.com/">ScaleFT</a> — to push the boundaries, to create a company dedicated to this, to iterating on what production operations itself means.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Putting Teeth in Our Public Cloud]]></title>
<id>https://paul.querna.org/articles/2014/07/02/putting-teeth-in-our-public-cloud/</id>
<link href="https://paul.querna.org/articles/2014/07/02/putting-teeth-in-our-public-cloud/">
</link>
<updated>2014-07-02T18:18:18Z</updated>
<summary type="html"><![CDATA[Less than a year ago ago my team started building Rackspace OnMetal — the project was a whirlwind. Below I have outlined some of our major…]]></summary>
<content type="html"><![CDATA[<p>Less than a year ago ago my team started building <a href="http://www.rackspace.com/cloud/servers/onmetal/">Rackspace OnMetal</a> — the project was a whirlwind. Below I have outlined some of our major decisions as the product progressed and my own experiences. I am very thankful to the many different Rackers that took part in this project.</p>
<h1>Why would you build OnMetal?</h1>
<p>There was no master plan from 5 years ago to build OnMetal. A handful of advocates within Rackspace pushed to create the OnMetal product, because of our joint experiences in building and using infrastructure.</p>
<p>When Rackspace acquired <a href="http://readwrite.com/2010/12/15/another-giant-buys-another-sex">Cloudkick 3.5 years ago</a> we were running hundreds of cloud instances — we were spending huge amounts of engineering effort working around their flaws and unpredictability. The technical challenge was fun. We got to know the internals of <a href="http://paul.querna.org/slides/cassandra-summit-cloudkick.pdf">Apache Cassandra really well</a>. We would spend weeks rewriting systems, an eternity in startup-time, just because a cloud server with 8 gigabytes of RAM was falling over.</p>
<p>Once acquired, we escaped virtualization and entered the supposed nirvana of designing custom servers. We customized servers with an extra 32 gigabytes of RAM or a different hard disk model. Because of our different generations of data centers, we had to vary cabinet density based on power and cooling. We also had to build capacity models for our network switches and pick different models, but why was I doing this? I just want to build products. I do not to want worry about when we should be buying a larger piece of network equipment. I also definitely do not care how many kilowatts per cabinet I can put into one data-center but not the other.</p>
<h1>Starting a team</h1>
<p>By the summer of 2013 I was looking for a new project. I had spent the last 12 months as part of the Corporate Development team working mostly on partnerships and <a href="http://www.rackspace.com/blog/newsarticles/objectrocket/">acquisitions</a>. My role involved bringing technical expertise to bear on external products and judging how they could match our internal priorities. It was a fun experience and I enjoyed meeting startup founders and learning more about business, but I wanted to make products again.</p>
<p><a href="https://twitter.com/kontsevoy">Ev, one of the Mailgun founders</a>, had recently moved to San Antonio and was also looking for a new project. Ev and I both wanted to build an exciting and impactful product. We had both experienced building infrastructure products on top of virtualized clouds and colocation. We saw opportunities for improvement of multi-tenancy in a public cloud, and at the same time we could attack the complexities found with colocation. After a couple brainstorming sessions, we agreed about the basics of the idea: Deliver workload optimized servers without a hypervisor via an API. We called this project “Teeth”. Teeth is an aggressive word; we wanted our cloud to project a more aggressive point of view to the world. We also knew that the code name of Teeth was so ridiculous that no one from marketing would let us use it as the final name.</p>
<h1>Teeth is Born!</h1>
<p><img src="/assets/posts/putting-teeth-in-our-public-cloud/teeth-logo.png"></p>
<p><em>The Teeth team logo. <a href="https://twitter.com/fredland">@fredland</a> sketched it on our team whiteboard, we adopted it.</em></p>
<p>The Teeth project started as part of our Corporate Strategy group — a miniature startup — outside of our regular product development organization. This removed most of the organizational reporting and structure, and gave our day to day a more startup-like feeling. This let us get prototypes going very quickly, but definitely had trade-offs in other areas. We found that while we were building a control plane for servers, integration with other teams like Supply Chain or Datacenter operations was critical — but because we were not in the normal product organization we had to use new processes to work with these teams.</p>
<p>As we kicked off Teeth, it was just a team of two: Ev and myself. We had gotten signoff at the highest levels, but we were still just two people in a 5,500 person company. Getting hardware in a datacenter is easy at Rackspace, but it was clear for our first hardware the project needed a more lab-like environment. I wanted to be able to crash a server’s <a href="http://searchnetworking.techtarget.com/definition/baseboard-management-controller">baseboard management card (BMC)</a> and not have to file a ticket for someone physically power cycle the server. I was working out of the Rackspace San Francisco office, and unlike our headquarters we didn’t have a real hardware lab with extra hardware lying around.</p>
<p>We put in a request for hardware through our standard channels. We were told a timeline measured in weeks. We waited a few weeks, but the priority of our order was put behind other projects. This was a legitimate reaction from internal groups, they had much larger projects on tighter timelines, more important than two engineers wanting to fool around in a lab. After conferring with Ev, we did what any reasonable startup would do: I went on to Newegg.com and bought 3 servers to be our first test kit. My only requirement for the servers is that they had working <a href="http://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface">BMCs with IPMI</a>, so I ordered 3 of the cheapest SuperMicro servers on Newegg. They arrived in the office 48 hours later. </p>
<h1>Using Open Compute</h1>
<p>Rackspace knew it does not create value from proprietary server designs, we create value from reducing the complexity of computing for our customers. However, before Teeth, we had tinkered, but hadn’t yet deployed a large scale system using OCP servers. </p>
<p>Rackspace has supported the <a href="http://www.opencompute.org/">Open Compute Project (OCP)</a> from the <a href="http://www.rackspace.com/blog/what-the-open-compute-project-means-to-me/">beginning of the project</a>. Our team is mostly <em>software-people</em>, and we love open source. We knew it was risky and could take longer to build Teeth on top of OCP, but we believed fundamentally that OCP is how servers should be built.</p>
<p>Once we picked the OCP platform, we received OCP servers into our lab environment and we iterated on the BIOS and Firmwares with our vendors. We required specific security enhancements and changes to how the BMC behaved for Teeth.</p>
<p>Using OCP has been a great experience. We were able to achieve a high density, acquire specific hardware configurations, customize firmwares and still have a low cost per server. As the OnMetal product continues to mature, I want our team to push back our learnings to the OCP community, especially around the BIOS, Firmware, and BMCs.</p>
<h1>The move to OpenStack</h1>
<p>OpenStack Nova has had a <a href="https://wiki.openstack.org/wiki/Baremetal">baremetal driver since the Grizzly release</a>, but the project has always documented the driver as experimental. It remained experimental because it had only a few developers and <a href="https://bugs.launchpad.net/nova/+bugs?field.tag=baremetal+">many bugs</a>. The Nova community also realized that a baremetal driver had a scope much larger than all of its other drivers. It had to manage physical hardware, BMCs, top of rack switches, and use many low level protocols to do this. This <a href="https://wiki.openstack.org/wiki/BaremetalSplitRationale">realization by the community</a> led to OpenStack Ironic being created as a standalone baremetal management project. The goal is to have a small driver in Nova, and the majority of the complexities can be handled in Ironic.</p>
<h2>Researching OpenStack Ironic</h2>
<p>When the team started building our Teeth prototype, OpenStack Ironic did not seem finished and we weren’t sure how quickly it would progress. Researching the project we found that the default PXE deployment driver was the main focus of development.</p>
<p><img src="/assets/posts/putting-teeth-in-our-public-cloud/ironic-sequence-pxe-deploy.png"></p>
<p>The Ironic PXE deployment method works by running a TFTP server on each Ironic Conductor, and as the node boots serving out a custom configuration for each baremetal node. Once the baremetal node is booted into the Deployment image, the deployment image exports local disks via iSCSI back to the Ironic Conductor. The <a href="https://www.google.com/url?q=https%3A%2F%2Fblueprints.launchpad.net%2Fironic%2F%2Bspec%2Fpxe-mount-and-dd&sa=D&sntz=1&usg=AFQjCNFPbcjJ6BEeKRZa7qVltCnnhCiXEA">Ironic Conductor can then write out the requested image</a> using dd. Once this is complete the TFTP configuration is rewritten to reference the User image, and then the Baremetal node is rebooted.</p>
<p>As we researched the existing Ironic PXE deployment method we were unhappy for these reasons:</p>
<ul>
<li>Minimum of two power cycles to provision a node, increasing the time to provision a baremetal node significantly.</li>
<li>The Deployment Image had limited functionality, and we had difficulty extending the Ramdisk mode of <a href="https://github.com/openstack/diskimage-builder">diskimage-builder</a>. We wanted to accomplish tasks like flashing firmwares and using SATA secure erase.</li>
<li>The Ironic Conductors rewrite the TFTP configurations multiple times for different baremetal node states, increasing complexity and introducing systems that are harder to make highly available.</li>
<li>The Ironic Conductors being responsible for writing disk images over iSCSI presented a performance bottleneck.</li>
</ul>
<p>Because of these reasons we started the Teeth prototype outside of OpenStack Ironic. We still wanted Nova integration, so we built our prototype as a Nova driver, and separate control plane, conceptually similar to Ironic’s architecture.</p>
<h2>Back into OpenStack</h2>
<p>By early 2014 we saw that the control plane we were building mirrored Ironic closely. We were solving the same problem, and we wanted our users to use the same Nova public API. Looking at what we had built and looking at Ironic again, we saw we only needed to change how Ironic deployments themselves worked. We decided to attend the Ironic mid-cycle meetup in February 2014. At the meetup our team explained how our Teeth prototype used an <a href="https://blueprints.launchpad.net/ironic/+spec/agent-driver">“Agent” based model</a>, where a long-running Agent running in a RAM disk can take commands from the control plane. This Agent based approach eventually was renamed Ironic Python Agent (IPA, yes, the team was excited to name their software after beer).</p>
<p>The Ironic Python Agent presents an HTTP API that the Ironic Conductor can interact with. For IPA, we decided early to build upon two architectural pillars:</p>
<ol>
<li>Use HTTP for as much as possible, bootstrapping out of old and low level protocols.</li>
<li>Do as much work before an instance is actually deployed, amortizing the costs of rebooting or flashing firmware to happen before a tenant ever asks for a baremetal node.</li>
</ol>
<p><img src="/assets/posts/putting-teeth-in-our-public-cloud/ironic-sequence-ipa-deploy.png"></p>
<p>With IPA the DHCP, PXE and TFTP configurations become the static for all baremetal nodes, reducing complexity. Once running, the Agent sends a heartbeat to the Ironic Conductors with hardware information. Then the Conductors can order the Agent to take different actions. For example in the case of provisioning an instance, the Conductor sends an HTTP POST to prepare_image with the URL for an Image, and the Agent downloads and writes it to disk itself, keeping the Ironic Conductor out of the data plane for an image download. Once the image is written to disk, the Ironic Conductor simply reboots the baremetal node, and it boots from disk, removing a runtime dependency on a DHCP or TFTP server.</p>
<p>After the successful mid-cycle meetup and the welcoming attitude we saw, we decided to become an active participant with the community. We abandoned our proprietary prototype, and have been contributing to the <a href="https://wiki.openstack.org/wiki/Ironic-python-agent">Ironic Python Agent</a> deployment method and the Ironic control plane inside the OpenStack community.</p>
<h1>Integrating back into Rackspace</h1>
<p>As our small team progressed in developing Teeth, we began to see a need to integrate into existing Rackspace products and organizational processes. For example we wanted the OnMetal flavors to show up in the standard Nova API, along all of our other flavors. To implement this we needed our Ironic system to be integrated with Nova. We did this by creating a <a href="http://developer.rackspace.com/blog/how-we-run-ironic-and-you-can-too.html">new Nova cell just for the OnMetal flavor types</a>. The top level cell only needs basic information about our instances, and then the nova-compute instances in our cell load the Ironic virt driver where all the hard work happens.</p>
<p>As we integrated software systems, our startup behaviors and structures were less valuable. We needed to reduce confusion and tension with the rest of the company. Once we moved to an integration mode, we moved the engineering team back into our normal product development organization. The teams quickly started working together closely and we hit our execution targets. In some ways it was like a mini-startup being acquired and quickly integrating into a larger company.</p>
<h1>Announcing OnMetal</h1>
<p>We wanted to announce Teeth to the public this summer. We considered the <a href="https://www.openstack.org/summit/openstack-summit-atlanta-2014/">OpenStack Summit in Atlanta</a> — we believe the combination of OpenStack software with Open Compute hardware is a great message for the community. But instead of announcing a product, we preferred to focus our discussion with the community at the <a href="http://www.rackspace.com/blog/just-rebels-or-a-rebel-alliance-openstack-summit-keynote-video/">OpenStack Summit on the Rebel Alliance vision</a>.</p>
<p>The <a href="http://events.gigaom.com/structure-2014/">Structure Conference</a> presented a great opportunity to show our message. Our message is that platforms should be open. That offerings should be specialized for their workloads. <a href="http://www.rackspace.com/blog/onmetal-the-right-way-to-scale/">That using Containers and OnMetal are another way we can reduce complexity from running large applications</a>. That we are not stuck on a virtualization only path. That our customers find value from reducing complexity and having a best fit infrastructure.</p>
<p>After working on the Teeth project it felt great to see our message has been well received by both <a href="http://www.techrepublic.com/article/rackspaces-onmetal-service-eliminates-noisy-neighbor-problems/">the press</a> and <a href="https://twitter.com/eastdakota/status/479803835884396544">Twitter commentary</a>. Interest in the OnMetal product offering has been overwhelming, and now our team is focusing on fixing as many bugs as possible, onlining more cabinets of OCP servers for capacity, and preparing for the general availability phase of the product.</p>
<p><em>Thanks to <a href="https://twitter.com/ahaislip">Alexander Haislip</a>, <a href="https://twitter.com/cioj">Robert Chiniquy</a> and <a href="https://twitter.com/comstud">Chris Behrens</a> for reviewing drafts of this post.</em></p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Announcing Rackspace OnMetal]]></title>
<id>https://paul.querna.org/articles/2014/06/20/rackspace-onmetal-launched/</id>
<link href="https://paul.querna.org/articles/2014/06/20/rackspace-onmetal-launched/">
</link>
<updated>2014-06-20T20:20:20Z</updated>
<summary type="html"><![CDATA[My secret project has been announced: Rackspace OnMetal Cloud Servers. What is OnMetal? OnMetal is bare metal servers via the OpenStack Nova…]]></summary>
<content type="html"><![CDATA[<p><img src="/assets/posts/onmetal-launched/onmetal.png"></p>
<p>My secret project has been announced: <a href="http://www.rackspace.com/cloud/servers/onmetal/">Rackspace OnMetal Cloud Servers</a>.</p>
<h1>What is OnMetal?</h1>
<p>OnMetal is bare metal servers via the OpenStack Nova API. These servers contain no hypervisor or other abstraction when your operating is running on them — they are bare metal, and available over an API. They have utility billing and other attributes of cloud servers, but are single tenant up to the top of rack switch. The underlying product is built on top of <a href="https://wiki.openstack.org/wiki/Ironic">OpenStack Ironic</a> and <a href="http://www.opencompute.org/">Open Compute Servers</a>.</p>
<p>I have spent the last decade building software deployed into combinations of colocation and cloud infrastructures. They all sucked in their own special ways. This product is about taking the dynamic advantages of cloud and combining it with the performance and economics of colocation.</p>
<h1>I want to learn more!</h1>
<ul>
<li><a href="http://www.rackspace.com/blog/onmetal-the-right-way-to-scale/">OnMetal: The Right Way To Scale</a></li>
<li><a href="http://developer.rackspace.com/blog/how-we-run-ironic-and-you-can-too.html">How OnMetal uses OpenStack Ironic</a></li>
<li><a href="http://www.rackspace.com/onmetal">Rackspace OnMetal product website</a>.</li>
</ul>
<h1>What are the OnMetal instance specifications?</h1>
<p>Each instance is specialized for a specific task:</p>
<br/>
<table border="1">
<tr style="background-color: #CCCCCC">
<th style="width: 200px">Instance Type</th>
<th>CPU</th>
<th>RAM</th>
<th>IO</th>
</tr>
<tr>
<td>OnMetal IO v1</td>
<td>2x 2.8Ghz 10 core <a href="http://ark.intel.com/products/75277">E5-2680v2 Xeon</a></td>
<td>128GB</td>
<td>2x <strong><a href="http://www.lsi.com/products/flash-accelerators/pages/nytro-warpdrive-blp4-1600.aspx">LSI Nytro WarpDrive BLP4-1600(1.6TB)</a></strong> and Boot device (32G SATADOM).</td>
</tr>
<tr>
<td>OnMetal Memory v1</td>
<td>2x 2.6Ghz 6 core <a href="http://ark.intel.com/products/75790/">E5-2630v2 Xeon</a></td>
<td><strong>512GB</strong></td>
<td>Boot device only. (32G SATADOM)</td>
</tr>
<tr>
<td>OnMetal Compute v1</td>
<td>1x 2.8Ghz 10 core <a href="http://ark.intel.com/products/75277">E5-2680v2 Xeon</a></td>
<td>32GB</td>
<td>Boot device only. (32G SATADOM)</td>
</tr>
</table>
<br/>
<p>While I am a believer in the eventual winning of <a href="http://mesos.apache.org/">Mesos-like scheduling systems</a>, the reality of today is that developers want extreme mixes of server profiles. OnMetal provides this with an IO instance with 3.2TB of Flash Storage, a Memory instance with 512GB of RAM, and an economical compute instance with 10 fast cores and lots of network.</p>
<p>Additionally each instance has dual 10 gigabit network connections in a <a href="http://en.wikipedia.org/wiki/Link_aggregation">high availability MLAG</a>.</p>
<h1>When can I get it?</h1>
<p>OnMetal is currently in an “early access” program. General Availability is expected by the end of July 2014. </p>
<h1>How do I use it?</h1>
<p>The instances are just another flavor type in the Rackspace Public Cloud API — you just pass in <code class="language-text">onmetal-io-v1</code> instead of <code class="language-text">performance2-120</code> as the flavor type, and it shows up, just like a virtualized cloud server would.</p>
<h1>How much do these instances cost?</h1>
<p>We are not releasing pricing yet. Soon™.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[SSH Proxy Commands that use `sed`]]></title>
<id>https://paul.querna.org/articles/2014/06/09/ssh-proxy-using-sed/</id>
<link href="https://paul.querna.org/articles/2014/06/09/ssh-proxy-using-sed/">
</link>
<updated>2014-06-09T17:17:17Z</updated>
<summary type="html"><![CDATA[If you find yourself using a Bastion or Jump Server very often, you quickly become familiar with man ssh_config. One trick I’ve recently…]]></summary>
<content type="html"><![CDATA[<p>If you find yourself using a <a href="http://en.wikipedia.org/wiki/Bastion_host">Bastion</a> or <a href="http://en.wikipedia.org/wiki/Jump_Server">Jump Server</a> very often, you quickly become familiar with <a href="http://www.openbsd.org/cgi-bin/man.cgi?query=ssh_config&sektion=5">man ssh_config</a>.</p>
<p>One trick I’ve recently figured out is using <code class="language-text">sed</code> with a <code class="language-text">ProxyCommand</code> — this lets me optionally use a bastion host by just appending <code class="language-text">.bast</code> to a hostname. Most examples of using <code class="language-text">ProxyCommand</code> apply it to all hosts, or a specific sub-domain, but this configuration allows you to <strong>late</strong> decide if you want to use the bastion or not.</p>
<p>Examples:</p>
<div class="gatsby-highlight" data-language="bash"><pre class="language-bash"><code class="language-bash"><span class="token comment"># uses bastion:</span>
<span class="token function">ssh</span> myserver.example.com.bast
<span class="token comment"># goes directly to myserver:</span>
<span class="token function">ssh</span> myserver.example.com</code></pre></div>
<p>Place the following in your <code class="language-text">.ssh/config</code>, with the appropriate changes for your environment:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Host bastion
Hostname bastion-server.example.com
ProxyCommand none
User paul.examplesurname
ControlMaster auto
ControlPath ~/.ssh/master-%r@%h:%p
Host *.bast
ProxyCommand ssh -aY bastion 'nc -w 900 `echo %h | sed s/\\.bast$//` %p'
ForwardAgent yes
TCPKeepAlive yes
ServerAliveInterval 300</code></pre></div>
<p>Any hostname that ends in <code class="language-text">.bast</code> will now use the bastion as its proxy, but on the bastion it will resolve the DNS without the <code class="language-text">.bast</code> in the hostname. Additionally because the bastion host has <a href="http://en.wikibooks.org/wiki/OpenSSH/Cookbook/Multiplexing">SSH Multiplexing configured</a>, after the first connection to the bastion, all others are very quick to become established.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Rackspace Interview Process]]></title>
<id>https://paul.querna.org/articles/2014/04/27/rackspce-interview-process/</id>
<link href="https://paul.querna.org/articles/2014/04/27/rackspce-interview-process/">
</link>
<updated>2014-04-27T19:30:16Z</updated>
<summary type="html"><![CDATA[I believe we have built a reasonable and consistent set of best practices for on site interviews at the Rackspace San Francisco office. We…]]></summary>
<content type="html"><![CDATA[<p>I believe we have built a reasonable and consistent set of best practices for on site interviews at the Rackspace San Francisco office. We learned the initial structure from the Rackspace office in Blacksburg. One of the aspects I like about our process is that it can work for both technical and non-technical roles. We believe it isn’t perfect and try to iterate on improvements. I would love to hear from other companies about their interview process and ideas for how to improve ours. </p>
<p>I am only describing the on site interview structure in this post, and not all aspects of the hiring process.</p>
<p><em>Note: We are always iterating on our interviews. We also do not have top down approach for the whole company. If you interview at other Rackspace locations or in the future, don’t fret if the process is different.</em></p>
<h1>Objectives</h1>
<p>In a few hours interviews are attempting to ascertain a candidate’s aptitude, fit, knowledge, potential and more. This is difficult. Human interactions are hard to measure, and the pressure of an interview does not lend itself to consistent results. In our interviews we try to achieve the following goals:</p>
<ul>
<li><strong>Consistency</strong>: We want to find potential Rackers on the first pass. A bad or inconsistent interview panel wastes both the time of a candidate and our teams.</li>
<li><strong>Targeted</strong>: We want to make sure we are accomplishing specific goals in each part of the interview. Interviews are always too short, but if all interviewers ask the same question, we will not have a well rounded understanding of the candidate.</li>
<li><strong>Culture, not just code</strong>: Rackspace is well known for it’s distinctive culture. We want to expose candidates to this early and often.</li>
<li><strong>Flexible for multiple roles</strong>: I believe interviewing a Product Manager under the same general structure as a Software Developer leads to teams being more effective in their execution of the interview.</li>
<li><strong>Highlight the importance of interviews</strong>: Through structure we force our teams to dedicate time and focus to interviews. Interviews take significant time from busy people, but we do not want to reduce the quality of the interviews or produce inconsistent results. </li>
<li><strong>Trained</strong>: We have an internal class called “H.I.R.E” which covers this hiring process, and educates potential interviewers.</li>
</ul>
<h1>The “Prehuddle”</h1>
<p>The day before an on-site interview we schedule a 15-30 minute meeting with all of our interviewers. The hiring manager drives the meeting and supplies the interviewers with an interview guide that changes for each position. <a href="/assets/posts/interview-process/interview-guide-example.pdf">Here is a recent example of an interview guide used by Mailgun (PDF)</a> The objective is to outline in writing the position we are hiring for, how they will fit into the team, and what each interview panel is trying to achieve. The Prehuddle gives time for interviewers to ask questions to the hiring manager so that they can come prepared to their interview panels. It also gives interviewers time to coordinate on who is covering each topic. We want to avoid situations where one interview panels assume another panel will ask certain questions.</p>
<h1>On Site Interview Day</h1>
<p>On the day of the interview we begin by giving the candidate a short tour of the office, ending in the conference room that will be used for the interview. On a white board in the conference room we have the schedule for the rest of the day written up. We try to keep the candidate in the same conference room for the whole day, to avoid losing 10 minutes of every panel to moving or finding the candidate.</p>
<p>An example schedule for a Software Development position:</p>
<ul>
<li>10:00 to 11:00: Background and context</li>
<li>11:00 to 12:00: Computer Science and Algorithms</li>
<li>12:00 to 1:00: Lunch</li>
<li>1:00 to 2:00: Distributed Systems</li>
<li>2:00 to 3:00: Linux Systems</li>
</ul>
<p>The interview consists of 4-5 panels of 2 interviewers. Each panel is generally 1 hour long. We consider it a best practice to also schedule an informal lunch with 2 more Rackers in-between the panels, but depending on the times of the panels this doesn’t always happen. After the panel interviews, we want the hiring manager have few final words with the candidate and to escort the candidate out of the office.</p>
<h2>Interview Panel Selection</h2>
<p>When selecting interviewers for a panel, we consider the following:</p>
<ul>
<li><strong>Cross-Team:</strong> We want to have at least 2 interviewers who are on different teams than the hiring team. This encourages cross pollination of best practices and gives teams more context on the quality of their candidate.</li>
<li><strong>Experience levels and time at company:</strong> We try to mix new hires in with long time Rackers to encourage learning by new hires.</li>
<li><strong>Always someone in the room</strong>: We want to avoid on site interviews over video conference units. Because we are a remote office, some teams and roles will interact cross-location, or interviewers are traveling and a VC interview is unavoidable. In these situations we pair the remote interviewer with a local interview partner.</li>
</ul>
<h2>Interview Questions</h2>
<ul>
<li>We encourage our interviewers to use what we call the “SARA” model when interviewing candidates. SARA stands for Situations, Actions, Results, and Application. SARA is similar to a <a href="http://en.wikipedia.org/wiki/Situation,_Task,_Action,_Result">STAR</a> or <a href="http://en.wikipedia.org/wiki/SOARA">SOARA</a> interview technique.</li>
<li>In the past we have tried to build a repository of questions or problems for interviewers to share with each other, but this has generally fallen out of our active practices. We have also tried code reviews, pair programming and other structures but have not been happy with the execution.</li>
<li>We have discouraged <a href="http://www.businessinsider.com/15-google-interview-questions-that-will-make-you-feel-stupid-2009-11?op=1">“Google Style” questions</a>, but don’t provide enough training or guidance on what the alternative is. This is an area I believe we must iterate and improve upon.</li>
</ul>
<h1>Feedback Session</h1>
<p>We try to schedule a feedback session immediately after the candidate has left. Waiting until the next day will dull memories. We assemble all of the interviewers, and the hiring manager drives the meeting. We have the interviewers recall their interview in reverse seniority, with the hiring manager going last. Each interviewer has the floor for 2-3 minutes. Clarifying questions can be asked by other interviewers. After all of the other interviewers, the hiring manger shares their thoughts and the hiring manager then asks for any final conversation. Once this is done, we conduct an anonymous vote:</p>
<ul>
<li>+2 Hard Yes</li>
<li>+1 Soft Yes</li>
<li>-2 Soft No</li>
<li>-4 Hard No</li>
</ul>
<p>If the total is positive the hiring manager has the <em>option</em> to continue with the candidate. The hiring manager may still do other things like referral checking. If the total is zero or negative we will not hire the candidate.</p>
<h1>Future Iteration Ideas</h1>
<p>Here are some ideas I have been thinking about for continuing to iterate on our interviews:</p>
<ul>
<li><strong>Collaborative Coding</strong>: We currently use a white board for most of the coding exercises. I am not satisfied with this because it penalizes different ways of thinking and communicating. I would like to experiment with using a tool like <a href="https://floobits.com/">Floobits</a>. Floobits is more interesting than straight pair programming, as it could enable candidates to use their native editor. </li>
<li><strong>Mixed length panels</strong>: I would like to experiment with making some panels 1 hour 30 minutes long, and others only 30 minutes. I believe this would allow stronger and deeper opinions about a candidate in these longer sessions, and the shorter panels would focus on smaller topics and meeting potential team members.</li>
</ul>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ffjson: faster JSON serialization for Golang]]></title>
<id>https://paul.querna.org/articles/2014/03/31/ffjson-faster-json-in-go/</id>
<link href="https://paul.querna.org/articles/2014/03/31/ffjson-faster-json-in-go/">
</link>
<updated>2014-03-31T14:50:14Z</updated>
<summary type="html"><![CDATA[ffjson is a project I have been hacking on for making JSON serialization faster in the Go programing language. works by generating static…]]></summary>
<content type="html"><![CDATA[<p><a href="https://github.com/pquerna/ffjson">ffjson</a> is a project I have been hacking on for making JSON serialization faster in the Go programing language. <code class="language-text">ffjson</code> works by generating static code for Go’s JSON serialization interfaces. Fast binary serialization frameworks like <a href="http://kentonv.github.io/capnproto/">Cap’n Proto</a> or <a href="https://code.google.com/p/gogoprotobuf/">Protobufs</a> also use this approach of generating code. Because <code class="language-text">ffjson</code> is serializing to JSON, it will never be as fast as some of these other tools, but it can beat the builtin <a href="http://golang.org/pkg/encoding/json/">encoding/json</a> easily.</p>
<h1>Benchmarks</h1>
<h2>goser</h2>
<p>The first example benchmark is <a href="https://github.com/pquerna/ffjson/blob/master/tests/goser/ff/goser.go#L18">Log structure</a> that <a href="http://www.cloudflare.com/">CloudFlare</a> uses. CloudFlare open sourced these benchmarks under the <a href="https://github.com/cloudflare/goser">cloudflare/goser</a> repository, which benchmarks several different serialization frameworks.</p>
<p><img src="/assets/posts/ffjson/goser.png"></p>
<p>Under this benchmark <code class="language-text">ffjson</code> is <strong>1.91x faster</strong> than <code class="language-text">encoding/json</code>.</p>
<h2>go.stripe</h2>
<p><a href="https://github.com/drone/go.stripe">go.stripe</a> contains a complicated <a href="https://github.com/pquerna/ffjson/blob/master/tests/go.stripe/ff/customer.go#L7">structure for its Customer object</a> which contains many sub-structures.</p>
<p><img src="/assets/posts/ffjson/gostripe.png"></p>
<p>For this benchmark <code class="language-text">ffjson</code> is <strong>2.11x faster</strong> than <code class="language-text">encoding/json</code>.</p>
<h1>Try it out</h1>
<p>If you have a Go source file named <code class="language-text">myfile.go</code>, and your <code class="language-text">$GOPATH</code> environment variable is set to a reasonable value, trying out <code class="language-text">ffjson</code> is easy:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">go get -u github.com/pquerna/ffjson
ffjson myfile.go</code></pre></div>
<p><code class="language-text">ffjson</code> will generate a <code class="language-text">myfile_ffjson.go</code> file which contains implementations of <code class="language-text">MarshalJSON</code> for any structures found in <code class="language-text">myfile.go</code>.</p>
<h1>Background: Serialization @ GoSF</h1>
<p>At the last <a href="http://www.meetup.com/golangsf/">GoSF</a> meetup, <a href="https://twitter.com/fullung">Albert Strasheim</a> from CloudFlare gave a presentation on <a href="http://www.slideshare.net/albertstrasheim/serialization-in-go">Serialization in Go</a>. The presentation was great — it showed how efficient binary serialization can be in Go. But what made me unhappy was how <strong>slow</strong> JSON was:</p>
<p><img src="/assets/posts/ffjson/goser-benchmarks.png"></p>
<h2>Why is JSON so slow in Go?</h2>
<p>All of the competing serialization tools generate static code to handle data. On the other hand, Go’s <code class="language-text">encoding/json</code> uses <a href="http://golang.org/pkg/reflect/">runtime reflection</a> to iterate members of a <code class="language-text">struct</code> and detect their types. The binary serializers generate static code for the exact type of each field, which is much faster. In CPU profiling of <code class="language-text">encoding/json</code> it is easy to see significant time is spent in reflection.</p>
<p>The reflection based approach taken by <code class="language-text">encoding/json</code> is great for fast development iteration. However, I often find myself building programs that serializing millions of objects with the same structure type. For these kinds of cases, taking a trade off for a more brittle code generation approach is worth the 2x or more speedup. The downside is when using a code generation based serializer, if your structure changes, you need to regenerate the code.</p>
<p>Last week we had a hack day at work, and I decided to take a stab at making my own code generator for JSON serialization. I am not the first person to look into this approach for Go. <a href="https://twitter.com/benbjohnson">Ben Johnson</a> created <a href="https://github.com/benbjohnson/megajson">megajson</a> several months ago, but it has limited type support and doesn’t implement the existing <code class="language-text">MarshalJSON</code> interface.</p>
<h1>Background: Leveraging existing interfaces</h1>
<p>Go has an interface defined by <code class="language-text">encoding/json</code>, which if a type implements, it will be used to serialize the type to JSON:</p>
<div class="gatsby-highlight" data-language="go"><pre class="language-go"><code class="language-go"><span class="token keyword">type</span> Marshaler <span class="token keyword">interface</span> <span class="token punctuation">{</span>
<span class="token function">MarshalJSON</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">byte</span><span class="token punctuation">,</span> <span class="token builtin">error</span><span class="token punctuation">)</span>
<span class="token punctuation">}</span>
<span class="token keyword">type</span> Unmarshaler <span class="token keyword">interface</span> <span class="token punctuation">{</span>
<span class="token function">UnmarshalJSON</span><span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">byte</span><span class="token punctuation">)</span> <span class="token builtin">error</span>
<span class="token punctuation">}</span></code></pre></div>
<p>As a goal for <code class="language-text">ffjson</code> I wanted users to get improved performance without having to change any other parts of their code. The easiest way to do this is by adding a <code class="language-text">MarshalJSON</code> method to a structure, and then <code class="language-text">encoding/json</code> would be able find it via reflection.</p>
<h2>Example</h2>
<p>The simplest example of implementing <code class="language-text">Marshaler</code> would be something like the following, given a type <code class="language-text">Foo</code> with a single member:</p>
<div class="gatsby-highlight" data-language="go"><pre class="language-go"><code class="language-go"><span class="token keyword">type</span> Foo <span class="token keyword">struct</span> <span class="token punctuation">{</span>
Bar <span class="token builtin">string</span>
<span class="token punctuation">}</span></code></pre></div>
<p>You could have a <code class="language-text">MarshalJSON</code> like the following:</p>
<div class="gatsby-highlight" data-language="go"><pre class="language-go"><code class="language-go"><span class="token keyword">func</span> <span class="token punctuation">(</span>f <span class="token operator">*</span>Foo<span class="token punctuation">)</span> <span class="token function">MarshalJSON</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token builtin">byte</span><span class="token punctuation">,</span> <span class="token builtin">error</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token keyword">return</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token function">byte</span><span class="token punctuation">(</span><span class="token string">`{"Bar":`</span> <span class="token operator">+</span> f<span class="token punctuation">.</span>Bar <span class="token operator">+</span> <span class="token string">`}`</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token boolean">nil</span>
<span class="token punctuation">}</span></code></pre></div>
<p>This example has many potential bugs, like <code class="language-text">.Bar</code> not being escaped properly, but it would automatically be used by <code class="language-text">encoding/json</code>, and avoids many reflection calls.</p>
<h1>First Attempt: Using go/ast</h1>
<p>During our hack day I started by using the <a href="http://golang.org/pkg/go/ast/">go/ast</a> module as a way to extract information about structures. This allowed rapid progress, and at my demo for the hack day I had a working prototype. This version was about 25% faster than <code class="language-text">encoding/json</code>. However, I quickly found that the AST interface was too limiting. For example, a type is just represented as a simple string in the AST module. Determining if that type implements a specific interface is not easily possible. Because types are just strings to the AST module, complex types like <code class="language-text">map[string]CustomerType</code> were up to me to parse by hand.</p>
<p>The day after the hack day I was frustrated with the situation. I started thinking about alternatives. Runtime reflection has many advantages. One of the most important is how easily you can tell what a type implements, and make code generation decisions based on it. In other languages you can do code generation at runtime, and then load that code into a virtual machine. Because Go is statically compiled, this isn’t possible. In C++ you could use templates for many of these types of problems too, but Go doesn’t have an equivalent. I needed a way to do runtime reflection, but at compile time.</p>
<p>Then I had an idea. Inception: <strong>Generate code to generate more code.</strong></p>
<h1>Inception</h1>
<p>I wanted to keep the simple user experience of just invoking <code class="language-text">ffjson</code>, and still generate static code, but somehow use reflection to generate that code. After much rambling in IRC, I conjoured up this workflow:</p>
<p><img src="/assets/posts/ffjson/flow.png"></p>
<ol>
<li>User executes <code class="language-text">ffjson</code></li>
<li><code class="language-text">ffjson</code> parses input file using <code class="language-text">go/ast</code>. This decodes the package name and structures in the file.</li>
<li><code class="language-text">ffjson</code> generates a temporary <code class="language-text">inception.go</code> file which imports the package and structures previously parsed.</li>
<li><code class="language-text">ffjson</code> executes <code class="language-text">go run</code> with the temporary <code class="language-text">inception.go</code> file.</li>
<li><code class="language-text">inception.go</code> uses runtime reflection to decode the users structures.</li>
<li><code class="language-text">inception.go</code> generates the final static code for the structures.</li>
</ol>
<h1>It worked!</h1>
<p>The inception approach worked well. The more powerful <code class="language-text">reflect</code> module allowed deep introspection of types, and it was much easier to add support for things like Maps, Arrays and Slices.</p>
<h1>Performance Improvements</h1>
<p>After figuring out the inception approach, I spent some time looking for quick performance gains with <a href="http://blog.golang.org/profiling-go-programs">the profiler</a>.</p>
<h2>Alternative Marshal interface</h2>
<p>I observed poor performance on JSON structs that contained other structures. I found this to be because of the interface of <code class="language-text">MarshalJSON</code> returns a <code class="language-text">[]byte</code>, which the caller would generally append to their own <code class="language-text">bytes.Buffer</code>. I created a new interface that allows structures to append to a <code class="language-text">bytes.Buffer</code>, avoiding many temporary allocations:</p>
<div class="gatsby-highlight" data-language="go"><pre class="language-go"><code class="language-go"><span class="token keyword">type</span> MarshalerBuf <span class="token keyword">interface</span> <span class="token punctuation">{</span>
<span class="token function">MarshalJSONBuf</span><span class="token punctuation">(</span>buf <span class="token operator">*</span>bytes<span class="token punctuation">.</span>Buffer<span class="token punctuation">)</span> <span class="token builtin">error</span>
<span class="token punctuation">}</span></code></pre></div>
<p>This landed in <a href="https://github.com/pquerna/ffjson/pull/3">PR#3</a>, and increased performance by 18% for the <code class="language-text">goser</code> structure. <code class="language-text">ffjson</code> will use this interface on structures if it is available, if not it can still fall back to the standard interface.</p>
<h2>FormatBits that works with bytes.Buffer</h2>
<p>When converting a integer into a string, the <code class="language-text">strconv</code> module has functions like <a href="http://golang.org/pkg/strconv/#AppendInt">AppendInt</a>. These functions require a temporary <code class="language-text">[]byte</code> or a string allocation. By creating a <code class="language-text">FormatBits</code> function that can convert integers and append them directly into a <code class="language-text">*bytes.Buffer</code>, these allocations can be reduced or removed.</p>
<p>This landed in <a href="https://github.com/pquerna/ffjson/pull/5">PR#5</a>, and gave a 21% performance improvement for the <code class="language-text">goser</code> structure.</p>
<h1>What’s next for ffjson</h1>
<p>I welcome feedback from the community about what they would like to see in <code class="language-text">ffjson</code>. What exists now is usable, but I know there are few more key items to make <code class="language-text">ffjson</code> great:</p>
<ol>
<li><strong>Embedded Anonymous Structures:</strong> <code class="language-text">ffjson</code> doesn’t currently handle embedded structures perfectly. I have a plan to fix it, I just need some time to implement it.</li>
<li><strong>More real world structures for benchmarks</strong>: If you have an app doing high volume JSON, I would love to include an example in <code class="language-text">ffjson</code> to highlight any performance problems with real world structures as much as possible. </li>
<li><strong>Unmarshal Support</strong>: This will require writing a custom scanner/lexer, which is a larger project.</li>
</ol>
<p>If you have any other ideas, I am happy to discuss them on the <a href="https://github.com/pquerna/ffjson">Github project page for ffjson</a></p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Security of Infrastructure Secrets]]></title>
<id>https://paul.querna.org/articles/2013/11/09/security-of-infrastructure-secrets/</id>
<link href="https://paul.querna.org/articles/2013/11/09/security-of-infrastructure-secrets/">
</link>
<updated>2013-11-09T23:23:23Z</updated>
<summary type="html"><![CDATA[Credit Cards and Personally identifiable information (PII) have compliance standards for their storage and transportation. PCI-DSS being…]]></summary>
<content type="html"><![CDATA[<p>Credit Cards and <a href="http://en.wikipedia.org/wiki/Personally_identifiable_information">Personally identifiable information (PII)</a> have compliance standards for their storage and transportation. <a href="http://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard">PCI-DSS</a> being one of the most commonly referenced compliance standards. There is an entire industry of payment processors like <a href="https://stripe.com/">Stripe</a> or <a href="https://www.braintreepayments.com/">Braintree</a> who use making PCI-DSS compliance easier as a major selling point. These standards aren’t perfect, but they present a reasonable bar that companies can measure themselves by.</p>
<p>I see a class of data are not well covered by existing standards. I call them <em>“Infrastructure Secrets”</em>. Infrastructure Secrets are credentials or secrets that are commonly used to build or deploy applications and that they are often shared with third party services. Examples include:</p>
<ul>
<li>Github API keys</li>
<li>AWS IAM identities</li>
<li>Heroku API Tokens</li>
<li>Rackspace Cloud API Keys</li>
<li>SSH Private Keys</li>
</ul>
<p>Many times systems using these secrets are running without direct human supervision, and the credentials they use are not locked down to single use cases. Many of these infrastructure providers do not have RBAC or limited use tokens. Even if the provider has an excellent system for limiting the scope of a credential, properly locked down tokens are still rarely used.</p>
<p>More than 4 years ago at <a href="http://en.wikipedia.org/wiki/Cloudkick">Cloudkick</a> we were dealing with these Infrastructure Secrets, including AWS API Keys and SSH keys to customer servers. I believe we took a paranoid approach to these securing these secrets. Cloudkick used all of the techniques that I outline bellow, and more.</p>
<h1>Recent Events</h1>
<p><a href="http://www.mongohq.com/home">MongoHQ</a> is a startup providing MongoDB as a service. <a href="http://security.mongohq.com/notice">Last week they were hacked</a>, based on a compromised password. CircleCI is another startup providing continuous integration and deployment. They happened to be using MongoHQ as their database provider. <a href="http://blog.circleci.com/mongohq-security-incident-response/">CircleCI stored many Infrastructure Secrets given to them by their customers</a>. The diagram bellow outlines the steps in the attack:</p>
<p><img src="/assets/posts/infrastructure-secrets/mongohq-hack.png"></p>
<p>This attack is a great example of privilege escalation across multiple providers and systems. Starting from just a compromised email password, the attacker escalated to SSH Keys, EC2 IAM credentials and more.</p>
<h1>Threats</h1>
<p>An <a href="http://en.wikipedia.org/wiki/Threat_model">in-depth threat model</a> is application specific, but I want to outline some common threats that I see across many companies interacting with Infrastructure Secrets.</p>
<h3>Bad Passwords</h3>
<p>Individuals pick good passwords, but <em>people</em> pick bad ones. This is not going to change.</p>
<h3>Support Tools</h3>
<p>Support tools are critical components of a SaaS business, and most are built to allow an employee to do anything a customer can do — sometimes even with a direct impersonation feature so that the support team can see exactly what a customer would see. They tend to have poor security precautions or logging, and at the same time time access to every customer’s information.</p>
<h3>Databases</h3>
<p>Databases are backed up, they are left on employee hard disks, and in the case of CircleCI, even hosted with 3rd-parties. Newer NoSQL data stores tend to have simple access controls, and often companies use a single set of credentials for all access.</p>
<h3>Server Compromise</h3>
<p>Applications are being developed as quickly as possible using a relatively small set of server technologies. For example in January, there was an exploit against <a href="http://blog.codeclimate.com/blog/2013/01/10/rails-remote-code-execution-vulnerability-explained/">Ruby on Rails that allowed remote code execution</a>. If an attacker were to uncover similar exploits in the future, attacking someone who stores infrastructure secrets could be much more lucrative that other targets.</p>
<h1>Defensive Layers for Infrastructure Secrets</h1>
<p><a href="http://en.wikipedia.org/wiki/Defense_in_depth">Defense in depth</a> is a common approach to information security. I view security as a series of white picket fences. Jumping over a single fence might be easy, but jumping over 40 isn’t. If a system is partially compromised, having side effects that alert you to abnormal behaviors is also important.</p>
<h3>Encryption using Keyczar</h3>
<p>Don’t type the words <code class="language-text">AES</code>, use Keyczar. <a href="http://www.keyczar.org/">Keyczar</a> is a series of libraries built to have a simple API and sane defaults. The <em>best</em> cryptography algorithm will always change, but Keyczar provides file formats and mechanisms for changing these defaults over the life time of an application. To support data of various lengths the <a href="https://code.google.com/p/keyczar/wiki/OperationSessions">Keyczar Sessions API should generally be used</a>. At Cloudkick we used Django and developed a <code class="language-text">KeyczarField</code> that overrode <code class="language-text">models.TextField</code> serialization to the database. We could then set the fields as if it was a normal ORM field in Django. If a backend service wanted to use the secrets, they had to explicitly decrypt them.</p>
<h3>Isolated Services with specific APIs</h3>
<p>If you use the <a href="https://code.google.com/p/keyczar/wiki/OperationSessions">Keyczar Sessions API</a> you can put the public keys on all servers, but only put the private keys on specific backend servers. This can let a user update a credential from your web server, but only a specific backend service can view the decrypted value.</p>
<p>These backend servers should be on isolated networks and only provide exact operations over their communication channel.</p>
<p>For example, if you are storing SSH Keys to deploy code:</p>
<ul>
<li>DO NOT: Have a service with an exported API like <code class="language-text">run_command(host, command)</code>.</li>
<li>DO: Have a service with a specific operation like: <code class="language-text">deploy_project_to_host(host, project)</code></li>
</ul>
<p>In the event of a web server being compromised the attacker can only force more deploys to happen — this still might be bad, but it is one more white picket fence for them to jump over.</p>
<h3>Notifications on special events</h3>
<p>For Cloudkick, in addition to our normal logging when an employee activated impersonation, we would send an additional email to our root@ alias. The alias got a few emails every day, but an attacker abusing it would of been quickly noticed.</p>
<h3>Multi-Factor Authentication</h3>
<p>Multi-Factor authentication is a critical and cheap method to protect against the most common threat: bad passwords.
For Cloudkick we used <a href="http://www.yubico.com/products/yubikey-hardware/">YubiKeys</a> extensively. You don’t need an expensive <a href="http://en.wikipedia.org/wiki/SecurID">RSA SecureID</a> system, today there are many more options <a href="https://code.google.com/p/google-authenticator/">including standards compliant HOTP/TOTP</a> or startups like <a href="https://www.duosecurity.com/">Duo Security</a>.</p>
<h1>Be Paranoid</h1>
<p>The ideas I outlined above are a random collection. Everything from Firewalls, Log collection, patching, secure software designs and more are needed. You can’t protect against all possible threats, like the <a href="http://www.washingtonpost.com/blogs/the-switch/wp/2013/11/04/how-we-know-the-nsa-had-access-to-internal-google-and-yahoo-cloud-data/">NSA tapping Google shows</a>, but a reasonable set of white picket fences can stop many threats.</p>
<p><a href="https://www.owasp.org">OSWAP</a> have created several guides and recommendations for common pitfalls of web applications. But I have not seen any content covering these types of Infrastructure Secrets. I believe a standard for storing, transporting and using Infrastructure Secrets is needed, and would love to see one evolve.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Fixing the Docker Ubuntu image on Rackspace Cloud]]></title>
<id>https://paul.querna.org/articles/2013/10/15/docker-ubuntu-on-rackspace/</id>
<link href="https://paul.querna.org/articles/2013/10/15/docker-ubuntu-on-rackspace/">
</link>
<updated>2013-10-15T15:15:15Z</updated>
<summary type="html"><![CDATA[Docker is an awesome tool to build, manage and operate Linux containers. Docker has been gaining momentum for various use cases, and if you…]]></summary>
<content type="html"><![CDATA[<p><a href="http://docker.io/">Docker</a> is an awesome tool to build, manage and operate Linux containers. Docker has been gaining momentum for various use cases, and if you want to learn more I recommend <a href="https://www.docker.io/learn_more/">checking out the Docker.io website</a>.</p>
<p>I have been using Docker on and off for months, but recently started to deploy services using it onto the <a href="http://www.rackspace.com/cloud/">Rackspace Public Cloud</a>. Unfortunately, not everything went smoothly, and this is the story of getting it all working.</p>
<h1>tl;dr: Updated Ubuntu Images.</h1>
<p>I have published updated Ubuntu 12.04 LTS images for Docker under the <a href="https://index.docker.io/u/racker/precise-with-updates/">racker/precise-with-updates</a> repository. Get the images by running <code class="language-text">docker pull racker/precise-with-updates</code>. The <code class="language-text">latest</code> tag is automatically updated every day with the latest Ubuntu updates and security patches. The <code class="language-text">Dockerfile</code> for building this is on <a href="https://github.com/racker/docker-ubuntu-with-updates">racker: docker-ubuntu-with-updates</a> if you want to tweak it for your own uses.</p>
<h1>Illegal instruction!?</h1>
<p>Following the basic Docker instructions to get it <a href="http://docs.docker.io/en/latest/installation/ubuntulinux/#ubuntu-precise">running on Ubuntu 12.04</a>, I installed a new kernel, rebooted, and <code class="language-text">dockerd</code> was running successfully.</p>
<p>I started building a new <code class="language-text">Dockerfile</code> from the <code class="language-text">ubuntu:12.04</code> base image. Everything was going OK, but then <code class="language-text">apt-get upgrade</code> crashed with:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">Illegal instruction</code></pre></div>
<p>After some prodding, I find <a href="https://github.com/dotcloud/docker/issues/1984">Docker issue #1984</a> — at least I’m not alone in my sorrow.</p>
<p>Digging through the links, you come to <a href="https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/956051">LP: eglibc/+bug/956051</a>, a wonderful bug about glibc. glibc made incorrect the assumptions that if the <a href="http://en.wikipedia.org/wiki/FMA_instruction_set">FMA4 instruction set</a> is available, that the <a href="http://en.wikipedia.org/wiki/Advanced_Vector_Extensions">Advanced Vector Extensions (AVX)</a> would also be available.
In <a href="https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/979003">LP: eglibc/+bug/979003</a> a patch for this bug is pushed to all recent Ubuntu versions, so why did it crash with Docker?</p>
<p>This affects Docker on Rackspace for two reasons:</p>
<ol>
<li>Docker’s base Ubuntu 12.04 image has not applied any patches.</li>
<li>Many of Rackspace’s Public Cloud servers run an <code class="language-text">AMD Opteron(tm) 4332</code> processor with Xen which has <code class="language-text">FMA4</code> instructions but the <code class="language-text">AVX</code> instructions do not work.</li>
</ol>
<p>Since the bug has already been fixed upstream, creating an updated Ubuntu image to use as a base seems like the easiest way to fix this. Little did I know that creating an updated image was another rabbit hole.</p>
<h1>Building an updated Ubuntu image.</h1>
<p>I naively started by trying to just run <code class="language-text">apt-get dist-upgrade</code> from the <code class="language-text">ubuntu:12.04</code> base image. It didn’t work at all. Packages tried to mess with <code class="language-text">upstart</code>, and something went horribly wrong with <code class="language-text">/dev/shm</code>.</p>
<p>After taking a break to <a href="https://twitter.com/pquerna/status/385797196026634240">propose to my fiancé and vacation in Hawaii</a>, I built up the following <a href="https://github.com/racker/docker-ubuntu-with-updates/blob/master/precise/Dockerfile">Dockerfile</a>:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">FROM ubuntu:12.04</code></pre></div>
<p>I started with the <code class="language-text">ubuntu:12.04</code> base image; It is possible to start from scratch using <code class="language-text">debootstrap</code>, but it takes much longer to build and doesn’t provide any advantages.</p>
<p>To allow automated installation of new packages, we set the <a href="http://manpages.ubuntu.com/manpages/lucid/man7/debconf.7.html">debconf(7)</a> frontend to <code class="language-text">noninteractive</code>, which never prompts the user for choices on installation/configuration of packages:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">ENV DEBIAN_FRONTEND noninteractive</code></pre></div>
<p>One of the updated packages is <a href="http://packages.ubuntu.com/precise-updates/initramfs-tools">initramfs-tools</a>. In the post-install hooks, it tries to update the <a href="https://wiki.ubuntu.com/Initramfs">initramfs</a> for the machine and even try to run <code class="language-text">grub</code> or <code class="language-text">lilo</code>. Since we are inside docker we don’t want to do this. <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=594189">Debian bug #594189</a> contains many more details about these issues, but by setting the <code class="language-text">INITRD</code> environment variable we can skip these steps:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">ENV INITRD No</code></pre></div>
<p><code class="language-text">ischroot</code> is a command used by many post install scripts in Debian to determine how to treat the machine. Unfortunately as discussed in in <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=685034">Debian bug #685034</a> it tends to not work correctly. Inside a Docker container we almost always want <code class="language-text">ischroot</code> to return true. Because of this, I made a <a href="https://github.com/racker/docker-ubuntu-with-updates/blob/master/precise/src/ischroot">new ischroot</a> which if the <code class="language-text">FAKE_CHROOT</code> environment variable is set, always exits 0:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">ENV FAKE_CHROOT 1
RUN mv /usr/bin/ischroot /usr/bin/ischroot.original
ADD src/ischroot /usr/bin/ischroot</code></pre></div>
<p>Using this replacement <code class="language-text">ischroot</code>, it allows updates to the <code class="language-text">initscripts</code> package to successfully install without breaking <code class="language-text">/dev/shm</code>, which works around <a href="https://bugs.launchpad.net/launchpad/+bug/974584">LP Bug #974584</a>.</p>
<p><a href="http://people.debian.org/~hmh/invokerc.d-policyrc.d-specification.txt">policy-rc.d</a> provides a method of controlling of the init scripts of all packages. By exiting with a <code class="language-text">101</code>, the init scripts for a package will not be ran, since <code class="language-text">101</code> stands for <code class="language-text">action forbidden by policy</code>:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">ADD src/policy-rc.d /usr/sbin/policy-rc.d</code></pre></div>
<p>Before installing any packages, I added a new <a href="https://github.com/racker/docker-ubuntu-with-updates/blob/master/precise/src/sources.list">sources.list</a> that has <code class="language-text">-updates</code> and <code class="language-text">-security</code> repositories enabled and to use <a href="http://mirror.rackspace.com/">mirror.rackspace.com</a>. Rackspace maintains <code class="language-text">mirror.rackspace.com</code> as a GeoDNS address to pull from the nearest Rackspace data center, and it is generally much faster than <code class="language-text">archive.ubuntu.com</code>.</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">ADD src/sources.list /etc/apt/sources.list</code></pre></div>
<p>Normally <code class="language-text">dpkg</code> will call fsync after every package is installed, but when building an image we don’t need to worry about individual fsyncs, so we can use the <code class="language-text">force-unsafe-io</code> option:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">RUN echo 'force-unsafe-io' | tee /etc/dpkg/dpkg.cfg.d/02apt-speedup</code></pre></div>
<p>Since we want make the image as small as possible, we add a <code class="language-text">Post-Invoke</code> hook to dpkg which deletes cached <code class="language-text">deb</code> files after installation:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">RUN echo 'DPkg::Post-Invoke {"/bin/rm -f /var/cache/apt/archives/*.deb || true";};' | tee /etc/apt/apt.conf.d/no-cache</code></pre></div>
<p>Finally we run <code class="language-text">dist-upgrade</code>:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">RUN apt-get update -y && apt-get dist-upgrade -y</code></pre></div>
<p>After all the upgrades have been applied, we want to clean out a few more cached files:</p>
<div class="gatsby-highlight" data-language="sh"><pre class="language-sh"><code class="language-sh">RUN apt-get clean
RUN rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/mirror.rackspace.com*</code></pre></div>
<p>These final steps bring the image down to under 90 megabytes, which is smaller than the default <code class="language-text">ubuntu:12.04</code> image.</p>
<p>As one last trick, I flatten the image into a single layer by <a href="https://github.com/racker/docker-ubuntu-with-updates/blob/master/bin/builder.py#L51-L68">export and importing the image</a>.</p>
<h1>Automatically updated every day.</h1>
<p>Ubuntu’s gets many updates and security patches on a regular basis. Rather than building a one off image, I hooked up Docker to a builder in Jenkins. The <code class="language-text">latest</code> tag is rebuilt every day and pushed to the public registry under <a href="https://index.docker.io/u/racker/precise-with-updates/">racker/precise-with-updates</a>.</p>
<p>This means you can use <code class="language-text">racker/precise-with-updates:latest</code> in a <code class="language-text">Dockerfile</code>, or in your interactive terminals:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">docker run -i -t racker/precise-with-updates /bin/bash</code></pre></div>
<p>Using <code class="language-text">docker pull</code> you can bring the images down locally for other operations:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">docker pull racker/precise-with-updates
docker images racker/precise-with-updates</code></pre></div>
<p>I have also published the <a href="https://github.com/racker/docker-ubuntu-with-updates">source for the image on Github</a>, and would welcome any PRs or feedback.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Adoption of TLS Extensions]]></title>
<id>https://paul.querna.org/articles/2012/09/06/adoption-of-tls-extensions/</id>
<link href="https://paul.querna.org/articles/2012/09/06/adoption-of-tls-extensions/">
</link>
<updated>2012-09-07T17:17:17Z</updated>
<summary type="html"><![CDATA[TLS extensions expand the SSL/TLS protocols. The extensions have many uses, like adding more features, supporting more scalable patterns or…]]></summary>
<content type="html"><![CDATA[<p>TLS extensions expand the SSL/TLS protocols. The extensions have many uses, like adding more features, supporting more scalable patterns or making the protocol more secure. However, adoption has been disappointingly slow until the last few years as the reignited browser wars have kicked client vendors into action. I was unable to find recent statistics about the adoption of TLS extensions, so I went about figuring it out.</p>
<p>There are about <a href="http://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xml">15-20 TLS extensions</a> in specifications. Many however are rarely used, some of the most common and important extensions are:</p>
<ul>
<li><strong>Server Name Indication (SNI)</strong>: Standardized in 2003, this extension enables browsers to send the hostname they intend to connect to, which solves the Virtual Host Problem in SSL. Without this extension every SSL certificate needs its own IP address to work. In <a href="http://journal.paul.querna.org/articles/2005/04/24/tls-server-name-indication/">2005 I implemented SNI support in mod_gnutls</a>, and in <a href="https://issues.apache.org/bugzilla/show_bug.cgi?id=34607">2007 mod_ssl added support for SNI</a>. Browser support lagged behind the servers, but is now thought to be widespread, excluding <a href="http://blog.jgc.org/2012/04/microsoft-is-holding-back-secure-web.html">clients running on Windows XP</a>. More: <a href="http://en.wikipedia.org/wiki/Server_Name_Indication">Wikipedia: Server Name Indication</a>, <a href="http://tools.ietf.org/html/rfc6066">RFC 6066</a></li>
<li><strong>Session Tickets</strong>: The base SSL/TLS protocol includes Session Caching to reduce the number of expensive cryptographic operations a server needs to do if a client has previously visited it. This session caching relies upon the client sending a session ID, and the server storing data about that session. This model however is difficult to implement in a large scale server environment with many endpoints terminating SSL, as they would all need a shared caching infrastructure. Session tickets solve this by having the server give the client an encrypted ‘ticket’, which contains all of the information needed to resume the session, without the server needing an additional shared cache. Client support is widespread in Chrome and Firefox, but realistic server deployments still appear to be rare. In late 2011 <a href="http://svn.apache.org/viewvc?view=revision&revision=1200040">I patched mod_ssl trunk to support configuration of session tickets</a>, but the feature hasn’t been back ported to a release branch. More: <a href="http://vincent.bernat.im/en/blog/2011-ssl-session-reuse-rfc5077.html">Vincent Bernat: Speeding up SSL: enabling session reuse</a>, <a href="http://www.ietf.org/rfc/rfc5077">RFC 5077</a></li>
<li><strong>Next Protocol Negotiation (NPN)</strong>: Used by the SPDY protocol to reduce round trips, this lets both the client and server agree on the protocol to run inside the encrypted connection. Without this extension, SPDY would require the use of additional round trips to upgrade from HTTP to SPDY. The extension is not yet an RFC, but has seen widespread adoption as Chrome and Firefox have both implemented support. Apache HTTP server <a href="https://issues.apache.org/bugzilla/show_bug.cgi?id=52210">added native support just 4 months ago</a>. More: <a href="http://tools.ietf.org/html/draft-agl-tls-nextprotoneg">IETF Draft: Next Protocol Negotiation Extension </a></li>
<li><strong>Renegotiation Indication</strong>: A <a href="http://www.educatedguesswork.org/2009/11/understanding_the_tls_renegoti.html">TLS protocol bug was discovered in 2009</a> that allowed attackers to inject data into the stream read by the client. This extension was crafted to prevent this attack. A common mitigation was to disable all renegotiation once a connection was established, so the lack of this extension doesn’t necessarily indicate that a client is vulnerable, but it is a good indication of the age of the SSL/TLS stack being used by the Client. More: <a href="http://tools.ietf.org/html/rfc5746">RFC 5746</a></li>
</ul>
<h1>User Agents Drive TLS Adoption</h1>
<p>The adoption of TLS features and extensions is directly tied to the User Agent. While consumer websites and web browsers are important, I believe there has not been significant enough attention focused to Web Service API User Agents. Consumer browsers are now on much faster upgrade cycles, but many servers are not going to follow the same upgrade curves.</p>
<p>For this reason, I’ve collected samples from 3 different data sources:</p>
<ul>
<li><strong><a href="https://monitoring.api.rackspacecloud.com/">monitoring.api.rackspacecloud.com</a></strong>: The API endpoint for the <a href="http://www.rackspace.com/cloud/public/monitoring/">Rackspace Cloud Monitoring</a> product. Most traffic is from Python, Java, and Ruby based API clients.</li>
<li><strong><a href="https://svn.apache.org/">svn.apache.org</a></strong>: Primary version control site for the ASF. The majority of the clients are using Subversion Clients, but there is a smaller mix of browsers and other agents.</li>
<li><strong><a href="https://issues.apache.org">issues.apache.org</a></strong>: The most browser focused site that I could easily sample. It hosts the ASF JIRA and Bugzilla which are primarily used by consumer browsers.</li>
</ul>
<p>If someone out there could sample a popular consumer site (Google? Facebook? Yahoo?) and post the results I would be very interested in seeing them.</p>
<h1>Collecting Samples</h1>
<p>It seemed too difficult to modify the existing server software to log all of the information that I wanted, and because in some cases the TLS termination is done in devices like a load balancer, I decided to build a tool to decode the information from a <a href="http://en.wikipedia.org/wiki/Pcap">packet capture</a>. All of the extensions I am interested in are sent by the Client in its ClientHello message. This means I didn’t need to do any cryptographic operations to decode it, just parse the TLS packet.</p>
<p>I started by using the <a href="http://code.google.com/p/dpkt/">excellent dpkt library</a> to dissect my packet captures, but quickly figured out it didn’t actually parse any of the TLS extensions. A little <a href="https://github.com/pquerna/tls-client-hello-stats/commit/60306d27d1d71485fa145587aad5a86f6d4fe9bd#third_party/dpkt/dpkt/ssl.py">patching later and I had it parsing TLS extensions</a>.</p>
<p>The <a href="https://github.com/pquerna/tls-client-hello-stats/blob/master/parser.py">script I wrote</a> handles the common issues I’ve seen, but still could be improved to do TCP stream re-assembly, but in practice with all the captures I made, the TLS Client Hello messages were in a single TCP packet.</p>
<p>If you want to try collecting and analyzing your own samples:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">git clone git://github.com/pquerna/tls-client-hello-stats.git
cd tls-client-hello-stats
# Let tcpdump run for awhile, press ctrl+c to stop capturing.
sudo tcpdump -i eth0 -s 0 -w port443.cap port 443
python parser.py port443.cap</code></pre></div>
<h1>Observations</h1>
<h2>SSL/TLS Versions</h2>
<p><img src="/wp-content/uploads/2012/09/tls-versions.png"></p>
<p>TLS 1.0 is the version advertised by most clients and servers. The version spread for <code class="language-text">issues</code> and <code class="language-text">monitoring</code> are about what I would expect, but I was surprised to see that <code class="language-text">svn.apache.org</code> was still seeing over 23% of its clients reporting SSLv3 as their highest supported version.</p>
<h2>Deflate Support</h2>
<p>While not an <em>extension</em>, <code class="language-text">deflate</code> compression has to be advertised by both sides in order to support it. If used, it also <a href="http://journal.paul.querna.org/articles/2011/04/05/openssl-memory-use/">imposes increased memory usage requirements</a> on both the client and server, so I was interested in seeing if clients are advertising support for it.</p>
<p><img src="/wp-content/uploads/2012/09/deflate-support.png"></p>
<p>OpenSSL enables the <code class="language-text">deflate</code> compression by default, and until recent versions it was difficult to disable. I suspect that most of the <code class="language-text">monitoring</code> traffic is using a default OpenSSL client library, and the more sophisticated browser user agents are explicitly disabling it. Since HTTP and SPDY both support compression inside their protocols, enabling deflate at the TLS layer would commonly lead to content being double compressed.</p>
<h2>Number of Total Extensions Sent</h2>
<p><img src="/wp-content/uploads/2012/09/extensions-sent.png"></p>
<p>It is interesting that most of the API centric clients send so few extensions. This seems to indicate potentially both the age of the TLS software stack being used, and the complexity of how it is configured by the developer.</p>
<h2>SNI Support</h2>
<p><img src="/wp-content/uploads/2012/09/sni-support.png"></p>
<p>I was disappointed to find a massive gap between consumer browsers and API consumers for SNI. This can be traced to common libraries not setting the SNI extension until recently. For example, only <a href="http://bugs.python.org/issue5639">Python 3.2 or newer sends the SNI extension</a>, and because <em>”<a href="http://bugs.python.org/issue5639#msg141913">Python 2 only receives bug fixes</a>”</em>, it will never be back ported for the most commonly deployed versions of the Python language.</p>
<h2>Session Tickets</h2>
<p><img src="/wp-content/uploads/2012/09/session-tickets-support.png"></p>
<p>Session Tickets seem to have a more reasonable usage by non-browser user agents, but the consumer browsers are again leading adoption.</p>
<h2>NPN Support</h2>
<p><img src="/wp-content/uploads/2012/09/npn-support.png"></p>
<p>NPN support has been driven by the adoption of SPDY in Chrome and Firefox, so it isn’t surprising that for <code class="language-text">monitoring</code> we see almost no support from clients.</p>
<h2>Renegotiation Indication Support</h2>
<p><img src="/wp-content/uploads/2012/09/renegotiation-support.png"></p>
<p>While the Renegotiation Indication extension is sent by a significant number of clients on <code class="language-text">issues</code>, its use is extremely low both <code class="language-text">svn</code> and <code class="language-text">monitoring</code>. This again shows how Browsers are leading the charge in upgrading, but also since the Renegotiation attacks <a href="http://en.wikipedia.org/wiki/Man-in-the-middle_attack">require a man-in-the-middle</a>, it would generally be a lower priority for server-to-server software.</p>
<h2>Raw Data</h2>
<p>I’ve <a href="https://gist.github.com/2ff9bdc9bf83057d4d7b">posted a gist with the raw data</a> for my three samples, if you wanted to look at the information for a more rarely seen extensions.</p>
<h1>Conclusion</h1>
<p>I think the data I’ve seen so far says a few things:</p>
<ul>
<li>Server Name Indication: I was hopeful that SNI could soon be used in prime time, but I believe this is still years away with the usage numbers I’ve seen.</li>
<li>Session Tickets: I believe it is reasonable to put effort into using Session Tickets if you have more than one server doing SSL/TLS termination. Reducing the need to use distributed Session Caching eases implementation, and should result in a generally faster user experience.</li>
</ul>
<p>I think it is great that Browser vendors like Chrome and Firefox are driving the use of newer features and extensions in TLS. It is obvious however that because API clients are commonly built by a more diverse set of developers, and those developers are less specialized in SSL/TLS security issues, that their adoption of the newest extensions is lagging. I hope this could change quickly if HTTP/2.0 and SPDY start driving the need to use NPN, and I hope that this would get developers to upgrade their SSL/TLS stacks.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Upgrades: SPDY, IPv6, FreeBSD, Jekyll]]></title>
<id>https://paul.querna.org/articles/2012/09/05/upgraded-spdy-ipv6-freebsd9-open-cloud-server/</id>
<link href="https://paul.querna.org/articles/2012/09/05/upgraded-spdy-ipv6-freebsd9-open-cloud-server/">
</link>
<updated>2012-09-05T17:17:17Z</updated>
<summary type="html"><![CDATA[Upgrades: FreeBSD on Cloud Servers: This site is now being served from a Rackspace Open Cloud Server running FreeBSD 9. This includes using…]]></summary>
<content type="html"><![CDATA[<p>Upgrades:</p>
<ul>
<li>FreeBSD on Cloud Servers: This site is now being served from a <a href="http://www.rackspace.com/cloud/public/servers/">Rackspace Open Cloud Server</a> running <a href="http://www.rackspace.com/blog/rackspace-cloud-servers-to-support-centos-6-3-freebsd-9/">FreeBSD 9</a>. This includes using a PF firewall, ZFS root, etc.</li>
<li>HTTPS: The <a href="https://journal.paul.querna.org/">site now supports HTTPS</a>. Sensitive business here on the blog ya know.</li>
<li>SPDY: Powered by <a href="https://github.com/indutny/node-spdy">node-spdy</a>, the site is now available over HTTPS with the SPDY protocol.</li>
<li>IPv6: All new Rackspace Cloud servers include IPv6, so I’ve went ahead an added an <code class="language-text">AAAA</code> record.</li>
<li>100% Static: I migrated a few months ago to a <a href="https://github.com/mojombo/jekyll">Jekyll</a> based blogging system.</li>
<li>Monitoring: I’m checking if the site is up using <a href="http://www.rackspace.com/cloud/public/monitoring/">Rackspace Cloud Monitoring</a>, both over IPv4 and IPv6.</li>
</ul>
<p>Details:</p>
<ul>
<li><a href="https://gist.github.com/e5775fd52f1feed60593">/etc/pf.conf</a>: Allow inbound ports 22, 80 and 443, allow all outgoing.</li>
<li><a href="https://gist.github.com/9aad1ff4de19aecae7af">/etc/sysctl.conf</a>: Sets <code class="language-text">net.inet.ip.portrange.reservedhigh</code> to <code class="language-text">0</code>, letting non-root users bind to ports bellow 1024. This lets me run my Node.js server without <code class="language-text">root</code>, and without needing to figure out dropping privileges later, mostly because I’m being lazy and its my blog.</li>
<li><a href="https://github.com/pquerna/journal.paul.querna.org/blob/master/server.js">Node.js Server</a>: Binds to both IPv4 and IPv6, HTTPS/SPDY and HTTP, and a few simple redirects. I’m logging to stdout, and using <a href="http://smarden.org/runit/">runit</a> to keep it up.</li>
<li><a href="http://www.freshports.org/www/node">Node.js from Ports</a>: At first I was going to compile Node.js from scratch, but then I noticed that the FreeBSD ports collection provides it, and was pleasantly surprised to see it is well maintained — so I went with using it.</li>
<li><a href="https://gist.github.com/bad5d9e1ba89141cb285">ZFS Root</a>: I haven’t setup anything cool with ZFS yet, but I’m thinking about how to do a <a href="http://developers.sun.com/solaris/articles/storage_utils.html">ZFS Send to Cloud Files</a>.</li>
</ul>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Retaliatory Only Patents]]></title>
<id>https://paul.querna.org/articles/2012/03/13/retaliatory-only-patents/</id>
<link href="https://paul.querna.org/articles/2012/03/13/retaliatory-only-patents/">
</link>
<updated>2012-03-13T00:30:30Z</updated>
<summary type="html"><![CDATA[Today Yahoo launched a patent lawsuit against Facebook. Yahoo has always said they collect patents for defensive purposes only. Then Yahoo…]]></summary>
<content type="html"><![CDATA[<p>Today <a href="http://allthingsd.com/20120312/breaking-yahoo-sues-facebook-for-patent-infringement/">Yahoo launched a patent lawsuit against Facebook</a>.</p>
<p>Yahoo has always said they collect patents for defensive purposes only. Then Yahoo’s newest CEO, Scott Thompson, is brought in, and gives choice <a href="http://www.forbes.com/sites/jeffbercovici/2012/01/04/new-yahoo-ceo-scott-thompson-well-be-back-to-innovation/">quotes like this one, just 3 months ago</a>:</p>
<blockquote>
<p>“We’ll be back to innovation, we’ll be back to disruptive concepts,” he added. “I wouldn’t be here if I didn’t believe that was possible.”</p>
</blockquote>
<p>Suing Facebook with patents is a <em>disruptive concept</em>. Yahoo just broke the <a href="http://www.npr.org/blogs/money/2011/08/15/139639032/google-escalates-patent-arms-race">patent mutually assured destruction</a> stalemate in the valley. This also signals to all engineers that Yahoo is not interested in building disruptive products, and is instead a sinking ship.</p>
<p>I believe the patent system as it exists today is broken. Software patents have major issues. There are many things I would like to change, but I cannot. I also believe reform of the system as a whole is unlikely. Previously, I have chosen to try to ignore patents as much as possible. The trouble is if you ignore the patents your own company is at significant risk. Other companies don’t have the same moral beliefs about patents, and will use patents against you.</p>
<h1>Retaliatory Only Patents</h1>
<p>Many companies say they have a defensive only patent policy. But control of companies changes. Policies change and <a href="http://www.iusmentis.com/patents/faq/general/#term">patents are granted for up to 20 years</a>. Most technology companies also provide some sort of cash or other incentive to employees for filing patents of behalf of the company. Just imagine if you were an engineer at Yahoo in 2005. You made a cool new patentable idea, and went down the path of getting it patented. In 2010, you left Yahoo, as most sane people did, and could of even joined Facebook. Then 2 years after you left Yahoo, your patent, which you thought was going to be used for defensive purposes only, is used in an offensive suit against Facebook.</p>
<p>This kind of situation is exactly why I’ve tried to ignore patents for so long.</p>
<p>I think there is a better approach to motivating engineers, besides a bonus for patents.</p>
<p>If during the patent filing process, a company created a binding legal agreement to only use the new patent for defensive or retaliatory purposes, I would personally find this highly motivating. I am sure that the legal definition of “defensive or retaliatory” would take 20 pages of text to define, but I trust that lawyers can figure out the details. This kind of policy would make me feel much better about putting effort into filing patents for a company. If the company later changes control, they could change this policy, but it would only apply to new patents after that change in control.</p>
<p>I am not a lawyer, but what is stopping something like this from happening?</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[March 2012]]></title>
<id>https://paul.querna.org/articles/2012/03/01/march-2012/</id>
<link href="https://paul.querna.org/articles/2012/03/01/march-2012/">
</link>
<updated>2012-03-01T10:31:38Z</updated>
<summary type="html"><![CDATA[Vacation: March 2 to 15: Japan March 16 to 19: San Francisco March 19 to 30: Chile and Argentina I don’t expect to be reading much email. I…]]></summary>
<content type="html"><![CDATA[<p>Vacation:</p>
<ul>
<li>March 2 to 15: Japan</li>
<li>March 16 to 19: San Francisco</li>
<li>March 19 to 30: Chile and Argentina</li>
</ul>
<p>I don’t expect to be reading much email.</p>
<p>I’m still undecided about where/how to post pictures.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Designing Network Protocols]]></title>
<id>https://paul.querna.org/articles/2012/02/22/designing-network-protocols/</id>
<link href="https://paul.querna.org/articles/2012/02/22/designing-network-protocols/">
</link>
<updated>2012-02-22T18:24:10Z</updated>
<summary type="html"><![CDATA[Hacker News user peterwwillis started a discussion about a new network protocol introduced by the mod_heartbeat module in Apache 2.4: It…]]></summary>
<content type="html"><![CDATA[<p>Hacker News user <a href="http://news.ycombinator.com/user?id=peterwwillis">peterwwillis</a> started <a href="http://news.ycombinator.com/item?id=3617247">a discussion about a new network protocol</a> introduced by the <a href="http://httpd.apache.org/docs/2.4/mod/mod_heartbeat.html">mod_heartbeat</a> module in Apache 2.4:</p>
<blockquote>
<p>It frustrates me when people use ASCII instead of packed bitmaps for things like this (packet transmitted once a second from potentially hundreds or thousands of nodes, that each frontend proxy has to parse into a binary form anyway before using it). Maybe it’s a really small amount of CPU but it’s just one of many things which could easily be more efficient.</p>
</blockquote>
<p>This thread on HN continued with dozens of other posts from many authors, with <code class="language-text">peterwwillis</code> holding his ground on his original point.</p>
<p>I disagree with the belief that a binary format should have been used and will attempt to show why the chosen network protocol for <code class="language-text">mod_heartbeat</code> was both reasonable and correct.</p>
<h2>Background</h2>
<p><a href="http://mail-archives.apache.org/mod_mbox/httpd-announce/201202.mbox/%3C2922160F-CBF2-4633-8B1E-C5045CC35918%40apache.org%3E">Apache 2.4 was released this week</a>, 6 years <a href="http://journal.paul.querna.org/articles/2005/12/02/httpd-2-2-0-released/">after 2.2 was released</a>. Compared to the 2.2 development cycle, where I was the Release Manager, I have not been as active in 2.4. However, one of the few features I did write for 2.4 was the <code class="language-text">mod_heartbeat</code> module. <a href="http://httpd.apache.org/docs/2.4/mod/mod_heartbeat.html">mod_heartbeat</a> is a method for distributing server load information via multicast. While I wrote <a href="http://svn.apache.org/viewvc?view=revision&revision=721952">mod_heartbeat 3 years ago</a>, many other Apache HTTP Server developers have added features and bug fixes since then.</p>
<p>The primary use case is for use by the <a href="http://httpd.apache.org/docs/2.4/mod/mod_lbmethod_heartbeat.html">mod<em>lbmethod</em>heartbeat module</a>, to direct traffic to the least loaded server in a reverse proxy pool.</p>
<p>The <code class="language-text">mod_heartbeat</code> code and design was derived from a project at <a href="http://en.wikipedia.org/wiki/Joost">Joost</a>. After stopping development of our thick client and peer to peer systems, we were moving to a HTTP based distribution of video content. We had a pool of super cheap storage nodes, which liked to die far too often. We built a system to have the storage nodes heartbeat with what content they had available, and a reverse proxy that would send clients to the correct storage server.</p>
<p>This enabled a low operational overhead around configuration of both our storage nodes and of the reverse proxy. Operations would just bring on a new storage node, put content on it, and it would automatically begin serving traffic. If the storage node died, traffic would be directed to other nodes still online.</p>
<h2>Understand your goals</h2>
<p><code class="language-text">mod_heartbeat</code>’s primary goal is: <strong>Enable flexible load balancing for reverse proxy servers</strong>.</p>
<p>For Joost we had good switches since we were previously setup for high packet rate peer to peer traffic. We also had previously used multicast for other projects. We choose to use a simple UDP multicast heartbeat as our server communication medium.</p>
<p>When designing the content of this heartbeat packet, I was thinking about the following issues:</p>
<ul>
<li><strong>10 to 200 servers</strong>: If you only have 10 nodes, you can do everything by hand. If you have hundreds of nodes, you are most likely building a hierarchical distribution of load. In my experience it is not a common configuration to have 10,000 application servers behind a single load balancer. I believe the sweet spot for this automatic configuration via multicast is pools between 10 and 200 servers.</li>
<li><strong>Multiple Implementers</strong>: The Apache HTTP server is all about being the flexible centerpiece of internet architectures, with many diverse producers, consumers, and interfaces. We must have a network protocol that is easily implemented in any programing language or enviroment, without adding additional dependencies.</li>
<li><strong>Extensibility</strong>: At Joost we embedded the available video content catalogs into the heartbeat advertisements. We needed a protocol that would be open to proprietary extensions without causing pain.</li>
<li><strong>Limited Network Impact</strong>: In a clustered systems you do not want the overhead of the cluster communication to negatively affect your application. It is important here to understand that many systems will actually hit <a href="http://www.cisco.com/web/about/security/intelligence/network_performance_metrics.html">packet-per-second limits before raw bandwidth limits</a>. We also assumed at this point in time all systems have gigabit internal networking. In my experience the difference between a 20 byte packet and an 8 byte packet that is being multicasted once a second is not a relevant issue on modern LANs. Even with 1000 servers emitting packets, this is 19.53 KB/s of bandwidth. How efficient this network flow is will depend on your exact multicast configuration and your specific switches, but in most configurations it is a non-issue.</li>
<li><strong>Operability / Debug-ability</strong>: <a href="http://www.wireshark.org/">Wireshark</a> and packet dumps are the best friend of a Network Admin. When people are doing packet dumps, they are looking for problems. A simple ASCII encoding of data will be easy for these people to see when they are in times of stress. Decoding a more complex binary encoding might get added as a feature to Wireshark someday, but it is yet another barrier</li>
<li><strong>Design for the long term</strong>: Design all public network protocols to be around for 10 years or longer. Include a versioning scheme. Don’t assume that 10 years from now your encoding system will still be around. I love <a href="http://msgpack.org/">msgpack</a> for internal applications, but on these time scales for a public protocol, nothing beats straight up ASCII bytes.</li>
</ul>
<h2>What I did in 2007</h2>
<p>Given the above considerations in 2007 at Joost, I started sketching out the possible formats for the multicast packet.</p>
<p>I considered using a binary format, but the immediate problem was having extendable fields. This meant we would need more than a few simple bytes. To create an extensible binary format, I started looking at serialization frameworks like <a href="http://thrift.apache.org/">Apache Thrift</a>. At this time in 2007 <a href="http://blog.facebook.com/blog.php?post=2261927130">Thrift had only been open sourced a few months</a>, and it really wasn’t a stable project. It also didn’t have a pure C implementation, and instead would have added a C++ dependency to Apache HTTP server, which is unacceptable. Since 2007 the number of binary object formats like <a href="http://bsonspec.org/">BSON</a>, <a href="http://code.google.com/apis/protocolbuffers/">Google Protocol Buffers</a>, <a href="http://avro.apache.org/">Apache Avro</a>, and <a href="http://msgpack.org/">Msgpack</a> have exploded, but just 4 years ago there really weren’t any good standardized choices or formats for a pure-C project. The only existing choice would be to use <a href="http://en.wikipedia.org/wiki/ASN.1">ASN.1 DER</a>, which would of implied a large external dependency, in addition to <a href="http://luca.ntop.org/Teaching/Appunti/asn1.html">just being too complex</a>. I decided that because of this and the other goals around debug-ability to peruse an ASCII based encoding of the content.</p>
<p>The choices for non-binary formats were:</p>
<ul>
<li><strong>XML</strong>: While XML is everywhere, and almost all languages have good bindings, it would be the most verbose choice. I also felt that it is <em>too</em> extendable. Someone later would add namespaces and other features that would make implementing a consumer much more difficult.</li>
<li><strong>JSON</strong>: Easier to consume, and <em>today</em> there are libraries for all languages. A major problem was that in 2007, there were no good JSON parsers in pure C. I know this because at the same time I was working on <a href="http://code.google.com/p/libjsox/">libjsox</a>, a pure C JSON parser with Rici Lake, and it was incomplete. (As an aside, <a href="http://lloyd.github.com/yajl/">YAJL is an excellent JSON parsing library</a> for C that you should use now days). Like XML, JSON would also mean consumers would potentially have to handle more complex objects, rather than a simple key value pair.</li>
<li><strong>Query parameters</strong>: <a href="http://tools.ietf.org/html/rfc3986">RFC 3986</a> defined URLs, including the structure of <a href="http://en.wikipedia.org/wiki/Query_string">query parameters</a>. This format is understood by every component in a web server stack, and Apache already included examples of parsing this type of format. The format is also easy to build without external libraries, meaning reimplementation in any language is very easy. The use of a key and value system also means implementers can use simple data structures like a linked list or hash for interacting with their representation.</li>
</ul>
<p>I made the decision to use query string style parameters as the best compromise for the content of the multicast packet’s content.</p>
<p>In the open source version of <code class="language-text">mod_heartbeat</code>, there are two fields that are exposed today:</p>
<ul>
<li><strong>ready</strong>: The number of worker processes that are ready to accept new connections.</li>
<li><strong>busy</strong>: The number of worker processes that currently servicing requests.</li>
</ul>
<p>Adding the version string <code class="language-text">v=1</code>, and then encoding the fields above we get something like this:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">v=1&ready;=75&busy;=0</code></pre></div>
<h2>What would I change today?</h2>
<p>If I were to need to implement the same system today, there are a few things I might change, but I don’t think any of them are critical mistakes given the original design constraints:</p>
<ul>
<li><strong>Consider using Gossip:</strong> <a href="http://en.wikipedia.org/wiki/Gossip_protocol">Gossip based systems are more complex</a>, but with more and more systems moving to Cloud based infrastructure, multicast communication is not a viable choice. Additionally, in some infrastructures, multicast can be problematic if not well configured, or if you have too many hosts joining and leaving the multicast group.</li>
<li><strong>Consider using JSON</strong>: JSON is a more verbose format, but the availability of parsers in all languages, including C, has significantly improved. I still do not think Thrift or Protocol Buffers are ubiquitous enough to anoint one of them as the only way Apache HTTP Server transports data.</li>
</ul>
<h2>Conclusion</h2>
<p>Binary encodings of information can be both smaller and faster, but sometimes a simple ASCII encoding is sufficient, and should not be overlooked. The decision should consider the real world impact of the choice. In the last few years we have seen the emergence of Thrift or Protocol Buffers which are great for internal systems communication, but are still questionable when considering protocols implemented by many producers and consumers. For products like the Apache HTTP server, we also do not want to be encumbered by large dependencies, which rules out many of these projects. I believe that the choice of ASCII strings, using query string encoded keys and values is an excellent balance for <code class="language-text">mod_heartbeat</code>’s needs, and will stand the test of time.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Rackspace Open Sources Dreadnot, a Continuous Deployment tool]]></title>
<id>https://paul.querna.org/articles/2012/01/05/dreadnot-continuous-deployment/</id>
<link href="https://paul.querna.org/articles/2012/01/05/dreadnot-continuous-deployment/">
</link>
<updated>2012-01-05T13:55:40Z</updated>
<summary type="html"><![CDATA[Today we open sourced Dreadnot, our take on a Continuous Deployment tool. Details are posted over on the Rackspace Cloud Blog Source is up…]]></summary>
<content type="html"><![CDATA[<p>Today we open sourced Dreadnot, our take on a Continuous Deployment tool. Details are <a href="http://www.rackspace.com/cloud/blog/2012/01/05/rackspace-open-sources-dreadnot/">posted over on the Rackspace Cloud Blog</a></p>
<p>Source is up on <a href="https://github.com/racker/dreadnot">github.com/racker/dreadnot</a>.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[2011 Timecards]]></title>
<id>https://paul.querna.org/articles/2011/12/31/2011-timecards/</id>
<link href="https://paul.querna.org/articles/2011/12/31/2011-timecards/">
</link>
<updated>2011-12-31T05:11:20Z</updated>
<summary type="html"><![CDATA[Work Project:
Hobby Project:
2012 Goal Finish the hobby project. Created using Dustin’s git-timecard.]]></summary>
<content type="html"><![CDATA[<h2>Work Project:</h2>
<p><a href="http://chart.apis.google.com/chart?cht=s&chs=800x300&chd=e:CkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639b,IAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAn.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.................................................,BZAWAAAAAWBDCyCyGnJDNOIsHqOQJZLIOQHTIWHqE4GRBvG9DIAWAABZE4E4DfELELE4ELE4GREhHTFOHTE4D1CcD1BZDIBvD1D1CyAtBDIAHTL1QAZvemqFlNsK0KlkgAUhF6DfE4BDF6CGCyFkD1AtE4ELG9LeYAj0kLl6mm1j..l6osgAIWKcIsF6GnIAIsHTDIEhLeIAFORDd6jerIosshz0tNtjkhVOM3FOG9HTHqLeIWELAtBvDfFkOQU3lkrIzIyFpY6b5Y70sKmQMLHTJvMhO9ELBvD1DIBvCcLINkPTd6chwWkhn.2m16yxv.n.RDIsIAGnEhHTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&chxt=x,y&chxl=0:%7c%7c0%7c1%7c2%7c3%7c4%7c5%7c6%7c7%7c8%7c9%7c10%7c11%7c12%7c13%7c14%7c15%7c16%7c17%7c18%7c19%7c20%7c21%7c22%7c23%7c%7c1:%7c%7cSun%7cSat%7cFri%7cThu%7cWed%7cTue%7cMon%7c&chm=o,333333,1,1.0,25,0&chds=-1,24,-1,7,0,20">
<img src="http://chart.apis.google.com/chart?cht=s&chs=800x300&chd=e:CkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639b,IAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAn.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.................................................,BZAWAAAAAWBDCyCyGnJDNOIsHqOQJZLIOQHTIWHqE4GRBvG9DIAWAABZE4E4DfELELE4ELE4GREhHTFOHTE4D1CcD1BZDIBvD1D1CyAtBDIAHTL1QAZvemqFlNsK0KlkgAUhF6DfE4BDF6CGCyFkD1AtE4ELG9LeYAj0kLl6mm1j..l6osgAIWKcIsF6GnIAIsHTDIEhLeIAFORDd6jerIosshz0tNtjkhVOM3FOG9HTHqLeIWELAtBvDfFkOQU3lkrIzIyFpY6b5Y70sKmQMLHTJvMhO9ELBvD1DIBvCcLINkPTd6chwWkhn.2m16yxv.n.RDIsIAGnEhHTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&chxt=x,y&chxl=0:%7c%7c0%7c1%7c2%7c3%7c4%7c5%7c6%7c7%7c8%7c9%7c10%7c11%7c12%7c13%7c14%7c15%7c16%7c17%7c18%7c19%7c20%7c21%7c22%7c23%7c%7c1:%7c%7cSun%7cSat%7cFri%7cThu%7cWed%7cTue%7cMon%7c&chm=o,333333,1,1.0,25,0&chds=-1,24,-1,7,0,20"></a></p>
<h2>Hobby Project:</h2>
<p><a href="http://chart.apis.google.com/chart?cht=s&chs=800x300&chd=e:CkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639b,IAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAn.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.................................................,MzAACkCkHrFIAAAAFIPXZmcKXCrhhR..euKPcKAACkFIFIMzFIKPAAAAAAAAAAAAKPXCcKhRrhUehRZmj1HrAAAAAACkAAXCFIFIFIAAAAAAAAAAFICkAAAAAAAAAAAAAAAAAAAAAAKPKPAAAACkAAAAAAAAAAAAAAAAFIAAAAAAAAAAAAAAAAAACkAAAAMzHrAAAAAAAAAAAAAAAACkAAAAAAAAAAAAAAAAAAAAAACkPXCkFIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACkUeR7HrCkAAAAAAAAAAHrAACkKPFIAAFIHrj1UeFIAAAAAACkAAhRo9AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&chxt=x,y&chxl=0:%7c%7c0%7c1%7c2%7c3%7c4%7c5%7c6%7c7%7c8%7c9%7c10%7c11%7c12%7c13%7c14%7c15%7c16%7c17%7c18%7c19%7c20%7c21%7c22%7c23%7c%7c1:%7c%7cSun%7cSat%7cFri%7cThu%7cWed%7cTue%7cMon%7c&chm=o,333333,1,1.0,25,0&chds=-1,24,-1,7,0,20">
<img src="http://chart.apis.google.com/chart?cht=s&chs=800x300&chd=e:CkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639bCkFIHrKPMzPXR7UeXCZmcKeuhRj1mZo9rhuEwozM1w4U639b,IAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAIAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAQAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAYAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAgAn.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.v.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.................................................,MzAACkCkHrFIAAAAFIPXZmcKXCrhhR..euKPcKAACkFIFIMzFIKPAAAAAAAAAAAAKPXCcKhRrhUehRZmj1HrAAAAAACkAAXCFIFIFIAAAAAAAAAAFICkAAAAAAAAAAAAAAAAAAAAAAKPKPAAAACkAAAAAAAAAAAAAAAAFIAAAAAAAAAAAAAAAAAACkAAAAMzHrAAAAAAAAAAAAAAAACkAAAAAAAAAAAAAAAAAAAAAACkPXCkFIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACkUeR7HrCkAAAAAAAAAAHrAACkKPFIAAFIHrj1UeFIAAAAAACkAAhRo9AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&chxt=x,y&chxl=0:%7c%7c0%7c1%7c2%7c3%7c4%7c5%7c6%7c7%7c8%7c9%7c10%7c11%7c12%7c13%7c14%7c15%7c16%7c17%7c18%7c19%7c20%7c21%7c22%7c23%7c%7c1:%7c%7cSun%7cSat%7cFri%7cThu%7cWed%7cTue%7cMon%7c&chm=o,333333,1,1.0,25,0&chds=-1,24,-1,7,0,20"></a></p>
<h1>2012 Goal</h1>
<p>Finish the hobby project.</p>
<p>Created using <a href="http://dustin.github.com/2009/01/11/timecard.html">Dustin’s git-timecard</a>.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Write Logs for Machines, use JSON]]></title>
<id>https://paul.querna.org/articles/2011/12/26/log-for-machines-in-json/</id>
<link href="https://paul.querna.org/articles/2011/12/26/log-for-machines-in-json/">
</link>
<updated>2011-12-26T15:10:15Z</updated>
<summary type="html"><![CDATA[Logging for Humans A printf style format string is the de facto method of logging for almost all software written in the last 20 years…]]></summary>
<content type="html"><![CDATA[<h2>Logging for Humans</h2>
<p>A <a href="http://en.wikipedia.org/wiki/Printf_format_string">printf style format string</a> is the de facto method of logging for almost all software written in the last 20 years. This style of logging crosses almost all programing language boundaries. <a href="http://logging.apache.org/index.html">Many libraries</a> build upon this, adding log levels and various transports, but they are still centered around a formated string.</p>
<p>I believe the widespread use of format strings in logging is based on two presumptions:</p>
<ol>
<li>The first level consumer of a log message is a human.</li>
<li>The programmer knows what information is needed to debug an issue.</li>
</ol>
<p>I believe these presumptions are <strong>no longer correct</strong> in server side software.</p>
<h2>An example of the problem</h2>
<p>An example is this classic error message inside the <a href="http://httpd.apache.org/">Apache HTTP Server</a>. The following code is called any time a client hits a URL that doesn’t exist on the file system:</p>
<p>{% highlight c %}
ap<em>log</em>rerror(APLOG<em>MARK, APLOG</em>INFO, 0, r,
“File does not exist: %s”, r->filename);
{% endhighlight %}</p>
<p>This would generate a log message like the following in your <code class="language-text">error.log</code>:</p>
<div class="gatsby-highlight" data-language="text"><pre class="language-text"><code class="language-text">[Mon Dec 26 09:14:46 2011] [info] [client 50.57.61.4] File does not exist: /var/www/no-such-file</code></pre></div>
<p>This is fine for human consumption, and for decades people have been writing Perl scripts to munge it into fields for a computer to understand too. However, the first time you add a field, for example the HTTP <code class="language-text">User-Agent</code> header, it would break most of those perl scripts. This is one example of where building a log format that is optimized for computer consumption starts to make sense.</p>
<p>Another problem is when you are writing these format string log messages, you don’t always know what information people will need to debug the issue. Since you are targeting them for human consumption you try to reduce the information overload, and you make a few guesses, like the path to the file, or the source IP address, but this process is error prone. From my experience in the Apache HTTP server this would mean opening <code class="language-text">GDB</code> to trace what is happening. Once you figure out what information is relevant, you modify the log message to improve the output for future users with the relevant information.</p>
<h2>What if we logged everything into JSON?</h2>
<p>If we produced a JSON object which contained the same message, it might look something like this:</p>
<p>{% highlight javascript %}
{
“timestamp”: 1324830675.076,
“status”: “404”,
“short<em>message”: “File does not exist: /var/www/no-such-file”,
“host”: “ord1.product.api0”,
“facility”: “httpd”,
“errno”: “ENOENT”,
“remote</em>host”: “50.57.61.4”,
“remote<em>port”: “40100”,
“path”: “/var/www/no-such-file”,
“uri”: “/no-such-file”,
“level”: 4,
“headers”: {
<strong>“user-agent”: “BadAgent/1.0”,</strong>
“connection”: “close”,
“accept”: ”<em>/</em>”
},
“method”: “GET”,
“unique</em>id”: “.rh-g2Tm.h-ord1.product.api0.r-axAIO3bO.c-9210.ts-1324830675.v-24e946e”
}
{% endhighlight %}</p>
<p>This example gives a much richer picture of information about the error. We now have data like the <code class="language-text">User-Agent</code> in an easily consumable form, we could much more easily figure out that <code class="language-text">BadAgent/1.0</code> is the cause of our 404s. Other information like the source server and a <a href="http://httpd.apache.org/docs/2.2/mod/mod_unique_id.html">mod<em>unique</em>id</a> hash can be used to correlate multiple log entries across the lifetime of an request.</p>
<p>This information is also expandable. As the knowledge of what our product needs to log increases, it is easy to add more data, and we can safely do this without breaking our System Admins precious Perl scripts.</p>
<h2>Why now?</h2>
<p>This idea is <a href="http://www.asynchronous.org/blog/archives/2006/01/25/logging-in-json">not new</a>, it has just never been so easily accessible. Windows has had <a href="http://en.wikipedia.org/wiki/Event_Viewer">“Event Logs” for a decade</a>, but in the more recent versions it uses XML. The emergence of JSON as a relatively compact serialization format that can be generated and parsed from almost any programming languages means it makes a great light weight interchange format.</p>
<p>Paralleling the <a href="http://www.pcworld.com/businesscenter/article/246941/big_data_analytics_get_even_bigger_hotter_in_2012.html">big data explosion</a>, is a growth in machine and infrastructure size. This means logging and the ability to spot errors in a distributed system has become even more valuable.</p>
<p>Logging objects instead of a format string enables you to more easily index and trace operations across hundreds of different machines and different software systems. With traditional format strings it is too fail deadly for the programmer to not log all the necessary information for a later operator to trace an operation.</p>
<h2>Generating JSON with Log Magic</h2>
<p><a href="https://github.com/pquerna/node-logmagic">Log Magic is a small and fast logging library for Node.js</a> that I wrote early on for our needs at Rackspace. It only has a few features, and it is only about 300 lines of code.</p>
<p>Log Magic has the concept of a local logger instance, which is used by a single module for logging. A logger instance automatically populates information like the the <code class="language-text">facility</code> in a log entry. Here is an example of creating a logger instance for a module named <code class="language-text">'myapp.api.handler</code> and using it:</p>
<p>{% highlight javascript %}
var log = require(‘logmagic’).local(‘myapp.api.handler’);</p>
<p>exports.badApiHandler = function(req, res) {
log.dbg(“Something is wrong”, {request: req});
res.end();
};
{% endhighlight %}</p>
<p>The second feature that Log Magic provides is what I call a “Log Rewriter”. This enables the programmer to just consistently pass in the <code class="language-text">request</code> object, and we will take care of picking out the fields we really want to log. In this example, we ensure the logged object always has an <code class="language-text">accountId</code> and <code class="language-text">txnId</code> fields set:</p>
<p>{% highlight javascript %}
var logmagic = require(‘logmagic’);
logmagic.addRewriter(function(modulename, level, msg, extra) {
if (extra.request) {
if (extra.request.account) {
extra.accountId = extra.request.account.getKey();
}
else {
/* unauthenticated user */
extra.accountId = null;
}
extra.txnId = extra.request.txnId;
delete extra.request;
}
return extra;
});
{% endhighlight %}</p>
<p>The final feature of Log Magic is dynamic routes and sinks. For the purposes of this article, we are mostly interested in the <code class="language-text">graylog2-stderr</code>, which outputs a <a href="http://www.graylog2.org/about/gelf">GELF JSON format</a> message to <code class="language-text">stderr</code>:</p>
<p>{% highlight javascript %}
var logmagic = require(‘logmagic’);
logmagic.route(’<strong>root</strong>’, logmagic[‘DEBUG’], “graylog2-stderr”);
{% endhighlight %}</p>
<p>With this configuration, if we ran that <code class="language-text">log.dbg</code> example from above, we would get a message like the following:</p>
<p>{% highlight javascript %}
{
“version”: “1.0”,
“host”: “product-api0”,
“timestamp”: 1324936418.221,
“short<em>message”: “Something is wrong”,
“full</em>message”: null,
“level”: 7,
“facility”: “myapp.api.handler”,
”<em>accountId”: “ac42”,
”</em>txnId”: “.rh-3dT5.h-product-api0.r-pVDF7IRM.c-0.ts-1324936588828.v-062c3d0”
}
{% endhighlight %}</p>
<h3>Other implementations</h3>
<p>There are many other libraries that are starting to emerge that can output logs in a JSON or GELF format:</p>
<ul>
<li><a href="https://github.com/flatiron/winston">winston</a>: (Node.js) A more complete (or complex?) logging module compared to Log Magic, but the prolific crew at <a href="http://nodejitsu.com/">Nodejitsu</a> have done a great job.</li>
<li><a href="http://pypi.python.org/pypi/graypy">graypy</a>: (Python) A graylog2 logger that interacts with the standard Python logging module.</li>
<li><a href="https://github.com/pstehlik/gelf4j">gelf4j</a> (Java) We use a modified version of this library that logs to <code class="language-text">stderr</code> instead of using UDP.</li>
</ul>
<h2>The Transaction Id</h2>
<p>One field we added very early on to our system was what we called the “Transaction Id” or <code class="language-text">txnId</code> for short. In retrospect, we could of picked a better name, but this is essentially a unique identifier that follows a request across all our of services. When a User hits our API we generate a new <code class="language-text">txnId</code> and attach it to our <code class="language-text">request</code> object. Any requests to a backend service also include the <code class="language-text">txnId</code>. This means you can clearly see how a web request is tied to multiple backend service requests, or what frontend request caused a specific Cassandra query.</p>
<p>We also send the <code class="language-text">txnId</code> to our user’s in our 500 error messages and the <code class="language-text">X-Response-Id</code> header, so if a user reports an issue, we can quickly see all of the related log entries.</p>
<p>While we treat the <code class="language-text">txnId</code> as an opaque string, we do encode a few pieces of information into it. By putting the current time and the origin machine into the <code class="language-text">txnId</code>, even if we can’t figure out what went wrong from searching for the <code class="language-text">txnId</code>, we have a place to start deeper debugging.</p>
<h2>Transporting Logs</h2>
<p>Since our product spans multiple data centers, and we don’t trust our LAN networking, our primary goal is that all log entries hit disk on their origin machine first. Some people have been using UDP or HTTP for their first level logging, and I believe this is a mistake. I believe having a disk default that consistently works is critical in a logging system. Once our messages have been logged locally, we stream them to an aggregator which then back hauls the log entries to various collection and aggregation tools.</p>
<p>Since all of our services run under <a href="http://smarden.org/runit/">runit</a>, our programs simply log their JSON to <code class="language-text">stderr</code>, and <a href="http://smarden.org/runit/svlogd.8.html">svlogd</a> takes care of getting the data into a local file. Then we use a custom tool written in Node.js that is like running a <code class="language-text">tail -F</code> on the log file, sending this data to a local <a href="https://github.com/facebook/scribe">Scribe</a> instance. The Scribe instance is then responsible for transporting the logs to our log analyzing services.</p>
<p>For locally examining the log files generated by <code class="language-text">svlogd</code>, we also made a tool called <code class="language-text">gelf-chainsaw</code>. Since JSON strings cannot contain a newline, the log format is easy to parse, you just split up the file by <code class="language-text">\n</code>, and try to <code class="language-text">JSON.parse</code> each line. This is useful for our systems engineers when they are on a single machine, trying to debug an issue.</p>
<h2>Collecting, Indexing, Searching</h2>
<p>Once the logs crossing machines, there are many options to process those logs. Some examples that can all accept JSON as their input format:</p>
<ul>
<li>Perl Scripts (Hah! Did you think Perl will <em>ever</em> go away?)</li>
<li><a href="http://www.graylog2.org/">Graylog2</a> (open source)</li>
<li><a href="http://logstash.net/">LogStash</a> (open source)</li>
<li><a href="http://loggly.com/">Loggly</a> (SaaS)</li>
<li><a href="http://www.splunk.com/">Splunk</a> (Proprietary Software, <a href="http://splunk-base.splunk.com/apps/22337/jsonutils">can do JSON with an extra tool</a>)</li>
</ul>
<p>For <a href="http://www.rackspace.com/cloud/blog/2011/12/15/announcing-rackspace-cloud-monitoring-private-beta/">Rackspace Cloud Monitoring</a> we are currently using Graylog2 with a <a href="https://github.com/Graylog2/graylog2-server/pull/52">patch to support Scribe as a transport</a> written by <a href="https://twitter.com/wirehead">@wirehead</a>.</p>
<p>Bellow is an example of searching for specific <code class="language-text">txnId</code> in our system in Graylog2:</p>
<p><a href="/wp-content/uploads/2011/12/graylog-txnId-search.png"><img src="/wp-content/uploads/2011/12/graylog-txnId-search.png"></a></p>
<p>While this example is simple, we have some situations where a single <code class="language-text">txnId</code> spans multiple services, and the ability to trace all of them transparently is critical in a distributed system.</p>
<h2>Conclusion</h2>
<p>Write your logs for machines to process. Build tooling around those logs to transform them into something that is consumable by a human. Humans cannot process information in the massive flows that are created by concurrent and distributed systems. This means you should store the data from these systems in a format that enables innovative and creative ways for it to be processed. Right now, the best way to do that is to log in JSON. Stop logging with format strings.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[The Switch: Python to Node.js]]></title>
<id>https://paul.querna.org/articles/2011/12/18/the-switch-python-to-node-js/</id>
<link href="https://paul.querna.org/articles/2011/12/18/the-switch-python-to-node-js/">
</link>
<updated>2011-12-18T01:33:06Z</updated>
<summary type="html"><![CDATA[In my previous post, I glossed over our team switching from Python to Node.js. I kept it brief because the switch wasn’t the focus of the…]]></summary>
<content type="html"><![CDATA[<p>In <a href="http://journal.paul.querna.org/articles/2011/12/17/technology-cloud-monitoring/">my previous post</a>, I glossed over our team switching from Python to Node.js. I kept it brief because the switch wasn’t the focus of the post, but since I believe I am being misunderstood, I will explain it in depth:</p>
<blockquote>
<p>Cloudkick was primarily written in Python. Most backend services were written in <a href="http://www.twistedmatrix.com/">Twisted Python</a>. The API endpoints and web server were written in <a href="https://www.djangoproject.com/">Django</a>, and used <a href="http://code.google.com/p/modwsgi/">mod_wsgi</a>. We felt that while we greatly value the asynchronous abilities of Twisted Python, and they matched many of our needs well, we were unhappy with our ability to maintain Twisted Python based services. Specifically, the deferred programming model is difficult for developers to quickly grasp and debug. It tended to be ‘fail’ deadly, in that if a developer didn’t fully understand Twisted Python, they would make many innocent mistakes. Django was mostly successful for our needs as an API endpoint, however we were unhappy with our use of the Django ORM. It created many dependencies between components that were difficult to unwind later. Cloud Monitoring is primarily written in <a href="http://www.nodejs.org/">Node.js</a>. Our team still loves Python, and much of our secondary tooling in Cloud Monitoring uses Python.</p>
</blockquote>
<p>This attracted a few tweets, <a href="https://twitter.com/#!/g0rm/status/148284022181732354">accusing various things about our developers,</a> but I want to explore the topic in depth, and 140 characters just isn’t going to cut it.</p>
<h2>Just how much Python did Cloudkick have?</h2>
<p>We had about 140,000 lines of Python in Cloudkick. We had 40 <a href="http://twistedmatrix.com/documents/current/core/howto/plugin.html">Twisted Plugins</a>. Each Plugin roughly corresponds to a backend service. About 10 of them are random DevOps tools like IRC bots and the like, leaving about 30 backend services that dealt with things in production. We built most of this code in a 2.5 year experience, growing the team from the 3 founders to about a dozen different developers. I know there are larger Twisted Python code bases out there, but I do believe we had a large corpus of experiences to build our beliefs upon.</p>
<p>This wasn’t just a weekend hack project and a blog post about how I don’t like deferreds, this was 2.5 years of building real systems.</p>
<h2>It worked.</h2>
<p><a href="http://www.rackspace.com/information/newsroom/pressreleases/rackspace-acquires-cloudkick-to-provide-powerful-server-management-tools-for-the-cloud-computing-era/">We were acquired.</a></p>
<p>Our Python code got the job done. We built a product amazingly quickly, built our users up, and were able to iterate quickly. I meant it when I said our team still <strong>still loves Python</strong>. </p>
<p>What I didn’t mention in the original post, is that after the acquisition, the Cloudkick team was split into two major projects — Cloud Monitoring, which the previous post was about, and another unannounced product team. This other product is being built in Django and Twisted Python. Cloud Monitoring has very different requirements moving forward — our goals are to survive and keep working after <a href="http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/">a truck drives into our data centers</a>, and this is very different from how the original Cloudkick product was built.</p>
<h2>What happened to Python then?</h2>
<p>Simply put, our requirements changed. These new requirements for Cloud Monitoring included:</p>
<ul>
<li>Multi-Region availability / durability</li>
<li>Multiple order of magnitude increases in servers monitored</li>
<li>Scalable system, that can still be used 5 year from now. (Remember Rackspace Cloud <a href="http://seekingalpha.com/article/306015-rackspace-hosting-s-ceo-discusses-q3-2011-results-earnings-call-transcript">grew 89% year over year right now</a>)</li>
</ul>
<p>Cloudkick was built as a startup. We took shortcuts. It scaled pretty damn well, but even if we changed nothing in our technology stack, it was clear we needed to refresh our architecture and how we modeled data.</p>
<p>The mixing of both blocking-world Django, and Twisted Python also created complications. We would have utility code that could be called from both environments. This meant extensive use of <code class="language-text">deferToThread</code> in order to not block Twisted’s reactor thread. This created an overhead for every programmer to understand both how Twisted worked, and how Django worked, even if your project in theory only involved the web application layer. Later on, we did build enough tooling with function decorators to reduce the impact of these multiple environments, but the damage was done.</p>
<p>I believe our single biggest mistake from a technical side was not reigning in our use Django ORM earlier in our applications life. We had Twisted services running huge Django ORM operations inside of the Twisted thread pool. It was very easy to get going, but as our services grew, not only was this not very performant, and it was extremely hard to debug. We had a series of memory leaks, places where we would reference a QuerySet, and hold on to it forever. The Django ORM also tended to have us accumulate large amounts of business logic on the model objects, which made building strong service contracts even harder.</p>
<p>These were our problems. We dug our own grave. We should’ve used <a href="http://www.sqlalchemy.org/">SQLAlchemy</a>. We should’ve built stronger service separations. But we didn’t. Blame us, blame Twisted, blame Django, blame whatever you like, but thats where we were.</p>
<p>We knew by April 2011 that the combination of new requirements and a legacy code base meant we needed to make some changes, but we also didn’t want to fall into a “Version 2.0” syndrome and over engineering every component.</p>
<h2>Picking the Platform.</h2>
<p>We wanted some <em>science</em> behind this kind of decision, but unfortunately this decision is about programming languages, and everyone had their own opinions. </p>
<p>We wanted to avoid “just playing with new things”, because at the time half our team was enamored with <a href="http://golang.org/">Go Lang</a>. We were also very interested in <a href="http://www.gevent.org/">Python Gevent</a>, since OpenStack Nova had recently switched to it from Twisted Python.</p>
<p>We decided to make a <a href="https://docs.google.com/spreadsheet/ccc?key=0AvBGESHWxhk2dHJ2Q0lWRFF3dkxLZmFiMVVGRElQaEE">spreadsheet of the possible environments</a> we would consider using for our next generation product. The inputs were:</p>
<ul>
<li>Community</li>
<li>Velocity</li>
<li>Correctness (aka, static typing-like things)</li>
<li>Debuggability/Tooling</li>
<li>Downtime/Compile Time</li>
<li>Libraries (Standard/External)</li>
<li>Testability</li>
<li>Team Experience</li>
<li>Performance</li>
<li>Production</li>
</ul>
<p>We setup the spreadsheet so we could change the weight of each category. This let us play with our feelings, what if we only cared about developer velocity? What if we only cared about testability?</p>
<p>Our conclusion was, that it came down to was a choice between the JVM platform and Node.js. It is obvious that the JVM platform is one of the best ways to build large distributed systems right now. Look at everything <a href="https://github.com/twitter">Twitter</a>, <a href="http://engineering.linkedin.com/tags/sna">LinkedIn</a> and others are doing. I <a href="http://journal.paul.querna.org/articles/2010/10/12/java-trap-2010-edition/">personally have serious reservations</a> about investing on top of the JVM, and Oracles recent behavior (<a href="https://news.ycombinator.com/item?id=3294783">here</a>, <a href="https://news.ycombinator.com/item?id=3357623">here</a>) isn’t encouraging.</p>
<p>After much humming and hawing, we picked Node.js.</p>
<p>After picking Node.js, other choices like using Apache Cassandra for all data storage were side effects — there was nothing like SQL Alchemy for Node.js at the time, so we were on our own either way, and Cassandra gave us definite improvements in operational overhead of compared to running a large number of MySQL servers in a master/slave configuration.</p>
<h2>Node.js? It has nested callbacks everywhere, thats ugly!</h2>
<p>I think this is one of the first complaints people lob at Node.js when they just start. It makes a regular occurrence on the users mailing list — people think they want coroutines, generators or fibers.</p>
<p>I believe they are wrong.</p>
<p><strong>The zen of Node.js is its minimalist core</strong>, both in size and in features. You can read the core lib Javascript in a day, and one more day for the C++. Don’t venture into v8 itself, that is a rabbit hole, but you can pretty quickly understand how Node.js itself works.</p>
<p>Our experience was that we just needed to pick one good tool to contain callback flows, and use it everywhere.</p>
<p>We use <a href="https://twitter.com/Caolan">@Caolan’s</a> excellent <a href="https://github.com/caolan/async">Async library</a>. Our code is not 5 level deep nested callbacks. </p>
<p>We currently have about 45,000 lines of Javascript in our main repository. In this code base, we have used the <code class="language-text">async</code> library as our only flow control library. Our current use of the library in our code base:</p>
<ul>
<li><code class="language-text">async.waterfall</code>: 74</li>
<li><code class="language-text">async.forEach</code>: 55</li>
<li><code class="language-text">async.forEachSeries</code>: 21</li>
<li><code class="language-text">async.series</code>: 8</li>
<li><code class="language-text">async.parallel</code>: 4</li>
<li><code class="language-text">async.queue</code>: 3</li>
</ul>
<p>I highly suggest, that if you are unsure about Node.js and are going to do an experiment project, make sure you use <a href="https://github.com/caolan/async">Async</a>, <a href="https://github.com/creationix/step">Step</a>, or one of the other flow control modules for your experiment. It will help you better understand how most larger Node.js applications are built.</p>
<h2>Closing</h2>
<p>In the end, we had new requirements. We re-evaluated what platforms made sense for us to build a next generation product on. Node.js came out on top. We all have our biases, and our preferences, but I do believe we made a reasonable choice. Our goal in the end is still to move our product forward, and improve our business. Everything else is just a distraction, so pick your platform, and get real work done.</p>
<p>PS: If you haven’t already read it, read SubStack’s great <a href="http://substack.net/posts/b96642/the-node-js-aesthetic">the node.js aesthetic</a> post.</p>]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Technology behind Rackspace Cloud Monitoring]]></title>
<id>https://paul.querna.org/articles/2011/12/17/technology-cloud-monitoring/</id>
<link href="https://paul.querna.org/articles/2011/12/17/technology-cloud-monitoring/">
</link>
<updated>2011-12-17T18:28:03Z</updated>
<summary type="html"><![CDATA[Earlier this week we announced a new product: Rackspace Cloud Monitoring. It is just starting as a (free) private beta, so if you want to…]]></summary>
<content type="html"><![CDATA[<p>Earlier this week we <a href="http://www.rackspace.com/cloud/blog/2011/12/15/announcing-rackspace-cloud-monitoring-private-beta/">announced a new product: Rackspace Cloud Monitoring</a>. It is just starting as a (free) private beta, so if you want to try it out, be sure to <a href="https://surveys.rackspace.com/Survey.aspx?s=e08d057768e04f09a8cb7811d47b82da">sign up via the survey here</a>.</p>
<h2>Transition from Cloudkick Technology</h2>
<p>Rackspace Cloud Monitoring is based on technology built originally for the <a href="https://www.cloudkick.com/features/monitoring">Cloudkick product</a>. Some core concepts and parts of the architecture originated from Cloudkick, but many changes were made to enable Rackspace’s scalability needs, improve operational support, and focus the Cloud Monitoring product as an API driven Monitoring as a Service, rather than all of Cloudkick’s Management and Cloud Server specific features.</p>
<p>For this purpose, Cloudkick’s product was successful in vetting many parts of the basic architecture, and serving as a basis on which to make a reasonable second generation system. We tried to make specific changes in technology and architecture that would get us to our goals, but without falling into an overengineering trap.</p>
<p>Cloudkick was primarily written in Python. Most backend services were written in <a href="http://www.twistedmatrix.com/">Twisted Python</a>. The API endpoints and web server were written in <a href="https://www.djangoproject.com/">Django</a>, and used <a href="http://code.google.com/p/modwsgi/">mod_wsgi</a>. We felt that while we greatly value the asynchronous abilities of Twisted Python, and they matched many of our needs well, we were unhappy with our ability to maintain Twisted Python based services. Specifically, the deferred programming model is difficult for developers to quickly grasp and debug. It tended to be ‘fail’ deadly, in that if a developer didn’t fully understand Twisted Python, they would make many innocent mistakes. Django was mostly successful for our needs as an API endpoint, however we were unhappy with our use of the Django ORM. It created many dependencies between components that were difficult to unwind later. Cloud Monitoring is primarily written in <a href="http://www.nodejs.org/">Node.js</a>. Our team still loves Python, and much of our secondary tooling in Cloud Monitoring uses Python. <code class="language-text">[</code>EDIT: See standalone post: <a href="http://journal.paul.querna.org/articles/2011/12/18/the-switch-python-to-node-js/">The Switch: Python to Node.js</a><code class="language-text">]</code></p>
<p>Cloudkick was reliant upon a <a href="http://www.mysql.com/">MySQL</a> master and slaves for most of its configuration storage. This severely limited both scalability, performance and multi-region durability. These issues aren’t necessarily a property of MySQL, but Cloudkick’s use of the Django ORM made it very difficult to use MySQL radically differently. The use of MySQL was not continued in Cloud Monitoring, where metadata is stored in Apache Cassandra.</p>
<p>Cloudkick used <a href="http://cassandra.apache.org/">Apache Cassandra</a> primarily for metrics storage. This was a key element in keeping up with metrics processing, and providing a high quality user experience, with fast loading graphs. Cassandra’s role was expanded in Cloud Monitoring to include both configuration data and metrics storage.</p>
<p>Cloudkick used the <a href="http://esper.codehaus.org/">ESPER engine</a> and a small set of EPL queries for its Complex Event Processing. These were used to trigger alerts on a monitoring state change. ESPER’s use and scope was expanded in Cloud Monitoring.</p>
<p>Cloudkick used the <a href="http://labs.omniti.com/labs/reconnoiter">Reconnoiter</a> <code class="language-text">noitd</code> program for its poller. We have contributed patches to the open source project as needed. Cloudkick borrowed some other parts of Reconnoiter early on, but over time replaced most of the Event Processing and data storage systems with customized solutions. Reconnoiter’s <code class="language-text">noitd</code> poller is used by Cloud Monitoring.</p>
<p>Cloudkick used <a href="http://www.rabbitmq.com/">RabbitMQ</a> extensively for inter-service communication and for parts of our Event Processing system. We have had mixed experiences with RabbitMQ. RabbitMQ has improved greatly in the last few years, but when it breaks we are at a severe debugging disadvantage, since it is written in Erlang. RabbitMQ itself also does not provide many primitives we felt we needed when going to a fully multi-region system, and we felt we would need to invest significantly in building systems and new services on top of RabbitMQ to fill this gap. RabbitMQ is not used by Cloud Monitoring. Its use cases are being filled by a combination of <a href="http://zookeeper.apache.org/">Apache Zookeeper</a>, point to point REST or Thrift APIs, state storage in Cassandra and changes in architecture.</p>
<p>Cloudkick used an internal fork of <a href="https://github.com/facebook/scribe">Facebook’s Scribe</a> for transporting certain types of high volume messages and data. Scribe’s simple configuration model and API made it easy to extend for our bulk messaging needs. Cloudkick extended Scribe to include a write ahead journal and other features to improve durability. Cloud Monitoring continues to use Scribe for some of our event processing flows.</p>
<p>Cloudkick used <a href="http://thrift.apache.org/">Apache Thrift</a> for some RPC and cross-process serialization. Later in Cloudkick, we started using more JSON. Cloud Monitoring continues to use Thrift when we need strong contracts between services, or are crossing a programing language boundary. We use JSON however for many data types that are only used within Node.js based systems.</p>
<h2>Node.js ecosystem</h2>
<p>We have been very happy with our choice of using Node.js. When we started this project, I considered it one of our biggest risks to being successful — what if 6 months in we are just mired in a new language and platform, and regretting sticking with the known evil of Twisted Python. The exact opposite happened. Node.js has been an awesome platform to build our product on. This is in no small part to the many modules the community has produced.</p>
<p>Here it is, the following is the list of NPM modules we have used in Cloud Monitoring, straight from our package.json:</p>
<ul>
<li><a href="http://search.npmjs.org/#/async">async</a> (rackers patched it)</li>
<li><a href="http://search.npmjs.org/#/cassandra-client">cassandra-client</a> (rackers wrote it)</li>
<li><a href="http://search.npmjs.org/#/cloudfiles">cloudfiles</a></li>
<li><a href="http://search.npmjs.org/#/command-parser">command-parser</a> (rackers wrote it)</li>
<li><a href="http://search.npmjs.org/#/elementtree">elementtree</a> (rackers wrote it)</li>
<li><a href="http://search.npmjs.org/#/express">express</a></li>
<li><a href="http://search.npmjs.org/#/ipv6">ipv6</a> (rackers patched it)</li>
<li><a href="http://search.npmjs.org/#/jade">jade</a></li>
<li><a href="http://search.npmjs.org/#/logmagic">logmagic</a> (rackers wrote it)</li>
<li><a href="http://search.npmjs.org/#/long-stack-traces">long-stack-traces</a> (rackers patched it)</li>
<li><a href="http://search.npmjs.org/#/magic-templates">magic-templates</a> (rackers wrote it)</li>
<li><a href="http://search.npmjs.org/#/metrics">metrics</a></li>
<li><a href="http://search.npmjs.org/#/node-dev">node-dev</a></li>
<li><a href="http://search.npmjs.org/#/node-int64">node-int64</a></li>
<li><a href="http://search.npmjs.org/#/node-uuid">node-uuid</a></li>
<li><a href="http://search.npmjs.org/#/nodelint">nodelint</a></li>
<li><a href="http://search.npmjs.org/#/optimist">optimist</a></li>
<li><a href="http://search.npmjs.org/#/sax">sax</a></li>
<li><a href="http://search.npmjs.org/#/showdown">showdown</a></li>
<li><a href="http://search.npmjs.org/#/simplesets">simplesets</a></li>
<li><a href="http://search.npmjs.org/#/strtok">strtok</a></li>
<li><a href="http://search.npmjs.org/#/swiz">swiz</a> (rackers wrote it)</li>
<li><a href="http://search.npmjs.org/#/terminal">terminal</a> (rackers wrote it)</li>
<li><a href="http://search.npmjs.org/#/thrift">thrift</a> (rackers patched it)</li>
<li><a href="http://search.npmjs.org/#/whiskey">whiskey</a> (rackers wrote it)</li>
<li><a href="http://search.npmjs.org/#/zookeeper">zookeeper</a> (rackers patched it)</li>
</ul>
<p>Now that our product is announced, I’m hoping to find a little more time for writing. I will try to do more posts about how we are using Node.js, and the internals of Rackspace Cloud Monitoring’s architecture.</p>
<p><em>PS: as always, <a href="http://rackertalent.com/san-francisco/">we are hiring</a> at our sweet new office in San Francisco, if you are interested, <a href="mailto:paul.querna@rackspace.com">drop me a line</a>.</em></p>]]></content>
</entry>
</feed>
If you would like to create a banner that links to this page (i.e. this validation result), do the following:
Download the "valid Atom 1.0" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src
attribute if necessary):
If you would like to create a text link instead, here is the URL you can use:
http://www.feedvalidator.org/check.cgi?url=http%3A//journal.paul.querna.org/xml/rss20/feed.xml