A new adventure at the New York Times

It's with mixed emotions that I post this today.

I've really enjoyed my 2 years with Datadog. I've loved working with the team, I love the product, I've learned a ton and I've made some great friendships.

But I had an opportunity drop into my lap and needed to take it.

September 6 I start at the New York Times as a Lead SRE on the Delivery Engineering team. My focus will be on monitoring and I'm really excited to put some of the things I've learned over the last 2 years into practice.

Working at the NY Times means me and my family have to relocate to NY - that's been in process for a couple of months and will likely take a couple more.

I'm super excited for the future - but I'm tremendously thankful to the team at Datadog that saw potential in me and believed that hiring a remote SRE from the Great White North was a good idea.

A huge thanks goes out to Alexis Lê-Quôc, Mike Fiedler and Amit Agarwal who took a risk on an unknown and helped me to grow and learn tons.

PS Since this was only finalized last week, I'm still looking for a place to stay for September and October, so if you know of a place for rent - 1 bedroom apartment with decent subway access to Manhattan - please send it my way darron@froese.org.

Running Consul at Scale - Journey from RFC to Production

I had the pleasure to present this at SREcon16 in Santa Clara, California.

I've embedded the slides and source code below:

Slides at Speakerdeck

kvexpress demo source code

kvexpress source code

Video from the talk is posted on the USENIX website.

kvexpress - transporting configuration through Consul

As discussed in my Consul service discovery talk on at Scale14x on Saturday, figuring out a technique which uses Consul's KV store to move configuration files around has been pleasantly surprising.

We released kvexpress - which is a small tool that:

  1. Uploads data into Consul's KV store and prepares it for distribution - usually on a single node.
  2. Downloads that data from Consul's KV store onto a client node, verifies it, writes it to a file and then runs an optional handler.

This happens usually in one of two main ways:

  1. Kicked off from a Consul watch - makes the delivery process very quick and hands off. This takes a little more to setup - but after that setup it's pretty hands off.
  2. In an ad-hoc manner - you need to put something on a bunch of nodes quickly.

Here's a quick demo of how it works using the Consul watch. It shows how removing a node from Consul's service catalog updates a hosts file that's inserted and delivered by kvexpress:

We can see a few things from the graphs:

  1. The files on all 1188 nodes are updated quite quickly - most of them under 300 milliseconds.
  2. There's one node that takes between 4 and 5 seconds consistently - I think it's an overloaded logging node.

The insertion happens when Consul Template notices the bunk service is disabled and rebuilds the template - Consul Template then hands off the final rendered template to kvexpress for insertion.

After the file is inserted, it replicates through Consul's KV store and the Consul watches that are watching the key kvexpress/hosts/checksum notice a change - which kicks off the kvexpress out process that double checks the file, writes the new file and reloads dnsmasq.

An example Consul watch would look like this:

{
  "watches": [
    {
      "type": "key",
      "key": "/kvexpress/hosts/checksum",
      "handler": "kvexpress out -k hosts -f /etc/hosts.consul -l 10 -c 00644 -e 'sudo pkill -HUP dnsmasq'"
    }
  ]
}

All of the commands we have used - and example versions of each are located here.

Here's another quick demo of how it works in ad-hoc mode.

In this demo, I am going to show:

  1. Grabbing a URL from a gist - it will be a 600 line configuration file.
  2. Installing that config file on 1200 nodes.
  3. During the same action - I will be removing the file - but normally you would restart the daemon or HUP a process.

kvexpress can help you to use Consul's KV store to make very quick changes to your cluster's configuration with safety and precision. There's additional kvexpress specific information in Saturday's talk it starts in the video at 44:30 and in the slides at slide 83.

Service Discovery in the Cloud with Consul

I had the pleasure to present this at Scale14x in Pasadena, California.

I've embedded the slides and link to YouTube below:

Talk on YouTube - Talk starts at around 4:30.

Slides at Speakerdeck

kvexpress source code

Push through it.

80 days ago, I decided that I would put real effort into learning to program in Go.

I had been working on something I had written in Ruby - from the original Bash script that it replaced - so I knew the problem space very well and I had my first potential project. As I finished the Ruby version, I realized that even though it was "correct" I had overlooked part of the problem space and I needed to extend it more if I truly wanted a comprehensive solution.

I didn't want to re-architect the Ruby version - I also didn't want to deal with adding the gems and a Ruby 2.x runtime to 1000 machines - so I thought I'd take a quick spike to see how quickly I could write it in Go. I'd written code in several languages with similar syntax - how hard could it be?

I created a private repo in my own account at Github and started hacking on Thursday night. After a cross country flight on Friday and some free time over the weekend I had binaries that were close to the same level of functionality as the Ruby version. I was very excited.

Some background might help here. During my almost 2 decades working with computers, I had worked with all sorts of different technologies and written code in many different languages - but I am not a developer. I'm much closer to a sysadmin / ops guy and I don't have any formal CS training. I studied theology and philosophy at school but the web ended up being my true calling.

When trying out a new programming language, I would sometimes buy a book, start reading and then try to "do it the right way". Gotta have tests! And those tests need to be mocked properly so that you can test without network access. And you need to make sure to write it in the style that the language is known for.

Nope - not this time - at least not at first.

I've half-learned all sorts of technology that way - gotten overwhelmed with the details that never quite came together - and was going to do this a little differently. I was not going to get stuck and give up.

Please don't misunderstand me - it's not that tests aren't valuable and that "doing things the right way" isn't a laudable goal. But I wasn't about to derail learning this tool because I couldn't put out perfect, tested and modular code right away. I will get there - but I need to read and write lots of code first.

I found some libraries to use and was going to start to build using a couple of pieces of reference material. I bought some books - but neither of them were actually available then - one of them isn't even done yet.

I looked through some Go intros and got the basics but better than that I started to write code - because I learn by doing.

And the code compiled, came together and worked. It was understandable and could easily be reasoned about. It was simple and organized into logical chunks and it functioned! The binary was significantly smaller and less cumbersome than my Ruby version, especially with all of its dependencies. I was able to refactor quite easily and so I did when it made sense.

I was pretty excited - this was fun again - but I was also freaked out about when I would actually have to show it to other people. I work with some of the smartest people on the planet and I knew:

  1. I write code, but I am not really a developer.
  2. I didn't use some of the distinctive features of Golang because I hadn't needed to yet. As somebody who reviewed my code early said - this was more like C code written in Go.
  3. There was obvious refactoring that could be seen by me - but what about the things I couldn't see yet? How many of those would I miss?
  4. There were no tests (yet). I didn't want to fall down that rabbit hole and not be able to climb out.
  5. We had talked internally about releasing my first Go project as an open source tool after my talk in January - scary.

That fear of failure of "not doing it the right way" had blocked me in the past but I was not going to let it stop me this time.

I needed to push past that fear of failure - that fear of not looking like I knew everything - because I needed to learn. I needed to go back to the beginning and be the student. How else do you learn? How else do you grow? I needed to not care about what Internet randos think about my coding style - or lack thereof. I need to be free of that as a concern in general.

I have no illusions that my code is the fastest, the best or the shortest. But I don't really care right now. I'm going to continue to learn, continue to get better and understand more - but I'm not ashamed of where I am at this very moment.

Because 80 short days ago, I had just picked up a new set of tools.

80 days later, we've deployed 3 of my creations into production where they perform their duty quite well.

80 days later, my newest project is being built with unit and integration tests from the start.

And I'm looking forward to the next 80 days of growing, learning and getting better at my craft.

I have a lifetime to learn new things - and I'm just getting started with Go.

Push through the fear - leave it behind - it's worth it.