Ops, Entrepreneurism, Tools, and Thoughts.

Failure Hurts

| Comments

The trick is not minding that it hurts

The trick, William Potter, is not minding that it hurts.

T.E. Lawrence and David8 Lawrence of Arabia and Prometheus

As the Internet, technology and consumers collide we often hear, ‘What are you going to do to ensure this never happens again.’ For awhile, this really bothered me and it was simple to get me from my zenlike calm to ultra rage with that sentence. While I am sure we all endevaour for failure to not occur, it does. Thermodynamics tell us that closed systems alway seek a state of maximum entropy and so it goes in Web Operations. This is why our entire existence is formed around MTTR. One day, in my state of annoyance with my inability to project certainty, I came across this tweet from Mark Burgess:

The trick is to create meaning from failure.

I was missing the point. All along, I thought there was nothing meaningful in failure (and no one needs nothing) but as it turns out, there is a great deal of meaning if you can get past the hurt. Here’s the pattern I employ:


  1. There is some type of metrics collection in use so you can measure response to inputs
  2. There is a willingness to employ a continuous improvement loop

The ‘Lawrence of Arabia’ Pattern

blameless post mortem

You need to understand what happened and why it hurts. It’s not enough to say the internet is hard and move on. As much as possible, quantify your pain. Can you reduce an outage to dollars per minute? Can you measure the impact on your brand reputation? To get meaing from failure you simply must know what impact the failure has. Here are some examples of how to enact the blameless postmrotem:

one human correction

Once you have a bead on the pain the next step is to enact your continuous improvement loop. With my team we always set the intention that we will make (at least) one human behaviour based improvement in the wake of an outage event. We don’t mind that it hurt. We do want to make sure that the pain we feel next time is new and different. Some exercises we enact to identify that human based improvement:

  • What business assumptions have changed?
  • What technology assumptions have changed?
  • What could we do to make it easier to do the right thing?

Once upon a time we suffered pain from hysterical stop the world GC events. The human improvement was to find a way to post to a dashboard when these hysterical GCs were occuring so that, as the operator, we could quickly see that it was a GC loop and act accordingly. A little graphite and a small daemon go a long way to minimizing MTTR.

one technology correction

Everything changes and changes fast. What assumptions have you made about your technology collection in your complex system? Are the queues queuing well? Is the JVM tuned for the way people use your system? Can you do anything to catch this and make the feedback loop shorter? I encourage the introspection so that one improvement is made on the technology side.


Let’s stay on the GC example for fun. After torpedoing the heap the obvious tuning paramter is to merely add more. Indeed, that was our first approach to buy some time. Our introspection kicked in and we realized that our user base had grown and their habits had evolved. We then decided to look at a different JVM that would match more closely to how people used our software.


If you were to take one thing away from this pattern I hope that it is the inspiration to find meaning in failure. Like Lawrence of Arabia, there is beauty in the desert of failure and it you can withstand the pain it’s a wonderful way to improve what you do.

Jeet Kune DevOps

| Comments

When I am trying to build a bridge between Dev, Ops, QA, and a business unit I find that I reference Bruce Lee a great deal. Not only was Bruce Lee a superb actor and amazing martial artist he was a great philopsher. Click through to read about Jeet Kune DevOps!


| Comments

Welcome to my first post in the series regarding people and how they are the most important part of the system. I have decided to tackle cooperation so here are my thoughts (and thoughts from others) on how cooperation trumps competition. Be warned, this is a lengthy topic and this is only my initial essay on the matter. Expect more. Much more. As ever, click on for a good read.

Why are you reading this?

Chances are that you:

  • Keep a corner of the Internet running or some critical piece of information technology operating inspite of itself
  • You walk a fine line between innovation and stability
  • You might have this restless feeling that things can be done better

If so, welcome to the club. Before we begin to get incredibly deep into the details here are a few things that won’t be happening in this post:

  • There will be no defining of DevOps. There are people way better than me figuring that out
  • There will be less focus on tools, again, way better people out there for that
  • This will be focused on soft things like people, feelings, culture, behaviour hacks and how we focus on doing things better

My goal with the post is to leave you with the feeling that cooperation, on different levels, is better for us, all of us, than competition. I’d also like to posit some behaviour hacks that might aid in the transition to cooperation. Lastly, at the end of this post you should be left considering where competition is present in your team/organization and it if is worthwhile or not.

What’s Wrong?

Lately, some of the DevOps heavy hitters (@botchagalupe, @therealgenekim) have been discussing the teachings and achievements of a scholar by the name of Deming. He puts in best:

“the present style of management is a modern invention and represents “a prison created by the way in which people interact.” The present system includes competition between people, teams, departments, divisions, students, schools and universities. Although economists have taught that competition will solve our problems, we now know that competition is destructive. A better approach is for everyone to work together as a system. The solution to problems comes from cooperation, not competition.”
-Deming, W. E. 1993. The New Economics For Industry, Government & Education. Cambridge: Massachusetts Institute of Technology Center for Advanced Engineering Study.

I could not put it better myself! So, I have no desire to live out my career in a veritable prison so let us shift gears to talk about cooperation in greater detail.

Cooperation Interesting Facts

Some of the information I am about to share with you came from a wonderful documenatary titled “I am” directed by Tom Shadyac. It is worth a watch so check it out here or here.

Biology is a wonderful science to draw from when looking to understand cooperation. Here, as introduced by the documentary I am, the tale of ‘group based consensus decision making’ is demonstrated wonderfully by a herd of Red Deer. When a herd is in the meadow eating it is faced with a difficult choice about when the herd should move towards the watering hole to hydrate. Go too soon, and some herd members will be under nourished. Go too late and some herd members will be dehydrated. In either instance, the herd as a whole will be compromised in it’s ability to adapt to threats. So how do the deer solve this problem? The answer, it turns out, is democracy. In nature it is not the alpha deer that decides when the herd will move, it is the majority. Simply put, once the majority of the deer are looking up from the grass and in the direction of the watering hole that is the signal for the collective to move onto the watering hole. Again, I’m an engineer so don’t take my word for it. Read up on it for yourself: Red Deer. Some would call this emergent behaviour, others democracy, but the animals call it staying alive. Funny that…

Now before you rail that I am a pinko commie bastard and point out that capitalism is the best invention since sliced bread bear with me here. I know that deer are prey animals and humans are predators and it would be reasonable to suggest that prey based decision economics differ from predator based decision economics. If you think humans are not predators take look at the spacing of a humans eyes to those of a deer (binocular vs monocular). Humans have depth perception where deer do not. Conversely, deer have an exceedingly large field of view compared to humans. Focus and depth perception are predatory traits and humans need to be able to pounce, judge distances, throw things and generally kill other animals as part of our diet to survive we are most certainly predators. Deer need to be able to see said humans and run away. There is no doubt about it, humans are the apex predator of the planet inspite of there being many other predators that are superior to us in many ways. See below for an example.

alt text

On any given day, I am going to say that T-Rex wins this encounter. I saw Jurassic Park and I fear only two things: Tyrannosaurus Rex and Aliens of the James Cameron/Ridley Scott variety. Ergo, I consider myself an expert in all things T-Rex-ish and extremely grateful for the fact that humans can cooperate. Why? Because with the help of my friends and some technology, I feel confident that we could handle a T-Rex. Please see below:

alt text

Therein is my subtle point. Even predators, who have competitive tendencies, excel at cooperation. Orcas in pods, hyenas, wolves, and honey badgers in packs, crows in murders, owls in parliments and yes, T-Rexes in gangs. So face it human being, you are wired to cooperate even though you are a savage predator! In fact, there is plenty of evidence that working cooperatively in groups towards a goal makes humans feel good. Conversely, a group of humans being coerced to do something makes them feel like crap. Don’t believe me? I’ll find someone to micromanage you for a week.

The DevOps landscape is littered with wonderful examples of teams working together well. @Allspaw’s beer test is a wonderful measure of a team working together. Still, all these examples of working well in a crisis are fine-n-dandy and well described but no one is talking about the normal day and how you get a group focused on a singular purpose. Because, let’s face it, nothing gets done without cooperation or coercion (which is merely forced cooperation).

Cooperation is hard and poorly understood by many of us in management. I suggest to you that there are 4 levels at which cooperation needs to be nourished for a person/team/organization/society to flourish. The challenge is that as you empower cooperation through the levels there is a cooresponding difficulty increases that is exponentially related. At the start you have the cooperation between people, then on a team, then groups of teams, then between organziations. The difficulty is represented in the truly scientific diagram below.

alt text

Remember, you are here because you do not scare easily and you have a strong sense of perservereance. Of course you do! You’re still reading. Fear not, dear reader, there are ways to help cooperation become a staple in the existence of your team, teams, and organizations inspite of the convincingly displayed challenges.

Cooperation at Level 1

Cooperation at the very base is two people working together. That is all. This can also be the hardest cooperation to achieve. The best way to get two people working together is to give them a common goal, a purpose, a why, a reason to combine their talents and produce something from their joint effort. Let us look at an example:

alt text

This is a rather entertaining example yet it illustrates the point that is crucial: two people: one goal. In this case the goal was to make an amazing video about the ever challenging topic of ‘poor life choices’. I think we can all agree, mission accomplished.

In conclusion, to foster cooperation at level 1 the behaviour hack is simple to state, harder to achieve: Provide a Common Goal

Cooperation at Level 2

Level 2 focuses on fostering cooperation with a team. Essentially we’re looking at groups larger than 2 and constrained by a single functional business unit and a single upline report. There are several crucial ideas to build cooperation within a team:

  • Remove Hierarchy and Ranking ^1
    Ranking and pecking orders are horrible practices that directly promote competition. A wonder example of this terrible practice is Microsoft. Read about it here. Instead, set your team up as a system and understand how it fits into the overall system that is your business. Judge the output of the team as it relates to the system and tune it from there.
  • Decentralize Decision Making ^2
    I suggest that if a team knows why it exists, has been empowered and has strong communication it will not need to seek approval to do the right things for the business. This transformation is facilitated by the manager of the team.
  • Leadership instead of Management ^3
    Management should be focused on the system of the business and less on turf and visibility. Leadership emerges when ego is surrendered and in its place is a you find a strong listener with the ability to manage and understand the interdependencies between people, team, business units, and customers.

So where would we look for a shining example of cooperation for level 2? Well you could always call the A-Team but why don’t we look at Bradley Wiggins’ Tour de France team for 2012? Team Sky built the team it fielded for the 2012 tour well in advance gave them a clear single purpose.

” It was programmed from last year (2011) to win the Tour in July. They started working on this project more than a year before the Tour started. They got everyone on board on the team to make sure the riders were in the best shape, with the best material, the best coaching and training. They left nothing to chance.” Bernhard Eisel, Rider for Team Sky

Communication was clear, everything for Brad and the results are impressive.

The behaviour hack for team based cooperation is simple to discuss but challenging to implement: Transparent Communication.

Cooperation at Level 3

Cooperation between teams! Is this not the heart of the DevOps matter? Honestly, I cannot think of a concept that is more obvious and yet hardly present in the construct of North American business. Here are some of the crucial ideas to empower different teams to cooperate with each other.

  • Destroy any and all inter-team competition:
    When teams within a business compete it destroys any hope of building quality into the organization. A sales team will throw crappy leads at an account management team to drive numbers and look good and leave the account managers holding the bag. Dev will throw code over the wall to QA that is garbage. Ops will be obstructionist if they are optimized for stability and not change.
  • The business is a system! Articulate how each team is connected, it is never ‘us versus them’ as the output from one team is the input for another.
  • Elimination of Fear
    Just Culture advocates provide the strongest case for the elimination of fear within a business. I reccomend Sidney Dekker’s work or just start here to start on a journey that will transform your business.

Inter-team cooperation is a huge and career defining journey. An interesting example of a business that is thriving with this type of a model is github. Check it out here. Keep in mind that what Github terms as anarchy is just the application of zero hierarchy and true democracy.

In the previous sections I have left you with a behaviour hack that might perpetuate the concept we have discussed. For inter-team cooperation the most important behaviour hack is Executive Air Cover. There is no way to be successful in transforming teams to work together without strong sponsorship from the top.

Cooperation at Level 4

If getting people, teams, groups of teams cooperating is not enough of a challenge for you then how about getting separate organizations cooperating? Welcome to cooperation level 4; ladies and gentlemen strap in because this is nearly impossible with business as we know it today. Well, almost impossible. Here the DevOps movement has been bucking this trend quite well.

Velocity, DevOpsDays, and all the individual meetups that take place all over the globe are the only example I need to present on inter-organization cooperation. Yet, there are so many more: Netflix open sourcing the Simian army, Etsy and StatsD/Deployinator, Ruby on Rails …

The example I’d like to leave you all with for inter-organizational cooperation is this: The Etsy-Twitter engineering exchange program. Here we have two completely separate organiztaions exchanging techincal talent so they can learn from each other. That’s amazeballs. I don’t purport to have a clue on how to encourage that kind of behaviour other than sharing this example with you and hopefully encouraging my organization to do something similar.


My goal was to present supporting evidence that cooperation is a behaviour that returns more value than competition. I hope I have achieved that goal and entertained you in the process. If I could leave you with one thought it would be this: Operational teams have a wonderful opportunity to transform the way things are done in all industries so long as we consider people in addition to technology. People are the most important part of the system.

Ops and a New Series of Posts

| Comments

In proper Ops fashion I have finally setup comments and analytics on the blog! I’m measuring everything so it should be entertaining. Comments should be entertaining. I’m not interested moderataing so be warned and review this handy chart. In other news I wanted to cover three things with this post: what is Ops to me, the DevOps Calgary meetup, and a new series of posts about position on people in systems. As ever, click through if interested.

Why My Team Exists*

| Comments

*This is a posthumous post that I meant to get out before my tenure ended with my former team. Still, it’s an important post because it highlights some of the extremely powerful elements of culture that a strong team can harness and share. As ever, click through if you’re interested.


| Comments

Barcamp was hosted by Startup Calgary today and it was awesome. Many great talks, chats, and people. I want to cover some of the high points and with such a jam packed day this post will be a touch longer. Below is the summary and read on if you can handle the awesome.

  • Opening speakers
  • Mobile development Awesomeness (@johncarpenter)
  • Get Cash Money for your LBS startup! (@arpadb)
  • Pair Programming (@mmazur)
  • Government Money for R&D (@tktechnow and @boastcapital )
  • Brand Pyramid (@tktechnow)

New Flavour of Lifesaver: FPM

| Comments

I’ve been hunting for a good reason to get familiar with @jordansissel’s wonderful tool FPM. I have spent more time grovelling with rpm spec files than I ever wanted to but I haven’t been able to break out of the habit. Until today that is…

My problem is simple, despite setting up a sweet series of internet accessible yum repos to deploy product some people insist on a standalone installation method. I can’t deny, this is aggravating as snot. Alas, my problem is simple:

  1. How can I get a yum repo deliverable by media?
  2. How can I keep the install procedure as similar to those using the publicly available yum repos?

How to solve this issue using FPM and some cool features of yum:


| Comments

zendesk Logo

We use zendesk and I will admit, I’m a huge fan. We had the opportunity to retool our support system and a subtle suggestion from @ericchernuka led us to check out zendesk and what they were cooking. Of course, if I’m going to rave about something I’m going to tell you why, in depth, because that’s what I do! So, read on if you want to know about my favorite bits of zendesk in detail. As a preview my love for zendesk is in part due to: SaaS, iOS/Android app, triggers, macros and integration!

Graylog2 and Dreamy Ocelots

| Comments

Logging! A repeat topic here these days. I have found myself in a position where there are multiple instances of our product running and logging the snot out of everything. Point of interest: we cannot virtualize our product at this point due do an OpenGL dependency and the need for some serious GPU power. So, back to the main topic, how do you deal with logs all over the place? Logstash and Graylog2 (though @lusis is doing some crazy stuff with logstash and 0mq) are your best bet. For our testing we did a graylog2 implementatation on Ubuntu \11.10 and logstash running on RHEL 5.5 piping the logs to our graylog2 instance. I want to cover our setup (there are some learnings that are worth noting as most people are ripping Ubuntu 11.04) and I am leaning towards setting up a chef cookbook soon.