DevOps Aphorisms
Table of Contents
- DevOps Aphorisms
- Aphorisms
- Software's Soft
- Make it fail
- Everyone's a genius, but nobody is a full-time genius
- It's never the complicated stuff
- A bug in the documentation is still a bug
- The person that did this did the best they could with the resources they had available, now you do the best you can with the resources you have available
- Always remember the NPB
- Is your code bus ready?
- If we take our hands off the keyboard now
- Always understand one layer deeper than you work
- If you're arguing, you're both losers
- Postmortems are blameless
- It's always OK to ask for help, but help the helper
- Technically correct is not the best kind of correct
- One crime at a time
- Communications checklist
- If you don't know, say those words in that order
- Sterile Cockpit
- Walk, don't run
- Aphorisms
DevOps Aphorisms
These are things I find myself trying to repeat and make normative in projects I'm working in. Some are pretty much "trad. arr." other might be a little more idiosyncratic. They mostly live in a little file on my computer, but someone else might like them, so here they are. You can hit me up on @djm62@beige.party on Fediverse if you find them to be unbearably wrong or somewhat useful, or you have your own version - I like to see what people pick up over the years.
Aphorisms
Software's Soft
If you can possibly just try an experiment, do so without worrying about reading. Bugs can be fixed, configurations can be reverted. Don't stress.
"Let's backup the config and try it out"
Make it fail
When you're troubleshooting, do things that you expect to fail or result in an error: it's a good check on your hypotheses.
"If I'm right, this won't work…"
Everyone's a genius, but nobody is a full-time genius
"Always pay attention and never make mistakes" is not a reasonable strategy: consider how to make your system less error-prone.
"How can we make this easier?"
It's never the complicated stuff
Maybe there's a subtle bug in the API server causing packet loss, but it's more likely you forgot to whitelist your IP. Ask every stupid question.
"Humour me: are you logged in to the right account?"
A bug in the documentation is still a bug
If one of your colleagues or clients can't get your code running by reading your documentation, that's a bug.
"If these steps don't work for you, please let me know on codemonkey@consulting.co"
The person that did this did the best they could with the resources they had available, now you do the best you can with the resources you have available
Your time is rarely well-spent asking "who wrote this piece of crap"? You're not going to write it perfectly this time, either.
"What can we improve about this?"
Always remember the NPB
Someone, at some point, is going to come face to face with your work in order to modify it. Avoid being tricky or clever if at all possible. Don't do stuff that only works once.
"That would work today, but I would hate to be the sysadmin next year"
Is your code bus ready?
If you get hit by a bus after work, can someone else pick it up? What if you have a family emergency? Just want a holiday? Don't make yourself a single point of failure: it's a miserable life, and bad for business. Some people find this phrasing a little gruesome and talk about "lottery island".
"The docs are up to date, everything you need is in README.org and CONTRIBUTING.md. Time to push the code and hit the road!"
If we take our hands off the keyboard now
This is a close cousin to the above. If some operational task is done-done, if I walk across the room, collect my camping gear, and disappear into the woods, taking absolutely no further actions, what's the first thing that will fail? If the answer is genuinely that nothing will fail, we are good and happy.
Always understand one layer deeper than you work
If you're dealing with Spark, understand the VMs. If you're dealing with APIs, understand TLS/HTTP. If you're talking about business migration, understand the services and components. You might never need to use it, but if you do, it's there.
"Is there something going on in the network layer? Let's dig in to the logs"
If you're arguing, you're both losers
We don't need to agree, we do need to discuss and decide, and there are no winners or losers ever.
"There are a few ways to do this: which factors are important in this case?"
Postmortems are blameless
"How did this happen?" never "What did you do?". Nobody breaks things deliberately, and people are never at their most effective while trying to cover their own asses.
"What happened first?"
It's always OK to ask for help, but help the helper
Before you call, try to anticipate the first three questions the person you call will ask (probably error message, logs, recent configuration changes). Get the answers before you call, and there's a 50-50 chance you won't need to.
"I switched from SSL to SASL_SSL and now I get No credentials found when I try to connect and the logs say No Kerberos ticket, oh, it works after I kinit, thanks for your help!"
"…"
Technically correct is not the best kind of correct
It's good to do exactly what's required, but never forget to check that whoever is using your work has actually received something of value.
"I've opened port 5439 inbound as requested. Does your connection work properly now?"
"Still timing out"
One crime at a time
Never change more than one thing while tuning or debugging. If you do, you have no way of knowing what fixed it.
"Let's try opening port 443 and allowing untrusted certificates" "Wait! Let's do one of those things…"
Communications checklist
Is this polite? Accurate? Helpful? It's hard to go wrong if you get all of those three right.
"Is this correct? Is it polite? Does it help? Hit send!"
If you don't know, say those words in that order
This is an important part of being honest, and without it you run the risk of either people seeing that you're bullshitting, or worse: people mistakenly thinking that you know what's going on.
"I don't know. It probably isn't this but it might be…"
Sterile Cockpit
When debugging, it's really easy to get sidetracked, start fixing the problem before fully identifying it, reminisce over vaguely related things etc etc. If you are genuinely just waiting for something, stay and chill and chat. If you are doing something, "sterile cockpit" is a polite reminder to keep communications focussed on the immediate issue.
"if they'd implemented this in Terraform-" "sterile cockpit, let's see what the immediate issue was with that API call"
Walk, don't run
If you have a good, logical, pliable, codebase, and a skilled and helpful developer you might find a user suggestion to be "dead easy, let me just". It probably serves everyone better to take some time, ask it properly, and implement it with a cool head.