PRE-ROLL: Whether you're working on a personal project or managing enterprise infrastructure, you deserve simple, affordable, and accessible cloud computing solutions that allow you to take your project to the next level.

Simplify your cloud infrastructure with Linode's Linux virtual machines and develop, deploy, and scale your modern applications faster and easier.

Get started on Linode today with $100 in free credit for listeners of Greater Than Code. You can find all the details at linode.com/greaterthancode.

Linode has 11 global data centers and provides 24/7/365 human support with no tiers or hand-offs regardless of your plan size. In addition to shared and dedicated compute instances, you can use your $100 in credit on S3-compatible object storage, Managed Kubernetes, and more.

Visit linode.com/greaterthancode and click on the "Create Free Account" button to get started.

REIN: Welcome to Episode 220 of Greater Than Code. I’m your co-host, Rein Henrichs, and I’m here with my friend, Mando Escamilla.

MANDO: Thanks, Rein. I’m here with our friend, Josh Thompson, who’s also our guest today.

Josh Thompson writes words for people and computers. Before getting into software development, he worked in climbing gyms, customer support/success, and inbound B2B SaaS sales.  

He likes how he gets to use all of this prior experience in the teams he works with and the problems he solves. In the words of a former coworker, “Josh brings a rare mix of skills and dispositions to the problems he solves.”

Welcome to the podcast, Josh.

JOSH: Thank you so much for having me.

MANDO: I know. Well, we like to start off the podcast with asking every guest the same question and I think you might know what that is already. But why don’t you tell us what your superpower is and how you acquired it?

JOSH: Yeah. it’s a common superpower, I think for many guests that I've heard on the show, it's around teaching and learning. I thought about it in my answer was: I can teach nearly anything I know to almost anyone. 

The way I got it. Well, so basically, I can learn something, it'll take me – I'm not actually that fast of a learner, as obsessed as I am with learning, but then once I learn something well, I can turn around and teach it to someone else in a lot less time than it took me to learn that given thing. There'll be often a lot less pain and suffering and difficulty along the way for that other person. 

The way I started seeing the shape of this skill and then honing it was, I worked at a large climbing gym on the East Coast of the United States in maybe 2011, 2012 and really enjoyed all of the work that I got to do there and I did a lot of teaching. One of the favorite things that I got to teach was there's this advanced lead climbing course that we taught, which helps people who are rock climbing and trying to do this certain rock climbing—which sometimes involves long falls and it can be dangerous and people can be hurt. As people get into rock climbing and try to get better at this, they find that their fear, which is a very reasonable fear, holds them back more and more and more as they are trying to progress into harder levels and a lot of the fun of rock climbing is trying hard to things so, it's very reasonable to have failures and to fall in all of that.

The long story short ended up rebuilding the course and started leading with fear remediation by way of building trust and confidence in safety systems. Then once that had been really carefully explained and everyone was feeling on board and checking in with everybody, then moving on to the stuff that they tended to be more scared of and had incredible success. Like, I was able to, in one or two sessions, take people – they'd been crippled by fear for years and have them climbing like with way more fun, much less danger, less risk and their actual climbing skills would skyrocket in just like a week or two as they were dealing with that fear. 

That's when I started looking at like oh, taking knowledge, squeezing it into a smaller and less painful package and then handing it off. Obviously, that goes to many different skills related to software and many other things, but that's where I got my start.

REIN: You sent us a photo and it looks like, in that photo, you're a few hundred feet up a wall. Which one is that?

JOSH: Yeah, it's one of my favorite photos. It's because of the context in which it was taken. I was with a very dear friend and we were probably 300 feet off the ground in the Shawangunk, New York. We'd gone out for a morning climbing session and it was just a delightful day with a good friend and a lot of that fear management. Just having fun, but doing safe things, but also doing things like, when you're a hundreds of feet off the ground, things could go wrong and the outcomes are more severe than when you're lying in your bed, but it can be done almost as safely as going to bed. So that was a really wonderful route in New York.

REIN: So I've got a question for you and it's not on the list of things we were going to talk about, but I'm really interested because I've been studying safety science for some years now. Usually, it's studied in the context of factories, hospitals, planes, trains, automobiles, things like that. The question is, how do you stay safe as a rock climber?

JOSH: That is such a good question and safety science is really interesting to me as well. I'll segue into this. Every year, The American Alpine Club, which is kind of the guiding body of American alpinism, as you might imagine, releases an accident report and it's always a thick journal and it's a number of stories. Sometimes it's multiple pages and a lot of details; sometimes it's just a paragraph or two about a variety of accidents. Sometimes it's just closed calls and nothing injurious happened to the party; sometimes it's much worse and there's a number of fatalities and very real accidents.

A common thread, I consider it not to be light or pleasant reading, but very important reading for the same reason that studying safety science isn't necessarily pleasant when you're reading about plane crashes, trains derailments and things like that, but it tends to be the thread. To answer how do you stay safe in rock climbing is build the baseline skills because most accidents happen to people that are new to the field and then very experienced in the field. The accident rate goes down dramatically once people are smart enough to not make beginner mistakes, but not so experienced that they're lazy, or they're like, “Ah, whatever. It's not going to happen to me.” 

Now I've been climbing for probably 15 years so, I consider myself to be kind of – my risk is going back up in that I could get lazy and think like, “Oh, this is easy,” or “I've done this before and nothing bad can happen.” There's a number of famous accidents that have happened just like that like someone forgets to do something very basic and then a world-class rock climber dies and everyone is just like, “Oh, it was so preventable.”

So intuitions, for lack of a better word, is once you've built a base line of skill, you should be very sensitive to intuitions and when you're doing something that ah, doesn't quite feel right, but you can usually justify it into like that not being a problem. But then if circumstances compound and you face another one of those of like, “Well, the weather's changed, I didn't pack an extra layer.” “Oh, I'm climbing on someone else's rope and did they mention that they cut the ends or not, or that they trimmed them or not?” You should listen to those little things because they compound and can rapidly turn into something that you can't easily extricate yourself from. 

I was once on a day trip up Longs Peak. There's a very tall piece of rock on the back of Longs Peak in Colorado. It's the highest elevation alpine climbing and I was with a friend who is ten times the rock climber I will ever be. Long story short, we had finished a very difficult portion of the climbing and we were about to get to the part that he was really excited about and I said, “I'm sorry, Nathan. We’ve got to go home. I feel like I've exceeded my margin and it's a good day, we're doing fine, but we've drunk more of our water. This has been harder than I expected.” 

There's a history of sometimes him being so strong that he doesn't know that other people can't just effortlessly keep up with him and I didn't want to find us in a spot where bad things could happen. So I intuitively canceled the trip, even though nothing bad happened. Not canceled the trip, but caused us to go back down. That's a very long way – there's as many ways to stay safe as there are ways to be injured but I recommend that people listen to their intuitions.

REIN: I think when most people think of rock climbing accidents, they think of falling off the end of the rope rappelling, or protection that fails, or anchors that fail. But correct me if I'm wrong, but the most common cause of rock climbing injury is roped falls. Just normal falls.

JOSH: Yeah, it depends. So it's kind of the layer of analysis because there's soft tissue issues, spraining an injury or a very common injury is straining a very small tendon in the finger, which if you were like, “I was out rock climbing and I got injured.” I was like, “Oh, what happened?” and you were like, “I strained my finger!” No one expects that to be the injury. But that kind of thing is very common and if you were to just lump up all of the incidents in climbing that could count as injuries, it's usually soft tissue kinds of things, muscular strains, that kind of thing and those can happen when falling. 

There's broadly two different domains of rock climbing. There's bouldering where you're not using a rope and when you fall, you usually have spotters and you land on crash pads and that can be a 2-inch fall. I'm a very timid boulderer so my favorite kind of bouldering is where like my butt is dragging along the ground and if I fall, like I fall about that far, like nothing, I love it. I've got other friends that are extremely, they're like, “Oh, I love to climb 20 feet off the ground because it's invigorating,” and I'm like, “I would wet myself.” So I don't do that. 

But then with rope falls, specifically lead climbing, usually… You know what, I don't know by volume, what the most common form of injury is because you tend to segregate into significant injury and then all the little stuff. For instance, golf has a higher injury rate per thousand participant hours than rock climbing, but most of them, it's not hitting someone with a golf club or hitting them with a golf ball, it's the same kind of muscular skeletal injury.

REIN: The thing that I think is really interesting about this is that the way it's categorized in these incident reports statistics is routine roped fault because it's just a normal climbing.

JOSH: Yeah. That could be it because falling is extreme. That's a really good point of falling being routine and with sport climbing or trad; there's certain kinds of climbing where you're not supposed to fall in certain situations and if you fell there, it wouldn't be that surprising that maybe there'd be unpleasant outcomes. But with modern sport climbing, which is a climbing gym, if you go to a climbing gym, all you're doing is sport climbing or top roping, which is much safer, and that kind injury – this is exactly what I taught about. 

We're getting dangerously close to something I could lecture on for hours or talk with and have taught many people because it's a very near and dear topic to my heart that when someone falls on a wall, say they're 5 feet above their last piece of protection. They fall 5 feet and now there's 5 feet of slack so, they fall another 5 feet and then the rope starts pulling tight. So it's very common to get 15-to-20-foot falls when you're outside or inside. 

If the belayer does not do their job really well, it's possible that the falling climber has an unpleasant degree of force exerted upon their body when the fall is all said and done and injuries can come from there like, sprained ankles, sprained wrist if you don't have time to orient your body in space correctly. 

REIN: You're leading; you need slack in the rope at certain times. 

JOSH: Yeah. 

REIN: The time that you need to slack in the rope is when you're furthest from your previous protection because that's when you're putting in the next protection. To fall at that point, you're in for a ride.

JOSH: Yes! Incidentally, if you could put a force measuring device on the top piece of protection, long falls tend to have surprisingly low maximum forces on them because the rope acts as a bungee cord, it stretches a couple of percentage points, maybe 10%. then as you're falling faster, the belayer, if they're doing their job correctly, they should, for lack of a better word, go with the falls so they kind of – there's situations where you wouldn't. 

Everything in rock climbing has giant asterisks every time you make a blanket statement, but all else equal, you'd want the belayer to step into the wall and maybe hop up to the wall with the falling climber. That makes for a very pleasant deceleration for the falling climber and then this is where it's really important to process fear and anxiety for both parties when they're climbing, because if you watch someone rock climbing, you're like, “Wow, they might be really scared.” Like that's very impressive. 

If their belayer is scared for them, the belayer is so much more likely to handle a fall wrongly and then project into the universe, their fear and then make it true. Make their fears justified by fail – because when someone is falling, if you're a belayer and you're scared, you tighten up, sometimes you take slack out of the system, you might even try to walk backwards. Because it can take a while for someone to finish falling and if you're like, “Oh my gosh, they're still falling. They're still falling! This is terrible!” But all of that happens in a split second. It's a reflex level response and so if you fight it, you can make their fall dangerous. 

That's why when trying to be safe, there's actually elements of just how are we feeling right now? Are you feeling –? Check in with your body. Do you feel tense? Do you feel anxious? If so, let's keep working on that and you can safely give people exercises to train the correct responses in a certain situation and then weight differences, change everything. If someone is belaying for someone that's much lighter than them or much heavier, that changes everything. That's probably where most injuries do come like sprained ankles and sprained wrists.

REIN: The part of this interests me especially is that one of the lessons of safety science is that accidents are the result of normal work so, the inherent variability in normal work is what produces accidents.

JOSH: Yes! 

MANDO: When I was at my last job, I worked on a dev ops team. Bunch of operators doing stuff and one of the things that our boss really drilled into us was that we would spend all day typing in command that could have disastrous outcomes. If they were the wrong command, you should have got some wrong environment or typo-ed and so part of the work is to try and build in whatever guardrails you have in the systems to keep those worst-case scenarios from happening by accident. 

Because another part of the work is to be okay with that as a potential outcome, because nobody's perfect. If you're running a couple of hundred commands a day, say, or over a week or a month or a year, to expect every one of those to be 100% correct, is something that you can't expect that of anyone. 

I really like that idea, Rein of accidents happening as a course of normal work. You cannot expect perfection all the time, right?

REIN: Yeah.

JOSH: Yeah. If your environment requires perfection, you're probably more likely to get imperfection. I perform worse when I'm anxious. When working, if I'm on a production environment and doing anything that has – I get anxious and I would ask someone to look over my shoulder of like, “Hey, can you make sure that we're like doing the right things to this production database?” Or we're not on production and there's lots of different ways to mitigate individual specific harms. 

I don't know how famous this is. It's a very short paid paper. It's 2 pages or it's just a couple of pages called How Complex Systems Fail. It's 18 paragraphs of bullet pointed. For instance, the first one is—I just pulled it up because I love this paper—complex systems are intrinsically hazardous systems. Point number two is complex systems are heavily and successfully defended against failure. Point three, catastrophe requires multiple failures. Single point failures are not enough. Point number five is complex systems run in degraded mode. You're almost never working with a system in whatever the salesperson sold it to you as it's rarely in that environment. It's duct tape and spit balls stringing the whole thing along. And point number six, catastrophe is always just around the corner.

[laughter]

This, I think is important and this is why I like The American Alpine Club accident reports because I think they're almost identical. People can die when software goes wrong and people can die when not software goes wrong so there's no artificial distinction between those systems. But point number seven says post-accident attribution to a root cause is fundamentally wrong. There's almost never root cause of a failure it's a complex inner related thing. The person that usually dealt with this was home sick because of whatever and so now the new person that just started was following the instructions and the instructions didn't say that except for this exceptional situation, do this other thing. So don't blame the new guy.

REIN: Things that seem like they have obvious root causes like the protection failure. Well, why did they put that protection there? Why were they willing to take that risk at that time and so on?

JOSH: Yeah, and I think that this has a lot of value. Some of the other things that I care a lot about is the transfer of knowledge from experts to non-experts because I'm not an expert in most domains that I – you spent enough time in something and then you can spend time with smarter and smarter people, then you're like, “Oh, I'm surrounded by people that have spent decades in this field and I'll never be like that,” and that's okay. It’s not a moral judgment, it's just a factual judgment. 

But there’s something that – I'm an expert in some domains, like I'm more of an expert in rock climbing than I am in software because I have a lot more time in rock climbing than software. Whenever I'm teaching or trying to impart knowledge, I try to help the person that I'm teaching understand that even experts operate with vast uncertainty in almost all, not all situations, but we're just making intuitive judgements as we go along and rather than impart rules, I really like to impart systems of reasoning.

For instance, saying if something feels uncomfortable, you should pay really close attention to that and understand why. Maybe you keep doing the thing that you were planning on, but if you feel this discomfort, you should think about it and understand that in group dynamics, it's scary to tell the whole group that you think something bad is might be coming. But that's actually a really noble and bold thing to say. I would like to plant a flag in this issue and let it be known and I want us to talk about that. 

We might be moving down a dangerous path and novices have a lot of value to bring there because they're more perceptive to the environmental hazards than maybe someone that's been operating it in all the time. Maybe then the expert can alleviate their concerns and say, “Oh, we're working in a nuclear power plant. Of course, it just feels scary everywhere. But here's the things that we can do to make it safe.” Or they're like, “Oh,” – but in safety systems and safety science, I'm sure so many accidents. I haven't looked at it recently, but accidents happen when experts get lazy and then they override the concerns of the people underneath them and then catastrophic outcomes ensue that certain people did see coming, but we're squashed. 

REIN: The interesting thing about that is that in a lot of other situations, that decision might've been fine. What you often find if you investigate an incident is that a hundred of the things that happened, that were related to the incident, happened hundreds of times without causing incident. Even if you find out that a piece of code wasn't tested enough, you can find hundreds of pieces of code that are tested enough that have never caused an incident. So clearly, that's not a sufficient explanation for why this one.

JOSH: Oh yeah, that's why I feel like a huge value add that the people that have the political power in an organization, if they can use it to try to remove pressure from individual accidents and more put pressure on systemic solutions. If you do a deploy – I've done this. Did a deploy, random migration because of the ordering of two lines, it ended up we're trying to reindex a column to force uniqueness and because it was a very busy table with a lot of activity, the migration failed for not important reasons. Then it took us a while to figure out what was going on and then eventually, it was fine. It was background workers so all the data was still there. 

It wasn't actually that big of a thing, but it was stressful for a couple of hours and we're trying to see why is this migration that took 3 minutes to run the first time, why is it taking an hour to run the second time and it was because we were trying to do a very large scan on a non-index column because we dropped the columns. So the solution was: rename the index, then do the thing and if you have to run it twice, whatever, that's fine. It becomes a little more item potent and then once the new uniqueness constraint index is successfully in there, then you proceed to the step where you drop the index. That's a hard-earned of knowledge that I’ll never make that mistake again.

But rather than be like, “Josh, why did you do it wrong?” or me trying to preemptively lay myself on the altar because I would rather talk about my own inadequacies than have someone else accuse me of inadequacy. Rather than that individual failure being the focal point, stepping back and being like, “Well, this is why we have data backups. Are we confident in our backups?” If not, let's once a month, until we're confident, let's practice dropping a table and making sure that we can bring it back from our backups. Let's get a runbook. If we had lost data, I didn't know where to go to start bringing it back because it was arcane knowledge in someone's head and if they were out for the day, we would have been out of luck. The individual failure doesn't matter, but the system that allows you to recover from a failure should be resilient broadly. 

Application performance monitoring, it's like this code went out without testing and then it failed okay, there's probably another lens of analysis that we'd be a little more rigorous with that will make it safer. Then that has benefits to the company because you can start bringing on less and less experienced people and trust you can give them a safer environment to do their work in rather than micromanaging them and feeling like they're going to torch your code base, or your data, or your customers with a small misstep. 

Mando running hundreds of commands a day, I would be terrified. Just saying that caused me anxiety because I'd be afraid of doing something. But if you're like, “Okay, here's how we” – before we touch the data, there's always a backup. Watch, I can drop users and it'll come back automatically with only 30 seconds of data loss. That makes it out a safe environment even though in my job, someone running a bunch of commands on production data, people would've been like, “What are you doing? This does not seem like the right way to do it,” but if it's safe, it's safe.

MANDO: Yeah. Going back to what you were saying earlier, Josh, how the desire to go in and do a root cause analysis of an issue is often misguided due to this complex interweaving of disparate sometimes systems. 

One of the worst weeks at my previous job where was around this series of outages that we had. They were temporary and they were more or less untraceable. All of a sudden, out of nowhere, alarm started going off. Everything from Datadog alerts from backend processing systems to frontend requests either taking too long, coming out failing. Seemingly from everywhere, things started going haywire in just one area of our product and one of the guys on my team tracked it down to one of our Cassandra Clusters having a problem and this outage would last for 3 to 4 minutes and then resolve itself.

So we tracked it down to Cassandra. We saw the dashboards and our alerts with Cassandra firing, telling us that either load was high or memory usage was high. There was obviously something going on with Cassandra, but as we looked at all the other systems, none of them were increasing any load. It didn't seem as though any queries had changed. We’re really digging into recent commits on various areas of the application to see what could have possibly happen and it happened but over the course of several days, very intermittently, maybe once or twice a day. 

As we were digging through, I happened to notice that—all the stuff was AWS so the security group policies that allowed network connections to the Cassandra Cluster—was open to our developer VPN IP range, which meant anyone in the engineering team could connect to the Cassandra Cluster. Which can be useful, especially for our team so that operators could connect in without having to bounce off of a bastion host or jump host or whatever. You connect directly from your laptop to the Cassandra Cluster getting. So on a whim, following some intuition that I wasn't completely aware of it at the time, I cut off that access and it took about a day and a half for an engineer to come into our Slack room and be like, “Hey, I'm trying to connect to this Cassandra Cluster and I can't and I could yesterday.” 

So as we dug into it, this developer had been running these huge queries on the production Cassandra Cluster, completely unaware that what he was doing was in fact, bringing down production. We fixed the glitch by cutting off access, but that didn't address all of the other systemic issues in the organization that got us to the point where we had a senior engineer who didn't really understand what they were doing with a production environment and they didn't understand what running these kinds of queries and what kind of impact it had. They didn't have visibility into the pain and suffering that my team was having because the way that the organization has set up production responsibilities, kept it solely or squarely on my team’s shoulders and didn't really involve the rest of engineering. 

So if we had just stopped, I guess, at that root cause. Root cause is developer is connecting to a Cluster and if everyone labelled that as the root cause then you wouldn't have seen all of these other issues and wouldn't have given us, ammunition isn't the right word, but the ammunition required to go to engineering managers, directors of engineering and say, “This is why other people need to be involved. This is why it's important to invest in training.” Even for people who got hired. This guy, he's a fantastic engineer. In the industry building products for 15 years. He's never worked with Cassandra a day in his life until he started working there so of course, he’s not going to know. 

All of this is to say, having an environment where people are willing to not force and not a scapegoat, not force her to be some line item in a Jira ticket somewhere that says, “This is why this outage happened,” and are willing to do the more nebulous work of tracing around through. You'll never get it all, but it's really the only way to address the real issues that could cause larger outages or larger negative outcomes.

JOSH: Yeah. I feel like that PR that exact story or some very similar version of that story probably happens every day in companies around the world because it's not incompetence at anyone's level, it's that lack. Expertise is usually just a collection of prior experiences that we pattern match on top of and then – because you said off of a whim, you decided to revoke direct access to these Cassandra Clusters from a certain range of IP addresses.

MANDO: Yeah, I had no evidence that trying to go through the Cassandra logs to track every query that was coming in was getting us really anywhere. But I'm sure something similar to this has happened enough times in my career. But I was like, “Well, let's just see.”

JOSH: Yeah. That reminds me this how complex systems fail is so helpful. Two other points are human operators have dual roles as producers and as defenders against failure. This developer would not have induced any failure if he wasn't trying to build a feature, resolve something. 

In the course of doing his job, he was like, “Oh, it's useful for me to be able to connect directly to Cassandra Clusters just to see what's going on.” Our team, we had read-only access to production data that we would very carefully segment out, but we often used production data, production logs, all that stuff to understand especially because the app was built by folks who were no longer on the team. I parachuted in with low context and had to just do a lot of step-by-step tracing of things to see what the heck was happening when someone clicks this button in the application.

Then the next point on this paper is all practitioner actions are gambles. You gambled on that. It was a pretty good gamble, but you were like, “We'll just turn – it's not going to hurt anything. Worst case scenario, if it does break something, we'll let the fact that someone will bring it to us be indicative of something being broken.”

My wife and I just moved into a house. I had an outlet in the kitchen that wasn't working so I got a $3 outlet tester and plugged it in. Also confirmed it wasn't working and then it has the little GFCI outlets where you press the button and it triggers, either outlet on both sides, which all had the GFCI safety stuff, were triggering when you pressed the test button or when you used the outlet tester feature that was supposed to trigger.

So I was like, “Great. I have a bunch of dangerous outlets because they're telegraphing a level of safety that they're then not matching.” I would rather it just not say anything about GFCI than say GFCI not working. Anyway, for a bunch of reasons, I took the outlets out and it's just this rat's nest of cabling underneath. There was hot power going to the outlet, but something in either the outlet was broken or the cable wasn't connected. 

I just took it all out and then capped off all the connections and then turned the power back on to see what doesn't have power. I still have two chords going into this box that I don't know what they powered in the house. I haven't been able to find anything that doesn't have power. I was like, “Cool. I'll just take it off, turn the power back on and then go find,” because the outlet box was all unlabeled because it’s a house from the 50s. I was like, “Cool. This power is in the fridge. This powers the garage outlets and I still have four that I have no idea what they power.” 

I was so sure that something was going to break, but it hasn't broken, which is more concerning than if it broke and didn't. So if you're like, “Oh, we have all this IP traffic coming from our developers. So I'll just turn it off and then tomorrow, one of them will ask me, ‘What's going on?’” If none of them reached out, that would be more concerning because you're like, “Where is this traffic coming from? Why are people connecting?” 

Everything is gambles and then if your CTO or the CEO is looking to, because they have to write a public statement about why there are outages and they want to be like, “This is the person and we've sacrificed them to the gods of public opinion to make sure this will never happen again,” to just say this happened, but we've put some remediation in place. Maybe it's an opportunity for – 

I wish our dev ops team, at the last company that I was at, I would have loved for them to build little obstacle courses or tutorials or anything that just showed some of their day-to-day because I had no insight into it and since we all work remotely, I was never looking over their shoulder. That when that kind of knowledge gap is exposed, it's such a good opportunity to one, be like, “Okay, there's a knowledge gap. How do we fill it from both directions?” 

Maybe the problem that that developer was trying to solve has a much more elegant solution that once you understand the problem he's trying to solve, you can be like, “Here, we'll like give you the sandbox environment you can play with, go crazy.” You'll understand Cassandra better and you won't have production implications if you run this crazy query. 

I see failure as an opportunity as long as it doesn't cost a lot of money. If it does even cost a lot of money, that's an expensive opportunity to throw away. If you induce an outage and then you just fire the person, chances are good they're never going to make that mistake again. So you just wasted that expensive learning thing.

REIN: One of the things that Richard Cook, who wrote How Complex Systems Fail, loves to say is, “Incidents are a forced investment in learning.”

MANDO: [laughs] Yeah, it's so true and it's one of the main takeaways that I had from my last job was, I wouldn't say that we celebrated when someone on the team made a mistake and caused an outage, but our boss would do one or two things. 

If it happened before lunch, then at lunch, he would order us in a meatball subs and we would all sit as a team. This is pre-COVID. We'd all sit as a team at lunch and not talk about what happened, but just eat. We never got those meatball subs unless someone had broken something and it wasn't a celebration of breaking it, but it was more of like, this is something that just happens. It's okay that it happened. It means that it's never going to happen again, or at least that the team is aware that this is what causes things to happen and now we can, like Rein said, it's a forced learning investment. 

If it happened after lunch, then we'd go to the bar around the corner and have a beer before we went home. For the same reasons like yeah, this is part of the job because this stuff happens through the course of just normal work. Like Josh, what you were saying earlier, it’s not someone wakes up one day and comes into work and is like, “Today's the day I'm going to bring down production.” It's like, “No, you're trying to get your job done. You're trying to be productive. You're trying to contribute to the team and to the company and it's a habit.”

REIN: You need to make the assumption that people are trying to do a good job if you want to be able to learn from what happened. 

MANDO: Absolutely. 

JOSH: Yeah. Mando, I like that tradition. You're right. It wasn't a celebration, but it's paying tribute or homage to the event and having a healing ritual almost around the thing. Imagine how healthy that feels to the newest member of the team maybe just getting into the industry, seeing other people having this happen of like, “Oh, so-and-so, who I thought knew everything there is to know about this system, made a mistake and they're okay.” As a team, we're lifting them up and saying you're still welcome here, we're okay. It's really, for lack of a better word, healing. 

I got into software by way of the Turing School, which is in Denver. Strongly endorse it. I am now a couple of years out, I graduated in 2017. So I have a couple of years of experience, but definitely not a lot in the grand scheme of things. I always stay close to people that are also pretty new to the industry and the number of times I've heard very new people, sometimes just a couple of weeks or months into their first job, the stories that they're relaying. 

They're feeling this deep sense of shame over an error that they made, or once they lost a bunch of time to a syntax error in a CSS file because they didn't know – they read it a bunch of different times and surprise, surprise, the human eye is not well-tuned to find the lack of a parenthesis or the lack of a quotation mark or like oh, he used a single quote there and a double quote there and it's 500 lines in between. It's ludicrous to assume that someone would ever even be able to see that. 

So what they then learned, I was like, “Oh, did you use a linter?” Actually, what I really said was, “Before your manager started chewing you out for that, did they tell you that there's tools for this kind of thing that are called –?” I think it's W3C schools or something. There's a bunch of different options, you just paste in your CSS and it's like oh, you have unbalanced quotation marks and then you can go find it. He was like, “Oh, no, they just said that this is why we need to be careful,” and I was like, “No, this isn't why we need to be careful. This is why we need to leverage the expertise of experts to create a safer environment.” Now this person is going to forever be afraid of making a small mistake like that because they were shamed for it when it happened.

MANDO: Yeah. I don't know how to say this in a better way, but one thing that I always took into our post-mortem-y type discussions over instance. What I always tried to instill in the team was that the answer is never be better at your job. The answer is never, like you said, be more careful because we're not going to be. We're all human beings. Maybe you didn't get a good night's sleep last night, or any number of reasonable or unreasonable reasons. Maybe you are going through a particularly rough time personally. 

But if you're not ever given that room or that space for it to be okay to make mistakes and if people aren't coming in it with – assuming you are trying your best, then you're going to miss out on those opportunities for learning and growth like Josh said.

So you can imagine someone in the background is like, “Oh yeah, I use this linter integrated with Visual Studio code to help me solve these problems or if you're on a team where you feel emotionally safe enough to ask someone, “Can you help me with this?” rather than beating your head against it for 12 hours or whatever. 

I had someone that I used to work with who invariably, she would have a problem, that she spent an hour looking at, and then she asked me to take a look at it. I would look at it and be like, Oh, you missed a semi colon here,” or whatever and she would ask, “How'd you do that?” I was like, “I've wasted 10,000 hours in my career on missing a semi-colon so at this point, my eyes are trained to look for that stuff. But it's taken me 15 years to do that. I can't expect you, who's been doing this for a month, to do that. In 15 years, if you come and I find the semi-colon before you do, I'll make a little joke about it. But for now, use this tool.”

JOSH: Yeah, you've got 15 years of experience of—not making the same error, of course but making, if you're like me, many similar errors from a shocking degree of different directions where you're like, “Dang it! I thought I had pattern matched successfully on that, but here's this weird little edge case that when it's been a full moon, I'm wearing a blue shirt and I look, cross-eyed at the computer,” whatever. But yeah, the hard-earned lessons. 

I think this is also, you took this knowledge and squeezed it down to this tiny little piece of time and pass it off and you also said you gave a painless way of the same painful lesson of I've spent a lot of hours on this, or I've taken down production, or I've done this thing and so you can use words to describe the painful situation that lands with an emotional weight on the person that hears it and then they can learn and like stand on your shoulders, for lack of a better word, of not having to make the same mistake, but still getting to learn some of the same knowledge.

I think that's where like real learning and knowledge transfer happens. There's this guy named Cedric Chin—I'll send you all a link to this afterwards. He has a series on tacit knowledge and he argues that tacit knowledge is knowledge that cannot be captured through alone. So much of what experts do when you actually sit down and watch an expert at a thing. 

He's a software developer and he relayed a story where he had this big class that he's trying to work on. He showed it to a coworker and the coworker was like, “That's going to be your problem. You should fix that, “and he was like, “I've spent a week on it. There's hundreds of lines of code. How did you just do that? How did you digest this whole thing?” The co-worker is like, “I don’t know, it just felt right. Want to go get lunch?” and Cedric was like, “No, no, no. I want to know what just happened here.” 

Now you can scaffold in understanding so hopefully, the expert will be able to be like, “Ah, I'm making into a judgements and a pattern matching and I'm doing all of these things,” and then pass that off to the person rather than just say, “Oh, I just know, now you can skill up the knowledge of the less experienced people on the team so quickly,” just by trying to find those opportunities where there's these gaps in knowledge that aren't easily explainable and trying to pour into that.

I think about that whenever an expert does something, I just wish they would record their screen and then I could watch them do it after the fact, or even ask them, “How did you do that? Why are you looking at that directory? Why did you look up that? Where this method is being called in this library? Why did you do that?” Because so much, like years and years of painful experience gets squeezed down into just this set of actions that this person takes. It’s much easier for me, as an inexperienced person, to be like, “Okay, in general, when I'm doing this, I'm going to Control + F.” I'm going to grab the entire repo before renaming this variable to make sure it's not used somewhere else. That's a painful lesson to learn the hard way. Pretty easy to learn an easy way.

MANDO: Yeah.

JOSH: So I think about that, there's always this desire for intermediate or senior developers. I've been working for a couple of years now, out of the junior territory, so now I can be like, “Oh, it's easy to get a job.” But there's so many people that are trying to get their first job that are struggling, struggling, struggling to get the job and I wish I could help. 

Maybe if the listeners to this find themselves in their spot, rather than trying to find better talent, try to find it – build a better system that allows you to take in just normal people and build them into the talent that you want them to be. Spend an extra month, educating them and take that person that's just ready to walk in and join your team rather than being like, “Oh, we wish they had more years of experience with Cassandra.” You could be like, “Oh, we have a really good – like, our Cassandra experts have built this bespoke in-house resource that people that are coming into our company both learn how we use Cassandra, our business logic of why we use it this way and the weird little idiosyncrasies,” because we have an upgraded this tool and forever and it just sucks.” 

If you've got someone that's at the bleeding edge of technologies, they're not as useful dealing with legacy older stuff that gets barnacles and scar tissue from just being used in the real world for a long time. But also, that's where it's being used for so long because it's been delivering so much value for so long. 

So I feel like teams could solve their hiring problems by working on training saying like, “We're going to take the average empathetic, well-intentioned person who has a little bit of domain knowledge and we can make them really good in a reasonable timeframe.” Now, you can hire anybody and your retention goes up and good ideas coming from within the team go up and you can promote from within. It drags behind all of these other pleasant secondary effects.

MANDO: I used to joke at my last company that I should just change the job rec that I had online for adding operators to the team. I wanted to change it to someone who is kind, someone who is empathetic, and someone who can read. If you have those three abilities, the team is set up in a way that it's welcoming enough and it's with [inaudible]. 

One of the teams was willing to stop what they were doing and help show someone how to do something. The team had built up a nice set of documentation and a nice set of run books, a nice set of kind of plays. There were guardrails set up around places to where it wasn't impossible to shoot yourself in the foot, but you had to try real hard and so, it was a really, really good environment for like you were saying, Josh people almost off the street. 

Oh, and the other one was wants to do this work. If you want to be an operator or you want to be a software developer. We had some really good success over the last year or two that I was there, bringing in people who were earlier in their careers and having less experience than I really thought would be useful.

As a hiring manager, you always want someone, like you were saying, can hit the ground running and start producing immediately and the truth is, almost no one's going to be that. Even people with 10 plus years of experience as a senior developer or a senior engineer on their resume for the past three jobs. It's going to take them weeks and sometimes months before they can get to a point where they're positively contributing.

So we had some really good success getting people, who were earlier in their careers, to the point where they maybe can only do these five or ten things because that's all they've been trained on so far, but they can do those things without breaking anything and those things are teaching them more about the system and they're learning more and they're growing more. 

JOSH: And it's taking those things off the plate of the more senior talent, which now gets to work on more strategic long-term things instead of like, “Oh, I have to add another domain to the six different things so we can get traffic from it,” or prevent traffic from it, or whatever.

MANDO: Absolutely. It's interesting, the place that I'm at now is extremely early-stage startup. There are four full-time people and one part-time intern. We don't have any of those guardrails, we don’t have any of those systems in place, but what I would give for someone to have that in place so that we could bring in someone who is earlier in their career and be like, “Here's one of the 150 things on my to-do list, just do that one,” and I won't have to worry about it anymore.

JOSH: I feel like a role I've settled in and whenever we did add people to the teams that I've been on in any company, I would ask them to do the same thing. I would say, “Okay, you are this rare spot that is highly temporary. You don't know what you're doing and I don't know what I am assuming about knowledge so every time you do a task, please write it down.” I've even handed off copy and paste like, “Here's your template for this new thing and heading one is overall goal, then reasons, link tickets or Jira cards or customer support issues,” or whatever and then just start mashing the Enter key and writing notes as you go. Then at the end, if that becomes the raw material, that can then be built into a bit of a runbook.

Then I'll have that person hand it off to someone else, the next person that starts and be like, “Okay, do this again. Find the gaps, use that to refine this knowledge,” and then by the time that's happened twice, you have a document you can hand to someone who's almost probably never interacted with the system before and they're able to confidently execute with understanding and checking. You can be like, “Okay, now go over here, check this other environment and make sure that the thing you did here percolated that way and here's how to verify it.” If so, come back and if not. raise your hand in the Slack room or whatever. 

You can just hand people tutorials and they're doing business relevant work, learning their own skills and it's becoming a more and more refined piece of documentation, but only a person that's new to the team can do that. Because if you sat down and tried to write that document, you're like, “Oh yeah, we connect to this thing and then we run this script and we do this.” There’s so many skilled expert judgments where you're like, “Oh, it takes too long for the terminal prompts to come back. I know that something has gone wrong.” You're never going to write that down because maybe it took 8 seconds to run a query and you expected it to take a quarter of a second. So it's two orders of magnitude off, but.

MANDO: I'm not going to write down. Make sure you're checking the VPN or this particular VPN. You're just going to assume that you know that because I knew it; it didn't occur to me to write it down.

JOSH: Yeah. I love this topic because I think there's business value here at every domain. A new manager, for instance, on a team, they want to make a name for themselves. They can make waves in the organization by encouraging this kind of behavior and then after a year they're like, “Here's our like 11 runbooks that we hand off to new teams,” and they start hearing like, “Oh, why is it every time you hire a new employee you can get more junior than the rest of us and they become more experienced than anyone else expects?”

MANDO: Yeah.

JOSH: It can be not scary. It can be really effective, this thinking applied to teams at the right level and then that's how you build high performing teams and then maybe other teams start poaching your employees, but that's cool. Now they're growing. How do you think it feels to be in that position of like, “Oh, I've been working in this job for a year, I didn't think I was going to be very good and now that team over there wants me because of the skills that I have?” The team that can do that reliably is almost gifting the rest of the company, these employees and these employees’ careers are growing quickly because their skills, they're getting exposure to lots of different topics and solving novel business problems, collaborating and reaching across business units. 

MANDO: This is something that really hit me there. When you were describing this hypothetical journey, one of the things that you said was, “How do you think they would feel?” talking about the employee who is getting poached or moving on to bigger and better things. I really like approaching management from that perspective rather than solely focusing on what's best for the company, what's best for the business group, or whatever your sphere of locality is. 

Because I find in a lot of cases, if you try hard enough, you can find ways to do for the employees, give them the best outcome and have that also be in line with what is best for the business group or the organization. That's something that I ground my gears on a little bit when I first started entering management because I felt more like labor than the man. You know what I mean? But yet here I was in a position where I wasn't labor anymore or at least not the same kind. 

I found that when I approached decisions and interactions with the team from a place of caring, kindness and love, I would always tell – people would ask me what it was like being a manager. I was like, “Well, it's pretty easy for me because I just try to love my team as much as possible and the right decisions tend to fallout from there.” It’s not always the case—I learned that the hard way—but I thought it was a really good place to start from.

JOSH: I'm nodding along over here like I have nothing to add. Approach management from a position of caring for your team and in America, in the 21st century, corporate value is based on love. That doesn't actually like – 

[laughter]

You're not going to have your notice to your shareholders be like, “We've taught our managers to be more loving and that's why our stock has outperformed our cohort of competition companies.” But I think there's probably a paper that is being drafted right now by some sociologist or MBA person that's “The effects of,” and then they're going to use a bunch of fancy words that all boil down to loving behaviors towards your team, the effects on that at the end of the day, stock value and I don't even like stock value as a metric of success because what is money and all of that stuff, but I'm still confident.

I've talked to people. I'm like, “Oh yeah, by being kind to my team, we will absolutely crush any other metric of success, whatever metric you want to evaluate us on—error rate, or recovery rate, or spread time, or the number of times—all of these things, the team that is kind in a very, sometimes painful way. That's why it's easy for teams to say, “We're psychologically safe,” but then they don't actually become psychologically safe because it's a lot easier to say the thing than it is to do the thing. 

But the teams that do the thing will always be vastly better or maybe it's as a manager, your role is to enforce the company’s will down below you. There are some times where you're like, “Well, this probably isn't going to be any of ours long-term spot.” So I'm going to treat you as well as I can and then at some point, it's a small tech world, we might end up on the same team again or some of us are like, you'll make referrals, or I'll help you, or whatever. 

It all just works out and if heaven forbid, you leave the company that's kind of toxic. Great, so be it. That is a further underscoring, the truth of what you're saying, which is by treating everyone well, if that means we get out of companies that don't treat their employees well, that's a good outcome.

REIN: Fun story, there is a paper, actually pretty old now. It was published, I think in ’92 and the title is “The Psychological Conditions of Meaningfulness, Safety and Availability and the Engagement of the Human Spirit at Work.” It is a quantitative study following up on an earlier paper that created this model that the engagement of the human spirit at work, which is basically how much do I like going to work? What is my quality of work life like?

JOSH: How is my soul? It's a very uncomfortably handwavy thing.

REIN: Engaging of human spirit is a pretty handwavy thing to be quantifying in an academic paper, but there are three factors that were originally identified. The first is the meaningfulness of roles and tasks, the second is psychological safety, and the third is access to psychological and emotional resources.

What this paper did was it said, “We're going to do a survey if one of those five-point strongly disagree, strongly agree surveys on a bunch of questions that each will factor into a waiting for these three factors to see how much these three factors affect the engagement of the human spirit at work.” Among the questions that are listed and at the end of the paper are things that have to do with how much do you like your boss? How much do you trust your coworkers? What they found is that according to the paper, you can't actually tell which of those things are the most important. It turns out, the psychological safety is very important and then engagement or the access to resources is how exhausted am I at the end of my workday? Do I wake up in the morning ready to go to work? 

JOSH: Do I feel nauseous Sunday night thinking about the coming work week?

MANDO: It’s scary, yeah.

REIN: So this has been studied extensively already and when you see engagement surveys at work, they are generally some 20 degrees of telephone game of this work. But it's interesting that some of the original work on How to Make Work Not Suck says, “Make the work meaningful, make people feel safe to take in a personal risks, and make sure that they have access to emotional and psychological resources to make sure they're not exhausted, make sure they're not emotionally drained.”

JOSH: I Googled it, found it and then using a tool that sounds like Sci-Hub, I was able to get the paper. I wanted to mention all of this is hand wavy and feels nice. The people that end up on these teams are lucky, teams where the manager has read this paper or is amenable to the ideas of psychological conditions and meaningful safety and availability and engagement of the humans. That sounds like a great thing. 

What I have been advocating for people, if there's a dev team manager, or a team lead, or even a “line level engineer,” I bet that if they could take their current salary and imagine making 30% more or 40% more, and if they figure out ways to roll this kind of thinking into their work, either their current job or their next one will result in substantial improvements. Because teams that are this kind of degree of health do consequentially more effective and more performative; they work better.

So if you want to improve your own work condition, rather than going necessarily learning another programming language or just hitting the job search really hard, I think that there's very real monetary value that comes from implementing this kind of thinking, or at least trying to implement. Because there are companies that want these kinds of people and there are people that want to be at these kinds of companies and if they can find each other, I think it would be good for everybody.

REIN: And also do it because you have an obligation as human being to reduce suffering when you have the ability.

MANDO: I was going to say, yeah.

REIN: But people [inaudible] mostly for money.

MANDO: Yeah, it sounds like the capitalist pigs were in our hearts all along, right?

JOSH: I gave a talk a while back that talked about why should you be empathetic towards the other business units in trying to accomplish your own goals and it's not just you're going to be more successful, but you actually solve greater problems by bringing kindness and generosity to work. 

The first order of value is just the being kinder to your people. But eventually, that enables you to like work on more interesting problems because more interesting problems, to most people that have unhardened souls, is helping people. If we can help more people, then we feel more fulfilled and at the end of the day, that leads to other good things however we define it. Sometimes people aren't interested in more money; they want more autonomy. But I'd say usually, people – there's, what is it? $75,000 a year and additional happiness doesn't correlate with additional income.  

So once you're covering your minimum conditions of shelter and roof and whatever, I think tremendous good comes across any dimension that is meaningful to you of just trying to care for the people that you have responsibility to and they have responsibility towards you.

REIN: I think that's a good note to end on. Let's do reflections, if that's cool with everyone. 

I've got a reflection that I wanted to mention a couple of references that I think folks would find helpful after this discussion. The reflection is that in Richard Cook's How Complex Systems Fail. Like Josh pointed out, one of the points is that operators are always gambling, operators are always taking risks and psychological safety is often defined as the ability to take interpersonal risks. So that's the connection between psychological safety and the ability to respond to failure. That's why it's so important. 

JOSH: Pretty good.

REIN: So I've got some references. You mentioned tacit knowledge earlier, Josh. That originally comes from work from Jean Lave in the 80s, probably the most comprehensive work there is in cognition and practice and then there's also a really good book by Shone called The Reflective Practitioner: How Professionals Think in Action.

MANDO: Yeah, I’ll go next. 

This gave me a lot of things to reflect on. But I think the one that is sticking with me the most is how to properly build systems and teams that are friendly to lesser experienced individuals to interact with as a way to help bring up folks who are earlier in their careers, folks who are coming from code schools, folks who are coming from other industries as we favor for any number of reasons, not the least of which being that there's a need for additional folks in the industry. There's this large, untapped mostly or partially untapped pool of resources. 

Specifically for me, how does one do that at let's say, an early-stage startup where the traditional thoughts are that you bring in a very small team of skilled, tenured, experienced folks and have them lay the groundwork for stuff and then once you get past a certain stage, that's when we can start doing these kinds of things. 

But thinking about ways to introduce people earlier in their careers to the early-stage startup, while also making it a safe and productive environment for them and not setting them up to fail is something that I want to spend more time thinking about. I think there's some interesting stuff there.

JOSH: Yeah. I like both of those a lot. They both were, I was like, what do I want to –? I wanted to reference complex systems. I wanted to reference that. But Mando, my reflection is going to be your team's experience of either meatballs or a little bit of a pour one out at the end of the workday in commemoration of something. I think that I love traditions and rituals around things. It could be the smallest, that's not a real tradition, that's not a real ritual and I'll be like, “Au contraire, I've made it one because I do it when this thing happens.” 

I think something like that speaks so strongly to that psychological safety, which ties in well to the complex systems safety that you mentioned, Rein of psychological safety allows experimentation at the edges of the sharp ends, wherever those are.

So if I ever lead a team, I will be trying to do something similar of when this thing happens, we're just going to go order food from meatballs from down the street, or—I live in Golden, there's a really good Himalayan food place—maybe we'd all get naan from there and eat that in commemoration of a production outage. 

But I think whoever set that up and started that tradition has gifted so many people, such a kindness because it relieves the non-stop pressure of failure because if I make a mistake, I know that there's a system for bringing us all back into unity and health and then when I do make a mistake, that thing happens. It's wearing a seatbelt and having an airbag. If a small car accident could be instantly fatal, I might drive in a way that would induce more car accidents, but because there's protective systems that actually reduces the likelihood of them ever having to be used. So I love that tradition and I'm going to use it.

MANDO: That’s awesome, man. 

Have y'all ever seen the Disney cartoon? I think it was Disney. Meet the Robinsons? I want to say it’s from the early 2000s, but there is part of the movie where this kid ends up in this, I don't even know, like a family compound almost. There's this huge family, extended brothers, aunts, cousins, grandparents, and stuff and the patriarch of the family is this adventure kind of mad genius, almost like a Steve Jobs kind of character. 

The kid does a thing and can't remember if it was like a science experiment or what, but he does this thing to try and show off to the whole family and then fails and the kid’s all embarrassed because the thing blows up or something and everyone's just kind of sitting there staring at this cloud of smoke. You can see the kid's face drop and the music comes in and it's all sad and then everyone stands up and cheers and they throw him this huge party and he's like, “What is going on?” and they're like, “Well, you failed. That's fantastic! You can't learn anything when you succeed, you only learn when you fail. So now you know not to do that again. Now we can move on to the next thing. But if it worked, you would have just known that worked.” 

That's always stuck with me and it's a scene that always comes to mind whenever I talk or whenever people are talking about team interactions around outages or mistakes or failures. It's a cartoon, but it's something that I always strive for in a group and I want to try to think, “Okay, well, no one's dead, then if the stakes aren't that high, which they almost never are, then we can learn from this and that's great.”

REIN: We’re basically done, but I just had a thought so one of the reasons that I've sort of stopped using psychological safety as an explanatory device is that it's not one in that it's a latent indicator; it's a measurement of the output of a system of social relationships and individual behaviors. That means that it can tell you whether you have it, but if you don't have it, it can't tell you how to get it. What I'm interested in is how to get it.

MANDO: Right. 

REIN: This is like measuring fatalities on the highway won't tell you that oh, what you needed were seatbelts. They'd just tell you whether seatbelts work once you figure out to try seatbelts. So what I'm interested in are what are the conditions and the ways of relating and behaving that lead to psychological safety and I think it has a lot to do with things like how we process blame, which is why I give a talk on that.

MANDO: Yeah. Something that I feel correlates strongly with that is being able to ask questions about things that you're working on or whatever and something as simple as making it clear to the group that our TFM style answers aren't acceptable or answers where people say, “Oh my God, I can't believe you don't know that.” 

Stuff like that and making sure that it is communicated clearly that that's not what we do as a team, because when you're willing to ask those kinds of – when you feel comfortable enough to ask questions that maybe you personally feel like you should already know and you're a little embarrassed to be asking this question. When you're at that point, but it's again, Rein like what you were saying, right? When you're at that point, you know that there is a level of emotional safety there, but that's not how you get it. You know what I mean?

REIN: It's good to know whether you're getting there, right?

JOSH: Yeah.

REIN: So it's a useful thing to measure, but it won't tell you what to do next to try to get there.

MANDO: Right. Sure would be nice if there was a checklist: [laughs] do these twelve things and ensure the emotional safety of your team.

JOSH: I feel like there is. Like How Complex Systems Fail and that kind of thing, it's not quite a checklist, but it's principles of you don't get to say that you're doing a dent on psychological safety, because that's a lagging indicator not a leading one, until you can point to ways that your team has processed shame and guilt in a way that indicates an understanding of how complex systems fail, blame being distributed, and stuff. There could be gatekeeping functions where you don't get to move to step two until you've solved step one. 

But those are hard things. I think a lot of people can't solve. It is larger than any single person in a lot of organizations to create like psychological safety. You might be like, “Oh, even the –” or create conditions where it can flourish. Even CEOs are extremely constrained; they don't have options. 

REIN: Just be more okay with taking risks now, please.

[laughter]

JOSH: Right, and we fired the last three people that acted like they were okay with taking risks so, good luck. I think a lot of – I don't know. I feel like the places that can do this will naturally bubble to the top of their respective industries and then it's not going to be like everyone is going to do this, it's going to be a natural selection process. The people that do it will win and others will cycle back through until they end up in a place where that is the case. Because the place that encourages psychological safety could handle hiring someone that doesn't have experience with that and they could teach them the things and then that person can move on and go spread that knowledge on a future team. 

Now we've all benefited from the tradition of eating meatballs when there's a failure. That can't be undone. I can't undo knowledge. That person has pushed their vision out into the world in a way that can never be pushed back against.

MANDO: By the way, for those who are from Austin are familiar with it. We would get those meatball subs from Home Slice, which is traditionally known for their New York style pizza. But the meatball subs. Man, there's nothing better after you’ve destroyed a Kubernetes cluster than a meatball sub from there.

JOSH: That makes me hungry for lunch. 

This has been my first time on a podcast and I've been listening to y'all for a long time so I feel like I'm in this rarefied air that I don't belong of getting to be with smart, competent people. This has been just a delight of a time for me and I'm very thankful just that both of y'all have taken time out of your day. I just want to make sure you both know, I'm very thankful for your time and eager, just rolling with conversation. 

This was a blast. I don't know how it could go poorly because it seems so effortless and easy, which indicates systems in place that consistently guarantee certain results of when we have guests, we help them feel successful because we’re the experts and they don’t have to be. So bring that whole arrow pointing back to the beginning.

REIN: I see what you did there. That was nice.

Yeah, it’s been a real [inaudible] to have you on.

MANDO: Yeah, Josh. Thanks so much, man. It’s been really, really good.