[Are you truly involved in the developer communities you work in and sell to? Are you seeing the value in the events that you are a part of? DevRelate.io can help. Developer and Community Relations as a service. We speak developer. Learn more at DevRelate.io or email us at Info@DevRelate.io.] JESSICA: Good morning and welcome to Greater Than Code #121. I'm Jessica Kerr and I am here today with my co-host, Sam Livingston-Gray. SAM: Yay! And I'm here to introduce the marvelous, John Sawers. JOHN: Thank you, Sam and I'm here to introduce our guest, Thai Wood. Thai helps teams build more resilient systems and improve their ability to respond to incidents. He's a former EMT and applies his experience in managing emergency situations, along with technical skills to solve problems. When not doing computer stuff, he's probably riding around Vegas on his motorcycle. He also writes a weekly newsletter at ResilienceRoundup.com. Welcome to the show, Thai. THAI: Thanks, folks. Thanks for having me here. JESSICA: Thanks for coming, Thai. For those of you all who don't know, I met Thai last year at the best conference, RE-deploy.io, where we were talking about resilience engineering in software. SAM: What's resilience engineering? THAI: What I got from what I also think is the best conference is that resilience engineering is this intersection of a lot of disciplines that have been going on for a while in cognitive systems engineering, human factors and taking the things from those domains and applying them to others. JESSICA: Including software? THAI: Especially at REdeploy, yes. JESSICA: But also the obvious ones like nuclear engineering and air traffic control, where we get to really study safety. Resilience engineering is a pretty new field. In software, I think our major interface to this new discipline is John Allspaw and he's been on the show before but I was talking to him in New York the other day -- at devopsdays New York -- and he was telling me about how David Woods and Richard Cook, who are two professors known for their work in resilience engineering, the kind of spun off of human factors research and brought in the cognitive systems work, right? THAI: Yeah, that sounds right. To me, from what I know with talking to Richard, I think their laboratory at Ohio State is the Cognitive Systems Engineering Laboratory, as I recall. JESSICA: I think this differs from old-style human factors. Old-style human factors gets mired in Taylorism and let's control the people, whereas cognitive systems is like let's make the systems work with the people the way people work. THAI: Right, as opposed to let's hit the people with wet noodles until they can bend to the system. Let's make the system, so that it supports the people and the work as it actually occurs, as opposed to the way we hope it occurs or the way we imagine it occurs. JESSICA: Yeah and software totally plays into that because software is so much of the systems that we live and work in. How did you get into this? THAI: I started getting into this when I moved from being an EMT early in my career. I've taken a break from tech and spent some time in emergency medicine and when I returned to tech, I was seeing a lot of overlap in the things that I've learned in managing medical emergencies and seen in ERs and things like that -- they could be applied to software. Digging into it more, I was able to discover some of this stuff and then it really came together for me -- the conference and talking to John and Richard and reading David's work. JOHN: Yeah. Our QA manager used to be a firefighter and an EMT and has brought a lot of that to their work as well, which is really interesting. We actually had a whole team sort of training on closed-loop communication, so that when we're managing incidents or handing things off, we make sure we get that going. JESSICA: Can you define that? JOHN: What I gathered from it and perhaps, Thai you can dive into it a little bit more, is when you say, "We just rebooted the database," someone acknowledges, "The database was rebooted. I have marked down in our logs that the database was rebooted," rather than just sort of shouting into the void and hoping everyone noticed that that's what happened. THAI: Yeah, I really like that part as well. It was a big change for me, coming back into it where that does seem to be the norm in a lot of different mediums. You can post something in Slack and maybe everyone will hear you but no one will acknowledge you and then, see your kind of left sitting there and you're like, "Uh... Anyone?" JESSICA: So you're saying, more in our software teams is to just say something and then not wait for acknowledgement. THAI: That's the result, I think because the norm is not to focus on closing that loop and not to enhance the communication. JESSICA: Because communication is message passing, instead of a process. SAM: Because we don't think communication as a skill that you can actually study and improve and work on. JOHN: There was a tweet recently about someone and she's like, "I'm married to a pilot and every time he hands me the baby, he says, 'You have control,' and if I don't say, 'I have control,' he doesn't let go." THAI: That's amazing. I love that. SAM: Yeah, my partner and I used to do something similar. We would say, "You have the con," because we watch a lot of Star Trek but we were quite formal about it. JESSICA: My youngest daughter, her nickname was Dieter, so we would say, "I am on Dieter-duty," and then chase her around. We don't do that in our teams. I made a pull request yesterday and I have a bunch of work. Look at all this these arrow thingy's on my piece of paper. I have a bunch of work waiting on this pull request to get approved and I come in this morning and of course, it's just sitting there, so I asked someone else to review it and then they do and they're like, "It looks good," and then the first person is like, "Oh, no, I'll review it later." I'm like, "I'm never going to get nowhere on this." Okay, thank you for letting me rant about that. We don't have these traditions of careful communication. SAM: It sounds like we're talking about how formal we choose to be. THAI: For me, that's a part of it. I think the formality can come in a few different places. We can be formal in the patterns we use for communication as it can be very formal in saying, "I will always acknowledge when you say something," but the content of the message, I think could be formal or informal. JESSICA: That's true because you could say, "Yo, I got that." With my kids, I've learned to just repeat back what they said, so that they know I heard them because otherwise, they will say it again and again and again. JOHN: Good training, actually. JESSICA: So what else? What else did you bring back from EMT days to software teams. THAI: With the emergency responder world, there's actually a lot of talk during the training, during these things about burnout, what it looks like, what resources are available and even in some places where there's kind of a culture of not really acknowledging that. It's something that was completely lacking when I return to software. There are high consequence incidents and companies and these things and people are carrying these pagers and subjected to a lot of the same stressors and end up having a lot of the same problems or results but in software, it's not really talked about that much. One of the agencies I worked for when I was in medicine, we had a staff psychologist. I think she was great and she was a part of the company and she would show up on your initial training day and [inaudible] she was and different things like that but I've never heard of tech companies having really one of those. You know, ping pong tables, sure. Eight kinds of coffee, yeah. Mental health support, not so much. JESSICA: Yeah, that's true. I've been at places that have nurses station, a little tiny doctor office but a psychologist, that would be useful and what you're talking about is that in emergency services, you acknowledge that the workers are human and support them. THAI: Absolutely, yes. SAM: But you see, ping pong tables and coffee are cheap, fixed expenses. If you want to bring in a psychologist, then you have to pay for somebody with a PhD and acknowledge that your people are human. It's terrible. THAI: I think that it's probably the latter that makes it more difficult to sit down and [inaudible] the people are human if the spending of large tech company is... Yeah. I don't know if it's the dollar amount expense, so much is that you'd have to acknowledge that some of these kind of traditions or norms that exist in software are maybe not healthy and just overcoming that sense of this is how we do things and that kind of inertia is hard to find, I think. JESSICA: We do have expectations that people will behave like computers and do what they're told consistently. THAI: Yeah, right. That's more automation, right? Just automate that away or just write better playbook. JESSICA: Oh, yeah. It's in the ReadMe. You're a human, you can run a ReadMe. SAM: All right, so for better or worse, now I'm thinking about professionalization and how tech is still very permeable, it's very accessible to people who don't have formal education in computer science and you can sort of stumble into a career in tech and have that totally be a thing, whereas I'm guessing that to be an EMT, you have to go through a fair amount of formal training. THAI: I don't know that it's the formal training that helps them. I supposed it does give a certain amount of common ground but interestingly, when you're officially certified at the federal level and you're an EMT of whatever standard you've certified to -- there are several now or I think they merged them -- but still oftentimes, when you join an agency, you still spend a couple of weeks, at least doing more training with a field training officer. It's like school training, even though it has practicums and clinical hours, are mostly so that you can kind of do this base level work and then you learn a lot of other things in the field. JESSICA: Yeah, like no degree is going to prepare you for what software development really requires. They don't even teach version control. THAI: I think that is true. I think in a lot of cases, I don't imagine that there are a lot of careers that you could walk out of any sort of certification or with a degree and be prepared to do the thing in the real world. I don't imagine that lawyers graduate law school and they're instantly prepared to do well for their clients. JESSICA: Yes, that is something that our culture doesn't acknowledge very well because we think we need to retrain people. You come out of the coal mines and you need to be retrained into a different field but training never prepares you to do the work. It never gives you the context and the understanding and that kind of in-place, contextual know-how for that. You have to give people a chance. You have to bring them in and you have to work alongside someone to understand the work and so this is where you get like internships and stuff like that and if you can afford to do unpaid internships, you can get in a whole fields that you wouldn't be able to break into otherwise. I'd love what you said about the formal training -- it gives you common ground. It gives you a shared vocabulary, it gives you stuff that we all know, that we all know, that we all know, that we all know this stuff because we all took this training but that's it. SAM: Which is not to dismiss that that stuff is necessary. It's sort of below baseline but it's part of the baseline. JESSICA: Right. It gives you a shared vocabulary so that you can work alongside someone and more quickly transfer the knowledge and know-how that's really needed to do that particular job. So how do we do that in our software teams? THAI: I think you touched on part of it is just acknowledging that that's going to be the case, instead of kind of having this idea that you did a number of years of computer science. That means you can function in this team or this organization and that may not be the case and I think that touches back on what we're talking about before about the work as we want to be or the work as we imagine it to be and the work that it is. JOHN: Those terms are art in safety and in process and in organizations. I've seen them mentioned a couple of times and have a vague understanding but it would be great if you could dive into that a little bit. THAI: I don't know of any formal definition. I do see it a lot in different places like high-reliability organizational research, especially when it comes to the gap between how managers or different people that hire organizational levels, the blunt end versus the kind of sharp end where people that have boots on the ground are doing the work. When you are far removed from almost any work, you probably have some idea in your head of how it's being done and I imagine that's true. Anything you encounter like I have some idea of how my house got built or how it got wired but it's not complete and if I were wanting to dictate to those groups of people who I don't have any of the profession, narrowing that gap is something that's highly recommended in those fields but I don't know the formal definition of those terms. JESSICA: When we make software for people, we are totally making all these assumptions about how they do their work. JOHN: From speaking of like that whole bringing in an early-career developer and getting them trained up to work on your team, for me I feel like that process is a very interactive, very close relationship because you're learning what their baseline is and then also trying to help them adapt to what the baseline of the company is. I feel like that has to be a very tight feedback loop because if you don't do that, then they're just going to flounder around without all that structure. JESSICA: Not just junior developers. I mean, I have 20 years of experience and when I come into a company, I'm happy if I'm productive after six months. SAM: Yup. JESSICA: People are like, "If you know JavaScript and React, then you're going to hit the ground running." Fuck that! I'm going to hit the ground running right into a wall. JOHN: Good point. That would be the same for me. THAI: I think there is a lot of that that also goes unacknowledged that how much of the organization and how much of the work does not necessarily involved the code level. It's not understanding this function. It's understanding what's function here, why did it not to change, why did it get changed, what were the time pressures, what are the production pressures, what were the concerns for safety that may be preserved or created the state is you see it today is when you first join. I think learning all those things takes time and ignoring that that's part of it is what leads to that six months or whatever the time may be. I don't think we can ever get rid of it but I think not even acknowledging it, there is an element of organizational comfort and adaptability needs to occur is something that would help. JOHN: Yeah. I've always felt like people discount the value of the institutional knowledge that you lose when a long serving team member leaves the team and we think we'll just hire someone else at the same skill level and slot them in and I'm like, "No, no, no, no." There's so much work to do to get this person back to the level that this other person was operating at it and also, the team dynamic changes. There's a lot of downstream effect that happens when you do that. THAI: Nancy Leveson does a lot with NASA and her research at MIT and she has some work about this and has gone over some of the accident reports for some of the spacecraft losses for satellites and different things and some of these reports cite that the problem of people who are well-versed in the organization and the systems leaving and having folks remaining who are strong at what they do but they don't have the history of the organization, so there are certain things that they see and perhaps, don't recognize. JESSICA: And at the same time, it's fun when you get a new person in and they do see things that everyone else recognizes so well that they don't even think about them. JOHN: Yeah. I've actually started including that in the offer of onboarding of people. In the first months of their job, I'm saying, "You're still an outsider to our process. What looks weird here? How can you feed that back to us because we stopped noticing it a year ago?" SAM: Right because one of the most valuable things that somebody in that role can contribute is that question that makes me, as the experienced inside and go, "I'm listening to the words that are about to come out of my mouth and I realize how ridiculous they sound." JESSICA: Yeah because when you work with someone long enough, you develop so much shared language and understanding that you don't have to say those ridiculous words out loud. SAM: Right. JESSICA: You're like, "You know that class? Oh, yeah. I'll go change that class again." Thai, earlier you said something about a high-reliability something-something organization. THAI: Yeah. High-reliability organizational research. JESSICA: Oh, that. Yes. Can you distinguish between reliability and resilience? THAI: I don't recall how they formally defined it in the research but to me, the difference between robustness and reliability and resilience is really that resilience encompasses the ability to continually adapt to something that there is some capacity remaining, to be able to adapt to change, whereas reliability might be consistently performing within the same state, given the same inputs continually being able to do this. A simple example might be if you throw up a ball to me at this speed in this way, I'm going to be always be able to catch it like that but I may not have the capacity to adapt to if you're faster or slower or left or right and then, robustness, I typically think of it as being able to survive certain inputs but not necessarily being able to adapt to them and respond differently. I think that a lot in physical materials. A building might be robust and be able to survive an earthquake but it certainly doesn't adapt. A bank vault can survive, if someone trying to break in but need repair. I still feel it's part of the function but there's some other thing that it maybe didn't do. JESSICA: And if you wanted to use this in an apartment, it would be really hard to bring in plumbing. THAI: Right, yeah. There's no easy way to adapt that. AVDI: Does that mean if a building is constructed in such a way that it has really large chambers that are then broken up with easily replaced like... I don't know what the construction terms are but walls that that you can easily tear down and build new ones that aren't structural. Does that mean that the building does have higher resilience so you can repurpose it from a business to apartments? JESSICA: The building plus the humans in that system has high resilience. SAM: Please, welcome to the show, Avdi Grimm. AVDI: Oh, yeah, hi. I'm arriving late. I really appreciate that the show has the resilience that you add me to the panel after the fact that. Sam, you were demonstrating that as a human in the system, you have the resilience to welcome me and thank you. JESSICA: And this show has the robustness of many hosts, so chances are, at least a couple of us show up every day, although it's not perfect. And we try to fix things by just adding more host but it turns out that it actually requires coordination. SAM: Yeah. That sort of increases the bystander effect. AVDI: I'm curious in the EMT context what does resilience look like. What is being prepared for unexpected events look like? THAI: For me, I think that experience was a lot of acknowledgment of we don't necessarily know what the next thing is going to be. That's the kind of literal definition of the job when you work for a service where people essentially call you when they don't know what to do. When someone calls 911, they're essentially communicating that they have reached the end of their capacity to adapt and they need intervention but you can't know in advance what sort of intervention it's going to be, so that looks like, I think a combination of specialist training and how do you put an IVs, how do you give medication but also, generalist training in how do you know when to just get help, how do where to get help, what resources are available, things like that. AVDI: Is learning how to evaluate a situation part of that? THAI: I would definitely say so. I think that's true in a lot of expert fields. If you've ever had the experience of fighting something or having logs roll by and someone to look over your shoulder and it's like, "Oh, it's that, that lens of their expertise showing them that." I think that over time, in any field we do develop that, that ability to kind of size up the situation. In the textbooks, that is actually one of the steps that they talk about when you get certified. It's actually a step they call scene size-up, which is used to try to get a general impression of what is it that I’m walking into. Is it safe, so I should be going in? Is it dangerous, so I need to not walk into it, get other people, things like that. AVDI: What a scene size-up look like in software? THAI: It depends a lot on the current mode of operation. If I'm in a sprint planning meeting, scene size-up is very different than in an emergency. If it's an emergency, it's trying to figure out what is ultimately going on, how bad is it, what my other consequence be of a given failure mode? JESSICA: So in software, by emergency, we're thinking like production into them. THAI: Yeah, absolutely. It could be something like that or again, if it's a different mode of operation, I might be sizing up how accessible our software is today versus where we want it to be. JESSICA: When you asked that question, Avdi, my initial response went to there's tools for code archaeology and there's tools to see which classes change the most and code quality metrics and blah-blah-blah-blah-blah-blah and then I kind of mentally hit myself because most of the relevant part of the scene is not in the code. The code is a great reference and it's awesome to have the power to be able to go there and find out what the system literally does in that situation but the hard part is seeing who cares. SAM: You mean like users and stuff? JESSICA: Most especially within the organization, like to find the ownership of the different systems and also users but who's going to be affected by change in each part and who's going to get mad at you. SAM: Sorry. Apparently, I am having a snarky, pithy kind of day. What I meant by that was it seems like as people who work in software, what we're doing right now is we're all focusing on the aspects of our job that are immediate and relevant to us, like how we organize code, how we organize ourselves in response to incidents and it feels like we're not really talking about the people that our systems are for. JESSICA: Because you can look at resilient as not doing the same thing. Like Thai said, reliability is about doing the same thing over and over and resilience is more about continuing to be useful under varying circumstances. SAM: Yeah and you can still have that second conversation about the organization itself and the people that it serves. JESSICA: Thai, since RE-deploy.io last August and not long after that, you started your Resilience Roundup newsletter. What did you learn? THAI: Primarily how much more I have to learn, how much history there is in these fields, how far back some of these concepts go and how much we still can learn software and develop some of these ideas. JESSICA: Yeah because this is the new field and it's even newer like bringing it to software. This was the first, the inaugural RE-deploy.io last August in 2018. THAI: And I am hoping there are many more. JESSICA: Yeah because it was really neat how half the talks were about people and have to talk were about code and most of them integrated this one way or another because elsewhere it says, resilient is in a human. THAI: Right. A server sitting in a rack on its own does not have the capacity to adapt. JOHN: In our continuing effort to be more resilient, we noticed that we failed to ask the important question at the beginning of the episode, so now we will do this. What is your superpower and how did you acquire it? THAI: I can say my superpower is just the ability to remain calm in emergencies, whether they are medical or technical or otherwise. I think I just acquired it through experience of seeing a lot of things go wrong. JESSICA: I imagine in software emergencies, now you get to say, "Well, nobody's dying here." THAI: I think that's definitely a helpful view. SAM: Actually, I'm curious about something you said in the intro that you started in tech and then you took a break from it and went to do EMT for a while. Did you have in mind coming back to tech at some point or were you just like, "Fuck this. I'm out." THAI: I think at the moment I did it, it was pretty early in my career, so I hadn't been in tech especially long but I had some opportunity to work with some search and rescue volunteers and that required some EMT training to volunteering and I got to watch some of those folks work and it was just really intriguing to me. It was more drawn to that as opposed to saying, "Well, I'm going to leave tech for good," because there's something I was doing on the side or on my own and following along with, still writing code or doing different things and so, I never really, I guess decided until when I left medicine. That was when I was more like, "I'm leaving medicine. I'm going to tech," as opposed to the other way. JESSICA: When you went back into tech, do you ever miss the connection of actually seeing the people that you're helping? THAI: A lot and that was actually one of the things that drew me to it is that there is an element of, I think anytime you're working in the physical versus digital world, there is that ability to look and say like, "This is what I did," and see the effect as opposed to, in the digital world, [inaudible] because we are interacting with the world through these representations, so we're always very far removed from our systems. It can be very difficult to see the benefits of our work often. Because of our technology, our users are far removed from us or far removed from our systems. It can be hard at the end of the day to say like, "This is what I did." From the outside, if you put a video camera, it's like, "You press 'G' on a keyboard. Were you playing a game? Typing?" You can't tell, right? It's indistinguishable, whereas a lot of work in the physical world, you can easily see the result -- did you produce a thing or did you not? Is it well-crafted or is it not? Is someone healthier or are they not? JESSICA: Some people like frontend development because you can actually see something that you make or you can do LEDs and make cool objects. JOHN: Yeah. I've always felt like there's this weird scale of realness for different types of coding and technical work in general. I always felt like just writing little command line programs wasn't as real as a GUI program just because you can see more of it and the same with hardware, if you're messing with Raspberry Pi or Arduino or something, it feels so much more real because it's right there in the physical world and that's just been a like a personal scale that I've sort of felt. Not a value judgment but a sense of it. JESSICA: Thai, what did you do to close that feedback loop? THAI: I'm not really sure. I think it depends on the person and the organization. I think there's different ways to do it. As an individual, I think knowing and reminding yourself what the impact of your work is can help that, if that's a personal need, to see the results of your work. If you feel very far removed from the good effects, then focusing on what your work is allowing, enabling or who you're serving through it. JESSICA: To kind of ask yourself, what couldn't happen if I wasn't doing this? THAI: Right or the reverse even -- what bad things might occur had we not done this? Maybe if you're at software, that helps people avoid bad outcomes. JOHN: Yeah. I feel like that's also an interesting way to think about, especially work in operations or devops where largely, your work is about preventing problems and it's hard to see problems that have been prevented and so, how do you make that value visible to people who aren't aware that you did all this work and therefore, we had six months without any incidents. JESSICA: Or all-nature incidents. JOHN: Exactly. JESSICA: Yeah. I think that's a problem in almost all operations and emergency response. Even in the physical world, there is a whole area of different things [inaudible] that a lot of folks don't think about and that's the great thing is that they don't need to. Most of a given community world, whatever locality doesn't need to worry about this whole system of what happens if you get hurt. If you can rely on it fairly well and it succeeds and you don't really have to worry about the intricacies of it but I think that those are more obvious because many of us have that experience -- we see the ambulances or we've been injured ourselves or we've had a certain situations. But with the digital world of like devops transformations or instant avoidance is hard to communicate that to people who don't experience incidents and maybe that's a value of having incidents. That reminds us that there are these outcomes that we have avoided in the past. JESSICA: I downloaded The Field Guide to Understanding 'Human Error' today on my iPad and I highly recommend. Just get the sample and read the list of figures. You wouldn't guess that but go back to the way beginning and read the list of figures because it's just a list of captions and they're all super insightful. For instance, Figure 15.2, Murphy's Law is wrong. What can go wrong usually goes right and over time, organizations come to think that a safety threat does not exist or is not so bad. THAI: Yup. Exactly that's the notion of that safety boundary where incidents remind us, sometimes that that is a possible outcome and the research supports that that a lot of these high consequence areas are working so very, very well, so over time, if we become overconfident and that is sort of intrinsic, that it will continue to work very well. That is when we can be actually more risk for accidents but then those accidents help us recalibrate and maybe move back the safety boundary. JESSICA: Because that's why the resilience engineering community puts a lot of focus on incident response and studying incident response because it's one of those times when all of those layers of safety that are preventing incident are finally exposed. THAI: I definitely agree. I had the privilege of hearing Beth Long's talk yesterday at New Relic and she's been doing some great work with Richard and John and even with the SNAFUcatchers and that was something that she took a shot as well is it's not that these folks are pessimists. They're not instant obsessed in the sense that they want this to happen but it is a way to look at these complex systems and dig into them because the systems are so complex that we can't understand all of them and we can't keep them all on their head and we can't predict all these things but these incidents are a moment in time that we can now have kind of a foothold and to be able to ask questions and do that investigation that we might not normally have. JESSICA: And the interesting part of the investigation, it isn't what went wrong. It's what stopped it from getting worse, what stopped this from happening all the time. You can actually see the layers of humans and tools that are keeping the system up. JOHN: Yeah. There was an interesting Twitter thread that I just stumbled across last week about the Challenger disaster and there was a terminology that they used in there called 'normalization of deviance,' where things are expected to run within say, 40 degrees and 80 degrees but you launch at 38 degrees and it's fine and so, you keep going and maybe 35 is okay and then, suddenly 35 is the new normal. It's the new lower bound, so maybe 34 is okay and that's fine and fine and fine, then you go for a decade without any problems until you finally hit that actual boundary point which was 28 degrees for the Challenger disaster and that's when suddenly, you realize that the actual boundary was up at 40 degrees and none of the testing had ever -- JESSICA: And you just have been lucky. JOHN: Yeah, exactly. He also brought up the Columbia disaster, where there had been 180 launches without foam shedding being a problem and so, that just becomes, "Oh, yeah. It's just foam shedding," but that 118th time, it cracked the wing and people died. I'm trying to trying to use that terminology to think about how that happens in my operations. THAI: That's a really good point. I think that happens in a lot of places. I think a common one, outside of operations in our daily lives is in most cities, at least I know this is true back home in Vegas is driving on the freeway. If you do the speed limit, you are the slowest thing on the road. The speed limit, on the sign is 65 but the normal speed you drive is really like 70, depending on where you are and then if you're slow and everyone is going faster like 20 over and you're going for normal, it was 10 and so, it really can change what we expect. SAM: Some of that is cultural though because here in Portland, sometimes I have to yell at people for not getting up to the speed limit and let's not talk about following distance people. JESSICA: And then there's how many cars are going to run the red light. JOHN: Don't let me get through it again. THAI: I think that's a good point that it is cultural because that's what happens at organizations. That normalization is a cultural moment. It's not that some engineer sat down and did the math and said, "This is a fact of our material science or something that we could push this envelop." When it becomes that normalization of the deviance, it is purely cultural. JESSICA: Sometimes that's okay because what matters is that everyone's doing it the same like the cars at the green light know to wait until a couple cars have gone through before they start in certain cities but then when you come in from the outside like that developer who is like, "I just change this comment," and people are like, "Oh, no. You don't know how important that comment was." THAI: I think that touched on great point and said, it works for a time. It works in certain situations that if everyone has that shared understanding, it can work but there is still a boundary there, so a certain number of people can run the red light sometimes and that's why it can continue to occur is because it works most of the time. JESSICA: Yeah and when it doesn't work, the ambulance comes. SAM: I think there's a couple of layers to this. First off, there's realizing that there is a cultural difference at all between one place and another. Second off, there's signposting that, like a couple years ago, a Bloomberg recruiter tried to approach me and pointed me at their website and they have this quiz on the website that's trying to figure out if you're going to be a good person at Bloomberg and they have this quiz that as you're looking at the questions, there is a little progress bar ticking across, showing that time is elapsing and it's a time for you to answer this question and it's going to run out. That was an extremely effective way of communicating what the culture at this place was going to be like and I looked at that quiz and I was like, "No, this is not for me," and then there's another layer beyond that, which is negotiating whether that's something you can change. I feel like most of the organizations never even get to that second point, so I guess well done Bloomberg. JESSICA: Yeah, at least they let you know. I also looked at the list of tables in The Field Guide to Understanding 'Human Error' and there's only a few but Table 14.1 is about stress coming from a mismatch between problem demands and coping resources and one of these is organizational constraints and pressures. That can be a source of stress but it says that the coping reaction to that is organizational awareness of such pressures and constraints. When the organization acknowledges, "Look, yo. People are going to run the red light. Watch out for it," then you can still proceed. SAM: Does that normalize that practice, though? JESSICA: It explicitly, I think makes a place for change. It says, "We would like to change this maybe." When you acknowledge it, then you're also acknowledging that this isn't perfect. THAI: To make a good point about the explicitness because if it's not explicit, then the tradeoffs that you make in service at some of these things are made at the sharp end over and over, as Sidney Dekker reminds us often, that if the organization doesn't make this explicit, then each person has to decide for themselves. In each case over and over, what are they serving but if it is explicit, then some of the decision making at least can be easier and if the organization says that this is a priority, then between A and B, I know what to do. But if they don't and I think that that's A, you think that it's B, maybe each time I make a decision towards one way or the other and then each time I'm having to face that and think about it in potentially, a situation that don't really have the room to be thinking about that. JOHN: Yeah, there's that extra cognitive load to sort through that and also, I think some of these things can become so internalized that it's actually very hard to discuss them. When you have it written down a piece of paper, "Oh, yeah. We always ignore this alert for the first 10 minutes because it's very flappy." If that's written down, someone could come along and say, "Wait, wait, wait. What the heck are we doing here?" whereas if it's just in the back of everyone's mind, you maybe never discuss it. JESSICA: And that's what your users are doing with your software. "Yeah, we always tap to that box. We always put Jane Doe in this field." So how do you find out what's really going on at the sharp end? THAI: I think it's important to ask questions but the questions have to be sort of the right questions and the right environment. One thing I learned in emergency medicine is that the right questions can be really helpful. You'll have people who call 911 and you'll show up. You'll basically ask, "So what's wrong?" and they will literally tell you, "I'm sick." They will literally choose the words, "I am sick," and of course as a practitioner, you have to avoid, "Of course, you are." There are questions that are a kind of standard that we can kind of keep in our toolbox. One of them is, "What made you call 911 tonight?" What's different? But there's also a lot of research into a technique called the cognitive interviewing techniques that are often used as well to help with this. SAM: Tell us more about that. THAI: These are a series of techniques that were developed, I believe in the mid-80s originally by two psychologists who started with this idea, I think to use the basis of how memory works and we're trying to figure out, so then how can we get better information from people who witnessed things, whether that was police or otherwise, how can we get information? We know they saw it. We know they have it somewhere in their head and they came up with what they call mnemonics and it's kind of funny because oftentimes, when you look at them, they seem really obvious and one of the key ones is to ask an open ended question and don't interrupt. JOHN: I actually just saw some data on this about doctors when they were talking to patients and often, they'll sit down and say, "We know why are you in the office today," and then they'll say something like, "I'm sick," or whatever and they found the doctors would interrupt after like 18 seconds and like, "No, no, no. Just tell me about your leg," or whatever and they found that if they would just let the patient talk for another 30 or 40 seconds, all the information that they needed would come out and just having that patients to let that come out and probably give you more information than you were originally asking for, was really important. Also, it makes the patients feel a lot more satisfied with their care. THAI: I think you touched on the other part of that which is getting the information that you really want, that you may not even know about to ask, giving someone that room to tell you about it. One of the other techniques in cognitive interviewing is to ask the person and just to encourage them to tell you everything about the event, whether or not they think it's important because often, we can set up these sort of interview contexts where if you're asking a very pointed question, I'm trying to get you the answer to the thing you want or the person doing the interviewing is really trying to dig a very precise thing but there's all this potentially relevant information that you may not know to ask about or I may not know to report but if you ask people to report everything and give them that space, oftentimes you'll learn a lot of those other things. It's like, "Oh, I'm so glad that came up." What I find interesting about it is that a lot of these techniques are actually supported by research. There's a lot of debate about different population groups and whether it's effective because some of the stuff is suitable for court. Is it okay for children to be interviewed that way? Is it okay for different populations? Is it effective for people who think in different ways, who might be [inaudible] or typical? There's all these bodies of research. You know, is it adversarial? Should it only be used in certain contexts? And most of the research seems to indicate that it is useful and since it is not adversarial, it can fit in a lot of cases. It's interesting that a lot of research comes from Britain. I don't know what level but it's essentially the endorsed method of interviewing by the British police, so a lot of the studies come from those sorts of contexts because they're always doing it. They have this context of they're always using this interview and they have all these people to put into training, so they get to re-examine this process quite a bit. JOHN: Note to self. If I'm going to get arrested, do it in Britain. THAI: If you look at some of the research, some of the complaints that officer is have in using these techniques is they feel pressured to do the cognitive interview and that's actually one of the things that later in research, they try to back off because really, there's not a cognitive interview. These are toolbox of techniques and it really emphasizes the human aspect. They eventually made the enhanced cognitive interview and the enhanced makes it sound like, "We're going to add all these techniques," and basically, the techniques were like, "Adapt to the situation. Do whatever makes sense in the moment," so if there is a technique that doesn't make sense with that person, don't use it or if you think one might be effective, then use that and adapt to the person you're interviewing. It was kind of funny that they had to cope back in the early 90s and be like, "Pay attention to the situation. Pay attention to the person that you were interviewing and make a point to have some sort of rapport with them and pay attention to what they're saying," because they were finding that in a lot of cases. People were now using these tools as a checklist and getting frustrated if they couldn't get through all of them. "Oh, I didn't use Technique 3, so I didn't do the whole thing and if I can't do the whole thing, there's never time to do it, so I won't use --" JESSICA: Oh, so like design patterns in code? JOHN: Throw them all in. THAI: Use them all, yes. JESSICA: Yes. There's Level 1, can we use the technique? Level 2, should you use the technique? THAI: Yeah, exactly. I know the NTSB teaches these techniques as well for accident investigation. SAM: It's like cognitive interviewing Pokémon. JESSICA: Yeah. SAM: You got to catch them all. THAI: That was actually something they had to continually review and emphasize in later research was that you don't have to use them all. JESSICA: Yeah and you mentioned that people were using it as a checklist. Checklists can be useful for safety to a point but John Allspaw was talking to me about this the other day. There's a point at which checklists make you less safe because you're not paying attention the same way. THAI: Right. If the checklist gives you the idea that you've done all the right things or the checklist is a mask over a large amount of complexity, I think a lot of the more popular knowledge on this comes from Atul Gawande's work in the Checklist Manifesto and he talks about that, where checklists are good fit for those things that can be kind of encompassed within a relatively short list of discrete tasks -- the pre-surgical things, some of the flight things but things that mask a lot of complexity maybe aren't a good fit. You can't have a checklist item that says, "Did you make sure all the things worked?" People will just say yes, so let's just check it. JESSICA: Yeah and it's the same with software because you can write reliable software and you can add layers of Kubernetes and restarting and back pressure and blah-de-blah-de-blah and you can get robustness. You can get software that stays up and you can get software that does the same thing but if you want resilience, you can't get that with just software. SAM: So a theme I'm hearing repeated again and again in the last couple of minutes is that you can teach a technique and in the process of teaching a technique, it tends to get formalized so that you can teach other teachers how to teach it. So as you're learning the technique, it helps to go through that formal process of like, "There's this, this, this and this," and then at some point, you have to sit with it, integrate it, think about it and figure out how to adapt and actually, use it effectively. You can't just spew out the thing and you're done. JOHN: Yeah. I think that's one of the steps of the learning ladder where that's conscious competence, unconscious competence where it's like you learn the thing but you have to just explicitly follow these steps. You always copy and paste this line into the terminal to fix problem B and then slowly work up to the point of actually understanding what's going on and being able to adapt that into actually fixing your problem or understanding why the problem is happening in the first place. THAI: Absolutely and I think that's something that's missing in a lot of software incident response. We have some really well-trained folks who are really good at their craft of software but oftentimes, we put them in a situation and we never really tell them what they should do at the moment the pager goes off and oftentimes, there's a lot of focus in senior engineers in making sure they know this or that but there's no more Go or C or other code to teach them. Often, it gets overlooked in giving them that even initial process of what to do from the moment the pager goes off. I think that's where some of the things like the ICS -- the incident command system that FEMA uses and PagerDuty and some others have adapted become helpful because they give you that initial steps of this is what I do as a responder and then allow you to build off that but I think we often skip that step in software, so we don't give people that opportunity very easily to build those later steps. JESSICA: Sweet. JOHN: At the end of the show, we like to go into reflections, which is basically talking about the things that have come to us, the new ideas, the interesting things that we're going to chew on for a while after the show. I wanted to see what you all had for reflections. JESSICA: I have a short one. There was one phrase that stood out early that I wanted to come back to, which was part of resilience is how do you know where to get help, when to ask for help and who to ask because it just gives you a lot of options. JOHN: I just think the cognitive interviewing idea, I'm going to read some of the links that have been posted to that information because I think that will be useful just in general, as far as being able to talk to people and get more information out of them, not necessarily in an incident context but just in general. SAM: I think in addition to the observation I made a few moments ago about actually having to apply things that you learned and think about them, the other thing that really sticks out for me in this conversation is this idea of being explicit and deliberate about thinking about process and communication as a thing that you can get better at, that you can work on, that you can talk about. Appropriate or not, I'm going to go ahead and do it. I'm going to plug a RailsConf track that I am curating. If you happen to be planning to get a RailsConf already, it would be great if you would go to the 'Working With Other Humans' track. It's going to be an all-day thing with some amazing talks about how to think about emotion as a state machine and how not to be a jerk and how to talk to other people and maybe, even how to be more effective at management. It’s going to be a lot of fun. I'm looking forward to it. JOHN: I'm so there. SAM: RailsConf this year is April 29th to May 1st in Minneapolis, Minnesota. JESSICA: Thai, do you have a reflection or some? THAI: I don't think I'm going to forget the phrase now -- cognitive interviewing Pokémon. SAM: I win! THAI: Everywhere I can throw that phrase in, I'm going with it. JESSICA: Now, we're going to have to give them names. THAI: I'm going to think a lot more about some of the things that Sam brought up about how we can work with other people in these different ways and just even thinking already about emotions as a state machine, I think is really interesting. It's interesting to see different lenses be applied. It feels like a lot of the same thing: applying emergency medicine to software or applying software lenses to these other things and making these cross domain links, so that they can make sense to us or we can improve either side of it. JOHN: If you're interested in those clips of interesting cross area metaphors, I also have a talk that I've been giving for a couple years now called 'Hacking Your Emotional API.' I think I've done it about 13 times now. I just got back from RubyConf Australia to do it but you can see a video at EmotionalAPI.com. JESSICA: Sweet. Thai, thank you so much for joining us today. THAI: Thanks for having me, folks. JESSICA: If you like this conversation, you should come to Patreon.com and give a dollar or something, at least to GreaterThanCode.com. Actually, you should give more than a dollar but just a dollar will get you an invite to our private Slack channel and then you can come to the Greater Than Code Slack and you can talk with us and our guest and everyone is very nice to each other and we have channels like Random and Overheard and 'Things I want to tweet but won't,' so you should join us.