REIN: Welcome to episode 186 of Greater Than Code. I am your co-host, Rein Henrichs, and I am here with my friend, Jamey Hampton. JAMEY: Thank you, Rein. And I'm here with my friend, John Sawers. JOHN: Thanks, Jamey. And I'm here with our guest, Emily Gorcenski. Emily has over 10 years of experience in scientific computing and engineering research and development. She has a background in mathematical analysis, with a focus on probability theory and numerical analysis. She's currently working in Python, though she has a background that includes C#/.NET, Unity, SQL and MATLAB. In addition, she has experience in statistics and experimental design, and has served as Principal Investigator in clinical research projects. Emily was also named as one of 2018's most influential feminists by Bitch Magazine for her data activism shining a light on far-right violence with her First Vigil project. Welcome to the show, Emily. Welcome back to the show. EMILY: Yeah, thank you very much. Thanks for having me again. JAMEY: Ooh, if you've already been on this show, then we're about to ask you a question that you've already answered. EMILY: That's okay. Maybe my answer has changed. It's been a couple of years, so a lot has happened. JAMEY: Well, then I'm excited to ask you again, what is your superpower and how did you acquire it? EMILY: Oh, this is totally a different answer than last time, because my superpower now is hunting Nazis. And I acquired it, I don't quite know how, it evolved out of a need to protect myself, my community, and the world and society at large. REIN: Maybe we could say that there were some formative events. EMILY: There were a couple of those, yeah, along the way. Yep. JAMEY: Would you say that it's a practice makes perfect kind of skill? EMILY: Practice makes perfect, but it's a hard skill because you really don't want to make a mistake when you're tracking Nazis because that mistake can be very, very bad. But thankfully, I have not made any of those and had a lot of support from a wonderful legal team and antifascist community and Twitter followers and supporters and general freethinking people who love equality and peace throughout the world. REIN: It seems like there are two kinds of mistakes and they are both pretty bad, false positives and false negatives. EMILY: Yeah, that's sort of the essence of any sort of data work. You can be very good at identifying things, but of course, you can get things wrong and you can get things wrong by missing somebody that you should have caught or by catching somebody who wasn't actually a Nazi at all. And both can be very devastating. Some of what we do when we expose people who are wanting to do harm is that if you get something wrong, you can really create a bad day for somebody. And we want to avoid that. And so, the burden of proof is very high. The burden of quality in doing this is very high. And so, this is not a game. This is not something that we do for fun. When myself and other activists do this, there's also sort of the flip side, which is, "Oh, why don't you just not do this? Why don't you not hunt Nazis? Why don't you not bother with what they're doing? Who cares about Nazis?" And unfortunately, they go out and they create lots of violence. There have been people who have gone on to commit terror attacks that I was not able to identify before they went and committed their terror attack. And things like that are really, really worrisome. The Robert Bowers incident with the Tree of Life shooting in Pittsburgh in 2018, he was somebody that was on my radar. He was on the radar of one of my very good friends, Molly, who goes by @socialistdogmom on Twitter that you all might know. He was somebody that we knew and he just sort of slipped through the radar. And the next thing you know, he shot up a synagogue. And so, those are very scary things, those are very difficult things to sort of have to have on your conscience. And sometimes people might say, "Well, it's easier if you don't try," but I can't bring myself to do that. REIN: To do like my very natural transition from hunting, not to use to data science? Let me try another one. EMILY: [Laughs] REIN: I mean, we can keep talking about this, it's fascinating. But I was just wondering, how did your experience in data science and all of these things help you to do this work? EMILY: Well, I think that when you look at data, you have to have a keen eye and an open mind because data tells you things about the universe. But the way that it tells you these things about the universe, it's colored by your ability to measure, your ability to sample, your ability to understand the domain of where you are. And what you see that the data point that you have is not necessarily a one to one reflection of the reality of the world underneath it, it's colored by your many biases. And these can be personal biases, but they can also be biases introduced in the process of capturing data, assessing data, measuring things and transforming things. And so that's part of this process. When you hunt Nazis, you realize that you're looking at a small sampling of who that person is. And it's the same if you're doing that or if you're working on big data for a software application, or if you're an epidemiologist running tests to detect virus cases. JOHN: So for those of us that maybe are lacking a little bit of a backstory about how you ended up with this profession/evocation, can you talk about like what led you to, not only using the skills that you have to do this work, but also that something is clearly motivating you very greatly to do this work? EMILY: Well, I'm from Charlottesville, Virginia, which as many people probably remember, was the site of the neo-Nazi rally called Unite the Right in 2017. This was the same rally that President Trump said that there were fine people on both sides. There was a terror attack that happened at it. There was a torchlight rally that happened before. And so, it was a very dramatic scene. And I'm really shocked a lot of people in America to really be confronted with the racism and the violence that exists in our society. And as somebody who is trans and who's queer and who's mixed race and who's a woman and who's very outspoken on the Internet, I was targeted throughout this event. And this is my way of fighting back. My training as somebody who is good at looking at data and asking the questions behind the questions, that gave me a natural set of skills to sort of identify and track what else is going on within these movements that took so many people by surprise. REIN: It's interesting because you have a very unique set of skills. You have the data science background, you can also manage and build projects. You can put together websites that are highly functional and things like these. You did a whole bunch of stuff using a whole bunch of different skill sets. And there aren't maybe a lot of Emilys who could do that, I think, is what I'm getting at. EMILY: I hope that there are more out there than we realize. And I think that one of the things about being a data scientist is that there are no data science problems out there. There's problems that we use data science to solve. Nobody is sitting there like, "Ah, if only I had a neural network, my problems will be solved because I just want to sell neural networks." That doesn't exist. So as a data scientist, you have to be good at understanding other domains. So if you're working in retail, you need to understand things about maybe it's pricing or maybe it's logistics or maybe it's supply chain. And if you're working in doing data science for computer vision, you have to know math, you have to know things about physics and images and how cameras work and all sorts of things. So, the core skill of being a data scientist is being able to understand the problem domain for the problem that you're actually trying to solve. And so, that means that data scientists have to be able to be interdisciplinary in their approaches to doing things. If you look at sort of history of projects in my career, I've worked in retail problems, I've managed clinical trials for medical devices, I've done aeronautical engineering, I've done rehabilitative medicine for stroke rehab, all sorts of stuff. And the common thread is this ability to look at problems with an interdisciplinary lens and to use tech skills and knowledge to solve the problems that the experts in that field don't have the skills to solve. REIN: I think not just that, but also to be able to relate to the collection of data, to the goals and needs of a community. It's not just sort of collecting data for the sake of having all these indicators we can put on a dashboard. It's to serve some purpose. EMILY: Absolutely right. So with anything with data, it's got to be driven by the use case. It's got to be driven by what it is that you're trying to build. And one of the things that, in industry, not in activism so much, but in industry, one of the challenges that I see is that a lot of companies and lot of people cut the boundaries of a data science project far too small. And so, people are just trying to build a model and just trying to ship a model or they're just trying to derive an insight. I hate the term 'insight', when really what you want to be doing is what is it that your users need? How can you bring them value? How can you help them? How can you make their world better? And how can you use the data to do that? And so, that's kind of like the central thing that we do. JOHN: I haven't really done any data science or any [ARNL] type of work, but I've always been sort of interested in it and curious to try out some of the tools to learn more about that. But I always find myself getting stuck at the start where I'm like, "Well, I'm not currently aware of any problems in my domain that are directly applicable to being solved by this." And probably that's because I don't understand the technology well enough to realize how it would apply in the domain. Like you were saying, you have to have both of those pieces before you can realize how it's actually going to become useful. EMILY: Yeah, absolutely. Sometimes I like to get really metaphysical when I talk about data. Like, really philosophical about it. Data is a measurement of the universe at a point in time. And so, I really try to think big. It's not like a measure of how many things you bought or how many letters are on the page or whatever. It's a measurement of the universe. And when you build a model, a model is a way of trying to understand the universe. You're trying to extrapolate and understand how the universe works in some very limited context, because only the universe is a perfect simulation of itself. And you may never know or be able to know why some customer went into a store and decided to buy the garlic pasta sauce versus the pesto sauce. Maybe it was just down to however they were feeling at that time. But the universe makes that happen. And the best that we can do is try to understand the boundaries of what we're trying to do and model, and use the measurements that we have of that process to make better understandings of it. So if you think more philosophically of like, "What is it that I have?" And if I think of the things that I have as measurements of how the universe was at some point in time, then you can start to think about, "Okay, how did it get there and what did it do once it got there? And how do those things all interact?" And so sometimes like I said it earlier in the show, what is the question behind the question? And that's like the thing that a good data scientist should really be driving towards. What are you really trying to answer? JOHN: I really like that framing because I feel like by saying like, "Universe at a point in time and we measured this," you're also very explicitly saying, "Well, there was the whole rest of the universe going out at the same time that we didn't measure." And so it very clearly calls out the fact that you were measuring a specific thing which could be biased or could be measuring not quite the right way or not the right thing to be measured. And you're trying to get at something that's just adjacent to what was measured, but it highlights that shortfall. And so, you can be aware of it as the data goes through the process. EMILY: Yeah, absolutely. And there's like noise that you have to consider. There's other factors, confounding factors, and you may not have the ability to measure that. And sometimes these things are really important. You want to use this data for various things that you need to have accurate measurements. And so, we need to build models. Maybe you can't measure directly what it is that you need to have data on. So you have to measure something else. Then you have to build a model of how the thing that you're actually measuring maps to the thing that you want to measure, and so on and so forth. One example that I think is fairly useful is let's say that you build a rocket, and you need to measure the temperature of the exhaust plume. Because the temperature of the exhaust plume, if it gets too hot, your rocket blows up. If it gets too low, your rocket falls out of the sky. How do you measure that temperature? You can't stick a thermometer in the air, because it'll just melt away. So you have to measure something else. And that something else is going to not tell you exactly what that exhaust plume is. But maybe if you do enough modeling, if you do enough research, you can figure out how the exhaust plume maps to the thing that you're actually measuring. And so what we do in data science is just doing that same problem in infinite many different contexts. JAMEY: This is super interesting to me and maybe this is just me being kind of naive in the way that I thought about this before. But when we talk about data science or computer science or whatever, I'm not usually thinking explicitly about science. And what you're describing is science. I think that's really interesting and really beautiful because I think science is really beautiful. And this idea of dedicating yourself to understanding things around you better is what I find beautiful in science and it's totally what you're describing too. So, I really love that. EMILY: Yeah. I think it's about hypotheses. It's about exploration. And you need to be able to try many things. You need to be able to be willing to be wrong to prove yourself wrong. All of those things, that's that whole process of introspection and interrogation that drives the scientific method. And I think sometimes we do forget about that when we just look at narrow scopes of problems of building a better API or building a better user experience or something like that. There's science that underpins everything. So there's infinite opportunities. It's just a matter of getting the right energy and people to go and dig into them. And whether it's worth it to do that, because it's not always worth it to do that. JAMEY: You said something about being willing to be wrong as part of the process. And I guess I'm interested in maybe diving a little bit farther into that, if you're willing to. What does it feel like to find out that your assumptions about something were wrong? And how does that spurn your next action? EMILY: I think that being wrong is the best thing, because when you're wrong, you learn something. And it's a very measurable way to improve and it's a way to get you closer to the destination. If you make a hypothesis and you're wrong, you rule something out. It's not wasted effort, even though it might not feel like you're getting closer to a solution. You're definitely narrowing the universe of possibilities of those solutions. And so, what you really want to be able to do is be wrong at scale so that you can rule out many things as quickly as possible. So that the best thing about data science is you get to be wrong all the time and it's good. It can be a little bit hard at first because you're like, "Oh, I really wanted that to work." But you learn to sort of separate that notion of like, you must be driving towards a goal from movement towards that goal is equal to success. And then you start sort of inverting your way of thinking. And once you start inverting your way of thinking like that, it opens up this whole world of how to see problems and how to see your data, how to see the universe, how to see your systems, whatever it might be. And then that allows you to sort of dive closer to the truth. And it also means that you're much less likely to have your heart broken down the road, because if you make three steps in the right direction, you may not actually be going in the right direction because you might just have not yet hit the blocker. So, it eliminates some sort of sense of false hope at the same time. REIN: There's a subtle thing about the relationship between failure and learning that I think is really interesting. I don't think this disagrees with anything you said. I think it probably reinforces it. But that is that people sort of think that once you fail, that's an opportunity to learn something. And that's true. But what's also true is that being able to say that you failed means you already learned something. You necessarily have some new knowledge that allows you to say that already. EMILY: Yeah, totally. It's funny because it's actually really hard to get into this mindset. I live in Berlin now and I'm trying to learn German. One of the most frustrating things is when I'm trying to speak German or write in German and somebody corrects me on something small where it's like, yes, you're helping me learn a thing. But also, it's just really hard to take that feedback when you're trying to have the freedom to experiment. So part of this whole being wrong is needing to have this freedom to experiment in a world free of judgment. And so, sometimes it can be very, very difficult to do that. And like, I flipped out on Twitter, like I'll write something in German and somebody will be like, "Oh, you used that wrong," or like, "You forgot this," or whatever. And I'm like, "Man, haven't you seen how I tweet in English? I'm full of typos and stuff like that. Give me a break, all right. This is my third language. Come on." REIN: One of the hardest things to do in science, but also in general, is to translate an unknown unknown into something else. Donald Rumsfeld was a huge piece of shit, but possibly the one good thing that he gave the world was this category of known knowns , unknown unknowns, and so on. So a known unknown isn't just something you don't know, it's something that you can't learn. There's no process. There's no method by which you can learn this thing except by accident because you don't know where to look. One of the ways you can translate an unknown into a known known is to fail in the right way. It is one of the very few things that will actually do that. EMILY: Yeah. I think it's super important. And if we think in the technology context, the folks who do a lot of work on site reliability engineering have really latched onto this. One of the patterns of site reliability engineering is designing your systems to avoid failure, instead of designing your systems to minimize your time to recovery, because you're going to come up to those things. You can't predict everything, you can't know everything. And you're going to have a situation where something is going to break for the first time and there is no way of you being able to predict it. And so, you want to be able to optimize for recovering from a failed state versus avoiding the failed state. And there's a lot of parallels between the SRE world and the way of thinking and what we see going on in the world right now. For example, with the coronavirus crisis, where some people and countries and leaders are trying to pretend like a problem doesn't exist and some are trying to prevent the problem from existing and some are trying to optimize the time to recover from the problem, knowing that shit happens and bad things happen in the world is full of these things. And if you look at the places that are doing better than others, it's the ones that have known that this is going to suck and have built in safety systems to get their people through it. Not every place is perfect, but I think there are a lot of lessons that we can learn as a society from this. But there's also people who do this for a living. And yeah, maybe it's just because they make computers go computer [gober]. But I think that these are important philosophical lessons, as much as they are engineering lessons or scientific lessons. REIN: Totally agree with you. I would maybe go slightly further to say that it's not just optimizing for the time between failure or the time to recover rather, it's figuring out what it is that makes it happen. And that is not making MTTR a KPI. What actually does this work is figuring out what people do when they're recovering these systems and making that better. EMILY: Absolutely. It's all about the process. It's all about minimizing touches, so to speak. One of the jobs that I had in my past was I was a receiving manager for a Barnes & Noble bookstore. And the things that we were measured on were how many boxes we would open in a day. And it was a terrible thing to measure on because some boxes had literally one book in them and some boxes had 40 books. And some boxes had 40 books, but they were all cookbooks and they were all easy to sort. And some boxes had 40 books, but they were all over the place and they took five times longer. And what you started to realize was your productivity was dependent on how many times you touched a single book between it coming out of the box and going on the shelf. And so, experienced receiving managers started to learn and develop processes for doing their job that minimize touches. And it drove corporate crazy because they're like tracking box counts at the hourly rate and you're like, "This doesn't make any sense. What you should care about is product getting onto the shelf. And I'm optimizing product getting onto the shelf. And you're here micromanaging the boxes as if they're all a homogenous thing." And so, it's the same thing at any other point. If you're building engineering systems, you want to minimize the amount of things that your engineers have to touch to recover a system. REIN: I would only add to that. When your engineers do have to touch stuff, make the touches they can do meaningful and useful, designed to make those touches stuff that is likely to help the system. EMILY: Yup. JAMEY: I think that's a really interesting point about not realizing what it is you're actually tracking. I worked in a grocery store for a long time and we had a metric that we were judged on how many items per minute you scanned. But when you were scanning your items, it would track it. But when you weren't scanning any items because you didn't have anyone in your line, you have to lock your computer. And if it was locked, then it wasn't counting your items per minute. And so, it essentially became -- and they would have these charts with who has the best IPMs. And it was essentially just a way to track who was the best at not forgetting to lock their computer in between customers. So I guess I wonder, what are the methods for determining that you're going to be tracking the wrong thing maybe before you spend a lot of time tracking the wrong thing? Because I think items per minute does make sense to me. And I don't think it would have been easy to kind of notice that that's how it was playing out until it was already happening. And I wonder if there's some sort of model for thinking about things in that way. EMILY: When I see things like that, I often ask, "What is the thing that you're actually trying to answer?" And oftentimes, they don't actually care how many items you scan. They're trying to solve a different problem than what the number is. And that is how few people can they pay. Or maybe I'm just letting my anticapitalism show. Maybe it's something like, do we have the right staffing at the right times a day, or something like that. And so, oftentimes one of the things that I see often as an anti-pattern is proxy measurements. Measuring something that's easy to measure. Like IPM, scanning things, that's easy to measure as a proxy for something that's more difficult to measure. And if you have that situation, which you do quite often, like the rocket example that I gave earlier in the show, you have to have a good understanding of how those things correlate, which means that you have to do experimentation. You have to have statistical thinking to prove that this number correlates well with the thing that you're actually trying to do. And so, sometimes it can actually be the case, like maybe that metric does do two things. Maybe it does tell them what they need. It is a measure of who is better at locking their computer. Maybe those two things can both be true. But I think that the ability to ask the question behind the question allows you to start formulating hypotheses and structuring experiments and ways of looking at the data to see if what you're actually collecting is the right thing. Unfortunately, what I see as a data science consultant is that most organizations are collecting vast amounts of completely the wrong data. So it's a very common thing to count the wrong things. REIN: There's a concept that comes from David Woods who sort of invented Resilience Engineering. Well, honestly, it didn't come from Woods. A lot of this sort of philosophy adjacent things are just restating something that some dead white person 200 years ago said or earlier. But his idea is that there's a sort of hierarchy of goals and that one goal is the means to achieve some higher level goal. So, for example, if you're tracking the number of items you scan, then that becomes a goal. Well, why is it a goal? It's a goal because someone thought that that was the means to achieve some higher level goal, which is how fast you're able to move product through the register, which is in turn tracking some higher level goal, which is sell more stuff, make more money. EMILY: Yeah, I think that certainly we see that. And this is one of the opportunities where when you have that sort of stack of things, there's lots of different interesting paradoxes that can come up. One great example is Simpson's Paradox, where this can arise when you have lots of imbalances in your data, where if you cut the data in certain ways, every way that you cut it, it looks like one thing is true. But then when you group it all together, it turns out the opposite thing is true. And so there is lots of examples of this. The world is nonlinear and this is a nonlinear behavior, ultimately. When you apply linear thinking to nonlinear universes and nonlinear processes, you're probably going to make a mistake unless you know where to cut the boundaries. You can always approximate nonlinear things linearly, but you have to know what is that domain of applicability and how far can you go before you start messing up? JAMEY: You said there are lots of examples of this. I was wondering if you could maybe give one example of simple ones. I think it would help people understand. EMILY: I'm not going to recall it perfectly off the top of my head, so please forgive me. But one of the canonical examples of Simpson's Paradox is a study at some university that was looking at the ratio of gender in admissions to some graduate programs. And when you looked at any department by itself, it looked like none of the departments were biased. So if you look at the Art Department and then you looked at the Biology Department and you looked at the Economics Department and you look at all of these departments independently, you would conclude we have no gender bias. But then when you group it all together, then what you saw is that overall as a university, there was a gender bias in admissions. And the reason that this came up is because one of the departments was so out of skew and so much larger than the others that it dominated the effect when you started to aggregate these numbers. And so, I wish that I could remember offhand what the specifics of it were. But these are actual real world examples that happen. Some other things that happen that are not quite Simpson's Paradox, if you look at the overall aggregation of pay inequality, pay inequality, let me be clear, is quite real. But if you aggregate within certain domains, it will look like the pay gap is not real. So in order to understand what's happening is that when you look at the statistics of wrong, you aggregate the statistics overall. It's not quite the same when you weigh things. It's not quite true that women get paid 70 cents on the dollar to men. But it does become true that even when you weigh things, women are still not paid as much as men, though it's not quite 77 cents per dollar. But then if you look at certain subsets of professions, it turns out that women get paid more than men. And so, the sort of reverse paradox can also apply. And why is that the case? Well, it's like if you look at fields that are dominated by non-men, for example, medical nursing, the proportion of women to men in those fields is greater. As a result, the tenure of women in those fields is greater. And so, that throws off statistics in various ways. So you have to control for many factors to really understand the truth. And when you look at these things, you have to be aware of sort of these types of paradoxes and types of confounding and non-linear effects to really understand the essence of what's happening. REIN: This also reminds me of the sort of push in the last decade or maybe more in SRE that means are a lie, and that you need to find other statistical measures of percentiles of these other things. For example, if you have one millionaire and 99 people who have a dollar, then the mean net worth here is ten thousand dollars. That is not an accurate characterization of the system. EMILY: Yeah, exactly. So, when you look at things that are naturally probabilistic in nature, you need to start thinking in probability terms. And so, one of the ways that's useful is looking at percentiles versus means. But also, you can be really mathematical about it and you can look at your cost functions. And you can look at the distribution. And in a nonlinear system, that distribution is going to have nonlinear effects. An example of this is let's say that you're building an API and you want your response time to be, whatever, or let's say your uptime to be whatever. The difference between your mean, your 50th percentile or your 75th percentile or whatever, that difference is probably not going to make that much of a difference. I'm not going to care. I'm not going to be less happy if I'm in the 50% to the 75%. But if I'm in the 99.9% of slow responses, that's going to make me very unhappy. So, it's not like that little gain in percentile leads to a very large gain in unhappiness where the equivalent gain, if you're in the middle of the distribution, is not necessarily the case. So when you take this probability distribution and you inject it into your cost function, now your cost function becomes a stochastic process. And so now, you have to look at how that process transforms in a random variable, not a deterministic variable. That's a very challenging thing to do in mathematics. And because it's a challenging thing to do in mathematics, it's very difficult to understand in intuitive lay terms. And so, that is one of the things like understanding that like that's happening, not being able to fully describe it, but knowing that if we can create measures or sort of shortcuts or proxies in a way that encapsulates that behavior, then we are more often more successful. And looking at p99s or p999s is much more useful because it kind of acknowledges that problem without having to go out and actually model the effect of that, which is a very burdensome and difficult process. JOHN: Are there ways of detecting non-linearities in your systems or models such that rather than just knowing the system so well that you know that it's nonlinear, it's like is there some sort of analysis that you can say, "Oh, wait. We're dealing with something that's not intuitively understandable." EMILY: There are. And I think that in the tech industry, we've not been very good at doing them. And this is something that I will die on this hill and I'm really trying to push for more cybernetics thinking in the engineering and distributed systems world. Some people are moving in this direction, but there's a technique known as Kalman filtering and constrained Kalman filtering that would blow your mind if you knew what it could do. And these are things that could be applied to SRE and reliability and monitoring and observability techniques. But nobody's done it yet. And I really wish that I could take a team and just have six months to implement it because you can do some really cool shit with it. But there are other techniques of being able to detect these sort of nonlinear behaviors and whatnot. And I think that the degree of sophistication that we have, even in very sophisticated observability stacks, is still not really where it could be for the complexity of our universe and of our systems. In the aeronautical engineering industry and in the biomedical engineering industry, we do this stuff all the time. In real-time controls, we do this stuff all the time. And we can apply those lessons to the way that we create systems, because all it is, it's all just about feedback. It's all about understanding your system, understanding your measurements, your observations, understanding the changes that you make as inputs to a system. It's all about feedback. And cybernetics is all about creating feedback control systems. And so, we can use cybernetics thinking in how we build distributed web systems, like the barrier to entry is like, first do a master's degree in mathematics. Second, be able to use quaternions fluently. That's a really high barrier to entry for a lot of us. And so we could simplify it, maybe. REIN: A cybernetics story, I think, is one of the sort of tragedies in modern scientific thinking. There was all this promise in the 50's, 60's, 70's, and then sort of the information processing systems model of cognition sort of went out the window for good reasons. And then we threw a lot of the baby away, too. And we're sort of so separate that it gets a bad rap in cognitive systems, in resilience engineering and these fields. And I think that's unfortunate. EMILY: I mean, we could dive into the history of some of this stuff. There's all sorts of histories in this. And technology has developed in waves. So, when the convolutional neural network revolution started in 2012, everyone was like, "Wait. Neural networks? Didn't those die in 1988?" No, they didn't. But they got kind of ruled out. And a lot of people are surprised that the first deep learning application with a neural network was created in 1978. There's an eight layer multi-perceptron polynomial neural network. And it was used for a control system application. But we've seen this sort of behavior, this pattern where there's a lot of hype and a lot of promise and then it doesn't quite deliver and then it crashes. And we forget about it for another 10 or 15 years and then it picks back up again. And we're like in the third wave of neural networks coming up. But we've seen this before. Logic programming is a great example. Look into the history of Prolog, look into the history of Japan's 5GL system, the fifth generation languages, and how that project was such like a -- or it wasn't that like the technology wasn't there. It's like an organizational failure. But it failed so hard that it killed Prolog as a language. And the same thing happened with like AI and Lisp. Do you remember, like Lisps were supposed to be the breakthrough for AI. And the US military invested a ton of money into it and it failed so hard that computer science departments threw out their Lisp textbooks. And they're like, "Okay, we're pretending that this didn't happen anymore." It's only been in the last five or so years that Lisps have really become popular again with the advent of things like Clojure and some other Lisp-based languages. And so, like cybernetics, very much the same thing. There is this sort of hype around it, like, "Oh, this is going to solve all our problems. We can just model everything in systems." And then, math is kind of like, "No, combinatorics still exists." And then everything failed. But the fallacy of that thinking was that because it failed for one reason, it's going to fail for other reasons. And that's not true. And so, it would be useful to take the best parts of it and learn how to evolve those and grow those into something else. My prediction is the next 10 years, we're going to see a resurgence in cybernetics thinking and in the engineering spaces, and we're going to see a revolution similar to what the convolutional neural network did to computer vision. REIN: The history of cybernetics is not only a history of scientific ideas and paradigm shifts, it's also a history of politics. So, for example, one of the reasons the cybernetics failed is because the CIA initiated a coup against a government that was trying it in Chile in the early 70's. EMILY: Yeah, that would do it. I think similar things have happened in Iran and other places where we've actually sabotaged [chuckles] their research. REIN: Ideas don't always fail because they're wrong. EMILY: Yep. REIN: Oh, by the way, we did interview Eden Medina, who wrote Cybernetic Revolutionaries about that situation. And that book is good. And the interview, I think, is very good. Just something our listeners might be interested in. EMILY: I've been kind of beating this drum, obviously the big news in the world to all the listeners from the future. I hope that there is a future so that you can be listening to us. We're in the middle of the coronavirus crisis and it's hitting a lot of people very, very hard. And there's been a lot of discussion around like, "Why don't we have better testing? Where's the ventilators? Where's the masks?" Blah, blah, blah. And I think that as we were just talking about things being political, there's a lot of political angling to this. And of course, I'm very clearly no supporter of Donald Trump. But I think that there's too much enthusiasm to look at this as a single point of failure - the president is bad, therefore, we don't have these things; and less of an interrogation of what are some of the factors of scale and reliability and pragmatism that has to go around it. Because the pandemic is a very complex crisis and there's not going to be a root cause. There's not going to be a single thing that we can do to get us out of this. And in order for us to get through this, yes, we need tests. But I wish that half of the yelling about tests was yelling about contact tracing, and was yelling about social support systems. We can have all the tests in the world, but if people are going to starve and lose their homes because they can't stay home, because they have to go to work, all the testing in the world isn't going to solve that. And when we look at things like, let's say you get a test, let's go back to what we were talking about earlier. Data is a measurement of the universe at a point in time. What is a test, but a measurement. And that measurement is not going to be perfect. So, let's say that you go in to your local clinic and they give you a coronavirus test. Let's say that that test is now positive. Does that mean that you have coronavirus? No, it does not. Why does it not necessarily mean that you have coronavirus? It also doesn't mean that you don't have coronavirus. It just means that the probability that you have coronavirus has increased from some baseline. So why is this the case? We have to get into conditional probability to understand this. A test can be wrong either by giving a false positive or a false negative. And what we need to do is we need to understand. We need to think in conditional probability terms. So the question isn't if the test gives me a positive, do I have coronavirus? The question is, given that the test is positive, what is the probability that I have coronavirus? And so in order to answer that question, we need to know things like what is the prevalence of the disease in the general population? If you plucked me off the street at random and we knew that 3% of the population had coronavirus, the probability that I'm carrying the virus would be 3%. Now, when you apply the test and it comes back positive, we need to now start looking at what is the ratio of positive results in that test. So if that test is positive, it means that it could be either a false positive or a true positive. And so what is the ratio of those things? And so when you go through and you apply what's known as Bayes Rule to that test, what happens is the probability of me having coronavirus might go from 3% before testing to maybe 30% after testing. And that number ultimately depends on what we refer to as the sensitivity and specificity of that test. And even if we can get those tests to be 99% accurate, we're not going to get that probability much higher than 50%. Testing is only one part of this problem. In order to make testing work, we need to have all of these other things in place. We need to have an infrastructure to quarantine people effectively. We need to be able to produce tests at scale. Germany, where I am, has some of the most thorough testing per capita in the world. But still at their current rates, it will take them three years to test their entire population once. I think the latest numbers I've seen, and it may have updated since then, was Germany was doing 500,000 coronavirus tests per week. The population of Germany is 83 million people. So, do the math. That's 166 weeks in order to test every single person once. But a test driven intervention, you can't just test people once and then have it work. You need to be able to test them repeatedly over and over and over. So we're very, very, very far away from pragmatically being able to use testing as the sole source of getting us out of this crisis. It's not that testing is not important. It's that testing alone, like data itself doesn't solve your problem. And that's the central challenge with what we see in the tech industry. We're like, "Oh, we've gathered all of this data. We've put it into a data lake. We have terabytes of data." I was like, "Great. What are you going to do with it?" Same thing with coronavirus testing. "Oh, we've tested five million people last week." Great. What are you going to do with it? And the same thing is true with medical devices. Everyone's like, "Oh, we need ventilators. We need ventilators. Where are the ventilators?" It's like, "I don't know, dude. But the time to ask that question was like 18 months ago," because that's what the certification timeline is optimistically to create a new ventilator. To ramp up production is probably 90 to 180 days as well, because you have to train people how to make it. If you're building a medical device, it's regulated. So there's a lot of paperwork that has to be done. And the reason that there's a lot of paperwork that has to be done is because medical devices, if they fail, will kill people. And so now, you have on top of the production and the logistics problems, now you have a liability question of, "Okay, what happens if my ventilator got raced through regulation?" Now, somebody die who wouldn't have died because of it? Now, who's liable for that? Is it the hospital? Is the manufacturer? Is it the president? We don't know the answers to these questions. When you think of the whole system, we need to be having people diving into every part of these questions. And so, like everything else, there is no single root cause. And because there is no single root cause, there is no single solution. And when we look at the politics of it, it's really easy to blame a person for not doing a single thing good enough. And this is not an excuse because we have piss-poor leadership going on right now. But our system is really, really incapable of handling these things. And we really need to think about systematic restructuring in order to solve this crisis. REIN: Yes, completely agree. An interesting thing happens when people who have this sort of academic belief in there's no root cause, human error is what we do when we want to stop asking questions. And that runs up into Orange Man Bad in their brain. And then the cognitive dissonance happens. EMILY: Yeah, absolutely. In SRE, we know that. Root cause is not human error. Root cause is very often organizational error. And so people systems can be part of the root cause. But most of the time, most of the failures that I see in my day to day work as a technologist involve confluences of factors that you have to be able to point to five different things. And if you had done any one of them differently, you wouldn't have the failure or you might have avoided the current failure, but you would have eventually come to some other failure down the road. And so, it's a matter of understanding systems, not sources. But you have to be willing to investigate that. It's the same thing, like I don't know anything about distributed systems. I'm a data scientist. I do math. I'm good at data engineering, but I don't like Kubernetes. I don't do this type of stuff. But SRE thinking is, to me, very easy because it's everything that I do as a data scientist all the time. It's asking the exact same questions, just applied to a very specific domain. I find SRE very easy, even if I don't understand the technology behind it. JAMEY: One thing that's been striking to me the entire course of our conversation, this whole episode has been like when you're talking about pinpointing problems and such. It's almost as if the world is really nuanced and not simple and one dimensional usually. And I was thinking about it very intensely when you were talking about the pandemic situation. But I think that nuance is something that individuals can hopefully appreciate, at least sometimes. But I think nuance is something that collectively, we have a really hard time appreciating. EMILY: Yes, we do. Twitter is a perfect example of that. Oh, my God. Twitter is like where nuance goes to die. Because it's where all of us just emitting our purest, most basic feelings. And so, there's not a lot of room for nuance and we do it for the collective whatever. And yeah, I think it is hard to do that. I think that the solution to this is to be less monolithic and be more decentralized, because large groups of people without any sort of common training, backing, understanding are probably going to struggle with nuance. But groups of people with a similar background and training who are experts in a thing will be able to effectively debate nuance. And that's why we want epidemiologists working on coronavirus plans, not economists. Unless our goal is to reopen the economy and make rich people richer. And then you would say, "Oh, shut the epidemiologists into a closet and let's get the economists out here." And so I think that this is an argument for decentralization. This is an argument for putting the power into hands of people who are best and most likely to use that power to steer the world for better. And I think that a lot of our structures are not organized around that. One great example of this is, why was the US so slow to roll out coronavirus testing to keep with the example? And part of it is because our FDA did not have an approved test. Why does our FDA not have an improved test? Well, because no tests had existed that had gone through our regulatory procedures. And it takes a long time to go through our regulatory procedures. Now, there is a provision that allows the FDA to sort of sideline those provisions in the case of an emergency. But that provision that usually requires the declaration of an emergency from the president. And so now, you've taken this, like you've done almost everything right. You established an agency of experts who can understand nuance and understand all of the finer points about the ethics and the safety and the efficacy and all of these things. And you've given them the authority to act as gatekeepers, except when it matters. And then you've put that authority in the office of a single person who doesn't know anything about this, and is completely incompetent. You're putting all of that authority and depending on the goodwill of a single person in order to use it. That's a systematic failure. And so, a decentralized system would look maybe something like the FDA would be self-empowered to waive those requirements and there would be some sort of independent auditing to make sure that they don't abuse that authority. And so, it is possible to do this. You can create groups of people who are able to act and understand nuance and understand these complex things. And you can empower them. We just haven't. And that, to me, is a central frustration. It's totally political, but it's also because our system has empowered this political sort of maneuvering. REIN: I totally agree with you. And I want to mention a couple of things, both to back you up and so that our listeners can know where they can go learn more about these things. One is that there's a study of what are called high reliability organizations. These are things like the Navy, Navy submarines, like firefighters, like National Park search and rescue teams, things like that. And one of the hallmarks of a high reliability organization is the ability to defer to expertise. And not just expertise, but the expertise of the people that are closest to the problem. So that's exactly what you said. And there's a bunch of research that backs this up. The other thing about decentralizing, Sidney Dekker in his book, The Safety Anarchist, calls this devolution, which is pushing decision making authority and power down and out towards the people who need to make the decisions. EMILY: Yep. I love that, The Safety Anarchist. I need to read that because it's like everything that I shout about all the time. REIN: The reception for that book has been mixed. EMILY: Well, maybe. I don't know. I haven't read it. So maybe some of his takes aren't good. But I like the concept and I really want to see more, thinking around those things. We need a more nuanced, speaking of nuance, we need a more nuanced discussion of power and authority in our world. REIN: It's not mixed because it's bad. They're mixed because you have a bunch of old guard sort of safety one rules policies and procedures compliance people reading this book and getting pissed off. And I think that's good. EMILY: I have stories that I can't share from my professional work working in safety critical systems for automotive engineering. And some of the people who understand it and some of the people who don't about writing software. And there's value in process. There's definitely value in process. But sometimes being prescriptive about process devalues that process. And so you have to look at the intent versus the letter of the standard, and you have to be able to be willing to think in different ways. But there's also challenges around that, because we live in a society. And so case law exists, theories of liability exist. And if you do something different than the way somebody else does something and something goes wrong, you're screwed. REIN: Yeah. Here's an interesting question. People doing the work are constantly adapting and trying to get their jobs done or resolving all sorts of conflicting goals and tradeoffs in a context that's constantly changing. And policies and procedures can't keep up with that. And so the question is, how do you tell the difference between an innovation, a work around, a shortcut or a noncompliance? EMILY: I think an innovation would solve the problem or address the problem. A workaround bypasses the problem. I would have to think about the difference between a shortcut and a workaround. REIN: That's why it's such a difficult question. EMILY: It is. It is. I would have to spend a couple more brain cycles on that. And the noncompliance is just being like acknowledging the problem, not caring about it. I just like, "Fuck it. I don't care." REIN: I have to say, I may have tricked you just a tiny bit because I don't think that difference is interesting. I think it's entirely something that's done in hindsight, post hoc, looking backwards at an event. EMILY: I'd agree with that. Pursuant to one agreement. And that is that people are acting according to some ethics, because I have definitely seen some incidents where the noncompliance or the shortcut has been deliberately and intentionally chosen and knowingly so. And so, I think that there are definitely times when those are things that do happen ahead of time and there is a difference. But in hindsight, yes. If you're looking at things, particularly in the light of a disaster and incident, all of those things maybe look the same. But I definitely think if you're looking forward, you can make a choice of one or the other. It's just you'll never know until something bad happens, which it was. REIN: A lot of these systems are sort of predicated on the ‘everyone is trying their best’ hypothesis. Like, if you want to understand failure, you have to understand why it made sense for that person to do what they did. These systems are vulnerable to people who want to hurt rather than help. They're vulnerable to sociopaths. They're vulnerable to bad actors. EMILY: Yeah. We often tell, like the prime directive and retrospectives that we truly believe that everyone is doing their best in those circumstances, given what they knew at the time. Yeah, there's like a lot of times and that's not true. We really want to build our teams and our workplaces and our societies around that. But I don't know. Maybe we're going back full circle to the head of the conversation. But when I deal with Nazi hunting, they are not operating under the best interests. They are definitely trying to lie. They're definitely trying to screw you. They're definitely trying to act in malicious interests. REIN: I actually think that this prior belief that people are good actors makes it take longer to detect bad actors. And I think that's problematic. But I also think that it also often is good because it helps us do better in the more common scenario where people aren't evil. EMILY: Yeah, absolutely. I mean, it's a cost. It's a cost that we accept. We need to create that world. If we create a world of paranoia and distrust, we are going to harm ourselves more than we're going to help ourselves. And I think that we just have to know that, yes, people can be bad actors and harden our ability to detect those things and understand that we need to create separate processes for addressing them. REIN: Yes. So to bring this all the way back full circle, the thing that you have to do is you have to detect and remove bad actors from the system. So, the thing that you are doing is good. EMILY: I like to hope so. It's exhausting, is what it is. JAMEY: I agree that it's good. And I noticed at the beginning when you were talking about it, you mentioned like a time when someone slipped through, and I noticed that you didn't talk about times when you were very successful and helps people. I want to give you credit for it and I hope that you give yourself credit for it. EMILY: I think so. I think that there's a lot of good that we've done. When I say we, I mean me and the community at large who empowers this work, the antifascists worldwide, the activists, the people who doggedly listen to all of these garbage podcasts that they produce ad infinitum, the people who do the work with the media and with government, with whatever. I do think that this is important work that is helping people. I mean, there are, in addition to the ones that we've missed, there are the incidents that we've caught. There is an incident in Virginia in January. A group of three men were arrested. They had a car packed, ready to drive from Maryland to Richmond, Virginia, where there was going to be a big demonstration at the Virginia Capitol. This demonstration was actually a pro-gun demonstration where a bunch of people were going to demonstrate and protest Virginia's new gun control laws that were going through. There was a lot of talk about violence at this event because a lot of people were going to go and they were going to be carrying guns. These three guys were part of a neo-Nazi group and they actually wanted to go down and shoot the right wingers who are protesting against gun control legislation because they wanted to create a false flag operation that would incite a race war. And these guys had packed their car, ready to drive the few hours down from Maryland to Richmond. They had a trunk full of guns, ammunition, food, beverages, body armor, all sorts of accessories to weaponry, all sorts of stuff. And the feds arrest them right before they're about to leave. And these guys would not have been found if it weren't for the work of the people who identified this group, the journalists and the activists who infiltrated this group. The work that was done across national borders because one of the guys was a Canadian Armed Forces member. He slipped across the border. He was outed in Canada by a journalist. He left a few days later, snuck across the US border, unlawfully, met up with his peers and was living for months in the United States training for an attack. And if it weren't for the work of the people who infiltrated and uncovered these guys, we might have been talking about one of the worst incidents in American history. And so, that's a win. And that win was not possible without the work of dozens, if not more people. That's what makes it worth it. At the same time, we can't stop them all, but goddammit, we're going to try. I still think about Taylor Wilson, who tried to derail an Amtrak train. I think about Robert Bowers, who shot up a synagogue, all of these guys in Halle in Germany and Hanau and in El Paso and in Gilroy and all of these incidents that we've seen that are terrible tragedies. These are guys that leave trails online. These are guys that leave clues to who they are all around. And we can find them and we can stop them. And we can do this without the immense and oppressive apparatus of the United States government. But it takes work and it means that in order to do good, you have to be willing to fail. You have to be mentally strong enough to know that you're not going to catch all of them. And you're going to have to be mentally strong enough to know that when you've got one, that you have to go forward. And that, I think, is hard. REIN: I'm probably preaching to the choir here, but I hope all of our listeners will remember everything that Emily just said. When you see someone talk about [inaudible] for terrorists and things like that. EMILY: But also more importantly for the folks out there who might be parents or aunts and uncles or grandparents or even just friends with people in school or whatever. Also, just be aware, because a lot of the stuff is a lot of young guys and they will try. There's lots of dog whistles that they'll leave. And you have to be willing to confront this. You have to be willing to confront racism and hate misogyny and transphobia and all forms of bigotry and to call it out. And I think that an essential skill that we should have as people is to know how to address somebody who's going down this path before they go too far. And to be willing to make it costly for that person to continue. And too often, I see things where it's like a person has said something like, it's a parent who's testifying at a sentencing hearing about how they knew that their kid was looking at these websites, but it was just websites. It's not just websites. It's not just a political opinion. Confront it. Speak to it. It's cliche, but be the force that you want to see in the world. And that means that you have to start with the people that you love the most. And if you can't deal with people that you love the most, then you definitely can't do that scale with strangers. REIN: I would just maybe want to add to that that when you see the sort of just asking questions people or these line stepping people, this is often part of an intentional and systematic program, the end result of which is always violence. EMILY: And this exists in the tech industry, too. We see it. We know it exists. We see this in the resistance to diversity initiatives. We see this in things like the James Damore memo. We see this all the time. And so as technologists, we have a privilege of being a fairly high paid, high access member of society due to our professions. And people are going to try to enter that and subvert that for their own needs. And so, you have to be willing to stand up to coworkers. You have to be willing to set the environment in your workplaces and speak up. And sometimes that might mean that you need to be willing to lose your position if you have the ability to do so. Because the truth is, tech is a very employable place and it sucks to lose a job but you can probably find another one in tech. And you've got to speak up. And I have nothing but respect for all of my peers at Google and Amazon and all of those people who have lost their jobs for trying to organize and stick up for what they believe is right. REIN: We usually do reflections at the end of this, but that is the thing that I will be thinking about. JAMEY: This has been really powerful. EMILY: I'm always happy to talk more. And you all know how to get in touch with me. Happy to have these conversations. I think they're super important. And I like having them everywhere I go. JOHN: Well, this was really great, Emily. Thank you for doing this with us again. JAMEY: Yes. Thank you so much. EMILY: Thanks for having me. I'm always happy to come on the show, and I'm so excited at everything that you all have done in the time since I was last here. So, thank you.