alyssaa-simpson-rochwerger_mixdown.mp3 Alyssa: [00:00:00] The best Data is is real data that is generated either by humans. Sometimes that's emails or whatever that use case is that you're solving. So I'll take a frequent use case, which is often like prioritization of support. Tickets is a classic model that teams want to build inside a lot of different types of organizations. You have zillions of support cases coming in for, and you want to just categorize them or you want to understand which ones are most severe that need to be answered first. Harpreet: [00:00:40] What's up, everybody, welcome to the Artists Data Science podcast, the only self-development podcast for Data scientists. You're going to learn from and be inspired by the people ideas and conversations that'll encourage creativity and innovation in yourself so that you can do the same for others. I also host open office hours you can register to attend by going to Bitly.com/adsoh forward slash a d s o h i. Look forward to seeing you all there. Let's ride this beat out into another awesome episode, and don't forget to subscribe to the show and leave a five star review. Our guest today is a customer driven product leader who is dedicated to building products that solve hard problems Harpreet: [00:01:35] For real people. Harpreet: [00:01:36] Having made positive impacts for customers at both startups and enterprises alike, she's built a proven track record of bringing products to market and scaling them from concept to large scale r o ai. She's held a wide range of high profile roles that many machine learning organizations, including VP of Product for Figure eight, which was acquired by Appen, VP [00:02:00] of AI and Data at Happen and director of product at IBM Watson. Currently, she's pursuing her dream of using technology to improve health care and serves as the director of product at Blue Shield of California, where she's surrounded by tons of Data hard problems and countless opportunities to make positive impact Harpreet: [00:02:21] In the lives of millions. Harpreet: [00:02:23] So please help Harpreet: [00:02:24] Me in welcoming our guest today, Harpreet: [00:02:26] A woman who is thrilled to Harpreet: [00:02:28] Pursue the mission Harpreet: [00:02:29] Of providing access to high quality, affordable health care to all. Alyssa Simpson, Rock Burger Melissa, thank you so much for taking time out of your schedule to be on the show today. I really appreciate having you here. Alyssa: [00:02:43] Great to be here, thanks for having me. Harpreet: [00:02:45] It's absolute pleasure today we are going to discuss your book Real World I I really, really enjoyed this book, as is evidenced by all these flags and tabs have gotten there. Yeah, there's like notes and everything all up in here, so I'm excited to get into the book. But before we do that, let's talk. Talk about a little bit about you and your background. So early on in the book, you mentioned being an unlikely A.I. leader, so talk to us about that. Alyssa: [00:03:10] Yeah, sure. So I have a fairly non traditional background for data science or machine learning, but one that I actually think gives me a lot of strengths and opportunities. I have a liberal arts degree in American studies and photography. I went to a progressive education. I don't have an MBA. I don't have a computer science background. Harpreet: [00:03:30] I have done quite a Alyssa: [00:03:32] Few courses and spent well over a decade Harpreet: [00:03:36] In hands on Alyssa: [00:03:37] Experience, working with development teams, working with data science teams and machine learning teams and deep R&D research. But and so that's what's giving me a bit of a different background. You know, I bring a lens of customer experience and interdisciplinary Harpreet: [00:03:51] Background to the Alyssa: [00:03:52] Space, and it's been an incredible privilege to work with partners who help me in the technical Harpreet: [00:03:58] Areas where I'm Alyssa: [00:03:59] Not as [00:04:00] strong and helping understand how we can apply the technology towards really Harp hard problems that people care about and what that means when the technology gets out of the lab and actually is impacting someone's life. Harpreet: [00:04:14] We are definitely going to get into the importance of having cross-functional kind of interdisciplinary teams when it comes to building AI products very shortly here. But something you mentioned was what happens when these products get out in the real world. So what could possibly what could possibly go wrong if all we did was focus on creating accurate machine learning systems and just focus on that accuracy metric? Alyssa: [00:04:39] Yeah. So that Harpreet: [00:04:40] Was the classic first mistake Alyssa: [00:04:42] I made when I got into machine learning. So when I first joined the Watson team in the computer vision base, I was like the first product manager that the team had had. And I was just I was Harpreet: [00:04:53] Paired with some incredibly Alyssa: [00:04:55] Talented PhDs who had been in this space for really, really long time, and they showed that Data product and the feedback from customers were it's not accurate. It needs to be more accurate Harpreet: [00:05:04] To be better. Alyssa: [00:05:05] And I played around with the API, would be feeding it images and getting tags back, and I totally agree. Harpreet: [00:05:10] It was like, Yep, Alyssa: [00:05:12] Needs to be more accurate. I had a Harpreet: [00:05:16] I Alyssa: [00:05:16] Didn't really know what that meant. And so I was asking the team, OK, you guys are working on making it more accurate, like, how are we measuring that? What do you mean by accuracy? And so then I learned what an F1 score was. I'm sure this audience is probably well steeped in different sort of academic measures of accuracy, understanding what the difference was between precision and recall and all that. But it actually came down to what data set were you Harpreet: [00:05:43] Measuring this Alyssa: [00:05:45] On? And we had a dataset with millions, billions of images. And you know, at the time, we were using image that as one of those data sets and we made the system a lot more accurate. Our F1 score got better precision or better recall, got better. [00:06:00] And then we went to launch this new fangled more accurate system that was actually beating a lot of records and a few days or weeks I can't remember before it launched, it was a halt. The presses? Oh my god, no, we can't launch this situation. And I was like, What are you talking about? There's all these people and my money and time buy into this like we're launching. And the research was like, No, we're not. And he sent me the image of what he put Harpreet: [00:06:24] Into the API, Alyssa: [00:06:25] And it was an image of someone in a wheelchair and the tag that came back was loser. Harpreet: [00:06:31] And I was Alyssa: [00:06:31] Like, Oh, you're right, we're not launching this. This is a horrible, horrible, in my opinion, very inaccurate answer. However, it was a more accurate system by those traditional measures, and what we had not done was really dove into what data were we measuring accuracy on. And what we ended up finding was Harpreet: [00:06:53] That quite a few Alyssa: [00:06:54] Other things that we pulled out in terms of the data Harpreet: [00:06:58] Set that were Alyssa: [00:06:59] Bad training data and were teaching things that Harpreet: [00:07:02] Were incorrect. Alyssa: [00:07:03] And so we had over focused on the sort of F1 scores and not on what was the actual training data itself or gold standard Harpreet: [00:07:13] Benchmarking that was driving Alyssa: [00:07:15] Those numbers. And what would that mean to our customers, right? If they were using this Harpreet: [00:07:22] And they put Alyssa: [00:07:23] In images and got back tags just because that was accurate, according to the data set, that was going to be a huge black eye for IBM and bad New York Times story, and I wanted no piece of that. Harpreet: [00:07:36] And so, you know, we changed horses. Harpreet: [00:07:39] I mean, that speaks to the the subtitle of the book here. It will guide for responsible machine learning, excellent book, by the Harpreet: [00:07:45] Way, you guys should pick this up. Harpreet: [00:07:46] I highly recommend this as a few different points. We got to dig into and there I mean the importance of a baseline, the importance of making sure we have the right data. But and then you lay out such a nice, clear framework on how to how to do this responsibly in the book. But [00:08:00] the first thing I guess we need to figure out is, OK, how do we identify a problem that A.I should solve? Harpreet: [00:08:07] Right? Harpreet: [00:08:08] So can you share some strategies with us for identifying the types of problems that A.I should solve? Alyssa: [00:08:14] Well, you know, one of the things I quickly learned was that I was not alone, and many other practitioners in the space had very similar stories to the one that I had. Harpreet: [00:08:23] Not everyone likes to talk Alyssa: [00:08:25] Or share those stories. We share quite a few of them in the book, but it's hard to get people to talk about failure and to your around the the problem. You know, often what happens is that it's the mistake I made, which is that I didn't define the problem well. I took for granted the project as it was handed to me, Harpreet: [00:08:44] And I sort of Alyssa: [00:08:46] Assumed that the team was on the right track when I Harpreet: [00:08:49] Stepped in and Alyssa: [00:08:50] Was getting up to speed, and I didn't take a step back to say, what business problem are we actually solving with an improved accuracy? Harpreet: [00:08:59] So in my case, the customers Alyssa: [00:09:02] Our customers were people who were trying to use visual recognition for marketing or, you know, customer service or all sorts of different things. And when I peeled back what they meant by make it more accurate, what they really meant was I have a very, very narrow and specific thing that I'm looking to tease out of imagery. Harpreet: [00:09:20] Can you help me put something Alyssa: [00:09:22] Yes or no into that category? So one Harpreet: [00:09:25] Example could be if you're Ben Alyssa: [00:09:27] Jerry's and you are launching a new brand of ice cream, you want to know where in social media you're Ben and Jerry's logo or that new brand Harpreet: [00:09:35] Is or isn't if Alyssa: [00:09:37] The tag comes Harpreet: [00:09:38] Back as ice cream, Alyssa: [00:09:39] Right? Or birthday party. And yes, that's it is an accurate description, but it's too vague or broad for the use case that they need. And so we totally shift, of course, and solve a different problem, which was building a system that could be trained or customized, really narrow at use cases, [00:10:00] but that took shifting gears and really getting out of the technical focus Harpreet: [00:10:06] At all and spending Alyssa: [00:10:07] Time with customers. So why do you even want this in the first place? And sometimes those customers are internal business stakeholders. This could be a customer support team, or it could be a marketing Harpreet: [00:10:18] Team or Alyssa: [00:10:20] Many different use cases. And sometimes those are external customers to you, like go talk to them, get out of the building, you know, set up virtual Zoom meetings Harpreet: [00:10:28] And ask them, why is Alyssa: [00:10:30] It that they're using your product and what really matters? And sometimes what you'll find is actually they don't need machine learning at all. And it's not actually a good toolkit for the problem that they're trying to solve, and they have a vague understanding of the sort of magic eye that's going to be applied. And so once you understand the problem, then you can decide whether or not machine learning actually is appropriate. And AIs Data scientists, right? You know that it's important to have a robust training dataset that matches the problem Harpreet: [00:10:59] That is important to the customer. Alyssa: [00:11:02] So in order for Ben and Jerry's to pick out their ice cream and birthday party pictures on social media, they need a lot of images trained around what their logo looks like and being able to articulate that. And if that doesn't, if Data is there doesn't exist. That's your first. Gotcha. Another one is around the business value of solving this problem, right? So how much is Ben and Jerry is willing to pay to find their logo? What is it worth to them? Because that, you know, machine learning is hard and expensive and requires specialized talent. And so if you don't have a big enough business value, that is very, very difficult to achieve with traditional methods that are not using machine learning. That's also a gotcha because it means that you don't have any money to actually solve this problem or you don't have the urgency to do it. So that sort of relative business value and priority and funding and Data are kind of a. Harpreet: [00:11:54] So you talk about this in the book as well, this this notion of the the Goldilocks problem, Harpreet: [00:11:59] So would [00:12:00] that be Harpreet: [00:12:00] Kind of the definition of the Goldilocks problem in a nutshell? Like how do we how do we tell if, you know, if we're if we have like, you know, what was that? The Papa Bear, right in the middle Alyssa: [00:12:10] Of just the Goldilocks problem is find a little bit differently. It's when a company decides, Hey, I want to do machine learning, right? I have a lot of different things that I think it could be applied to which one should I start with. And that's an opportunity of taking a lot of different problems and sort of categorizing and measuring them around. Ok, which ones are really important to your business? Which ones do have all the data for which ones are small enough from a technical model building perspective that you can effectively build a model fairly quickly and get it into production accurate enough and watch? Where is there not a ton of risk to your organization of getting it wrong, right? You have to kind of evaluate all these different problems against those criteria so that you can be successful in a pilot when you have an organization that is interested in adopting A.I., which basically everyone is but is stumbling to actually sort of make of those projects particularly fruitful. Harpreet: [00:13:11] And I guess the hard question is like, you know, as data scientists and machine learning practitioners, we got we got a toolkit right. If we got a hammer, everything looks like a nail, right? So can you share some tips with us to understand or tell, at least if a problem is going to be well suited to using machine learning? Alyssa: [00:13:31] Yes, the the difference I find and this actually came out one of the interviews I was doing for the book and someone else sort of framed this really well. For me, the difference between a more junior data scientist and a more senior data scientist is that the junior data scientist will Harpreet: [00:13:47] Take something Alyssa: [00:13:48] Vague from a business stakeholder, or they've gotten assigned to a project, and they will just start solving it from a using their toolkit that they learned in school and trying to build models [00:14:00] and using whatever data that's Harpreet: [00:14:01] Available to them on day one. Alyssa: [00:14:04] The more senior data scientists will not all start solving the problem and have a lot of conversations Harpreet: [00:14:11] With the business Alyssa: [00:14:12] Around OK, what problem are you really solving? What value is it? What data do you have? Let me look at the data and spend time thinking if that's actually really the right data and evaluating it to see how well it matches the expectations of the business Harpreet: [00:14:27] Folks, those Alyssa: [00:14:28] Sort of starting point differences to me are are huge and they they really represent the difference between someone less experienced, more experience because the more experienced person knows that the actual model building piece is not, I don't wanna say, trivial or easy, but that is not the hard part. I mean, I can't build models, so I have a lot of humility for it. Harpreet: [00:14:50] But that that part is Alyssa: [00:14:53] Executional in nature and the far more difficult piece is really around understanding. Is your Harpreet: [00:14:59] Data set going Alyssa: [00:15:01] To be able to build a model that solves the business problem? Harpreet: [00:15:05] Thank you so much for that. I appreciate that. So we got we got a good sense here of kind of some strategies for identifying problems, getting into the Goldilocks problem and then trying to tell if our problem at hand is suitable for machine learning. So now I guess the next step would be less, you know, assemble a team and let's get people working on this project. So how do we determine how can an organization determine who Harpreet: [00:15:29] Gets to work or Harpreet: [00:15:30] Who should rather go to work on this project? Alyssa: [00:15:33] I believe in cross-functional teams are what matters and get things done. Machine learning problem or not. And so that cross-functional team should represent the business problem. So sometimes that means people from customer success. Sometimes that means people from operations. Sometimes that means finance. People are H.R. people or privacy or legal Harpreet: [00:15:53] That in machine Alyssa: [00:15:55] Learning contacts, that often means machine learning engineers, frontend software [00:16:00] engineers, backend software engineers, Data operations, DevOps right away user experience design. Those are really critical stakeholders Harpreet: [00:16:10] And yes, machine learning engineer Alyssa: [00:16:11] Or data scientist. But I feel bad for so many machine learning engineers or data scientists are often sort of put on an island by themselves and expected Harpreet: [00:16:21] To do everyone Alyssa: [00:16:23] Else's job right and magically come up with, you know, the answers or solution. And that's just not possible. They're not they're not well equipped to do it. You need a team to solve problems. And so not only should the team be diverse from a skill set Harpreet: [00:16:35] Perspective, but I highly Alyssa: [00:16:37] Recommend the team be diverse from a background perspective. Or at least that background matches the constituencies of the problem you are solving for the business. So I'll take a take an example there was an algorithm built for comfort. Got a ton of play in the press, which was for recidivism in the U.S. Harpreet: [00:16:58] Justice system was a model Alyssa: [00:17:00] That essentially predicted, you know, who is going to commit another crime when some going to jail, who should go back to jail, essentially what parole should be. Harpreet: [00:17:07] And you know, Alyssa: [00:17:09] Unsurprisingly, because the justice system in this Harpreet: [00:17:14] Country is very Alyssa: [00:17:15] Biased and has a lot of Data Harpreet: [00:17:19] From African-American people Alyssa: [00:17:20] Or people of color Harpreet: [00:17:21] Who spend more time in Alyssa: [00:17:23] Jail because we have all these systemic issues. If you go and train a model on that, guess what? It's going to predict that someone who is black is likely to commit a crime more because the data that you're using is flawed. Right. And so it's Harpreet: [00:17:35] Important that the team that Alyssa: [00:17:37] Is building this reflect the constituency of the people that you're serving because often they will bring experiences that are really, really, really relevant that other people have blind spots for right. I am fortunate enough not to be in a wheelchair. I had a blind spot there. I didn't test that. I didn't think to test that system that I had built in visual recognition with anyone in a wheelchair. [00:18:00] The pictures I tested with were me and my sister on a wedding or whatever I had lying around. That was easy to me. So those were were serious biases that I brought to the table, and I was really fortunate to be part of a diverse team who had was representative people from all over the world who spoke. I think we spoke 10 languages on the team. You don't really diverse and you know, that's such a huge asset. Harpreet: [00:18:23] Over and over and over again Alyssa: [00:18:25] In my career building machine learning teams have I noticed, but I'm the only woman in the room, Harpreet: [00:18:30] Right? Or everyone Alyssa: [00:18:31] In the room is Harpreet: [00:18:32] White or Alyssa: [00:18:33] Everyone in the room is white and Indian. You know, there's there's narrow groups of people building us, and it's not necessarily reflective of the people we serve serving, whether or not those people are educated differently or speak a different language or come from a different country like you really got to think that through. Harpreet: [00:18:51] Yeah, yeah, they say with compasses, it's quite quite unfortunate, I mean, to quote Kanye West, it seems like Jerome gets more time than Brandon and just including a name as a feature to to have something discriminate like that, it's possible. Harpreet: [00:19:06] And you know, we need to Harpreet: [00:19:07] Be responsible to not to not let that happen. Alyssa: [00:19:10] It's like, so not OK. And it's also so I want to say easy to fix, but it's so very, very, very possible. You know, there are so many different strategies that you cannot do that and you can Harpreet: [00:19:21] Create responsible Alyssa: [00:19:23] Locations and machine learning that really do serve and even correct for some of the systemic biases that we that we have. Harpreet: [00:19:29] Yeah, it's like, you know, that's when focusing only on accuracy can go wrong, right? I mean, that's edge case there. But but yeah, thank you for sharing that. And as somebody who is the first data scientist in an organization building the data science team up from scratch, I can definitely feel the pain Harpreet: [00:19:45] Of having the Harpreet: [00:19:46] Expectation that I can do at all. Harpreet: [00:19:49] But I found it Harpreet: [00:19:50] Interesting in the book how if you talked about if you were going to build a team from scratch, a hat like the Harpreet: [00:19:57] Order in which you would hire Harpreet: [00:19:59] Those [00:20:00] people? So talk to us about that. So if you're building Harpreet: [00:20:04] A data science machine Harpreet: [00:20:05] Learning AI project team from scratch, what would be the order in which you hire Alyssa: [00:20:10] People? Well, I have product management skills, so I wouldn't hire another person like me. I probably my first hire would probably be a. Designer and user researcher to really understand the problem space and what it Harpreet: [00:20:25] Is that the product Alyssa: [00:20:26] That we're building should look and smell and feel like, and if there's a way we could achieve Harpreet: [00:20:31] That problem without Alyssa: [00:20:32] Hiring it, because guess what? They're expensive and hard. So I would see what kind of prototypes could build that didn't require machine learning to use off the shelf machine learning to kind of prove out that there was business value there, assuming that it is a problem as well suited for machine learning. And we really need to do that. You know, I would start with Data engineering and making sure that the Harpreet: [00:20:53] Date because Alyssa: [00:20:54] Andrew Carpathia puts up a great slide. I'm sure it's circulated a lot in the community, right? But in academia, you spend 80 percent, 90 percent of your time on model building and we're like 10 20 percent on Data. In the real world, it's the opposite. You spend 80 percent of your time on Data wrinkling and about 10 20 percent of time on model building. And so I would hire someone who's really good at Data wrangling and understanding the Data. After that, you know, I'm sort of making some assumptions, but typically companies already have software engineering. Harpreet: [00:21:25] If you're a startup Alyssa: [00:21:26] Company that Harpreet: [00:21:27] You have access to that, so Alyssa: [00:21:28] Assuming that a software Harpreet: [00:21:29] Engineering team is already in place, I think a data Alyssa: [00:21:32] Scientist might actually be one of my last hires Harpreet: [00:21:34] Because I think you Alyssa: [00:21:36] Need a high performing software engineering team in order to take that data science to make that data science team successful and to put something into production right and get it there. Harpreet: [00:21:47] Yeah, I really enjoyed that Harpreet: [00:21:49] Part of the book. It did, when you're Harpreet: [00:21:50] Talking about it might have been you or Wilson that wrote that chapter when they're talking about the order in which they would hire teammates and the first hire would have been like Harpreet: [00:21:57] That, the product project Harpreet: [00:21:58] Manager or product manager type [00:22:00] role and then move into like those more technical roles. I mean, I definitely like that approach. So, OK, so we've got we've figured out how to identify a problem, figured out how it is that we should structure our teams. Is there anything else that that we should make sure that we are doing in the early stages of a project to ensure that it's successful to to make sure that this thing has some? I always like Alyssa: [00:22:24] To touch on Harpreet: [00:22:25] Incentives. Alyssa: [00:22:26] People do what you're measured on. And so in a big organization, or sometimes even a small organization, if people Harpreet: [00:22:32] Are if you're giving Alyssa: [00:22:33] Someone a Harpreet: [00:22:34] Job right, Alyssa: [00:22:35] You're paying the money. And typically they have job responsibilities and goals and expectations and bonuses and aligning people's bonuses and incentives all in an aligned way makes for a really high functioning team. So I'll Harpreet: [00:22:49] Give you the final word doesn't work that is actually pretty Alyssa: [00:22:52] Typical, which you will have a product management team bonus on revenue or member satisfaction, something like that. You will have an engineering team bonus on up time Harpreet: [00:23:04] Or quality Alyssa: [00:23:06] Measured by lack of bugs and things like that. You have a machine learning team bonus on accuracy or precision because everyone is incented differently. You never ship anything right because the engineering team is like, No, no, no, we can't launch our game because we might accidentally introduce bugs or break something down. The product team is super Harpreet: [00:23:24] Frustrated because they're like, Alyssa: [00:23:27] No, we have to get stuff to market or we have to like, launch something and they don't care if maybe it's buggy or whatever. And the machine learning folks are like, No, no, no, the model's not ready yet. Harpreet: [00:23:36] I need more Alyssa: [00:23:37] Time. I need more time, I need more time Harpreet: [00:23:40] And everyone sort Alyssa: [00:23:41] Of operating or not collaborating to get a high functioning MVP out to market. So I always like to make sure that when I'm building on a hiring plan that a portion of people's bonuses are based on their their skill set and the technical acumen that they can Harpreet: [00:23:58] Bring, but also Alyssa: [00:23:59] Portion [00:24:00] is shared right and there are shared revenue goals or shared member satisfaction goals or whatever it is. Harpreet: [00:24:06] Thank you very much for that, Lisa. So how do we tell if we have the right data? What's what's something that we should consider? You know, we we know what our problem is. We got the right team. Next thing you know, I'd say we probably have to move into getting some data to solve the problem. How do we make sure that it's the right data that we're using? Alyssa: [00:24:24] Yeah. So the Data should be from the real thing that you're solving, right, so it needs to not be fake Data or synthetic Data sometimes could be a way to like, jumpstart. You know, you can create Data when you have a vacuum or no Data for a problem. Harpreet: [00:24:42] But the best Data Alyssa: [00:24:44] Is is real data that is generated either by Harpreet: [00:24:48] Humans. Alyssa: [00:24:49] Sometimes that's emails or whatever that use case is that you're solving. So I'll take a frequent use case, which is often like prioritization of support. Tickets is a classic model that teams want to build inside a lot of different types of organizations. You have zillions of support cases coming in for and you want to just categorize Harpreet: [00:25:09] Them or you want to understand Alyssa: [00:25:11] Which ones are most severe that need to be answered first or which Harpreet: [00:25:14] Ones have really, really pissed Alyssa: [00:25:15] Off customers. And then, you know you want to get back to. And so you can't go and like, turn around and look at another companies, Data for that or off the shelf because you need your own data, because your customers are going to be writing in in the language of a specific to your products and services or their interactions with you or in, you know, perhaps the members that you serve. You know, in California, 40 percent of California speak Spanish as their primary language, right? So maybe your members are writing it in Spanglish or whatever it is, and you need to make sure that the data that you are using to train is reflective of the actual problem. Harpreet: [00:25:56] Typically, like it means annotated by [00:26:00] humans. Alyssa: [00:26:01] Sometimes you get lucky and you can get data that is already annotated with the outcomes or if you're in a customer Harpreet: [00:26:06] Support setting frequently, there's Alyssa: [00:26:09] A large customer service team that's already organized and categorized tickets and place labels Harpreet: [00:26:13] On them, but you need to be Alyssa: [00:26:14] Really critical of whatever you're inheriting as the training data set and understandable who put those labels on and what context were they putting those labels on? And is that context the exact same context that I'm trying to Harpreet: [00:26:30] Replicate using a model? Because whatever model you Alyssa: [00:26:34] Build will replicate the context of the annotation or categories, right, that that data already exist Harpreet: [00:26:40] In. And if that is Alyssa: [00:26:41] Adjacent to the business model that or the business problem, it's going to be the wrong context and it will not always serve you. So really, really asking hard questions around what is the provenance of the data that you're Harpreet: [00:26:55] Getting, you know, Alyssa: [00:26:56] Doesn't contain any gotchas from a security perspective or a reusability perspective. Sometimes you get a data set and you it's inappropriate to train on or a variety of reasons because it was collected in one context and the legal terms in which you could use it are limited. Or it might be because of a business relationship and there's a partnership that's ending. And so, you know, as part of that partnership ending, you guys have to delete all the data and any inferences from it. And so you need to ask some business questions around what are going to be the gotchas. Can I really use the data? Is it complete? Is it missing any things? Harpreet: [00:27:33] Has it Alyssa: [00:27:33] Been transformed from Harpreet: [00:27:36] The original way in Alyssa: [00:27:37] Which it was annotated and is in that transformation? Is that going to be meaningful? Right. So a classic example is like nulls, right? Or just no answer on a particular category of things. Harpreet: [00:27:51] Know, does that mean Alyssa: [00:27:52] The agent didn't know, you know, Harpreet: [00:27:54] Or, you know, it Alyssa: [00:27:55] Could mean different things that impact the outcome of your model? Harpreet: [00:27:58] Yeah. One of the many things I loved about [00:28:00] the book Real World I again, you guys should get this book is really good was just the sheer number of use cases that you shared from industry and some of them from your experience and Wilson's experience as well. I mean, just a lot of good stuff in this book. A couple of points to touch on that. One thing that I guess Harpreet: [00:28:17] Didn't really Harpreet: [00:28:19] Register for me as severely as it should have prior to reading the book was just the importance of annotating Harpreet: [00:28:26] Data. Harpreet: [00:28:27] But talk to us a little bit about that. Like, you touched on it a little bit just recently. But I mean, like if we have data that needs annotation, like how do we check the quality of those annotations? Harpreet: [00:28:38] How do we know where to go Harpreet: [00:28:39] To get annotated? Do you have any tips around that? Alyssa: [00:28:42] Yeah, absolutely. So I thought this was easy. I've inherited data sets before and I'm like, Oh, great, it's already annotated. Cool. And then I also look, or I've inherited datasets and it's not annotated. I'm like, Oh, how hard can this be? Like, Let's just hire Mechanical Turk or a bunch of Harpreet: [00:28:58] Interns or put labels on Alyssa: [00:29:00] Data. So that works for a thousand. You know, kind of scale. You can get enough interns, maybe ten thousand, you know, it depends how many hours. Harpreet: [00:29:11] But if you are doing Alyssa: [00:29:12] Serious scale right and you know, deep learning sort of neural, that scale typically requires orders of magnitude more data than that. And so that requires infrastructure. It requires hundreds or thousands of annotators, and it requires them all doing it in exactly the same Harpreet: [00:29:33] Way that you Alyssa: [00:29:34] Really can control and follow up on and have quality checks on. Harpreet: [00:29:40] So, you know, there's Alyssa: [00:29:41] There's quite a few Data and companies that do Data annotation. Obviously, I used to work Harpreet: [00:29:45] For one on app, an Alyssa: [00:29:46] App and really had a huge workforce that acquired a degree where I was at, which was the sort of technology side, but also, you know, was a platform that people can use. And sometimes those annotators are going to be more anonymous to [00:30:00] you. And other times it's really important that those annotators are experts in your business. So and leveraging a platform, so if those annotators are, you know, need to be nurses or radiologists or sometimes you know, we've had use cases where it's Data that is owned by a government or really sensitive. So you need special types of clearances to access the data, but you need to be able to organize it in exactly the same way and account for people's different perspectives, but also structure it. So, you know, the technology platforms can help scale figure it out. There's a bunch of them in market. I happen to be biased. Harpreet: [00:30:38] I think figuring out a really Alyssa: [00:30:39] Strong one about app and, you know, things like test questions and gold standard data sets things like making sure you have if you ask the same annotator or three different annotators, the same Harpreet: [00:30:52] Question and Alyssa: [00:30:53] Two of them agree right. And a third or two disagree. That third can be a tiebreaker, but sometimes they all disagree and maybe need to sort of dynamically expand the number of annotators so that you can drive to agreement. Or you need to have multiple steps in an annotation. And breaking down those steps into really, really narrow and discrete phases for scalability is key. There's a lot of gotchas and annotating Data ton, and the way what I recommend Harpreet: [00:31:23] Doing is roll up your sleeves Alyssa: [00:31:24] Yourself. Before you ask anyone else to annotate your data and do 10 do a hundred annotations Harpreet: [00:31:31] Of the data that you need, Alyssa: [00:31:33] And it will become clear really, really quickly to you Harpreet: [00:31:35] Whether it's easy Alyssa: [00:31:38] To scale and really structured, or it uncovers more problems than you thought. Harpreet: [00:31:41] Typically the latter. Harpreet: [00:31:43] And thanks so much for that, Lisa. So let's talk about the deployment phase now, I guess. What are some things that we should be considering when it comes time to deploy our model into production? And then maybe if you could share a lesson that you learned with us, you shared a Harpreet: [00:31:59] Lesson in the book [00:32:00] that you learned Harpreet: [00:32:00] About deploying models into Harpreet: [00:32:02] Production as if you could talk to us Harpreet: [00:32:03] About that lesson learned. Alyssa: [00:32:04] Yeah. Which one are you referencing? Harpreet: [00:32:06] Sorry. Harpreet: [00:32:06] This was the one that he was with with IBM, and it involved something breaking. I'll dig it up in a second here. Alyssa: [00:32:15] I remember. So watch a demo and accidentally break down our data center. Yeah, not one of my final moments. So deploying somebody to production, there's a lot of different scales of doing that right when I was at IBM. That's a four hundred thousand person company. It's a hundred billion dollar company. There's huge checks and balances that go into Harpreet: [00:32:36] Production at a startup. That's very different. But if someone Alyssa: [00:32:40] Is paying money for your product or it's going to be used Harpreet: [00:32:44] In a public environment, even if Alyssa: [00:32:47] That just sort of on a Harpreet: [00:32:49] Website, it could Alyssa: [00:32:50] Have the opportunity Harpreet: [00:32:52] To be good, Alyssa: [00:32:53] Right, and gain traction. So I think what you're referencing was the time that we lost our first computer vision demo. I had a really, really small team. I think it was like five people, essentially like an intern, Harpreet: [00:33:05] Built a demo Alyssa: [00:33:06] Front end website to this API on top of and vision really nice intern at IBM eventually hired the guy is awesome. And so we launched it. Iq made it myself because I didn't have any resources and I'd be minus a person. But you know, we were trying to be scrappy and we launch something and we weren't operating with a ton of revenue. Like this was a beta sort of product, but we lost it and it was a cool, little interactive demo. You get to put images in and train models. But what happened was it was like, I think a Wednesday, you know, I was doing my job. I was traveling on business that week and I had flown to Boston. And on Saturday, my phone is blowing up their Harpreet: [00:33:47] Urgent Alyssa: [00:33:48] Messages from all these TVs and GM's way above me around. Like, who built this demo and why is it broken? What I didn't even know, you know my name like what's happening? Turns out that someone [00:34:00] put the demo on Reddit and people like the demo and started interacting with it, Harpreet: [00:34:04] And it was getting a ton Alyssa: [00:34:06] A ton a ton of traffic. And to this, my my poor little web app, and we hadn't built it in a way that scaled like it was like we didn't put enough. It wasn't sort of built to scale dynamically with that kind of traffic. Harpreet: [00:34:19] And all of a sudden it was like down. Alyssa: [00:34:20] And, you know, Reddit comments are like, Oh, Watson's down, Harpreet: [00:34:23] What's it sounds like? Alyssa: [00:34:24] Oh, that's actually not what's happening. It's my little demo website that's down the API behind the scenes. Like we're it's totally fine. It does scale. But that perception, you know, was was totally inappropriate. And so, you know, we fixed it, we fixed the demo and, you know, Harpreet: [00:34:39] We need to Alyssa: [00:34:39] Expose some underlying bugs in some infrastructure underneath. And we'd accidentally sort of brought down some other bigger systems for about 20 minutes, but we got them all back up and running. So that, for me was a lesson learned that even if I don't think something is super exciting, you know, I thought I was like, Oh, this is an MVP, Data, I get enough traction. People really were excited about it. And, you know, machine learning can really unlock excitement and engagement for things that weren't possible before. So be prepared for scale. Harpreet: [00:35:10] Yeah. Like I said, one of the many things I love about this book is all these real Harpreet: [00:35:13] World War stories. Harpreet: [00:35:14] The guys check it out. I mean it. Alyssa: [00:35:16] So many stories, horror stories, Harpreet: [00:35:19] War stories like it's all just it's really Alyssa: [00:35:22] It's all good learning. And one of the things I love about the data science and machine learning community Harpreet: [00:35:27] Is that Alyssa: [00:35:28] Everyone is. I find frequently really, really giving with knowledge and wants to share and wants Harpreet: [00:35:34] To sort of gave away. Alyssa: [00:35:35] And this is a small way I can put some lessons learned and help others not make the same mistakes that I did. Harpreet: [00:35:41] Yeah, it's a privilege. Harpreet: [00:35:43] It's definitely a great book for, you know, early career, mid-career late career. Even just anybody who is in the field like this, I think is a must read. Harpreet: [00:35:49] And one thing that Harpreet: [00:35:50] You touched on that I know for a fact not a lot of early career or aspiring Data scientists get exposed to maybe even mid-career [00:36:00] Data scientists. It's just the importance of having a Data strategy for a maturity. So talk to us about that. You guys have like you got some significant retail space in the book dedicated just towards having a good Data strategy for immaturity. So talk to us about the importance of that. Alyssa: [00:36:16] Yeah. So this goes Harpreet: [00:36:17] Back to like most companies just Alyssa: [00:36:19] Don't focus on like Data as a core differentiated asset of a growth engine for the organization. And so one of the things I speak Harpreet: [00:36:30] To is, you know, Alyssa: [00:36:31] Where where are you acquiring Data? What is the strategy for acquiring Data? And then what is the strategy for extracting value from that Data towards meeting your organization's goals? So, you know, in health care, for example, where Harpreet: [00:36:46] I'm at now, we have different Alyssa: [00:36:48] Strategies. You know, we're an insurance company. So guess what? People submit Harpreet: [00:36:52] Claims to us lots and lots and Alyssa: [00:36:54] Lots of them. So that is a it's a Data acquisition strategy. And you know, we can turn that and leverage that claims data into really, really useful insights and an aggregate level. And you know, there's regulation on what we can and can't do with that data and how we share it and needing to adhere to that regulation, not Harpreet: [00:37:13] Because it's legal, Alyssa: [00:37:14] But also it's appropriate ethical and there's good reasons for being stewards responsible with that data. But it could also really delight those same members that are serving right Harpreet: [00:37:27] Health insurance claims but doesn't get anyone Alyssa: [00:37:30] Up in the morning, right? Oh God, the pain to submit claims and like, Oh, why won't you pay more and whatever? But I think if I could turn around and use that same data and say, Harpreet: [00:37:37] Hey, you know, other Alyssa: [00:37:38] People like you, we recommend you go to this different doctor Harpreet: [00:37:42] Because they have Alyssa: [00:37:43] Really good outcomes with this knee surgery that you need to come up. And we think actually it's going to cost you less and have a better outcome. And so there's all these really interesting insights that we can Harpreet: [00:37:53] Do with that data and understanding sort of what do you Alyssa: [00:37:58] Have? How is it an [00:38:00] asset? And then how are you using that asset Harpreet: [00:38:02] To deliver value Alyssa: [00:38:04] For your customers? Harpreet: [00:38:05] And I know one of my good friends, George Buchanan, is huge on data governance. He's got this channel lights on Data all about data governance, and Harpreet: [00:38:13] I didn't even know Harpreet: [00:38:14] Anything about data governance until I had to start implementing a Data strategy on my current company. You know, being the first data scientist, delivering some value with a machine learning model to put into production. Everybody wants a little bit of this machine learning action, but we have no data strategy in place. We've got no data governance, data management, none of that stuff. Alyssa: [00:38:30] Oh, important because you need to have legal on board. You need privacy. It's an operationally really critical. Often it requires business development. Know it's no different relationships there. Harpreet: [00:38:40] Yes, it also Alyssa: [00:38:41] Requires some data engineers and some databases and strategy around where you're storing and how you're moving and how frequently you're updating Harpreet: [00:38:49] The data and all that. Alyssa: [00:38:50] But like I said, I had the privilege of working with the awesome engineering and Data teams, and so Harpreet: [00:38:55] They've done a really good job at making that easy. Alyssa: [00:38:58] Have you had any challenges like trying to to, Harpreet: [00:39:01] I guess, sell the data governance thing if anybody hears the word governance there are Harpreet: [00:39:05] Know policemen doing that. Harpreet: [00:39:07] Don't govern me, it's a free country. So like, how do you deal with those challenges in Harpreet: [00:39:12] An organization if you face those? Alyssa: [00:39:15] Yeah, I mean, there's always carrot and stick, right? So stick is, you know, are we going to get audited or do we need to be compliant with the legal mandate? And if we don't do this, here's the bad things that could happen. Here's a risk to the organization. So that's one side of it. If you have some, if you're trying to importance urgency about something which is like, Hey, let me back on the envelope math here. Like, we could get fined up to $500 per thing and we have exposure right now of five hundred thousand. So that's a x dollar value problem for us. On the Harpreet: [00:39:49] Other side, you know, what Alyssa: [00:39:51] Opportunities can you unlock for the business using this Data and Data governance, or Data strategy is Harpreet: [00:39:58] Sort of a means to Alyssa: [00:39:59] An end, [00:40:00] right? It's it's one of the steps that you need to take to unlock this revenue opportunity. So typically follow money or follow the growth or. Harpreet: [00:40:09] Thank you so much for that, Lisa. So we're going to start winding down the the episode here. We've got just two more questions before we get into what I like to call the random round. We talked about this a little bit earlier. You brought it Harpreet: [00:40:20] Up about just, you know, Harpreet: [00:40:21] Being a woman in tech. I was wondering if you might be able to just share Harpreet: [00:40:24] Some advice or Harpreet: [00:40:26] Words of encouragement for the women in the audience who are looking to break into Harpreet: [00:40:31] Into Harpreet: [00:40:31] Ai machine learning into tech in general? And, you know, just share whatever advice or wisdom you have for them. Alyssa: [00:40:38] Yeah, I mean, the short Harpreet: [00:40:40] Version is like, Please do Alyssa: [00:40:41] Keep going. This industry needs you. Harpreet: [00:40:44] We are very lopsided right now. And your Alyssa: [00:40:48] Experience, anyone who has experience that is different from a dominant culture rate, the weight of that Harpreet: [00:40:56] Experience is Alyssa: [00:40:57] Even bigger than any one voice, right? So know that your your value just by your Harpreet: [00:41:04] Being Alyssa: [00:41:05] A woman or being a person of color or being, you know, someone from a Harpreet: [00:41:08] Background is Alyssa: [00:41:09] Nontraditional or you didn't go to Stanford or you didn't go to Harvard or MIT Harpreet: [00:41:14] Or whatever like Alyssa: [00:41:16] That. That experience is really, really needed in this Harpreet: [00:41:19] Industry depends Alyssa: [00:41:21] On having more diverse voices in the room because otherwise you end up in situations. You know, there's a very sort of active conversation going on in government and politics right now around kind of Big Tech, right? And you know, the influence that they wield and the small number of people who are wielding really big influence and those small number of people are Harpreet: [00:41:43] Are not, you know, diverse Alyssa: [00:41:44] Groups that are serving sort of the macro populations. And so the industry success depends on diversifying so that we can better serve the constituents. So, you know, I think it's important for democracy as a word for our country's national security and the macro [00:42:00] stuff. But you know, on a more micro individual, you know, don't don't doubt yourself and don't doubt the value that you can bring. Harpreet: [00:42:07] And it's OK not to know everything. Alyssa: [00:42:09] I got hired in Watson. I had product management experience. I had business I didn't have a clue about. I literally enrolled myself in Coursera, Andrew in Machine Learning 101, but I was able to get up to speed fast because I had some, some great partners who helped me get up to speed and I was willing to learn. Harpreet: [00:42:28] And I spent late nights Alyssa: [00:42:29] And, you know, my husband's an engineer and he coached me for it. So surround yourself with a network and a team of Harpreet: [00:42:35] People who can be your champions Alyssa: [00:42:36] And support system. Harpreet: [00:42:37] Yeah, absolutely love that. All the points made, the last point resonated with me a lot because I studied math and statistics in school and grad school, and I wasn't actually for a while. I'm sure you deal with plenty of those that at Blue Blue Shield. I didn't have any tech experience whatsoever. I did learn that stuff all my own, even though, yeah, I could do algorithms by hand if I really had to. I didn't. I didn't know anything about the technology aspect of it, and a lot of that was just taking time outside of work to upskill and learn about that. So thank you so much for sharing your journey and perspective on that. I mean, Data science, the whole field. It's really diversifying in terms of the types of roles that you're starting to see pop up. So for people who are coming from product management, project management type role, what are some adjacent roles on Data teams that they Harpreet: [00:43:26] Could be, you know, kind of Harpreet: [00:43:28] Shimmy into if that question makes sense? Alyssa: [00:43:30] Yeah, I mean, building a diverse, cross-functional Harpreet: [00:43:34] Team like these excellent project Alyssa: [00:43:35] Management skills like you get humans to work together successfully and talk about what their blockers are in language that the other Cuban can understand. So that's a straight project management. And so you don't necessarily need a different set of skills. It's applying it in a different domain, right? Harpreet: [00:43:51] So but you Alyssa: [00:43:53] Know, if I'm hiring a project manager into a team that is heavily dependent on data science or machine [00:44:00] learning products, ideally, I want them to have experience doing this before. And if they Harpreet: [00:44:04] Don't, Alyssa: [00:44:05] I want to make sure that they can get up to speed and they are not afraid of those technologies and they're eager to Harpreet: [00:44:10] Learn. So what I actually Alyssa: [00:44:11] Look for Harpreet: [00:44:12] Most is is attitude Alyssa: [00:44:14] And aptitude to learn less so than the Harpreet: [00:44:16] Hard skill Alyssa: [00:44:18] Necessarily. And so, you know, I'm hiring someone on my team right now and I have a lot of open roles, actually. So I come work with me at SHIELD. But they, you know, not a lot of people are applying don't necessarily have the exact skills I'm looking Harpreet: [00:44:32] For, but we're looking Alyssa: [00:44:34] For aptitude and we're looking for Harpreet: [00:44:35] Willingness and the Alyssa: [00:44:37] Excitement around solving these problems. Harpreet: [00:44:40] Absolutely love that. Thank you so much for for saying that because I resonate so much with me because I share the same type of Harpreet: [00:44:45] Viewpoint and you Harpreet: [00:44:46] Guys heard it hit Alyssa up. Get a job with Tell Tyler Harpreet Sahota you Alicia. You're in Boston to blow up. Alyssa: [00:44:55] So I mean, product Harpreet: [00:44:56] Managers, let's do it Alyssa: [00:44:57] If you understand Data Harpreet: [00:44:59] Looking for you. So let's do the last question before we jump into the quick random round. It is one hundred years in the future. What do you want to be remembered for? Alyssa: [00:45:08] Or hundred years in the future? I don't know, I I'll be remembered, but Harpreet: [00:45:14] I think, you know, I'm a big Alyssa: [00:45:16] Believer that small teams can solve big problems. And so if I can help move the ball forward and create teams and help be a part of Harpreet: [00:45:27] That, I don't think I don't think anyone remember Alyssa: [00:45:29] Me specifically as much as the outcomes that a team I was a part of helped achieve. Harpreet: [00:45:34] I love that. Very beautiful, but thank you so much, Alyssa. Let's jump into the random round. First question for you here is when do you think the first video to hit $1 trillion views on YouTube will happen and what will that video be about? Alyssa: [00:45:49] What is it now? What is the top number? Harpreet: [00:45:52] It's like nine ish billion and it's Baby Shark. Alyssa: [00:45:57] Sir, my kid listens to a lot of sharks [00:46:00] right there. Yeah, I guess I'm sure depending on the preschool Harpreet: [00:46:07] Scene, Alyssa: [00:46:08] I have no idea. I think the what the pandemic has showed is how Harpreet: [00:46:14] Fast traffic and attention can shift. Alyssa: [00:46:17] So, you know, it keeps me up at night as Harpreet: [00:46:19] Global warming and Alyssa: [00:46:21] That kind of stuff. So I Harpreet: [00:46:24] Worry Alyssa: [00:46:25] About that. I'm also really excited about the promise of technology and the promise of collaboration to tackle some of these big challenges. Harpreet: [00:46:32] Everybody listening and watching do your part. Let's make this the first video to hit a trillion views. You can do Harpreet: [00:46:37] It. Harpreet: [00:46:37] So in your opinion, what do most people think within the first few seconds of meeting you for the first time? Alyssa: [00:46:44] Tall. People think I'm tall. I got bright red hair. I kind of stand out in most crowds, particularly in Asia. So, you know, I'm easy to spot in the first few seconds. Harpreet: [00:46:56] What are you currently reading? Alyssa: [00:46:58] What am I currently reading? So I tend to read books intermittently, actually only on vacation. I'm going on vacation next week. So there's a there's a pile on next to my next to my bed of different books. Harpreet: [00:47:11] I recently read a book Alyssa: [00:47:14] I'm blanking on Harpreet: [00:47:14] The name of the author, Alyssa: [00:47:16] But it's the woman who's a mother of what is the name of the book? I'm so sorry. Harpreet: [00:47:22] We could skip to the next. Remember the name Alyssa: [00:47:24] Of the book? It'll be a good parent and raise strong kids. Harpreet: [00:47:28] So I got a old. Harpreet: [00:47:30] I got a one year old, so please send me that book because I need to figure out how to do that. So what song do you have on repeat? Alyssa: [00:47:36] I have the name of the song on repeat. It is showmen show man. Harpreet: [00:47:42] Huh? Who's that man? I have to look that one up. Alyssa: [00:47:45] So sorry. I'm like the worst. All I do is work and parents I don't like. Besides, like Baby Shark, Harpreet: [00:47:53] It no worries. Hey, let's go ahead and jump to a real quick random question generator, one of my favorite things to do on the show. So this is a [00:48:00] completely randomized question. Generator should be some fun stuff in here. First question is what's your worst habit? Alyssa: [00:48:07] My worst habit is talking before thinking and not taking a step back. Harpreet: [00:48:14] Pizza or tacos? Alyssa: [00:48:16] Tacos. Harpreet: [00:48:17] Yes, I'm a Californian as well, and it's interesting because like a Californian from Sacramento, Harpreet: [00:48:23] We owned a pizza Harpreet: [00:48:24] Restaurant for twenty five years. I just combine the two. I like taco pizza. It's the best thing. What's your favorite candy? Alyssa: [00:48:31] I'm Australian. Like Rush. Harpreet: [00:48:34] Oh, interesting. I'll have to try to find some of that around here. Alyssa: [00:48:38] You can get a Trader Joe's Harpreet: [00:48:39] Mountains or ocean. This is a good one for for a Californian. Was that ocean? Alyssa: [00:48:43] Definitely ocean every day. Harpreet: [00:48:45] Awesome. And this is the last one here, and it is. What's the last book you gave up on? It stopped reading, Alyssa: [00:48:53] Thinking fast and Harpreet: [00:48:54] Slow. Alyssa: [00:48:55] I could get through that book. Harpreet: [00:48:58] Man, it's a bit of a book. It's quite Harpreet: [00:49:00] Quite. Alyssa: [00:49:01] It's about like 60 percent read. Harpreet: [00:49:03] So I think Alyssa: [00:49:04] Halfway through Obama's latest book and four or five others. Harpreet: [00:49:08] Nice. So listen, how can people connect with you? Where could they find you online? Alyssa: [00:49:13] Yeah, reach out to me on LinkedIn Simpson Rock works. Harpreet: [00:49:18] Well, listen, thank you so much for taking time out your schedule to be on the show today. Everybody tuning in. You got to pick up a copy of this book. It is really good. It's going to really help solidify all those concepts and things that you're learning about and boot camps and stuff like that Harpreet: [00:49:32] And give you some context. Harpreet: [00:49:33] Highly recommend checking it out, guys. And until then, remember you've got one life on this planet. Why not try to do some big cheers, everyone? Alyssa: [00:49:40] All right. Thank you for having me. What?