Sian Lewis Mixed.mp3 [00:00:00] So the hottest topic in the hottest field in America right now is fairness, bias, algorithmic fairness, that algorithmic how to make algorithms more fair to people or how to mitigate bias and make them less unfair. So I'm going to assume you say quality. You mean that. See you shaking your head. Yes. OK, I have like talks about this. I talk about this a lot. What's up, everybody? Welcome to the artists of Data Science podcast, the only self development podcast for Data scientists. You're going to learn from and be inspired by the people, ideas and conversations that'll encourage creativity and innovation in yourself so that you can do the same for others. I also host open office hours. You can register to attend by going to get free dot com forward. Slash a d. S o h. I look forward to seeing you all there. Let's ride this beat out into another awesome episode. And don't forget to subscribe to the show and leave a five star review. Our guest today is a data scientist specializing in deep learning, natural language processing, predictive analytics and social good initiatives. [00:01:45] She's a lead data scientist and analytics manager at Booz Allen Hamilton, where she helps her stakeholders cut through the clutter to make better decisions and leads a team that transforms complex problems into simple solutions. She's also a member of Globe Booz Allen LGBTQ, a forum and the African American Forum. She also serves as a teaching assistant at General Assembly and a Data adviser for the National Urban League as part of the Women in Data Science Organization. She serves as the chair of the Community Service Branch, connecting data science capabilities to social impact initiatives for her contributions to data science and social good. She's been awarded the 2020 Women of Color in STEM All Star Award, the twenty nineteen D.C. Femme Tech Award and the two thousand seventeen Prince George Counties, Maryland 40 under 40 honorary. So please help me. In welcoming our guests today, a woman who is committed to creating equity with algorithm's Siân Lewis. C.N., thank you so much for taking time at your schedule to be here today. I really, really appreciate you coming on to the show. [00:03:01] Oh, thank you so much for having me. [00:03:04] Will you make me some really fancy Zephyrian? He definitely are. So she had let's let's learn a little bit more about you before I did and talk to their science. [00:03:14] Boring stuff. Let's learn let's learn about where you grew up and what was it like there. Well, I grew up in many places. I have I thought I had a true African experience, so I was born originally in Nottingham, England. [00:03:30] I got there because my parents were there. They were both immigrants from their respective countries. My dad's from Sierra Leone in Trinidad and the Caribbean. [00:03:39] They both left their countries, wound up in Jahlil England. And I was born there, lived there for a few years. And then I went to live in Trinidad for a few years in the Caribbean was wonderful and great, and then came to this fine nation and been here ever since then. [00:03:58] In the D.C. area ever since. Yeah, you're going up for me. It was like a one floating party. I would say, you know, when you're an immigrant and I am I am an immigrant immigrant from England, I don't have the typical immigrant story. [00:04:17] When you're an immigrant, you find any form these social enclaves, wherever you are with people who are similar to you, who are from similar countries as you, you form tight knit communities. [00:04:27] You really have the ability to form friendships with people of all kinds of backgrounds. So I really had a very cosmopolitan upbringing exposed to a lot of different things. And I'm an only child, which is unusual in both of my parents cultures. So being an only child, I got to be around the adults. So it was really very much a grown up when I was younger and I got exposed to great lectures and great museums and great travel adventures. So I had a really cool up in there. [00:05:00] So freaking cool. So like Nottingham, if I if memory serves me correct, isn't there like a forest where there is. Yeah. Isn't that the same forest where Robin Hood is from. You are correct sir. Did you ever go looking for him in the forest. [00:05:18] I did not go looking for him in Sherwood Forest. I was interested in the forest. Where you are from Nottingham is all you hear about therefore. No, that's crazy. [00:05:30] It's always interesting hearing the stories from from various immigrants that that come to the states. Right. So obviously, like I'm an immigrant, too, like my parents were not born in the States. I was I was born in the States. So I guess first generation, my parents are moms from Fiji, that is from India, but both ethnically Punjabi. And it it's always cool to kind of see the similarities between the immigrant story. [00:05:54] So with your parents, do they give you the option of being either a lawyer, doctor or a failure? [00:06:03] That is correct, yes. Correct. Anything in my family? Actually, my mom is a nurse. My dad is a research scientist. So for us, it was just doctor or failure. Uh, did you have any other choice? [00:06:18] No, no. The second option, you know, I think for Indian people for sure like that there is becoming a third option. The third option is Data scientists who could be a lawyer, Dr Data scientist or failure to. Yeah, I would. We're evolving. We're developing our our notions of success, that is for sure. So what kind of kid were you in high school? [00:06:41] I was. So if can you tell it was asking my friends to with it. So my perception of myself was being a very quiet, reserved dweeb who had a very terrible sense of fashion and a very great interest in reading. My friends tell it, I was extremely funny, very outgoing in all the clubs, did all the things, I just feel awkward all the time and I spent a lot of my time studying. I love school, which was kind of unusual for the people. I love school. I love going to school. I like doing homework. I like the whole nine and it all the AP classes. I just enjoyed it. What if I look back and look through pictures? I appear to be a pretty fun person. [00:07:24] So that's that's pretty interesting. Like, I mean, I like your sense of fashion now. You can't people can't see on the podcast. We just got this really cool polka dot top blouse. It's really, really cool. I like that. So anyways, that what did what did you think your future is going to look like in high school? [00:07:41] Well, as I told you, I only had the one option. Well, I had to go through a failure, so I was very much going to be a doctor. How I did the whole AP biology AP chemistry thing, born and bred and raised and directed to be a doctor is all I thought about. I wanted to be a cardiothoracic surgeon, a very specific kind of doctor. I knew how long it would take me in the schools I go to. So that's that's all I thought about. The only thing I saw for myself. [00:08:13] So what was the journey like then that took you from that specific, really specific vision into Data science? [00:08:21] Well, I suppose the second option, so I had the option of being a doctor and then I explored failure for many years. [00:08:29] You know, I went to grad school. I was terrible at it. I didn't want to be there. And I actually learned that I had no interest in actually anything health care related that required lectures on people. So I quit after great anguish, great terror. And I was like, oh, my God, what am I going to do with my life? And so I spent the next, I don't know, probably twenty minutes when I got to do so. At the time I was in Baltimore and Baltimore and, you know, just just doing terribly at it. And there were a bunch of nurses on strike at the hospital that was going to school, and I saw a friend from college leading a picket line and I was like, Hey, Don, what are you up to? Oh, yeah. And nurses are underpaid and they're know mandatory scheduling something. And I recall my mom going up here to work these ridiculous shifts at the drop of a dime. And that's all that that that sucked. And so I began talking with her. I went to one of these meetings. And then she worked for the Service Employees International Union and then she was like, hey, you're passionate and you get to go into this building without anybody harassing you, why don't you join us? And I was like, all right. And so I began my journey into labor organizing. I did that for a while. [00:09:58] And from there I because of my heavy love of mathematics, I converted that into political organizing or organizing work because I had the ability to make predictions on people's previous voting history from the voter file and such. No one around me was doing it. And then from that, I parlayed that into being a political director of a labor union, being a lobbyist, and then beginning my own political Data from from where I worked on a variety of political campaigns. Really, really big ones. Really, really small ones that became the vote whisperer when it came to predictive analytics in politics and then from there. [00:10:50] I really lead into Data science as it became a thing wasn't just me making predictions, it became like it was I formalized line of work and I was like, OK, I can do this. I would be a data scientist because I have been all these years, I just didn't know it was called. And so I went to a boot camp to learn more about deep learning specifically. And then after that and off to the races in Data science, how does that work? [00:11:24] How do you use predictive analytics for like votes and vote counting and. [00:11:31] Explain that a little bit to me just for your audience in case where we're doing this podcast one week and one day after the general election on November 20, 20, historic election between Joe Biden and Donald Trump. We still have a deal that they're still counting votes, so essentially you can use predictive analytics and build a bunch of models to do just about anything in politics. And I began doing it. You just did turnout models trying to predict how many people are going to show up to vote from what districts are going to come from and what can we do to change the amount of people that will turnout like actually vote by election? That's pretty basic, just based on people's voting history in the past, along with any demographic change of a locale, along with individual voter demographics and along with the results of any surveys that they took along the way where they get scored according to their enthusiasm for a candidate and their enthusiasm for. So you have all those elements you make with simple. [00:12:50] When I began doing it, I began making predictions for other things like how much money are we actually going to need to raise? And spend in order to make that to happen, where is the best place to send pieces of mail and what is the effect of that mail going to be, according to the stats, to improve how many people are going to turn out? [00:13:17] Where are the best places? If the national race for the best places to buy iPads, they're going to have the largest effect where the best places to find new donors, where the best places to persuade people who are on the fence or even just find them a million. There's so many different ways that you can use predictive analytics in politics. [00:13:41] It's interesting because if you abstract away like the the just the specifics of your context and you can take that same story, maybe move it to like a retail space where now instead of predicting who is going to come and vote, you're predicting when the next customer is going to come, how much you're going to spend, things like that. [00:13:59] So it's very, very close to not getting it. [00:14:03] So what do you love most about being a data scientist or nowadays? [00:14:08] I love you. I love. I love that people think that we are magical wizards that control the world. And then I get to burst people's bubble and say that was not medical wizards that control the world. I think. You don't think that, OK. I love I love Data science because it means everything and means nothing. [00:14:31] So the term is going around so, so often it doesn't really have any meaning. But I love that people are more and more data oriented where they really are more and more metric oriented when they're looking for hard things to judge whether something is good or bad. That's good for people like me. That's good in general. Having a more black and white situation for most things is nice. I also love Data science because I blink and something new has come out that has fundamentally changed the way I did things. Literally every single day there's something new, there's a new package, there's a new technique, there's a new finding. There's a new paper that comes out. And I get to rethink what I learned in school. I get to rethink what I've done practically over the years. And I love that odometers. [00:15:21] Do you consider Data science machine learning to be an art or purely a hard science? And why? [00:15:27] Oh, let me give you the most Data science answer ever, which is it depends. Weathervanes is like the answer to everything. That one hundred percent. Yes. So when it comes to like for things like marketing, analytics and if if I were working in a Silicon Valley firm, I would say to you, absolutely, literally every decision that they make is metric based, is data driven. If you do an algorithm to figure out if this thing works or not, you tweak a current algorithm. Everything, everything is. [00:16:01] But if you ask somebody in politics, let us say, well, more than science know, you can't really predict how many people are going to come out. You just can't because you don't have a popular you don't have an actual population to derive a sample of. This is something that's wishful and happening in the future. Right. So it has to be inherently cannot be science because you should have to go in. Twenty, twenty four is imaginary. So for me the answer is it depends. [00:16:30] So I currently work at two and I work in public service and I work on improving our active duty military health care system for Kubernetes. And I'm a work on very, very concrete things. And I also work on very imaginary things as well. What could happen if this happens? That might happen. So I spend a lot of time in both worlds, in art and in science. [00:16:57] I follow Seth Godin, I'm not sure if you follow him as well, but he posted one of his blogs are amazing and get them every day in the mail, and he posted one about the difference between science and art. And it really, really resonated with me. And it was if it can't be replicated. So let me restate that. If you can't replicate the work and to get the same outcome, then it's not science if you can replicate the work and get the same outcome. It's not art. I thought it kind of made me think a little bit because I like when it's a science and I lay out my procedure for you and say these are my steps, then you should be able to do all the same steps and get the same result. That's kind of like a science part of it. But the art part is, OK, I'm going to solve this problem with my specific knowledge. With what I know you're going to solve this problem with your knowledge, what you know, and that's the artistic part of it is the actual way we solve the problem. [00:17:58] Yeah, I agree. What role do you think being creative and curious plays in being successful as a Data scientist? [00:18:08] Oh, that's a good question. It depends. So, you know, being curious is incredibly important in general in any kind of quasi science based field. And that's what the scientist right now is not the hardest hit is a hard science, but it really is that from now until the cows come home. Being curious is really important, but being curious and also the attachment, because often as a data scientist, you're hired to solve a very specific problem. And the problems are usually in three categories. [00:18:42] How are you going to increase usage of something? How are you going to increase revenue or how are you going to increase engagement on something that's that's really it then. [00:18:52] And if your curiosity takes you outside of any of those realms, then it's a detriment because then you're not actually working to optimizing the problem with annual growth. You're off in another place that is that may or may not be not suboptimal. So that's if that's the case, it's like a guy and he's a great, great thing. And then too much of it is not the same thing with creativity. So I am a highly creative person. I call myself the fashionista. I love my fashion, that I love my bike, that I do. I'm very creative personally. I love art. And when it comes to finding really interesting Data to enhance the Data that I currently have, very creative, I was a very creative and presenting Data because I guess because of my political background, I'm very good at saying, hey, this is what we did, is what we found and convincing clients or whomever that the way to go is this. This is a strategy that we should be going forward. It's Data based. These are fine. [00:20:02] But then just like again, just like curiosity, creativity, if creativity often I would say lead to a result that can't be replicated because you can document as many things as you want. When you write out of Data, you've got to do a model. But the creativity involved is never document you literally. People cannot dictate what you do, that uranium is a tool to A.M. in the morning and you say to yourself, oh my gosh, I bet if I go to the EEOC and find this data set, you combine it with that one and find just this one column and multiply them together, make it into action, turn that and has my current dataset. Yes, perfect. Let me write that down so it's hard to replicate. So again, creativity is great until it's I thought I was the only one the state of science. [00:20:57] So what is a model anyways and why is it that we even build them in the first place? [00:21:02] You know, I am going to take a moment to disambiguate, which means it saying to clarify something because what of I have a lot of different side jobs that are all Data science based, one of my side jobs as being a teaching assistant at General Assembly, which is a bootcamp for folks to become Data scientists in a few weeks. It's a great thing. [00:21:25] And I get to hear students say, oh, my model isn't working well or my my model isn't working right. And they keep using the word model and the change I hear in the real world all the time. So I'm just going to say what a model is when you see what an algorithm is to answer your question. [00:21:43] So the model is basically a model represents what you have learned by running an algorithm and is a thing and models the thing that's the saved. [00:21:52] If you've done your machine learning after you've run an algorithm on training data, right. Then you have a representation of outputs that is a model, an algorithm. So, you know, is like a specified set of rules, procedures that you follow, mathematical rules to follow to solve a problem. So algorithms are things like force and regression apply that those are not models. When you run the algorithm over a set of data to create your predictions or whatever you want to do with it, that is a model. And so basically it's a snapshot of a moment in time. It's a snapshot of of a particular reality at that moment, that's all. [00:22:39] So how can we use algorithms to build models with equality and equality, like you will disambiguate that one for us to live for? What does it mean to you? [00:22:53] Well, so the hottest topic in the hottest field in America right now is fairness, bias, algorithmic fairness, algorithmic bias, how to make algorithms more fair to people or how to mitigate bias and make them less unfair. So I'm going to assume you say quality. You mean that you shaking your head. Yes. OK, I have like talks about this. I talk about this a lot. This is one of my personal passions. So I brought it up on top. I'll start with this. [00:23:23] Number one, as you know, the Data science person, garbage in, garbage out. If you have crappy Data that you're putting into your model, then you're going to get crappy results. The thing is, is that when you are a data scientist and you have the pressures of working in this gross revenue for engagement, often folks just don't even look at the data just like this is a day. And let me find ways to make this data set better so I can make a better model. [00:23:54] But in order to build algorithms, I'm sorry to build models that are more equitable, to promote equality. Let's call it justice fairness, because people are going to and I'm going to get people going to ask me after to do that. You actually have to look at your data, look, stop and look at your data, and you have to look at your data and you have to do it. And you have to look at things like, are there any attributes in here that are quote unquote protected or the protected classes or anything but age or race or gender or even income in here. And if there are, we have to treat them a certain way. And if they aren't, then we have to look at things that correlate to the protected classes. So in this part of the world that I live in, I live in Washington, D.C., and my part of things like race very much correlate to the part of the city that you live in and the zip code of the ward. If in D.C., if you live in Ward seven or eight, highly likely to be African-American, if you're living in Ward six, so much to have a look and see what also correlates with these protected classes, you can really do this. People say, listen, we took race out of the dataset, we took up gender, OK? And now finished doing all the things it's supposed to have a look at what correlates with those things as well. [00:25:24] And you don't necessarily have to take them out of your dataset. But what you can do is treat these good attributes a certain way. So, yes, you want to keep a model, you want to keep your data untouched. You want to run the algorithm first and then you want to see. Want to make sure that you can actually find a measurable difference between the outcomes of a protected class that's privileged and I'm sorry, but I give you this privilege and one that's underprivileged. So let's take a second. Less inflammatory one. Let's take age. Let's say you leave age into your Data that when you algorithm, you get your model and then you find that people over a certain age have outcomes that are between just a random number between one and three. And then people that are under a certain age have outcomes that are between seven and 10, a stark difference between them. So you want to average you will look at the difference between the outcomes of these classes to see. So it is a privilege or privileged groups within a productive attribute to see the mean difference it can. It doesn't even have to be the main difference. It could be the difference. You have to see if there is a difference in your prediction. And there usually is. Usually is. [00:26:56] You actually look for it. So now that you can actually quantify that there are different outcomes for people over a certain age and then people under a certain age, then you have actually seen bias have literally reported that there is bias in this dataset. And once you do that, you can set about the fun task of mitigating. Right. And there are a bunch of ways you can go back to. Can mitigate bias by just relating your different classes, by adding more weight to this class on the other so that there's no difference in the outcome. You can. You can add there are a million different fairness algorithms that you can run across your Data as well, talking to how to output fair outcomes for books. There's a lot of things as long as you want to be in this place. The place you want to wind up is when you look at your protected class, you look at age and you look at people over a certain age and people underestimate the difference between their outcomes should be civil, right? That is all you're trying to do. You can get the reaction in different ways, but it really matters that you actually have to take a moment to look and quantify the bias and then mitigate it. 50 million different tactics. [00:28:25] There was like a fair in this algorithm that that are our audience should go brush up on. Which one would you recommend that they go brush up on? [00:28:36] It depends on your Data is that I actually would say you won't know until you try them. Also, there's a really cool website you should go to find. [00:28:45] They have really cool educational tutorials and all the different evidence you can use and all of the prepossessing techniques you can use like reweight or even find in dance classes. Go to a three sixty is IBM site where they have a let me show you a tutorial on what is actually looks like mathematically all the different metrics that you can sell for. And they also give you notebooks on how to write your own algorithms or use their own package algorithms to mitigate of definitely include that in the show notes. [00:29:20] Thank you so much. I've never actually heard of that website, so that's very, very useful. Thank you. So let's talk about I guess we're when we're building models, right? Sometimes people will just take raw data set and they will just try to fit a model to it. They don't really think about feature engineering or building out that complexity in the Data. Do you have any tips that you could share with our audience so that we can be more thoughtful with our feature engineering and maybe we can talk about within the context of the type of Data that you work with where it's actually affecting? I want to actually affecting humans, I guess. But I guess you kind of know what I mean, like social social Data. Is that the right word for it? [00:30:00] Sort of. Well, no one featured I just talked about you want to do it in error analysis between different different groups in your data set to make sure that that at the very least, your error is consistent between then engineering is very is very I have a checklist and will tell you what the checklist is, OK? So the first thing I do is I look at any indicator variables and I try to figure out if there are any thresholds. I try to figure out if I can take the variables that I have and group them or deny them. You know, I the first thing I do actually before I even do anything, I've been doing this for years. The first thing I do, I get a dataset. So I plot every single thing that I want to see how my data are distributed to each individual variable is distributed. I see what's normal skewed versus how it is spread that really determines how I'm going to treat it. It also determines how I'm going to engineer features that I want to see in my data. Set is sparse or not, and there's a whole bunch of zeros and a whole bunch of those I'm going to deal with at a different way. And I also want to plot it because I want to see how the outliers are wondering, too, if there are some there are a lot of them how that's pulling my distribution in a certain direction. So if there are a bunch of zeros and such, I have to decide how I'm going to compute missing data. Am I going to do it all? If I am, what what computation technique to use? It's okay to talk about this and just say, take a moment. You know, in politics, there's a practice I don't agree with. Let's say that if there's a missing thing, you just fill it in with the mean hurts. My heart hurts and it hurts me so much. But they fill it in with me because I had a survey that is a micro scale between two and five. So three is neutral is it essentially hurts me. Hurts. [00:32:06] I prefer to mathematically impute missing data if I had the computation power to do so. Let people use mice. They use chain equations to figure out what the next was, what the estimated value is for me. So decide if you want that. If you just want to drop missing, if you decide the job, you just have to make sure that the reason why you plot is that you want to see after you drop that your distribution hasn't changed significantly, that you're variances. And they have to decide what you want to do with the outliers. [00:32:38] Keep them in the amount a lot of the time we scale or standardize because of the outliers. We want to bring them in and throw off your predictions for your algorithm. So, again, you just decide which one to do the outliers. Then I want to often, especially the data is really big mean. I just want to put stuff into buckets, bin it out so that my algorithm has an easier time of getting the bin instead of the individual numbers in between. Sometimes, you know, like do some transform? You want to allow times? One just turns out that often this is my life. [00:33:19] I don't know about your life. Often it's a bunch of random categorical stuff. In my Data set a bunch of random words and algorithms can only take numbers. And so I have to convert the words into numbers and have started. I want to do that too. I might just map it every time I see the word man that's one and see the word woman. That's a to sign. Or do I want to create some dummy variables to one honeycomb where it would be a one or zero or zero one for a man versus woman? [00:33:56] I've learned the hard way this little tidbit. If you're going to create the variables or do one hotfooting to that, have after please do that after you have grouped your sparse is part of your design. So it's always good you Data as much as you can do that. So after you've done something as good a feature, I say all the people with a twenty one and younger a new column that just says that is twenty one and younger and then that's it. And then if you do in timesaver and stuff to extract Data and make sure that the Data are the right format for you. So those are the things that if don't know what's up artists. [00:34:46] I would love to hear from you. Feel free to send me an email to the artists of Data Science at Gmail dot com. Let me know what you love about the show. Let me know what you don't love about this show and let me know what you would like. To see in the future, I absolutely would love to hear from you. I've also got open office hours that I will be hosting and you can register by going to bitterly dot com forward, slash a d. S o h. I look forward to hearing from you all and I look forward to seeing you in the office hours. Let's get back to the episode. [00:35:31] Thanks for sharing that. You have never gotten much of a chance to work with survey Data when I was a biostatistician once upon a time. Couple of times here and there, I did work with, like Likert scale Data, but never had much of an opportunity to work with cert. Is that something that you find yourself having to work with quite frequently? [00:35:48] Not so much anymore in politics all the time. Not so much anymore. Yeah. [00:35:53] If somebody wanted to do like a project that involved survey data, do you have any resources for them to go look into to get like survey data publicly? [00:36:05] Data that world has a bunch of really cool surveys that you can find and then can't remember. Right now there's a newsletter that has survey that you'll come to me and kubernetes OK? [00:36:16] Yeah, definitely. I did a world into the show. Not so. [00:36:19] I think something that doesn't get nearly enough coverage in any book or tutorial that I've come across is what to do once the model has been shipped into production. Right. So once we put a model, ship it. That's our work. Stop that as a data scientist. [00:36:39] Oh, absolutely. When you're becoming a data scientist, you know, you build a model and you're you're that 90 percent accuracy and you're done. And you don't get to learn this very hard lesson that models degrade over time. They don't stay as accurate as the moment that you have them in your little laptop. And certainly models behave differently if depending on how the data comes in. So, no, your work is not done. Your work is just done. Actually, you will be spending a great deal of the time tweaking the algorithm, trying to find trying to find the things out of your algorithm off redoing your algorithm to deal with degradation, sometimes running survival analysis on that site, on sideswiping survival analysis on either your model or final analysis on the people that are in your model to see when they're going to, quote unquote, die off so that you have to create a pneuma. Sometimes you have to forget all the data from the best. Scratch it because you just need something that doesn't predict very well. Know if your work is just just just beginning. [00:37:51] So what are some things that you think we should monitor and track once it's deployed from like the both from our perspective as data scientists and from the business perspective, from the business perspective, all people really care about. [00:38:08] But nobody first of all, very few people care about you, OK? Only you really care about what everybody else cares about the outcome. So obviously tracking the outcome because that's what everyone's eyes are on. That's really important. The second thing to track is, you know, sometimes your predictions as they're getting worse over time, they're getting worse. [00:38:28] They're going in a particular direction. And that's because you don't know what you have to find out. What is it due to the data set itself changing? Is it due to the historical data of old old history data not matching up with the most recent history data? You have to decide when the cutoff is right. So those are all the things that you should look at. You should also look at how often it's run. So the more you model, the worse it gets, more you want to get. The new CONTAINERIZE environment that we live, you live and you have to build models that are sturdy and they and those are usually not the most accurate models, but they are the most consistent because they're going to be when we want so many, so many more times than they have been in the past, anybody can spin up a doctor or a containerized environment. [00:39:21] So not to promote one thing or another the CONTAINERIZE environment and to just run, run, run, run, run, run, run, run, run way more than you ever thought it would. So the way that you asked me, the way that you actually build out the models also very much. But I've begun like in the last year, looking at parallelization and basically how to move out of one models in parallel across all these different probably different locations. And then in some other cases, like very large data sets, how to split the model split, literally split it up over many different servers and locations to make predictions. I never thought in a million years I would ever learn anything like that. I'm just a lowly data scientist. What do I care about how a model is actually going to make a prediction and how it's actually built up? I've had to learn that, and I think that is definitely the future of our Mellops, as they call it. That is that is what we're going to be spending our time doing in the very near future. [00:40:17] So I've spent a lot of my time at work recently, is doing a Mellops type of stuff the last several months, getting that stuff right. It's really interesting aspect of it and one that you don't really get any exposure to unless you actually deploy a model into production and you don't get to really deploy models into production. You might be able to endure an up and coming Data scientist out of some flask API in Jupiter in a book or whatever, the file, whatever, but. You don't have to worry about the business impact of it, you don't have to worry about Data coming in or Data changing or anything like that. So it's definitely something I think people should read up on that next point there. Nobody cares about your model, right? So most up and coming Data scientists, they tend to focus primarily on the hard technical skills and they think that that is what is going to separate them from the rest of the crowd. What would you say are some soft skills that candidates are missing that are really going to separate them from the competition? [00:41:22] Yeah, that's a very good question. And I think as designers, we've all gone through that phase where you were like, I'm going to be the greatest scientist in the world. But I read all the books when all the all the publications and all the studies and to right now go back and be the best we've all been there. That is nice and all well and good. If we live in a world that was like this. If we live in a world where which is you in a vacuum building models, that would be awesome. But you do not live like that in the world. You have to work with people. Right, in order to help you achieve your goal of being the world's greatest Data scientist or building a product that people really love or building something that you have to work with people, period, end of story. So there are a variety of soft skills that actually lend to the working with people. Think the ability to effectively communicate with somebody is a great skill to have not only at work but also in life. [00:42:25] I watch I manage a very tiny team of data scientists and I watch people try and communicate with our team. I just watch them and I watch people communicate with each other. I watch people communicate within my team. And I spend a lot of my day watching people miscommunicate or a good chunk of it, and the ability to effectively convey what you're saying to somebody else. [00:42:50] It saves time. It reduces friction is a wonderful thing to draw people in and enroll them up to the second thing and the soft skill. I don't know how to explain it, but as much as as much interest in being the world's greatest science scientist as you do and the other person. So imagine if you're going on an interview, you want to tell them all the cool stuff you've done, you want to get up and that the other and you have no interest at all in the other person, the person sitting across from you, no interest. I got to tell you how many times I sat and interviewed people who didn't ask, didn't have any questions for me at the end of the like, weren't even interested in in the actual job. They were more interested in his language skills. We've all been there. We've all done that. But I've had to take a genuine interest in your coworkers interest and your colleagues and interest in both the people in the field and actual interest that you're able to build out your skills, your colleagues, and create opportunities for you as well as I'll I'm going to talk about this so I can from politics aside, and I'm actually an introvert, which is a hard thing to do in politics, very difficult and not even like a regular introvert and extreme. And I don't need a nap after this interview for me. Just looking at your Facebook screen, just being your beautiful face, your beautiful face, but looking at your face but interacting with people is draining for me. I feel that you people get energized, but I'm not proud of that. So I've had to learn over the years how to be a very, very extroverted girl. And I was in the most activity is a technique that I picked up along the way I think are very vital for people. Number one, they always say this is true, networking is the key. [00:44:39] Networking is the key for any problem. You have a job that you want, anything that you need. You got a network your way to it, period. End of story. And I want to network, if especially if you're an introvert, there are a couple of things that you can do to mitigate that. So it's really easy to listen. To what to listen to people when you actually have an interest in them. So reaching out to people that you think are pretty cool, just saying, hey, listen, I just want to hear about how you got to where you got to. That's a very effective networking technique. And if I walk into a room, stand in one corner and just ask people how they got to where they are lined up, they're the people just talk for twenty minutes each day like I did today. Right. But you have to network in order to make it happen, because I think your second point, I would call that empathy. [00:45:24] I was trying to find I was reading Laws of Human Nature by Robert Grenier and he was actually talking about that, taking an interest in the other person, genuine interest, the other person. That's essentially just a form of of empathy. And that goes such a long way. And I mean, I'm actually super, super introverted myself. I'm like, this is the same way for me. Like, I get trained. But it's like, you know, like this is the 11th day of the month and I've done I've done pretty much one interview every day this week, just the office hours I do this part of my mentorship platform, Data says, dream job, where I have, you know, anywhere from five to 10 Data scientist on a resume like this, just asking questions and questions and it gets so, so draining. [00:46:10] So definitely I feel you can appreciate that, that there's got to be for an introvert burnout or something like that, I don't know. [00:46:18] But we had coined a term for it because people need to understand. [00:46:22] Yeah, yeah. No, 100 percent. Yes. So what tips can you share for a Data scientist who might find themselves having to present to a non-technical audience or maybe even like a room full of executives? [00:46:36] Well, you know, I just like you. I literally I talk to people every day of the week about this. The tip is this No. One practice no to practice. No. Three practice. I am just going to emphasize to practice what you're taught before you actually give it. I cannot tell you how many presentations I have sat where I'm like, oh my gosh, this person wrote the best PowerPoint ever and now they're just reading it. To me, that is an indication that they didn't practice. Or if they get a question from the audience or some something goes off a little bit, it's totally thrown off because that's what usually happens is the scientists love to talk about their models and their findings and their scores and their Data and how they got the Data and how they managed it, all that stuff. And nobody cares. Nobody cares. People care about the outcome. So the tip number to number four, because again, the first three is practice number four is to speak always from the outcome, always be answering the question of how you met or didn't meet the goal. [00:47:48] Always every slide that you make, every visualization that you present has to be. Is this showing them meeting? The goal with my model is showing that I'm not meeting the goal. And why that's it is only that is all. If you stick to simply talking about how you're meeting or not you the goal is a step that I took to get there. That's it. You can't go on like this. [00:48:14] But this is exactly the crux of the outcome of the entire back to them. [00:48:18] So kind of like the empathy again. But here's the outcome and here's how it affects you, right? Yeah, that's excellent framework. Thank you for sharing that as well. If you share some advice or insight with people who are breaking into the field, they see these job postings in some of them. They look like they just want the abilities of an entire team wrapped up into one person. [00:48:39] And then they just they get scared to even apply as if as if somebody is like going to, like, knock on the door, like the application police and say, you don't apply for this job, you're under arrest. Can you share some tips or least talk about this with woman and woman Data scientists and aspiring women in the sciences? A lot of studies show that women in general will not apply for a job unless they have 80 percent of the job requirements and men is in the 20s and 40s. [00:49:12] So women applicants have the attitude that I have to know before I apply and male applicants have the attitude that I can pick it up along the way. I'm confident in my ability to learn this huge difference, a huge difference. So I'll say so that's one thing. So another thing is this. If you ever see a job that has everything in the kitchens and they want a person who does it, engineering and data analysis with the presentations and Data science and machine learning with the prediction and I don't know, NLP and whatever everything about the one. You know, a couple of things about this case, because this is an opportunity. We can see it as a place where you don't want to. So it's an opportunity because of the folks that are writing this job description. Obviously don't know that you don't know. We have all those skills in one person and that can be an opportunity because they may not know what is a good think. It's a good thing. It's a good thing. You can go to the interview and say, hey, I am what you need. [00:50:15] We can see like, oh, this place, I don't know. I don't know what they want. And they might spend a lot of time wasting your time. They might avoid two ways of looking at it. You see a lot of jobs I require. I look like I love to see this. You see something like fifteen years, but they experiment to see and sometimes certification and, you know, data science hasn't been around for fifteen years. I so I was like, OK, my advice to everyone is if you feel that you have the skill set to do the core of the job. If you feel that you can learn the rest apply, just apply anyway, I can't tell you how many jobs that I have gotten offers for that require and have a page there that said that they require a lot of what they decide comes down to is do you have the base skills, but also do you fit into the company, into the vision that they have you fit the vision of the person that fits the role? [00:51:15] I think you have to have a masters and you have to have this and you have to have fun the other. But often I have fun that you just have to have either a strong desire, definitely a great presentation of your skills and the will to learn more. So just apply anyway. No one's going to laugh, you know, one's going to laugh at you or anything like that. But, you know, I work at Booz Allen and there are people there that have multiple PhD. There are some people that don't have a degree at all or some people that have been a lot of people come from other industries and they're just like their career change of a lot of people have been doing this since it was called just efficient. They just apply anyway. If it strikes your fancy as a place where you want to work, apply anyway, no matter what. [00:52:02] It cannot be used to be just plain statistician. And then I became sick. [00:52:07] I used to be a place I was it. [00:52:10] So it was really good advice. Thanks so much for sharing that. I was wondering if you can speak your experience being a woman and Data woman in STEM and if you have any advice or words of encouragement for women in our audience who are trying to break into the into the Data world or who might be facing some challenges in their career. [00:52:29] Ok, well, it's not easy. It's just not like it's just not easy to be a woman. Any kind of something is definitely not easy to be Data. I think when I did a bit of the survey a couple of years ago, Data science had the lowest ratio of women. I don't know where it is. I think I think it's a lot more I think it's gotten really, really improved in the last two years, really, which is in the field that's dominated by one gender over another. [00:52:56] The minority gender is always going to have difficulties. My advice to women is always number one, you have options. So right now the is super hot. You're not going to be high for much longer. But right now it's super like you can work anywhere you can go and work anywhere in any industry and actually any part of the world that has the Internet consistently seriously, you have options. You do not have to stay in a bad situation. That is one of the best things about you. Can I can I pick up tomorrow and go back to number two is the leadership has to be committed to culture change. And it's actually not a hard sell to get political change. So everybody knows that every executive means that the more diverse your team, the more gender balance your organization is, the more money you make. Right. Literally, the more money you make. That thing just came out a couple of days ago that companies that are run by women make significantly more amount of money than companies that have one that everybody knows. That is actually a winning proposition to have more women in the field. And if you have a gender imbalance to create an environment that's conducive to collaborative, it's not a hard sell. [00:54:12] And I would say nowadays executives and people who are running these companies are all ears, all ears. I can't tell you how many initiatives there are to have more women coming. A data scientist, Data Data science. So so that's one thing. And I think because data scientists know this to be true, it really is that people really are trying to get more women in the field and keep women in the field. The other thing is I always have this thing about people. I talk about fairness and such talk about people is that they don't they can make a difference where they are. But when it comes to like they come to this for creating more fair algorithms, you can start exactly where you are. Right. You can you don't have to set the world on fire, but you can advocate if you are a man, you can advocate for your women colleagues. If you are in a hiring position, you can advocate to hire more women legally. You can do that. You can advocate to not use applicant tracking systems. I found women out of of hiring more at a higher rate than you can do it manually. You can make a difference right where you are. Everybody wins. Everybody, thank you so much. [00:55:23] I absolutely love that perspective. And I know that the women in our audience are really and feel empowered by that. So thank you so much for sharing that. So I was wondering what the STEM community can do, what the Data community can do to foster the inclusion of people of color, black Americans particular into our field. [00:55:39] Oh, I saw. Because the greater the most stark examples of unfairness. Right. The questions that we build, the models that we make, the starkest representation of how unfairness unfolds and really negatively impact people's lives comes down to gender and race and most recently race stuff. [00:56:04] So here you have an industry that perpetuates pop, anomalously fosset would perpetuate unfairness, especially among people. And then at the same time making a really concerted effort to bring in people of color. But there's a mismatch with the mismatch. [00:56:23] So to actually bring in more people of color, you just have to hire people, just have to hire them. That is all. This is not like rocket science or anything like that. Just hire people. When people hear from you, just hire them. And I can't tell you how many people I have worked with over the years. And it's silly now. But just over the years, folks were just they were not hired for their hired for their potential. [00:56:50] And I do hope that people of color get the same benefit of the doubt, because often what you find in the person is that they feel and they have to present as being extraordinary and extraordinary. People are few and far between and, you know, just hire just hire people. I talk to recruiting folks about hiring more people and they often say this to me. And a big CEO of the Silicon Valley companies said this. And then another CEO at a bank said this recently. You know, I just the pool of people of color is small. I can't find them. I just can't find them. Well, here's the thing. I'm a I'm a black woman and you and a black woman. And growing up, I had no idea about so many things. And I went to to a decent school and I had no idea why so many things. And the reason I didn't know is because as a person of color, I'm excluded from so many things. [00:57:48] I had no idea about internships didn't. No, just didn't know. Nobody ever told me. So you're not you don't even know about places where you could apply to work. You even know about school that you apply to college. You just don't know you're completely cut off from access to information. So if you want to hire more people of color, they may not even know that they are eligible for these jobs. You have to go to them, which is not hard. There are a ton of cues, right? There are predominantly white institutions that have very large populations of people of color that go to these schools just to go to them. And your problem is solved. Right? There are so many organizations, there's many Meetup groups that are race or gender based peer. You might go to them so that you can diversify the people that work on your job and then just hire them. [00:58:45] That's it. It takes slightly more effort than then. So with the same type of person coming to you, because you already know that you already know the process and the inside track, but a lot of people of color don't know. [00:58:59] Thank you so much. I really appreciate that. And so to our last formal question before we jump to a random round, I think it is one hundred years in the future. It is the year twenty one twenty. What do you want to be remembered for? [00:59:15] Oh, my slip dance moves. [00:59:21] Where can we work with a slick dance where you can it and you have to pay good money to see them. All right. I want to dance move. [00:59:31] I also want to be known for creating creating algorithms that I create that perpetuate that inspire fairness. Right. And inspire healthy behaviors and people. I want to be known for that. [00:59:46] Jump into a quick round around here, starting with the first question. When do you think the first video to hit one million views on YouTube will be? And what will it be about is going to be the next two years? [01:00:00] And I think it's going to either involve Donald Trump or I think it's going to be involve involves some kind of baby dancing might involve a cat in my son. [01:00:10] He is six months old right now. If you are listening to this in two years, you better be that baby dancing. [01:00:17] Yes. [01:00:19] What would you do if you're the last person on earth? [01:00:22] Oh, man, that's a good question. The last person on Earth, I think I forgot how to build a boat, explore the world on my own. [01:00:30] I like that. [01:00:32] If you were to write a fiction novel, what would it be about and what would you call it? [01:00:37] Oh, I would write a fiction novel about the hijinx of political presidential political campaigns, no hijinx. And I would call it a political and hijinx. I call it a campaign of kindness that. [01:00:56] What are you currently reading? I'm kind of reading a book called Forget a Mentor. [01:01:02] I get a sponsor. [01:01:03] Oh, yeah. Yeah, Sylvia. [01:01:04] And quit my job, send it to us. And I'm like, wow, this is a really cool book. [01:01:12] Yeah, it is a good book. I really enjoyed that. What song do you have on repeat song? [01:01:18] Everything is a wonderful song by an artist named Juvenile. [01:01:23] It's called Back That Thing up to Data back. [01:01:28] I just don't know why. I don't know why I behave this way. I was raised better than this and apologize to my parents too. But I really love that song. [01:01:36] When I was in 11th grade in that song, maybe even younger than that, when I was young, I was young. Wow. But I mean, half the audience listening probably wasn't even born when that song. Yep, that is true. It's a great song you guys. It is quite a philosophical piece. It's let's open up the random question. Generator car. [01:02:02] Can we go first question another random question generator. What is one of your favorite smells? [01:02:07] Lavender pet peeves. [01:02:10] People who are attention sucking people that not a narcissist narcissist. [01:02:18] What story does your family always tell about you? [01:02:22] All the time I fell down a manhole. What? Yes, I fell down a manhole. I was walking into that. I was six years old and it was flooding into floods in the yearly basis. So you can't see the actual ground. I'm just minding my business lobby school and I fell on the metal, didn't know how to swim, almost drowned. [01:02:43] And I think people think that's funny, you know, to be covered in sewage water. They think that's hilarious. But then, you know, I didn't drown because an angel, a white person came from the sky and pulled me out. [01:02:58] And when I say a white person in the sky, there's a telephone worker on a pole. And he saw me fall down. His albino became and he pulled me out, saved my life. Wow. That is crazy. It's crazy. [01:03:11] Who are some of your heroes? [01:03:13] Oh, my, my my current evil person of my adversity. Now a billionaire makes my favorite makeup that matches my skin tone. So dark skinned black woman, it's huge and makes really good products. Just also very that is another one of my heroes obviously is my mom, my mom who immigrated not one or two places on a very tiny island from her very poor village. She's a gangster. She's a rock star. Amazing. And my dad, he's a who came from the poorest countries in the world. And now he's like this renowned professor person. That's pretty cool because that's awesome. [01:03:52] That is awesome. Do the last random question here. When was the last time you changed your opinion about something major? [01:04:01] Ok, so this happened to me the last week. I let you know during the election. So with the election, you know, all the predictions were a landslide. And I thought to myself, well, I don't think it's going to be a I do think it's going to be OK on Tuesday. [01:04:16] Then how could people connect with you and where can we find you online? [01:04:20] You can buy me on LinkedIn San Luis LinkedIn slash all alil as a ed all. See this where I am. You can find me on Twitter, at Altium, on Instagram and you on Facebook Symbolist. [01:04:37] I'll definitely include all those links into the show notes. Thank you so much for taking time out of your schedule to come onto the show today. I really, really appreciate having you here. Thank you for having me. This is a totally fun.