feb26-happy-hour.mp3

[00:00:09] What's up, everybody has everybody doing. Welcome to the @TheArtistsOfDataScience Happy Hour. Hope you guys are having an amazing week. Super excited to see all of you guys here, man. The room is packed. The waiting room is packed. Looks like we're going to have a good afternoon today. You guys got a beer or wine or hard liquor? Cheers, man. Happy to have you guys here. Hopefully you guys get an opportunity to check out. The episodes are released today with Tim Annells.

[00:00:36] Temenos runs the analytics, explained a YouTube channel, had an excellent time talking to him, kind of was on a trend of product management and wanted to really understand that a little bit more. So that's why the last couple of episodes really heavily focused on that. If you got if you had hot tea, can choose a hot tea as well. So we got a lot of friends in the house. We got Tom, we got Greg Coquille, we got Angelo oxidase in the house, Kristen A. Kozel vivaciousness in the building, Russell Willis, man. So happy to see you guys here. We got wego coming in as well. So, guys, for the first few minutes of office hours today, about the first step, 15 to 20 minutes or so, we have a special special treat today. We got some of acase students here who will be essentially practicing a presentation they have that they're going to be giving at a much larger audience tomorrow. And we get an opportunity to see a real Data science project in the works. And not only that, we get to help them prepare by asking a bunch of questions. I'm sure they're going to be appreciative of that. Ben Taylor's in the House as well. Happy to see you guys here, guys. So if you guys have questions in the meantime, though, please do go ahead and enter right there into the chat. But you have a question. I'll add you to the queue right now. I've got the students queued up for at least 15 minutes. You know, we'll go over to a half hour if there aren't that many questions. But I wanted to really give these guys an opportunity to practice what they're working on. So without any further ado, let's go ahead and take it away. Actually, do you want to go ahead and introduce your students?

[00:02:20] Absolutely. So good evening, everybody. My name's Oxer. As you know me, I went to University of Maryland. That explains the swag that I'm wearing today. And we have this annual Data challenge where our students get to participate on different Data science questions and present their analysis towards the end of the week. So fortunately, this year I'm mentoring a group of students who is presenting their analysis to us today. The final day of the presentation is tomorrow. So we are hoping to give our best. But this would be a good dry run for us to present what we worked on this week and also get an opportunity for some questions. So I'd like to introduce my team and they'll go one by one. But the presentations, I guess, today are going to go first. Go ahead.

[00:03:04] So, yeah, I'm Teddy. I did a lot of the Data scraping and analysis for this project. So did Gabriel. But he is driving. So I'm going to be presenting that part.

[00:03:16] And then for our next member. Brendon.

[00:03:20] Yeah, I'm running. Running. Yeah, you. I helped with all the Data scraping the. Well, not but honestly, you, Gabriel did the most. I've been helping with logistics, the growing, the abstract, getting everything organized, make sure I meet like all the requirements we need to have. And then last members.

[00:03:37] Menar I everyone I'm hour I'm also part of the team, like Brendon said, Gabriel and Kelly, I'm taking over most of the Data scraping on Python. I'm a little bit more comfortable with our time and doing a little bit of our work with the Data set, but a lot of it ended up being on Python and it also was just in helping with logistics and writing about abstract.

[00:03:59] Ok, so I have permission to screen here and here. Yeah, absolutely. Go ahead on the bottom of the bed. I just want to make sure that yeah. Go for it. Is this working for everyone. Looks it. Yep. OK, so this is a little bit it's a work in progress analysis.

[00:04:21] So our project is finding trends and covid-19 did a study of the correlation between database indicators in the Schengen Area countries.

[00:04:32] So these are the data sources that we were using for our analysis. We were initially given from the data challenge, the covid-19 Global Systems Tracker, and this was provided to us by the UMD Social Data Center. They were we had some good data in here, but we really wanted to produce something that could be used for some predictive analysis. So we ended up joining some data from the covid-19 World Survey Data API. So these are the two main areas ended up using for our analysis.

[00:05:05] So in.

[00:05:09] This Data said, this is from the Maryland covid-19 to Survey API there are they just updated it. But when we started, there were 21 indicators sent out. And these are all like, so each of these is returned as a smoothed sorry, like a smoothed aggregate of the weekly values. That's what we were told. So, for example, if so, for example, each one of these has a specific sample size and each one of these indicators correlates to the percentage of the population experiencing one of these. So for the covid indicator, it returns the proportion of people in that country or the estimated proportion of people in the country based off the survey experiencing these types of symptoms. And there are twenty one of these and a lot of them didn't have complete data. So what we focused on most was the good indicator, the best indicator, the contact indicator and the anosmia indicator and endorsement refers to respondents reporting loss of smell. So to collect this data we gave you and I both coded different data collection methods and for some reason has worked a lot faster than mine. But mine was just just building the links and calling the API. And it took a pretty long time because it's this twenty one API calls per row and there were around seven hundred and fifty thousand rows of data. So then we picked the Chenggang countries because those are the countries in Europe that allow free travel between the borders and they didn't shut them down during the pandemic. So we figured it'd be interesting to study the trends and see if there were any correlations between a spike in one country and its neighboring countries.

[00:06:52] And then this is just our abstract. We had to turn this in by Wednesday. And just to clean up the data before comes from Facebook surveys that are prompted million people logging on Facebook and it's all across the world. So just clear up. And then basically I so this is our abstract. Basically, the goal of this was to identify and analyze relationship between covid-19 indicators provided to us through this World Survey Data API. And just like we're trying to do some predictive analysis to see if we can take the old in like the past indicators and use the to predict the future indicator and angles, then like here we just talked a little bit about just what Teddy just said about DSV contains twenty three columns, all those indicators. So yeah, right now we're still working like like you said, it's a work in progress but yeah. So it's just do you have the graph study or.

[00:07:43] Yeah. Yeah. So I'm sure that what we have so far, the predictive analysis, we didn't focus on so much because we just felt that while something cool to show, it's not really proving anything, especially as vaccines come out. Vaccines have only been available for the last three months. There's not a lot of training data. So we might not get from our presentation tomorrow also because the algorithms a little bit funky. It's not working. But so I wrote the analysis notebook really quickly and imports of these things. And long story short, because I think we're more focused on analyzing the trends when you run this whole thing related.

[00:08:24] Sorry, do you can you maybe zoom in a little bit on the chart so we can look at the actual values?

[00:08:31] Yeah, yeah. I'm going to generate a chart from this. So one country that we thought was interesting study was Sweden, because unlike most other countries in the world, they didn't provide a response to the global pandemic. They didn't shut down, didn't have any mandates.

[00:08:46] So in fact, Sweden next.

[00:08:50] And then from the list of indicators, Seelie refers to people, percentage of people experiencing covid like illness. Mask refers to the proportion of people wearing a mask in public contact.

[00:09:03] Same thing proportion of people who came into contact with someone with covid anosmia, which is that. So when I click next, it generates it's interesting. It just worked a minute ago I could pick another country. I have some screenshots saved. So for example, Wasta, France and I'll zoom in.

[00:09:25] We have to reformat the data because that was a work in progress. So France doesn't have as extreme of an example. But you could see around the end of May, beginning June, the percentage of people wearing a mask in public seemed to decrease a little bit around the same time, the summer months, the percentage of people experiencing contact with someone with covid increased and dropped sharply around November and again, big increase around the holidays. So that was one example of one country. And we're still working on the formulas to be able to cross compare countries and record everything and make it simple. But another good example, I believe, would you say Italy, because we know Italy was hit pretty. So covid and Nazmi, these are pretty small proportion of the people, but again, same similar trend, you can see a decrease in mask wearing around the summer months and the rate of contact. But what's also interesting is as the mask stays around, the same contact rate dips and then increases around the holidays. So this says to me that around the holiday season, a lot of people who claim that they were wearing masks were probably giving false information to the survey because we could see a pretty clear correlation otherwise around here. And the same with the summer months. You can see that contact greatly increased as weather got nicer.

[00:10:50] And for now, that's all we have.

[00:10:54] We have more data analysis coming, but that wasn't clean enough for us to feel present.

[00:11:01] Oh, man. Thank you, guys. It's pretty cool to see guys put that into action. Someone open up the floor right now to questions about your presentation. So if anybody has a question about this presentation, go ahead and meet yourself and go for it, said David Knickerbocker. Knickerbocker is a unmuted Dave.

[00:11:19] Hey. So I saw that you had a couple of features and their trusted CDC. And I think the other was like, trust the politicians. I was wondering how thorough is that feature? How trustworthy was? It was a pretty scarce, yeah.

[00:11:31] So we didn't focus on those columns so much because there's pretty scarce Data in them. So if I screen share again, if you want to pick a country from the list on here, can you guys see this.

[00:11:43] Yeah. Yeah. So I don't know why this is not working. Because it work.

[00:11:49] It's running for me.

[00:11:50] I can share a screen and try to convey to try Sweden again and then let's just do Sweden and then let's just try Seelie just kind of like illness and trust politicians. There we go.

[00:12:14] So you can see on this graph in Sweden, trust in politicians spiked around January and it seems to be a decreasing trend. We haven't done enough in-depth analysis to really see if there's any other correlations. But that's why we created this chupa notebook to be able to find that.

[00:12:30] And hey, where does that field come from?

[00:12:33] We trust politicians of that from CDC or that is from all of this data in this presentation is from the Maryland covid-19 World Data API.

[00:12:42] Ok, said that we were provided with I've been doing mounded and then covid study based on how state politics affect covid numbers. And so when you split the numbers up by how states voted overall in previous elections, you get Nete separation two. So that's pretty interesting. So those two features really sparked my attention because they're similar to something I'd be looking at.

[00:13:08] Yeah, we picked these countries. The I'm probably butchering the pronunciation. So if we in this dataset, we don't have the other countries available. But if we were to pick a country that probably has a less what we would consider to be a less secure government, I'm sure that the trust politicians would be significantly different.

[00:13:30] Cool. Very cool. Thank you. Hey, Teddy, I'm Tom here.

[00:13:35] So my comments aren't going to be very Data science. Yeah. Who are you presenting this to?

[00:13:42] There is a whole board of people over this whole event. Doesn't seem to be very well organized. OK, so we are getting information kind of like on a minute by minute basis of who we're presenting to. Sure. We have to submit our materials. It's very loosely organized.

[00:14:02] I could be wrong and I'd really love to hear anyone disagree with me on this. But you presented this to other data scientist. And I mean, excuse me. What I mean is you sound like you were presenting to other data scientist. Yes. And I would encourage you not to do that. OK, if if the event's not very well organized and you don't if they haven't told you what your target audience should be, you'll actually get better experience by presenting to non Data scientists and and a couple of high points that I hope will help. Because you you guys, I respect the hell out of you for wanting to be grilled. I didn't know what a Shenzen country was until a few slides after you mentioned it. Please don't show me code. I'm a data scientist. I love code. I don't want to see it in the presentation. I'd love to hear if others agree or disagree with that show the outputs. And I know you're still coming up with this, so I don't mean that ultra harshly. Don't show me a page of text with the abstract, maybe some high points that really show how the the high points connect together like a mind map or some. Thing like that, but it took me I don't I still don't think I know what was the purpose in doing this. It was that was there was there a reason you felt it was important to present this? I mean, obviously, covid is important, but what are y'all really trying to what value add are you trying to give through this? Those are just some of my thoughts. And I can see Greg is about to shoot me if I don't shut up. So I'm going to do that because I like what he likes to say.

[00:15:36] No, no, no. I think I think you're you're onto something for them. And and I fully agree. But, Teddy, correct me if I'm wrong. You're saying to us and this is this is a business guy here, you're telling me that you were concentrating on the prediction piece of it, but you're leveraging Data science skills to be able to aggregate all these indicators under one roof and extract insights at scale. Yeah, right. So you're able to analyze countries by pulling indicators through your tool in extra insights. So to build on what Tom says, I would go with this high level structure, which is like some sort of a pitch of a tool you created that will get everybody excited, whether they're a technical person or a non technical. And typically you want to start with a nagging issue and that Harp and there is a void to tackle that nagging issue. Then you go, you created this solution and you tested it. You're at the high level results by using my tool. This is what I saw for Sweden. This is what I saw for Italy by pulling these indicators, et cetera, et cetera. And then you can go a little bit to go after that. What was the logic that I used to come up with the solution X, Y, Z, blah, blah, blah, but still being some sort of in between going no technical to explain how you are able to access the Data clean up the data in the indicators that mean a lot to you in terms of performing this analysis. When you do this for prediction, you should do the same thing, too. So that's that's what I would say.

[00:17:16] And Tudi, real quick, I'm going to discover that's the infamous Ben Taylor above. So speak to why avoid the how could I have said it more concisely? And Russell really gave the best explanation. I think it's the Denzel Washington in the Philadelphia movie. He played a lawyer defending Tom Hanks. Explain it to me like I'm a fourth grader. And but I like Russell's version even better, as if explaining to your grandma rather than your peers. I think that's spot on. By the way, take your code. That's OK. But do a flowchart if if there's some reason to show the importance of the code. But everything Greg said to I hope this is helping. I know it. It's hard to take this stuff, but you guys actually have the foundation to turn it into something really powerful.

[00:18:05] Yeah, I think like I said, we were still in the process of I have a whole folder of other graphs and things that we're analyzing. So our original we had a million different ideas originally. And I think that over the course of today we're going to be going through all this data and seeing not just trends, because, of course, we can kind of say that, of course, over the summer, people were in contact with more people who had covid. I mean, this data proves that, but that's kind of like an intrinsic thought. So a lot of what we have right now seems to be proving intrinsic thoughts like spikes around holiday seasons, things like that. But what we want to do and what I wanted to do is see if I see a spike. This is where the predictive analysis comes in. And this would just be helpful for anything else. Like I said, in this Chenggang region where the borders are open, does a spike in one country correlate to a spike in a neighboring country in the coming weeks? That's, I think, one of the main things I wanted to try and answer. And we're still working on the algorithm for that, cleaning the data, since I don't have too much experience with that. So it's over here and there.

[00:19:12] But does that sound like a better focus to you than some regular I have a question on there or anyone on your team on the visualization. And I put it in that chart that I looked like two of the variables that you have are such a higher rate that the really browning out the variation and the two other variables with the lower rate. And can you make that into like a double axis chart with a different scale on the right and get the other two at a different scale so we can see or have you presented and see the variance a little better?

[00:19:42] Yeah, we can do that. We just put them on one graph, I guess, for an example, just to show how it works. But yeah, we were working on that. There is more code and I've moved it over to Jupiter for presentation purposes. But in visual studio code, that's where I'm doing most of the messing around with it. So if you want me to screen right now, I can show you, I guess the two variables that you're talking about were the. And the anatomy of it, so if you want to see that right now, I can show you how those two specifically correlate my screen showing you're currently not screened, sharing concerns.

[00:20:18] So for sure. And if we run this, do you have any country you would like to see or any companies did this to Italy?

[00:20:32] So here you can see in Italy, I guess the graph needs to be formatted still. Our mentor is actually sent us some links to do that. You can see the rate of covid like illnesses is significantly less than the rate of people experiencing loss of smell in the survey. Those are two different factors. So, for example, one insight that I believe that we can pull from this is the rate of people infected with covid is actually significantly higher than those experiencing the symptoms because of loss of smell in its own category at around twice the rate of people experiencing all the other symptoms. And you can at least infer that there are significantly more people infected with this than experiencing symptoms or getting tested for it.

[00:21:19] For example, I'd be I'd be curious to see if there is a rate increase in anosmia as mask wearing went up, because what if people just thought they couldn't smell but really they just had a mask on? I don't know. Only that might be interesting to see. So Christine had a really cool comment inside. I'd love to hear her comment and also wrestle as well. So let's hear from Kristen and Russell on this and then we'll go ahead and we'll go to the rest of the questions in the queue. OK.

[00:21:51] Hey. Hi, Teddy. My quick suggestion would be to go to a site like Slide's Go or Slide Carnivàle. They've got tons of pre formatted slide decks for every type of audience, every type of presentation. And it really adds a big value impact to the information you're delivering. And it's really quick and easy to plug in your information into like a beautiful setup. So that could be something quick to do between now and your presentation tomorrow that I think your audience would really respond well to.

[00:22:26] Ok, Russell, you had some points here that I think are worth mentioning for the audience listening on the podcast.

[00:22:33] Sure. So so I had a question about that. He said to you guys that I know you mentioned the specific one you've used, but did you experiment with others? And the reason I ask is that I've been doing analysis of multiple data sets globally available, and I've never found one yet that is reliable at a global level. So there's some that are a good fit for US centric data and some that are better for UK or Europe. But there's nothing really that can give you a good understanding of how the covid pandemic has been hitting, you know, the entire planet as as a global at a global level. So I'm really only talking about the main characteristics because I've not been looking into the nuance Data that you've been looking at. But just the three characteristics are either active cases, covid cases or deaths. That's what I've been doing much my analysis on. And there's been some real issues with the data that I've managed to find on GitHub. So interesting to see if you've looked at other datasets.

[00:23:40] So we had to use the data sets that we were provided with through Maryland and this is one of them. There is another that Menar mentioned and we've incorporated some of that. But there is significantly less data and more sparse data. And they're focusing on different aspects that we were having trouble correlating because this dataset contained a lot of the same stuff to a similar degree. For example, the other one focused on people worried about finance. And I forget what else. It was only two or three categories of data that really wouldn't have provided us much insight or allowed us to correlate anything together.

[00:24:16] So while we know that this dataset is definitely not accurate, we were going with what we were provided, especially since I think a lot of the judges are from the University of Maryland system. We figured it would be OK to use this database as and not say it's accurate, but use it as our baseline at our man.

[00:24:34] So Eric had a great comment here. He would recommend putting your biggest takeaway at the very beginning of the presentation that it's the hook bluff bottom line up front. Cerrado guys does excellent presentation. Thank you so much for being brave enough to share it here with us. Um, I'll go ahead and I'll give you guys a copy of the chat. I'll send that straight over to talk show and he can disseminate that to you because there's a lot of great advice in there. I see there's one last comment here from Greg. And after Greg's coming here, we'll move into the the rest of the.

[00:25:07] The questions that I got cued up, Greg, go for, it was just a quick question I have for you guys, the the columns I guess you selected, did you do any particular, like, special cleanup for them to to make it fit? I'm imagining even if they were populated with the most Data, did you have to still explore it and try to change a few things about it? But was it of highest quality?

[00:25:35] So this Data is exactly as it is grabbed from the API because the API had two options a daily value or a smooth value and the status of the smooth value. So for example, someone reporting it, COGAT like illness, they're going to be sick for a few days in a row reporting that. So the smooth value accounts for that. And that's the only Data cleaning we have done, because beyond that, we couldn't really think of anything that would needed to be added or removed or really modified.

[00:26:04] Thank you for joining us. Thank you for sharing that with this man. Really appreciate that. Before we move on to the questions, shout out to Eric, who just he has some good news to share with us. Omar, let him shoot the good news with this. Eric, go for it.

[00:26:19] Yeah, it's always just so much. I signed a job offer today, so it was if it wasn't DocuSign, the ink would still be wet and was like an hour ago. So. Yeah.

[00:26:29] And congratulations, Eric. That's awesome, man. What can you share the the job title role or company location, any of that stuff.

[00:26:37] Yeah, I don't think any of it's classified. So it's a senior senior analyst just lending tree. So we should be working in business loans investment area. So I'm really excited about it.

[00:26:47] Great man. Congratulations. I know you do a lot of hard work and doing doing a lot to get there. So, uh, you're my inspiration.

[00:26:55] Glad to be here.

[00:26:56] We're happy for you, man. That's awesome. Great news. Shout out to everybody else. I just wanted to see Dave's here. David Knickerbocker is here. I see you guys. Joe's in the building. Kurt is Jennifer. Timothy, then, of course, real happy to see all the guys here, man makes my day. So we go go for it.

[00:27:15] Yeah. So last time I came here there was last question. Right. But eight of us scraping lambda docker, all that good stuff. I essentially spent this week working on it before I started. I knew what the letters meant. Now I know slightly more. I ran a lambda test. It worked out well. I decided to drop it because I realized that my millennium's scripts, the vast majority of them, last over fifteen minutes and that was kind of like a hard cut off. So I decided to drop that went off to figure out easy to. I created an easy two incidents, connected it up to an S3 bucket, load it up silentium, load it up like everything else I needed and I got it running and I was very happy, very big deal for me. Big milestone. But I only got like a small snippet of it running. And whenever I tried to scale it to the to hit the rest of like the you or else that I had, it got to maybe like the third one and then just slowed and then crashed. Now I just did it basically right before we got on here. So my thinking is I'm running on the free tier of the easy to like SC2 stuff. And I'm thinking that I know that selenium is a bit of a resource hog. So is all I have to do just kind of like upgrade my instance, pay a whole like four and a half cents an hour or something like that, like suck it up and then and then I'll be good to go. Or is do you think that there might be something else that might be running into this bitly.com/artistsofdatascience.

[00:29:01] Yeah. Vener Djogo for Joe. Sorry, go for it.

[00:29:04] Oh yeah. You're running a micro instance I imagine since it's a feature it's the free one. Yeah I think it's like. Yeah too. That's kind of like running your coat in a Raspberry Pi. It got. Yeah I actually think it's about six packs looking at one. Yeah. You might want to bump it up a bit.

[00:29:19] Ok, that's actually really nice to know that.

[00:29:24] Yeah. I mean if it works it works, it doesn't work. The thing is you don't know what it is. Right.

[00:29:29] So right on it. Be memory. It, that's exactly what it sounds like is that you're basically running out of space. If it was Java it's a problem. But no, you're not using Java so. Yeah. So I mean. Well that's exactly what it sounds like, is you get this Millennium Falcon sounding word starts up against creators. So I'm guessing you're holding something in memory. And if you can figure out a way to dump it or save it someplace else, that might actually clear up even on a micro. But Joe's probably right. You're going to need to go up to something more beefy.

[00:30:04] Ok, that's sounds like an easy enough solution. I'm cool with that. Thank you.

[00:30:09] What was it you were trying to do, say selenium or you're doing some scraping?

[00:30:12] Yeah, yeah, basically just trying to migrate my little small army of scrapers on my local machine up to the cloud so it can run there without me dealing with it.

[00:30:26] Essentially, Weka, when I've when I played with selenium, it can be quite finicky. I just want to urge you just do a lot of extra Google searching. But it's that's why I gave that comment or that suggestion I did last week. But I want you to hit by selenium. You're signing up for a lot of little tiny battles, a lot of Google searches. Just want to confirm that you're not alone when you hit these speed bumps, for sure.

[00:30:56] Yeah, I always try to hit it with scrapie first. And if that's not working, if I can't grab, you know, like, I don't know, Ajax calls from that or something like that, then I'll I'll jump into selenium and everything tends to work out with those sites after everything is actually rendered.

[00:31:14] So what I do for my personal research is I've got a cheap free tier that I do that I use for scraping social media and doing some very tiny classification work. And then I've got a very beefy server that I often forget to turn off for more heavy duty scraping, and that one would cost me about ten bucks a day. So it always sucks when I find out two days later I turn it off.

[00:31:40] So the question following up that is, is turning it off, stopping the NSA or is it terminating? Stop it.

[00:31:48] Ok, so let's not confuse the two. By the way, what is it? Don't confuse the two. Definitely stop. Click here to terminate one.

[00:31:58] So whenever you stop an instant, does it keep them like files the data on there and you don't have to like reload python, you don't have to reload libraries and things like that make you feel like a computer that you're using right now.

[00:32:14] But start with. Yeah, when you turn it off, it doesn't erase everything. I hope that would suck if it does. But I'm awful on the assumption it's fine. Right. So yeah, it's the same thing. You're just turning the answers. Terminating the answers would be like literally Terminator. And since I don't do that. But if you turn it off, you're fine.

[00:32:30] So my two, my two instances I have have identical code on them. It's just I keep one of them off and only use it for certain occasions. So that's one other option you can do.

[00:32:42] And whenever you stop it, you stop getting charged too. Right.

[00:32:45] And that's what I was about to jump on drives. If you have mounted drives, watch out. You'll still be charged for those because you can stop and terminate an instance. But those drives can still live out there in space and they don't charge a lot for them. But you can you can rack up some charges that way to keep it on your drive space.

[00:33:04] Oh, the other thing to keep in mind is if you have an IP address attached, like a static IP, you want to make sure that if you disassociate that from your I sense that you also get rid of that IP address. We charge for that, too. OK, so there's some gotchas. You'll know when see here, Bill. So, yeah, this is tuition you got to pay to be part of the cool-headed.

[00:33:26] It's OK then when you were talking about drives, I remember reading something about like like an Ebb's thing is that the same is at different.

[00:33:38] Amazon has every drive type that you could ever want to mount and their acronyms form make no sense at all until you've been doing it for about a year. So, yeah, they've got different types of drives, different drive types, different ways, well, mounting and then accessing them. And you can partition them and you do you do a ton of different things and you diffusing windows. I mean you do all this through windows, you can do it through one of their Linux instances, too. You can automate drives to spin up and down. I mean, just it's it is as deep as you want to script it. So if you want it to be nice and easy, you can do a Windows instance. They're a little more expensive, but then you can play with the drives and a whole lot more sort of visual way. And it's sometimes a good way to start out because you get a better feel for it using windows.

[00:34:23] Then you switch over to Linux and it's a whole lot more I don't know, in my head it was conceptually easier seeing it in windows and then doing it in a Linux instance. It just made more sense. So there's a lot of different ways that you can play around with drives, partitioning them, using them, especially for things like scraping the way that you partition might end up helping make your work a little bit more efficient or at least keeping things separated. And you can save, like I said, once you've mounted a drive that drive, even if you kill the instance that drive, if it's a if it's external, if it's not one of the things that was included as part of the instance, if you mount that drive that drives survives. And so if you end up scraping something to that drive, you can kill off the insta. Spurtle still live out there, but it sounds like you said you already attached to an ice three bucket and that's where you're scraping to. Yeah. So for this instance, that's not so much important. But think about it going forward. I mean, having to drive that, you can amount to different instances as you spin them up and having the same thing over and over again on that drive, kind of having it as a known entity for your configuration even.

[00:35:29] It's kind of cool to have so worth playing around with.

[00:35:31] Keep in mind, like these these servers, it's easy to insist they're meant to be ephemeral in some sense. Right.

[00:35:39] Like never treated like a special snowflake where you expect this server is going to be around from now until kingdom come. I can't even count how many times we've had issues of servers dying for no apparent reason. It just happens. So one idea, too, is just have a boot script. If some of these boot straps and this like have that load when you fire up a server, because that way it actually I think you probably get away with using unspotted since just now that you're getting really dirty on this thing.

[00:36:08] So spot insurgencies that are way cheaper than your regular ones and you can just throw them away because if you're scraping to like, that's kind of cool because you just need like that instance to run for a bit. You don't really care. It's going to be around.

[00:36:18] I, I read a little bit about spot instances, but I got the I got the idea that you weren't able to schedule them reliably, kind of reliable sometimes.

[00:36:30] So if I want to, if I just needed some to run, like I think I think it from like a really flaky friend, like they might show up and hang out or they might not. And that's just how it goes. But if they show up and that's fine. Right. And he saves money, but if they don't, then not a big deal. You just get a regular phone. But I think events point right. Like, you know, you can be you can keep your your disk separated from the server. And that's really handy because he's you should count one hundred percent of the server is going to go down at some point and it won't be there. So don't get too attached to it. I think it's what we're both trying to stay cool.

[00:37:02] Yeah, that's definitely good to know because I did not think about that yet and we probably wouldn't until it disappeared.

[00:37:09] Like, what the hell just happened.

[00:37:11] Yeah. OK, cool. Thanks.

[00:37:13] You don't question coming in hot for you. We go.

[00:37:16] So you're scraping say prices of products, something like that in storing industry, then you're taking that to poor form analysis which are on Data internal Data with the company you work for. Is that what your purpose is? OK, just wanted to make sure I understand. Yeah. OK, thank you.

[00:37:35] Adam has a valuable lesson in how technology is working, you guys very much for going deep on that. Appreciate it. So that's one of the shout out that Tom sounds extra sexy today. I think he had a new microphone. So what happened, Tom?

[00:37:49] So I was blessed to be working in Laramie, Wyoming, until covid hit. And then my manager and I agreed, why do I need to be traveling back and forth between the Eagle, Idaho and Laramie? And I had wanted to move the family out there, but mostly to be closer to Ben Taylor, of course, who wouldn't want to be closer to him. But I actually ran my own podcast for like four years and this was my baby. And the only thing I can't show you is my new camera that I finally got back from Laramie, too. So, yeah, my my setup is sweet again. So I've got only four monitors for my main workstation and then my Linux box over here. So I know I'm a lightweight compared to most people, but it's just back to normal, thank God, on a MacBook Air.

[00:38:41] So I think that might be the latest wave of the mall right now. So it sounds good, though, Tom. Next up, we got Greg with this question. So, Greg, go for it.

[00:38:50] Yeah, I have a more generic question. A question to anyone here wants to take it. Can a business person manage a fully developed Data science team? And what are the skills required for that?

[00:39:05] I think the minimum I'd love to hear from David linger on this one. Oh, why are you picking on me first? Because I think this is right up your alley.

[00:39:14] This way, Greg. You know I love you, but I'm going to say this with all based on all my experience. Now, the answer is no. Technical people, engineering people they like they want to work with somebody who knows the stuff. So if that business person knows all the things, OK, great. But in my experience, that usually doesn't happen. I know that in my experience, the answer's no and I'll just leave it at that.

[00:39:39] I love to hear from yeah. Let's hear from David Knickerbocker and then then just on this one. Oh, David is unmetered. Is that an accident? That it's an accident.

[00:39:52] But, um, I've been in engineering for my whole career. And like David said, well, one of my pet peeves is working for managers who are not technical at all and give you unrealistic expectations because they think you do magic. And I've seen that throughout my entire career and Data science still has the magic feel to it and even sometimes feels like alchemy to me. And I've seen so much misunderstanding, even in very technical companies like I for McAfee. And we still have people that think Data science is magic. And so I agree with David that Data science is pretty far out stuff. And so it's useful to have people that at least have some hands on experience to working on Data, at least at the statistics level, or are pretty powerful with Excel at least, and not just some NBA that that just finished. So we're engineers. We need specifics. We we get we get laser focused on an idea. And like even this week, my my JIRA is very big and it drives me nuts. And so that's my own fault. It's just I have a lot of small administrative stories to take care of, but I prefer it when I know exactly what it is I need to build. And I don't think that soft skills are enough for that.

[00:41:12] Fair enough that if I could just add one more thing, this might be a bit heretical in my very long career. As Jaris will tell you, I've been coding since the eighteen hundreds. It's my experience universally then it's easier for technical people to pick up business than the other way around. Then what do you think?

[00:41:30] So two types of leadership. Some teams need technical leadership. Some teams have a group that is meeting somebody who's very, very technical as a leader and who needs technical leadership. And that's one type of leadership and that's the type of leadership most engineers are used to getting. Other teams need leadership at a an extremely high level. And I'm not just talking about like a manager or director. Some teams need a leader, and that is somebody who can be a mentor outside of the technical realm. Some teams need to skill up. There are people within teams as teams are expanding. You're going to need to promote people. You're going to need to train leaders. You're going to need to start positioning people to move into other parts of the organization. And I've seen transitional leaders come in that were not technical at all and do that and become leader leaders and become mentors and teach. Sometimes it's as simple as teaching business. In other cases, you're teaching your replacement. And so I've seen that happen where a team has a leader for a year. I haven't really seen it for much longer than that.

[00:42:40] About a year you'll have a leader come in to that team, coach the team and sort of raise them to be more of a part of the business and get individuals in the team ready to be promoted and start moving into other organizations or in some cases starting their own teams. And so I've seen that be successful. There is also a very rare third Pastorek instance where a team is so senior it does not need any technical leadership. Very rare that there is no technical leadership required. That would be an extraordinarily senior team. And I've seen it once. I've heard of it reading books. And in that case, all you need is a leader. And truly it's somebody who stands and takes the horrific brunt of dealing with the rest of the organization and is also able to keep the team functioning, cohesive, collaborating, all the sort of fuzzy and soft type of roles that you could have as a leader. But that team is so senior, it doesn't need a technical leader.

[00:43:38] Things when the world the what the damage is done questionable with the difference between like a strategic lead. And he technically does a Data science team need a strategic leader? Are they solely requiring a technical lead?

[00:43:54] I think a team needs both. I'll be honest. I think there needs to be somebody typically above the technical lead who at least has it in there. Sort of deliverables are in there, their goals to be part of the team and provide some sort of strategic leadership or maybe even some of the roles that I said to train up and sort of provide a deeper connection to the business and also shield that team from especially in larger businesses. You can get you get dragged into some meetings you'd want to be in. And so I think it's good to have someone there who knows the business well enough that they can keep the team functioning and also train up and up skill. Most teams also need a very good technical lead because they're not so much engineers, especially Data scientists, machine learning engineers, that whole ops side that's emerging now, researchers, two leaderships, different for those types of teams. And if you don't do the job, you don't really understand the differences in leading a team like that because there's a level of independence and oversight that has to live together and there's different types of overlap. It's that you can do in a technical team that wouldn't work in any other team, and if you try to get too granular with the technical team, you're going to have individuals who are smarter than you are and we're going to run you like a stop sign. And so there's a totally different dynamic being a technical leader on especially Data science teams, because there's going to be at least five people that are smarter than you on that team. However, there is some level of capability that you have that is above and beyond, and that team is going to need guidance from you, is going to need sort of a deeper understanding of how they do their job from that technical leader. And so they do need that in almost every case.

[00:45:47] It's a really, really good question. And we like this discussion about Ben just hopped back into the room. Ben, let me know if you're around, because I think you'd have a really interesting perspective on this as well. And then after Ben, we'll hear from Russell on this. Ben, are you here?

[00:46:01] I did have a question from here. What what was the question or what's the topic yet?

[00:46:06] Greg, you ask a question, then we'll hear from from Tom as well.

[00:46:10] Yeah. Ben, do you think a business person can manage a fully developed science team or department effectively? And what is the minimum skill? What are the minimum skills they need to perform that job? Well, we can't hear you've been more active.

[00:46:27] Yeah. Yeah, he's thinking. I think I saw I had with Data. So I. I hear Data sorry, guys.

[00:46:36] Yeah. So it sounds like somebody and I think the answer's no. I agree with Data.

[00:46:41] Awesome. So Russell had some interesting points here as well. I'd love to hear from Russell then. Tom, I know you wanted to jump in as well. So we'll get to you after that, Russell.

[00:46:51] So I'm saying that creativity and critical thinking, I think are very under appreciated skills that can benefit people that don't have specific groundings in one specific skill set. Now, that doesn't mean that a business professional can then lead a highly technical teams, but I think they can get across some boundaries. How this particular field. I think that if the business professional has had a grounding in some of those technical aspects before Tebartz, you know, five, 10 years ago, as long as I understand some of the basics, then I think they can operate quite well if they if they don't understand the basics at all. So they've come from an accountancy background or any other any other background. So that doesn't really gel well with the technical aspects.

[00:47:41] And I think what struck me that's interesting point I'd be interested to hear. Greg, when you say technical, what do you mean by technical means, specifically when it comes to coding and that type of or technical when it comes to understanding machine learning concepts and things like that?

[00:47:58] All of that? Right. So I'm thinking even the listening, the underlying infrastructure that's needed to empower a data science team.

[00:48:05] Right. So from from databases to the concept of data science to the systems themselves when it comes to, you know, Cloyne pipelines managing the science lifecycle project, et cetera, et cetera.

[00:48:22] So the way I see it, too, is this point is probably best that that person leaves most of that non-technical behind and really deep dove into the technical requirements behind all of this. So you have to be in the know to be able to help that team. Right. So that was a question.

[00:48:45] I was just curious to see what what you guys thought. So, yeah.

[00:48:49] Tom, you're going to chime in there. I'd love to hear from you. Then I want to hear what Joe has to say about Conaway's law in this situation.

[00:48:57] I'm sure Greg did. Was that a question or a statement? I apologize. I was trying to track you.

[00:49:04] I thought it was a question.

[00:49:06] Yeah. Did we have an answer for that? Because I was curious.

[00:49:10] What are you talking about? I'll ask my question, too.

[00:49:13] But if someone has an answer to Greg's question, I want to hear it. But I'll go quick then. I was really wanting to ask you. So I'm fortunate to be buddies with John Thompson and I'll shoot Gilbert and his last name, the guy that wrote the book and people skills for analytical thinkers. I mean, Achelen, boom. Yeah. Thank you. Gosh, I can't say we are. Gilbert and I have been talking through some concepts of the new book. He wants to look, I really love the way those guys think, but then I just want your take on this. I kind of think, well, so Greg's a good friend of mine and he's he's a phenomenally good data scientist student. And I think he plays down that he's not a data scientist. To me, he's just a data scientist and training.

[00:50:05] But I look at. Greg, and he gets it, he gets the Data science space, he gets its benefit to the business, he's an expert at reconnaissance. I could imagine if Greg were running a team, he'd know better than anyone I've met so far how to look out for that team to make sure it was giving value to the organization. To me, that's what the main job of a leader of a Data scientist group would be, or an analytics group. Let's just make it more general. I'm just one. For example, I love the story that John Thompson tells that we did this for the students that came earlier. He'll have the new data scientist give their first presentation to the Data science team first because they know what's going to happen. They're going to try to show off what they know and then they point out, who is that presentation for? Oh, yeah. That's not going to go over very well, is it? And it's like they have to get it out of their system now. Just tell them what they already think. You're smart because you're in a group, so. But do you agree with that? Like, to me, Greg and I have become buddies. I see how he thinks to me. He's like the epitome of the perfect Data science group leader. It's more of reconnaissance and he's like the connector between the nonlinear geek thinkers and the linear business thinkers. Do you like what I'm saying? Does that gel with what you were trying to say?

[00:51:24] I like what you're saying. I think there's a different level like Greg, just based on that definition right there in the description that you gave. Sounds like a VP level. That sounds like you have. And this is the way I like to structure. You have a manager who's not really a manager. They're running a team. They're the technical lead. And we've gotten away from this technical lead too much. I say the word manager and people run screaming because they don't want to lose their technical skills. They say the word technical lead and nobody wants the job because wait. And I have to manage. And so there's no there's no win here. But I like to the team wants a technical leader and someone who takes that role probably wants to get into leadership and go more towards at least leadership strategy. One of those two, if not both above that, you have to have a director who's getting some of the business acumen, who's now making that transition from I am a technical leader to a leader in the business. And I now understand a little bit more than just what the team does.

[00:52:28] I understand how the teams that I'm probably leading now interact with each other and how as a unit we interact with other units. And Greg sounds like a VP, someone that could mentor a director, someone that could lead an organization of Data scientists or a group of teams. And I think that's where it sounds like Greg fits. And I agree with a lot of what you're saying. And I think it takes a fairly mature organization to have a space for Greg, because many organizations now just have one Data science team or two Data science teams. One of them is usually stuck in marketing. The other one's on its own in engineering someplace. And, you know, when they start to meet and they start to work together and then you add another team and another team, that's when you start looking at having Greg be very, very, very, very, very useful to a company. We're trying to put Greg into a role where he's running a single small team. It's a set up to fail, in my opinion, because the team's expectations aren't ready for Greg in the Eli Eli five sense.

[00:53:36] Like explain it like I'm five. To me, that's the real art that can say I can collect and I can connect the non-linear thinking talent to the needs of the organization. I can get that. I can get the ultra geeks excited about what the organization needs and get them focused on it.

[00:53:54] And it is and that's a that's a maturity step, trying to get a company to understand how that should look in the need for that role and then getting the team to understand the need for that role and why. Without Greg, life has not been great. But with Greg, life is going to be a whole lot easier. Your projects are going to be way more interesting. And from the other side, the business is going to be a lot more money out of the team, you know, in trying to teach engineering groups that technical leadership is not the same as sort of that director level of leadership and the VP level of organizational leadership. It's difficult. I mean, I had a really hard time when I was first promoted, not doing my job and being a leader a real hard time with that. I couldn't it took me forever. But I had somebody like Greg who taught me how to do that. That's why I think the the value is huge, but the company has to understand it or Greg gets run over.

[00:54:49] What I just heard you say is they took my phone away until I realized my new job is making sure my old buddies had more fun.

[00:55:00] I was actually my first leadership gig. I was horrible. But yes, I would like to have things. I'd like to have thought of myself that. That's a very nice way of saying it, but realistically, it wasn't that great.

[00:55:12] So who is right when you're expected to make the supposed promotion from technical individual contributor to a lead into a leader?

[00:55:23] That's not a natural fit normally, right? I mean, engineers, this story is still filled with graveyards of horrible managers. And, you know, and I would say that most of the time the management promotion ends up being sort of a white elephant gift or it just kind of shocks, actually. Like, you don't want to do that. And but I think people need to be honest themselves to understand, like, what is this they actually want to do into the company should ask, is this actually the right person or do we just bring in somebody who is actually good at managing people? Because I think the success rate of engineers turning into great managers is not great. And a lot of actually because management isn't taught and the engineering disciplines either. Show me where that happens. I'm sure it happens some places. But in the vast majority of companies, individual contributions in general aren't really taught how to lead or how to manage and of figure it out. So many of those things are quite vanity. All of us are probably like really shitty managers in the beginning. And I would say if you weren't you weren't trying hard enough or something. I don't know.

[00:56:30] You know, based on the I really love what we're talking about. And I can't wait for you all to see the material that Gilbert's developing for his next book. It's pretty amazing stuff.

[00:56:40] Yeah. Thank you for that. And I don't want to take too much time. I know a lot of people have some work in next week.

[00:56:46] Maybe if you allow me or we can bring this up again to me if I'm wrong and this is something I'm trying to understand, I'm seeing a little bit more products that are powered by machine learning. And to me, are we going to see a rise of folks who know how to manage these products where we can release features that are machine learning powered and you can control how to optimize the cost of releasing a machine learning powered feature in manage the life cycle of that machine learning powered product. So I'll leave you guys with that thought. Is it necessary to have someone heavily equal to manage the lifecycle of that product or somebody who is just good at business managing their cost, understanding how to skill and understanding how to target customer needs to fit into that product to come up with solutions? So that's what I was thinking about. Thank you guys for your response.

[00:57:44] I definitely see a short answer. Yes. From Joe. And I saw him shaking his head. Yes, as well, Greg. That was an excellent question. I really love that discussion. I think that's very relevant to the audience of the artist Data size. I feel like this audience, this group, we are the future leaders of Data science. And having conversations like this I think is important. So thank you so much for asking that question. I really, really, really liked it.

[00:58:07] Tom, go for it. Just one last quick thing. All these things we're discussing, we need to think this way. Regardless of where we're at, we need to be like then and then we need to remember what it was like to be an individual contributor. But when we're CES, we need to remember what our leaders are going through and help them with what we can. That's that's all I wanted to add.

[00:58:31] Yeah, 100 percent ability, Greg. I'm definitely beyond like beyond a down to talk about this next week as well, too, because this is a good topic. Thank you for asking. So let's go. So just a real quick I so you guys know where you stand in line. We got a nice than Vikram, than Naresh, than Christian, than Curtis. A lot of questions. So we'll try to get to this as quick as possible, but just go for it.

[00:58:55] Thank you. So recent American on a NLP project and I've been doing sentiment analysis, research. So one of the questions is what is? And I can just imagine from sentiment analysis that said that a sample size, because you see like one of the so whenever I read the research paper, they say that they use this dataset that has two thousand examples and they have like ninety for accuracy from this model. But I mean, so what is a representative measure that says two thousand sentence is enough? So I think that's my question. I don't know if you understand that question, because I'm still working on this. I know.

[00:59:39] I've got a quick response. Yeah, go for it. Go for it. So I love sentiment questions because sentiment is really complicated because what sentiment are you talking about? Are you talking about Twitter sentiment where there's that five million tweets, smiley face, friendly face.

[00:59:55] Are you talking about Amazon reviews sentiment or are you talking about price action, sentiment or they all mean very different things and they will all have very different accuracy's. It's been a while since I played the sentiment. I remember there was there was an open source library that incorporated six or seven different sets about trading sets. And so I guess the short answer is it's hard to know what a good accuracy is because it depends on the type of sentiment. If you're doing price action, sentiment for stock trading, then I might argue you have a very low accuracy, but it's still a very valuable model.

[01:00:27] So I'm curious what type of sentiment you're going after another in social media like Twitter or Facebook and also in a single sentence, you can have multiple emotions. The first Harp can be happy in the second half can express sadness. And so human language is a bit difficult. I'm reading through a book right now and sentiment analysis that I bought off Amazon. You can find it because it's just a bunch of emoticons on a blue background. It's a really cool book and he goes very deep into the usefulness of splitting on sentences rather than looking for sentiment through a paragraph like when you're looking at an Amazon review, there's typically one sentiment tied to that review if it's about a pair of sunglasses. But for instance, if it's a tweet, people are different on social media. You know, you could express multiple things and even two or three sentences. And if you're scraping an Internet forum where people can do a wall of text, that could be seriously complicated stuff. And so sentiment analysis, you can go pretty deep into. And I play with Twitter Data all the time, even for work. We're doing some security ideas with Twitter Data right now for an innovation group. I mean, so one nice thing is you don't need a billion rows of data for sentiment analysis depending on what you're doing. If you're trying to build a violence classifier, you can do that with just two thousand rows of Data. You know, if you're looking for hate speech, that doesn't take very much data at all or toxic speech doesn't take very much data at all if you're trying to do text translation or something like that. And I think those are the big boy or big, big, big Google problems, you know, so it's different.

[01:02:04] So sentiment problems are a lot of fun for my project right now is mostly positive. Negative because I'm doing it. I'm trying to train one of the Arabic dialects models because there isn't any translation for Arabic dialects out there. So that that is collecting a lot of data. So I don't know what how much data I should get from Twitter so I can have a representative model.

[01:02:28] I feel like we might have gotten away from your question unless you were really asking about accuracy and metrics for this Manlius. So I don't I don't mean this is a criticism toward the group, but I don't think we've touched on that yet, have we?

[01:02:47] Yeah, I think the answer was mostly because sentiment analysis is very wide and it depends on what is the accuracy of looking for.

[01:02:55] So it depends on your training set, too, though, like is your Data label. And if it's not labeled, I have to create my own data sets very often to do with whatever it is I'm exploring for NLP. And so if I'm doing a violence classifier, I scrape a whole bunch of tweets and start labeling them ones or zeros depending. And so your accuracy depends on the quality of your labeling. And if there is no labeling, you're going to have to do that labeling. And I find that the first few iterations I've mislabeled things because after going through two thousand tweets while drinking three beers, you start to just kind of glaze over. But your machine learning helps you find your errors and then you fix them doing the next iteration. So accuracy is a very accuracy is an almost meaningless metric to me when I'm doing NLP work. But it does give me a hint that things are at least getting better. But for me, as accuracy goes up, I'm running into fewer headaches. That's about it. Language is more complicated than the other stuff.

[01:03:58] So I strongly agree with what David's saying. I'm wondering, are you using some type of K fold? How are you splitting your data?

[01:04:06] The other thing is I'm still trying to I'm not trying I'm planning to label the data first and seems like that model will do that, this job, so that I've never touched that before. So I think it's an interesting idea to work on that.

[01:04:20] If you need help, feel free to reach out to me on LinkedIn. I love doing this kind of stuff.

[01:04:24] Ok, thank you. If anybody has any more insight to that question, definitely go ahead and type it right there into the chat for now. We'll go ahead and keep it moving. Let's go to Vikram next.

[01:04:35] So maximum having an interview so that there's a possibility for asking a broader question. And I'm very new to this kind of question, so I would appreciate if anyone could approach a framework to answer that kind of questions. He, for example, a user's going down for subscribing a newsletter by ten to 10 percent to 15 percent every month. So what we're here to suggest.

[01:05:06] That sort of thing, but let Greg jump in on this question, because your product management expert but I would say I would start off with just first try to play with the product that they have. Right. If you can download the product yourself and just click around with it, play with it, see what it's all about. That's great. Second, what industry is this in, like as a user generated content? Is it subscription based model like these type of questions I would want to look into. And I'd also recommend a Lean Analytics great book that I think might be able to help you get the right mindset. But with that, I'll turn this over to Greg.

[01:05:45] I want to make sure I captured your question. Vikram, are you saying what what are the product metrics you should focus on if you're trying to understand the performance of a product? Is that OK for me? What does help is is kind of like going back to the method of design thinking and fought for a certain product and design thinking is what will give you a pulse for the performance of the product or solution that you come up for your targeted market. And because with that you are able to perform some test in measure whether that solution is effective or not. And design thinking allows you to spot that pain point, come up with a solution, slash product test and readjust until that test is deemed the most effective for for the product. Right. So with that, that's to me, that's how you build the list of metrics that truly matter for continuous monitoring of of that product. And I'm staying a little bit high level here, but hopefully I touched on what what you're looking for is for that next time, something small.

[01:07:19] If you're if you're new to like just digital analytics in general, I would say sign up for the Google Analytics demo store so that you can just, like, get in and see what the metrics are about a product that a product might be looking at or an e-commerce store or something like, oh, like bounce rate, what the heck is bounce rate? And then Google that and just start getting getting to know sessions, page views, users, new users and just kind of like thinking in those terms. And then you'll be able to maybe think on the fly a little bit about whatever the specific product is that you're interviewing for. In whatever case.

[01:07:56] Yeah, it's for me. For me too. I leverage a lot of the I guess the it's kind of like a some sort of wiki. If there's a wiki for business metrics, those are super useful because I can look at these metrics for each of the products that I'm interested in and come up with a list of metrics that I'd like to improve. And once I hone in on the list, these list of metrics, then I can work backwards to figure out what kind of solution do I need to improve these metrics. So that's that's what I'm banking on.

[01:08:36] And then put in the chart here depends on the product per customer churn, click through rate, etc.. Dave Languor, do you have any advice to to drop here?

[01:08:48] Yeah. So I've spent a lot of time, especially in recent years, formulating KPIs essentially that are disproportionately valuable for the success of the product or the business, because it's really easy to be like I'm going to create twenty five KPIs. I'm not a big fan of Google Analytics, to be absolutely honest with you. I kind of detested. So I tried to I don't use that as an example because it'll flood you with all these different kinds of metrics. And usually what you want is three to five key KPIs that represent what is the actual levers that you have for making money. So if you're interviewing for a potential product management role, what you want to do is sit down and think about the nature of the product and say what are the disproportionately valuable levers that allow them to make money? And depending on the nature of the product, that might all be top of the funnel, where it might be primarily bottom of the funnel based on the cost characteristics of the product software, for example, it's usually top of the funnel because the margin is so high you don't really care about what's going on at the bottom of the funnel. So think about what are those disproportionate metrics. So, for example, things like it might be engagement rate might be the actual thing that really matters. It's not bounce rate. It's not the number of people that land on your page. It's like how many people actually engage with your page and how do they actually interact with a product that might be a disproportionately valuable lever for actually making money? So generally speaking, those are the kinds of things that I would look for, right, the three to five KPIs that seem disproportionately valuable in terms of the nature of the product and how it makes money and emphasize those.

[01:10:16] I completely agree with that. They've said they'll be much more excited to hire you during the interview if you touch on a big problem, if they think you can make money for them. That's exciting.

[01:10:25] And there's always that that trap of wanting to fall for like a vanity metrics. You mentioned one like traffic to your site might be one that that is a vanity metric. How do we I guess, how do we identify various vanity metrics based on the product type? I don't know if that question makes sense at all or not, but yeah, it does.

[01:10:46] And here's my experience. If people push back on your metric, it's probably a good one. Vanity metrics are one is everyone goes, yeah, that's a great metric. I'm totally down for that one. That's usually a dead giveaway that it's not actually going to help anything. I've actually had people argue with me about metrics because they essentially said that's too hard for us to move. That's actually the metric that you want to use.

[01:11:09] And when it comes to like a business model, like what do we think about four metrics, like let's say we have a subscription based model versus a model where maybe it's not a subscription, just like a one time purchase. I mean, like there's there's different I think that was called Bitly or something like that. Um, different models like that.

[01:11:30] So in those kinds of models, CCAC customer acquisition cost very, very important. And then CLV, because those two things work together to say if I subtract CCAC from CLV, that's how much money I make. So those are like two key ones, especially in like any sort of SACE subscription based model.

[01:11:47] My experience at the customer acquisition cost and lifetime value, certain things are looking to as well. So hopefully that answer the question. We're going to go ahead and move on to Naresh.

[01:11:58] Yes. Thank you. Hello, everybody. You got it. You got to take care of my crew, my friend. Is it proper now?

[01:12:07] Not quite. Hold it. Hold it right close to your face.

[01:12:09] Ok, and you guys get up. That's a lot better, OK. Sorry about that.

[01:12:13] My question is, what are you hearing? I mean, do you read all of that.

[01:12:26] Yeah, sure. Sure, sure.

[01:12:28] Now what foundation does it take to break into engineering from Data science besides technical skills, what are their skills are needed to survive in engineering.

[01:12:39] So it's engineering like you mean same thing as like machine learning, engineering type of roles. That's what I'm getting. I'd love to hear from provin on this one.

[01:12:48] And then if anybody else wants to jump in right after the break, I think I could help with this. Absolutely. Please not. And then please do jump in if, you know, nourish. Are you asking like engineers that are using A.I. for cutting edge stuff in robotics and control system design, stuff like that? Exactly. Exactly. OK, and so you're really asking what's it what's it take to get into that? Yes. OK, now I need to qualify. It's kind of shy to say this, but I consider myself an expert in multi physical system modeling control system design, and I'm pretty good at I just love to learn it, but I think it's like we would answer anything. You just going to have to start doing passionate projects. This is something I could hear Ben Taylor telling you. You just got to start demonstrating passion in that area.

[01:13:47] You've got to start creating portfolio stuff that shows how you're tracking practical projects to implement I in control systems, in systems controls in in and just show that it's OK if they're toy projects to where you're just demonstrating that you're growing in that field in that knowledge, but just got to get started and start walking that that trail.

[01:14:14] I hope that's helping nourish and typed a question out here as well. Just so it's reiterated for everyone. Listen to the podcast. It's what transition does it take to break into A.I. engineering from a Data science? Besides technical skills, what other skills are needed to survive in engineering? So Tom Donnelly touched on a few. I mean, I've definitely had an engineer by any means, but I would like to to to say that this is a truth, maybe just curiosity, passion to learn, willingness to to to to pick up new things and stay on top of trends and stuff like that.

[01:14:51] Absolutely. Harp. Yeah. And I guess what we're saying and we're we're both saying it to you is you're going to have to be good at peddling the fact that you're growing in this field and just. Don't be afraid to show a baby project, but make sure that the next one you put in your online portfolio is showing a progression and an advancement and just keep doing that.

[01:15:16] Love to hear from Ben on this as well. Russell makes an excellent point, soft skills as well. And I think along with soft skills, I mean, a little bit of ethics I think might be helpful as well, depending on the domain you're working on there. Then what do you think?

[01:15:31] So we're talking about transitioning from sorry, I was answering your question with a chat room service. The U.S. public receiving me.

[01:15:39] Yeah, that's it. What does it take to break into engineering from Data science? So besides technical skills, what other skills are needed to survive?

[01:15:49] I don't know if this is a technical one or not, but I think I talked about this last week, too. But the whole concept of what a cloud architecture to cloud architecture does with a software architect does what an enterprise architect does in looking at a massive picture. That's really what you need to be able to understand is just this enormous problem that you're going to face because you're not coming into something that's brand new. You're coming into something that's forty five years old, potentially all the way up to a brand new component.

[01:16:22] So you're going to be working with a lot of teams. You're going to be working with a lot of people who are in love with the technology that they've already adopted or that doesn't want to change. And so a lot of what you're hearing has like practical technical applications, but it's the application of some of these more soft skills to talk to people in other groups who are also technical and to try to get them to understand where the business as a whole is going. And so that would be the only thing I'd add. There's kind of this I don't know what to call it, but there's this fuzzy skill that I see, you know, infrastructure architects or enterprise architects and cloud architects have where they can get a whole bunch of people who are all technical, who all speak different technical languages and all have different backgrounds together and to agree and come to a consensus on moving forward with something that's painful for all of them.

[01:17:10] It's interesting. Maybe that skill is called architecture, right? That's yeah. That could be that that emasculates. That's interesting observation. Knickerbocker, love to hear from you on this since you're definitely into research and engineering as well. So the question he's asking is, what skills does it take to survive in engineering?

[01:17:29] Ok, sorry, I was actually troubleshooting a email flow in Q4 issue at that time on engineering right there on the spot. Yeah, I hate it when I have a bug right in front of me.

[01:17:41] I really don't like being called an engineer or anything. I, I still just haven't gotten used to it. Even Data science feels weird to me. I'm always just a software person forever. I just really got hooked into Data Science and Emelle and I've been doing Data my whole life. But on my current team I work for the A.I. Research Group at McAfee and so I am an engineer. I guess now that the skills that I use most are the skills that really help me out a lot is a lot of it is my operations background. I was a Data operations engineer at McAfee before that, and so I'm used to dissecting servers. I know how to get to the root of problems very quickly. I know how to map out data flows from the point of coming into a company to the product itself, and so being able to dig into data and go from the beginning to the end and understand operations and know your way around Linux and understand the ups. And like there's so much software involved, there's so much operations involved, like this is just a marriage between so many different fields. And I think that's one of the reasons why I love it so much. But it's also one of the reasons why it's so exhausting. It's just that there's so many things to balance and there's no holy cow, there's just so much to learn. I've got piles of books all over the floor, like you guys probably do as well and shoot.

[01:19:11] I think it's just you got to have a technical background, but you have to have a hunger and obsession for this kind of stuff. Otherwise, I don't think you'll you'll ever float to the bottom real quick because it's overwhelming.

[01:19:25] Absolutely. Love that man. That's really, really true. An insatiable hunger to learn and to want to feel uncomfortable all the time because you feel like you don't know enough.

[01:19:35] Yeah, I don't feel like people people often talk about their comfort zone and I don't even think I have a comfort zone anymore. I don't think I have a place where I'm not in Northern Territory, like I get thrown at the problems that nobody's figured out yet. And that's exhausting every single day. And I think other people are probably like me in this field.

[01:19:54] And this touches on what Russell is talking about. The chat here, he's seen tenacity and resilience. I think this is very much in line with what you are talking about. It's it's one hundred percent like you just described. What? Tenacity and resilience is like in the in this field, so thanks for sharing that. Also, Kimberly says negotiation is a good skill as well, which I agree with. So let's go ahead and move on to so will Kristen. Your question is up next. If you want to ask it, go for it. I don't know if you're still here. Oh, sure.

[01:20:26] My question is, just so I was in an interview this morning with high level Data science team, and he was talking all about all Terex and they're moving their whole pipeline into Ultravox.

[01:20:44] And I know Google has a plug and play pipeline as well. And so I was just curious what you all have seen and the trend moving away from a traditional code heavy in our pipeline and into these more local no code.

[01:21:04] You know, products like this would be great. Who's that guy that runs the hashtag breaking into Data science hashtag that guys always hashtag Greenwall? Yes, that's his name. Yeah. Yeah. But yeah, I'd love to hear anyone's take on this. I'm not too familiar with the Locarno Code kind of environment, so maybe I can take this.

[01:21:26] Yeah, I've used Valtrex quite a bit. I like that you can automate everything and it takes away the programing part and just lets you focus on the business logic. But can I ask, like, what's the end goal? Are you trying to like, create a workflow that's transforming the Data and then feeding into a dashboard or like, what's the end goal of the of the use of Andrex?

[01:21:46] Yeah, that was the end goal was to create a continuously updating dashboard for the stakeholders.

[01:21:53] Yeah, I've done something similar. And if your questions are like, is it better to do that or not? I think if you're streamlining a process and making it easier to train like anybody knew who was onboarding and taking ownership of that, it's easy because Valtrex is like a flow chart. You have different tools that can read multiple Excel files or create new columns or transpose your data. And then you can also have connectors that feed a dashboard and tableau or power by and just kind of have an end to end connection. So you don't have to manually write scripts for anything, just head execution flow and it picks up the data from the folders and passes it through the pipelines and throws it into a dashboard. So if that's what you're trying to do and leave from the programing side of things and move towards a more automated solution, I think outworks is great. It will have some learning curve because you need to understand which tool is doing the right business flow for you. But there's enough documentation online that you can leverage for that. I think analytics is great for that. And they've also paired up with Snowflake now? I think so. It's even easier to have like a production level connection with the workflow they go hopefully that answer to questions and tricks.

[01:23:06] If anybody is listening, you can send a check for the advertiser to me whenever you get a chance. Chris, this one's for you.

[01:23:13] I have one is actually a bit of a shame that Joe's left the job because, um, something just popped up on my LinkedIn feed for something he said a couple of hours ago. But my question is more about for the more senior scientist in the top that do a lot of hiring, how would you like to be approached by code contact like every spot by email or via LinkedIn or whatnot? How would you like someone to come to you and what would you like to see from someone who is approaching you code looking for work?

[01:23:47] Yeah, I mean, this is this is a good one. I love to hear it. I know I know a lot of us here get these messages quite often. I probably get 15 to 20 messages a day from people. And all they're telling me is that they need a job. And like, I really I can't really help you.

[01:24:04] How about try being my friend first and getting to know me and building a relationship before, just like I mean, I'm not going to call this person out, but literally this person was like, hello, sir. Hello, sir. Like for many days in a row. And then I was like, Hello, madam. And she was like, how can I contact you? And I was like, you can't that no. She's like, I want to be a data scientist. Please tell me how. And I'm like, I wish I could tell you, like in one sentence, but it's not that easy. That's definitely the wrong way to do it. But I think if somebody is reaching out to me called there's somebody here that was joining earlier, I think she bounced off really. But her name was um d d are you still here? I think she is dear. If you're here please and meet yourself. She did it the right way. The way she reached out to me was really, really cool because, um, on two accounts because first of all she was like, I've got a bunch of questions ready.

[01:24:55] I know time is valuable and time is precious. This is literally what Diaz said. She said to seek mentorship and for transition to Data sites. I'd be so grateful. We can connect, time is valuable, so thinking the events have prepared some questions, and I hope I can make it worth your while. Cool, that's professional as well. Not only that, I gave her my standard response, and this is the same response I give to everybody that reaches out to me for mentorships. Thanks for reaching out. Happy to connect. I'll change. I'll exchange time. My time if you clean my podcast transcripts for me and usually at that point people just crickets, nobody responds. And she was like, yeah, definitely. I'm happy to help you and help you. So kudos to you for doing that, dear. Um, you did that right. So reaching out to people and just like being like, look, I know your time is valuable. I've got questions prepared. And he was. Well, I think that's that's key. I'd love to hear from everybody else, Tom, than anybody else as well.

[01:25:53] Hi. Yeah, I just want to jump in because I'm still here. Yeah, good. Yeah. I didn't end up helping out because we had some technical issues, but I wanted to help out anyway because I understand like everyone, we all have our own things going on. So it's like just like a kind gesture on your part. And I say this like office hour is way too advanced for me, but I'm just having a played here in the background and picking up on everyone's expertize and the one word that's being used. And so it's been really helpful. So yeah, I'll still take you up on the offer about like cleaning up the podcast. So just let me know and I'll definitely check out the podcast as well.

[01:26:37] Yeah, definitely a trade trade. My time for free. For free. That's great. Then over time, what do you think is the right way to reach out to people when it's just flat out called reach out?

[01:26:49] Yeah, I do get this a lot like, oh, I really want to work with you and I want an internship. And I just very politely say I have no power to create positions or to hire anyone. But if you apply at my company, I'll look at your portfolio online. And if it looks good, I would have no problems encouraging people to take a strong look at you that way, at least get a chance to encourage them to build their own portfolio to help them stand up in.

[01:27:18] Sorry, I just want blurry there, you know, quality webcam. Good thing to invest in so called outreach is always going to be a hard thing. The best thing I can tell you about cold outreach is do it six months before you need to do it. And that's almost impossible to do because the only time you think about building your network is when you need your network. So spend some time, you know, once a week, at least once a month, spend a bit of time reaching out to people, but do it in a personalized way, like have a good reason to reach out to somebody. It's almost you know, I compare social media to like being at a restaurant. You know, if I'm sitting at a table and having dinner, would you come over and say, give me a job? That's the wrong way to reach out to somebody. If it wouldn't work at a restaurant, don't do it on social media. But if somebody came up to me and said, hey, I've been following you for ever, I mean, I've been following you for two years. This last post, like I just got this on LinkedIn, this last post that you took was kind of, you know, like it meant something to me. I want to connect, you know, can I connect with you? Can I follow you? What do I do to connect with you? And that's like that's your number one conversation starter.

[01:28:31] Yeah. It'd be strange for that to happen at a restaurant, but you get what I'm saying, it would be far less abrupt or awkward than somebody just come up and say, hey, give me a job. And I know that sucks because it feels like, you know, you want something now. You need something right now. And it's hard to do the the longer road in, but you got to get more responses at the same time. Like if you're urgently looking for a job, it's okay to just reach out to people and say, hey, I just got to tell them the real story, like I just got laid off. I need to I need help. Or, you know, I just graduated from college. The internship I had lined up evaporated. I need help. I need you know, this is kind of urgent. Sorry I reached out cold, but when you're authentic and you're honest like that, even a cold out reach, you get somebody who's going to look at and go, oh, hold on, let me see what I can do, because all of us want to help somebody that's found themselves in a rough time. And so you're going to get a lot of like if you need something urgently and there's a legitimate reason for it. Cold outreach with like zero prep is OK now because of that great advice.

[01:29:33] Another thing you can you can do as well. You know, there are companies in your local area, your municipality, that you find interesting, that you think that you want to go work for, like Vince said, networking connect before you have to. So take a look at their website, read their blogs, connect with people inside the company. They do. I read this blog on your company website that had to do with innovation, A.I., machine learning, Data, science, whatever I thought was really fascinating that you guys are doing X, Y, Z. Love to learn more about it. Would you, would you like to connect. And then take it from there, right, and then come up with like two or three Max, really insightful questions about what you read. And just to show interest, and I think that goes a long way. So, Curtis, hopefully that was helpful. All right. So that's going to be the last question for today. Thank you so much for hanging out. A couple of huge announcements. So next week I will be on the Super Data Science podcast. I'm recording with Jon Crohn next week. So I don't know when that episode will be released, but that's pretty huge. It's like the first Data science podcast anybody has invited me on. So we'll see if you care about what I have to say so I can be cool.

[01:30:44] Not only that, just announced that. Well, I don't know if it's been an outspoken and I were both outreach to to be on a panel discussion for the Data Science Go virtual conference in April. So I will be moderating a panel discussion for that, which should be awesome. Um, we're looking forward to that. I know Tom's going to like this. Also in March, I will be interviewing Andrew Hunt, who is the coauthor of The Pragmatic Programmer. So I'm really looking forward to that. Other big interviews happening as well. So looking forward to a lot of cool things happening from now until the end of April. I've got like 20 interviews scheduled and then I'm taking a hard break from interviewing for the rest of the year, because by that point, I have interviews cued up until literally the end of the year. So looking forward to that. So please take care. See you next week. You guys did not already do so. Go register for comment. Emelle open office hours, which happened on Sunday morning. Eleven thirty a.m. Hope to see you guys there. And that's pretty much all we got guys. Uh, take care. Have a good rest of the weekend. Remember you got one life on this planet. Why not try to do something big. Cheers everyone. Is your.