open-officehours-11-06-2020.mp3 [00:00:00] What's up, everybody? Welcome to the @TheArtistsOfDataScience Open Office hours, super excited to have you guys here. What a week it has been. Do we have a new president of the United States yet? I'm still not sure. Men have no clue we have a president or not. Well, we still have Donald Trump as president, but I don't know if we got a new president yet. I hopefully AIs got a chance to check out the interview that I did with Annie Duke that was released on Monday. She goes at great length talking about why you shouldn't trust the polls. So check that episode out. Also, as of this week, we move to a one per week interview release schedule. [00:00:35] So I'll give you guys plenty of time to catch up on all the episodes you have not yet gotten a chance to listen to. So hopefully you guys don't miss me too much, but go through back catalog. Lots of great conversations there to listen to. [00:00:51] Heads up that on December 21st, we're going to kick off the holiday hiatus. So I'm going to have a Christmas mixtape made free. I said keep an eye out for that. But let's go ahead and let's get started with Austin in the house and we got Nicole in the house. Hey, how are you guys doing? Hey, doing good over here. How are you? I'm good, man. Thank you. Yeah. A man so super, super happy to actually get a chance to meet you because I'm glad you're able to drop. I know you've been exchanging emails because the only person to send me some feedback on the podcast, and I appreciate that. So if anybody else has feedback. Yes, I have my email address. I feel free to shoot me a message. I'm super excited to to connect with the guys. But I think that's our questions, man. Let me know. [00:01:42] If not, we just hang out and just chat right later for a face to the email and the voice. So. Yeah, right. [00:01:52] Yeah. So you guys that Austin, you got the question on anything, man. [00:01:57] I'm happy to do so. I guess I have a question around. I'm currently kind of in the middle of the process of going from Excel jockey to shifting more into programing to do more of my analysis that I do currently. And I guess I'm having a hard time trying to figure out when you have like deadlines and when things need to come in, where to start. Like, do I start at something that I just have to do weekly, just like a regular report and kind of automate that piece and then start to expand from there? Or I guess would you recommend just start trying to do it by brute force, like, all right. Trying to do everything as much as I can to build that muscle, I guess. [00:02:49] Let me back up a little bit. So describe team warrior situation. So you're using Excel primarily at work and now you're wondering how can I incorporate more Python into my day to day activities so that I can develop those skills and just get better at this thing over time, right? Yeah, yes, yes. I think a reasonable first place to start is by trying to replicate exactly what you typically do in Excel in Python for a couple of reasons. First off, you already know Excel really well, right? So whatever it is that you're going to be doing, whatever operations are going to be doing, you're going to be confident in your output. So that gives you a baseline of comparison, like, OK, here's what I did in Excel. Here's what the results look like. Now let me go and try to recreate that in Python and see if I get the same results I can. So started there. Right. So now the next thing is, OK, if that's kind of my end goal, what type of activities should I start making this happen for? And I think you already have your answer there. You're mentioning that you have stuff that happens pretty regularly or in frequent intervals, then start by doing those. OK, so whatever comes up more frequently, just start by turning into Python. And then from there it's like, OK, probably step one is let me try to replicate it in a Jupiter notebook. Step two is probably very cool. Let me try to replicate it on the command line. And then finally, let me try to have it. Maybe just I mean, I don't know what your architecture infrastructure is like. I work maybe as they're setting it up as a cron job on your local machine. So that just kicks off periodically whenever you need it to. [00:04:35] Yeah. So a lot of what I do and working with like our customer survey, so like the feedback and at different points of our sales process and we we have Data like we have it in a platform, but then we also take that and export it into like Data warehouse. So tables and stuff. So there's more. For ways to kind of get at it, and it's usually a lot of what I end up doing is there someone comes up with a question and it's just like, oh, where what's going on here? How how is this coming through? So it's I guess I'm trying to figure out, like for those ad hoc ones, like getting more exposure. So that way I can feel more comfortable doing that versus like the regular the more regular frequency ones. Yeah, yeah. [00:05:27] I think that'll be a reasonable approach. Do we get a surprise guest here on the office hours? This is the one and only legendary three votes in this universe. And if you guys are familiar with LinkedIn or YouTube, then you will know Shreveport's and he hosts the channel on YouTube called EHI Engineering and he is huge, a huge contributor to the community on LinkedIn as well. His posts are always very informative and his work is amazing. Shreveport's and Matt, I'm so happy that you are able to make it today. Thanks for thanks for coming by giving this introduction. [00:06:05] You are making me logoff. [00:06:06] Not now. Was it not good enough, man? No, it is. It's too much. [00:06:13] I know you guys are. The man had been such a wonderful contributor to the community. It's I mean, it was an honor to interview you for the podcast and to have you take me up on the offer to come into office hours. Man, that's awesome. I'm so happy you're here. So we're just we're just getting stuff kicked off our son who was describing a situation where he's moving from the Excel world and trying to slowly pick up on Python. So he was wondering what type of tasks he should focus on, I guess, doing in Python to make his life easier. Do you have any input or advice to that? [00:06:49] Yeah, sure. So so if you're being, like, doing a lot of data analysis on Excel. Right. It's just excellent. Pretty easy, first of all. But the only thing is as your data volume growth might really crash. Right. And that's where that that's kind of you look at programing languages like I don't know about hardest geared more towards statistical modeling. I still has the library, but not as comfortable as. But I don't ask the market share, so. Oh, well, one good thing is it will feel like once you get into pandas, it's more like a tabloid Data similar to Excel. Only thing is you need to know the respective functions rather than having like Excel specific functions like look up and figure out and everything will out pretty and other functions in pandas. It's I would not say like it's going to be a hard transition. It's it's only like you're getting used to all the python functions and the way of handling Python data types. Right. In Excel you would not a lot of data that constraint and everything. So knowing the underlying data structures and Data type really helps. And at the same time, like a lot of people get lost when they really try to master a doozy of Python to learn Data you need not do that. So either you can take a problem and then start working, analyzing the data, creating Data structures for that, laying around and then running the model. Right. There's no need to boil the entire ocean to learn either. Just just stick to what is required and that will really make it easy. Start with the data structures then Banda's matplotlib for visualization. I know Excel is pretty easy. We just click on the chart icon. Everything comes for you, things are done. [00:08:38] You have my favorite resource for learning. Python is called Python for Data Analysis. That's the book I learned from maybe like two or three years ago. I can't remember. But that book is going to teach you everything you need to know for Python with the application to Data. So it's like an amazing resource and another great resource. I think it's actually free on you dummy. But if it's not free and you the guy has the actual video series, load it up on YouTube and it's automate the boring stuff with Python. I'm not familiar with that book. [00:09:12] Yeah, yeah. Here at the beginning of every month, he puts it for free. It's like the first week. So every month you can go and get a code from like the Python Subrata to get that for free. [00:09:23] Yeah, that's cool. Yeah. Right. Yeah. I just I picked up the book a while ago and I was happy to see that it's got an accompanying course to go with that as well. So yeah. Um Batmen. So hopefully that's enough to get you started on your transition from Excel to Python. You know who else has amazing content are all about Excel and SQL and using Python with Excel. David linger on LinkedIn. So give him a follow. He's got some amazing content on there as well as we got a couple new people that just joined it on to office hours. Got current cards like that to support. The show so much and thank you so much for all your your, uh, your listenership and then we also got Shibani. [00:10:05] So if you got a question, let me know. You're also free to just Harp anybody at any point is free to Harp in with any commentary whatsoever. So definitely don't feel like you have to wait your turn or anything. But yeah, if, if call if you don't have a question then I'll turn it over to either sheibani or current carnage. Varnay That's the three Martin Sheen of arson in the house. So you have questions for him. Go for it as well. [00:10:31] Hi, my name is Shibani. How are you? Good. How are you doing? [00:10:34] I'm doing all this, so I have a question. If I am doing pushing the Masters and includes a master and I'm in us, so now it's high time to sort, you know, career options. But whenever I see that I wanted to be a data scientist. So the first question of every person I ask for review is where you want to go. Like, what do you want to proceed? You know, what is your area of interest? So good question. Would put some spot in it so that, you know, I can research more. I don't know how Data applies to sales team I. Of course, I know like I've done little projects, but I was kind of hoping more into it. [00:11:20] Ok, so let me just kind of deconstruct the question, make sure I understand it. So everybody's telling you that you should pick a niche for Data and you are not sure which niche to pick for yourself. Are you specifically asking how you can use Data science for sales teams? No, I wanted to look into. [00:11:41] Ok, so how can I explore that? [00:11:45] Yeah. I mean, first thing is just start thinking about what type of industries that you're into, right. Like, for example, I would never want to work in the insurance industry and I would never want to work in pharmaceuticals. Why? Because I've worked in both industries and I left them. So I know right off the bat that I would not pursue any opportunities in those two industries. Something that I am interested in, for example, like e commerce to me is extremely so is manufacturing and so is LinkedIn Industries here. But let's just take those two, for example. So those two are very, very interesting to me. Right. They sound like, you know, Modlin customer churn is interesting. Modern customer lifetime value, things like that was really interesting to me. [00:12:26] So I would gravitate towards that direction. Right. It sounds like you're in a situation where you don't even know how data science is being applied to any industry, right? [00:12:36] Yeah, actually, the only experience which I have is doing know, exploring it, Kaggle for datasets it. And you know, this much of my experience, like, you know, participating in comedy. [00:12:48] And so case studies is going to be definitely the way to go. Ask if you got if you had some input, go for it, but definitely read up as many case studies as you possibly can. We'll we'll dig into this a little bit, but I'll let Karen chime in here. And Sebastian, if you had any advice as well, go for it every day. [00:13:09] First of all, it's nice to be in the office of having a single podcast, so it's really great. My finally nice to meet you. Well, and not in person, but. Yes, nice. [00:13:20] Nice to have you here, man. [00:13:21] Yes. I like what they wanted to pitch in on what you want to mention. I was in a similar situation just right. I think I graduated in Grapey had been around this time. Of course that was it. But what I would suggest and they said that to me, like an area where you sort of being in on things where if you are trying to, you know, switch any particular of things like that, I always suggest people to first get the foot in the door where they have something in continuation. Because the primary reason I suggest that is because I, I found it very difficult to convince people that I can work on something that they might be looking at, because in the US they have like an option of so many people applying to that position. It becomes very difficult to kind of support your case. And you have to be very thick skinned in that. I mean, I started out I didn't have any experience and my experience was only on that one year of classes and it was pretty difficult. Like I graduated in May, I got something in September and that was like one month away from my final month in May. I had to get that job. Like after that I would have to pack my bags. So just in that way, yeah, I that's my suggestion. [00:14:55] That's some excellent advice to write first to try to get your foot in the door with any. History, especially if you have no experience, learn some fundamental principles, some guiding principles, and then once again, a little bit of experience are to explore issues that are inherently interesting to you. Sorry, but I love to hear what you have here for Shibani. Yeah. [00:15:17] See, one thing I've seen and don't mistake me with all the all the master's education and college education, it's really algorithm focused, right. They teach you a doozy of all the algorithms, how to write it from scratch, how to use it and everything. [00:15:33] Right. When you go to the real industry to get to that algorithm, you need to build a pretty scalable pipeline at the beginning. That can be your data collection and data analysis, Data, whatever it is, Data cleaning and other or even like feature engineering. Right. That's part of Data engineering as well as modeling as well. I would say like broaden your skill, don't spend too much time in modeling. Not that I don't work with. Algorithms are not onto algorithms, but learn another part as well. It can be some part of Data engineering mastery if you are mastering and machine learning. Yes, because Data science is more than machine learning sometimes like we make it, but it's an automatic machine learning. It can be your Data annualization or any team speculative that you're comfortable with. Master one of the team and also build additional capabilities. Just having all of the capabilities right that would make you more marketable. I know mine is something good to have, but when you are just out of college and working at the Data said, it's very difficult to build that capability. But it's not like impossible that a lot of good data sets available that you can try to understand that mind read about if you are looking for finance that is like American banker's website. And there are other a lot of resources available over that. So like Brodsky's, I know like people say, like you do not do everything, but the industry requires that because you are supposed to say this does not sit quiet for Data to come to me need to be working with the business from the school systems to get the Data and go get it to the model to broaden your skills. [00:17:20] Yeah. And definitely broaden it to be more than just the technical skills as well, because I will be super, super important. [00:17:27] I have a question like as the source said that Prodan your skill expl yourself. [00:17:33] So like what does he mean by that. Like only technical skills are you know what else, like exploding more and more data sets. It's going to practice a lot. [00:17:44] So it's a combination of technology and the domain name domain is what are great and current both are talking about. Right. Try to know the domain. Right. See if you're looking for banking or. [00:17:57] I know, like I said, it is not like insurance, so I don't want to think so. So just go and understand some of the business processes. Right now, marketing is kind of common for all the industry. But if you go to banking, you get to look at fraud, you can look at the investment banking. [00:18:14] Oh, and stuff like if you go to Google Scholars website you and there's a search available, you can search for a particular theme like fraud detection in banking. And you will have a lot of problems over there that I saw using Data sites, a lot of research paper that you can find. So that is one part of it. Second part even from technology, if it all depends on what you want to get to, if you want to get into more like the business analyst or Data in the school, I would just focus on domain equally as well as technology. Right. But if you are looking for more on the modeling side or visualization side or some focus on technology or like a up like, if you if you are learning Data science, you going to be in machine learning, learn all the algorithms at the same time, learn Data engineering as well, which is key to getting the data into your models that the caller was mentioning. [00:19:11] Ok, OK, I got it. So one mistake I was doing that I never researched, I just used to go to Google and see at any random Data and just used to think what I can get out of it. So the point. Thanks. [00:19:24] Yeah. So I is good too. I'm sorry but yeah it's good to improve your technical skill. But sometimes, like the Data said, you play with in calculus already prepared for you. Right. They don't, they don't give the background about the data set. Right. But if you go to a lot of external websites, you will find a lot of data where a lot of information will back and research that particular. [00:19:47] You can find even more details about it and you can build out a real easy kind of data engineering project for yourself using your own data. Right. I don't know if you have, like, any wearable tech, like if you had any wearable tech. AIs is right, or even if you have any type of music streaming service or what have you. [00:20:05] Bank account data, bank account data, you can set up a streaming pipeline, not a streaming pipeline, but you could set up a batch processing pipeline where every night at midnight you have a process that kicks off that just goes and pulls your previous day's activity, loads it in, does some manipulations to it, and then dumps it off into either a cloud database or a local database. Right. And that's Data gendering experience right there at a high level. That's a very valuable skill. So just look for interesting opportunities for you to to do stuff like that. I mean, Data really is available everywhere. You can even have if you wanted to, you can pull weather service data from like your favorite countries and have that refresh every hour and dumped into some database. So just little things like that. [00:20:55] That's perfect. I was actually going to ask what you guys recommend for some Data engineering projects. And before I even ask. [00:21:00] That's a nice welcome, Nicholas. And then Nicholas is a friend of the podcast. He's been here a few times shorter than his one of my mentees from Data scream job. You guys are really lucky today. We got the three robots and Cenovus in the house. [00:21:20] A legend in the game is that this is the last one for you all. But no more. No more, no more. No more Bahaman. [00:21:30] So super, super happy to have you guys here sheibani. Any other questions or, you know, you're more than welcome to hang out for as long as you like, but if you don't have a question, will kind of go around and see if anybody else has one. [00:21:41] And for now, it's it's a great addition. [00:21:46] And I just wanted to add something that I that she wasn't mentioned about this. [00:21:56] One of the other things that I have over the last like maybe a year or so, we also interview for some people in my team, one of the things that we noticed was like some of the other basics, like, for example, somebody has worked on a Data set for, let's say one of the famous ones is like Airbnb Data had predicting something on the price, things like that. Right now, people have that on their resumé. But when we when we are like getting into a discussion on, let's say, OK, how do you do this in, for example, a simple wedding as well, or what are the assumption of a linear regression? Now, what happens there? And this is true for me, like I have been in interviews where I have been asked these questions. Now, what happens is that we have like a lot of questions in our minds about the advanced I mean, I call it A because I don't work a lot in machine learning right now, but it definitely is important to get the basics right. That's my point. So like like I read a lot about statistics, read about SQL, get anything like at least 80 to 90 percent of what is being asked should be explored. And then maybe, you know, you get to a point where I think I is something that even I started out. But I have kind of put it on hold actually, to do some more of reading on the statistics side, because I knew some of the questions that I saw online or, you know, people sharing their experiences, I wouldn't have been able to answer. And that is something that I have to take home. So, yeah, that's that's just I abstractedly suggesting this to people sometimes who have the documents or something like that. [00:23:45] That's just one thing that I would want to add is how do you balance the two hundred thousand creating a portfolio of projects and learning? Because you stick to learning, you can spend years learning all the latest and the tried and true. [00:23:59] And here's a thing you don't need to know all the latest. You know, all you need is the basics and the fundamental bedrock. And equipped with this fundamental knowledge, you can build on it on an on demand manner, right? Yes. [00:24:14] So that's a great question, actually. Sorry to interrupt, but that's OK. That's actually a great question. So to that question. [00:24:23] So my answer is nobody is going to tell you about when in your interview, if you mess up, I still question what says if you mess up a prediction of question, I would you are going to get more for this because that's that's one of one. But yeah. [00:24:44] So SQL and statistics and some basic, you know. [00:24:48] Yes. All the bedrock. [00:24:50] And I'd say anything that you do put like first of all, your portfolio, keep it clean and clean. I mean, you don't need more than two. The three projects on there and make sure that they're just airtight so it could work and make sure that these are questions that you able to answer, like upside down, inside out, like left, right, center, like any question that comes from any direction you will be able to answer, because that's what they're going to expect in an interview. Like, my favorite thing in an interview is I mean, obviously there's a set of questions that I have to ask that's consistent across people I'm interviewing just for the baseline comparison. And after we get through those questions, my entire focus is on what is on the resume. So if there's anything on the right and I've had people do this, I've interviewed hundreds of people and something about the resume and they're like, oh, well, you know, I don't really remember how I did that. That was so long ago or I was just working on a part of a team. I did one little piece of it and it's like, well, why is it on your resume if you can't talk to me about it? So I kind of have that threshold that make that be like a mental threshold for progress. [00:26:02] So I applied enough, though. [00:26:05] I had two to three projects is definitely enough as long as they're really well done, really well executed. I mean, you have the solid repository structure, your code is well documented, commented, making sure that you're using helper files when necessary. Make sure your top level red meat is just airtight narrative. I'm curious, Sebastian, what are some things that you look for in a portfolio project when you're reviewing candidates for a position and you can ask these. [00:26:39] Oh yeah. Regarding the resume, you said about the project, I have like four project listed in my resume, which I have done. Okay. Now I can exactly tell what I have done. [00:26:54] I can give my answer that why did I choose this model or why I chose this language? But what it's like sometimes it happens like, you know, everything but one interviewer asked and it's all out of context, like it's a hidden question in it. So how to tackle that situation? [00:27:12] Like, I know, like I was not I'm just not able to tell you properly that it's a Dulmatin site question that I never thought of it. [00:27:23] Yeah. So that's how you manage the question, right? Like there's always the question behind the question. Right. And you can always first pause and repeat the question back to them and dig a little bit further. Right. Like if you require more information or you recall, if the question is not clear, then the worst thing you can do is just sit there, pause like a deer in headlights and not say anything. So try to drill deeper. Right. And offer up information and pause and ask, is this what you are asking me? Like, you know, making sure, staying consistent. What do you think? Three hours and now. [00:27:58] Yeah. Yeah. So so let me just go to the portfolio question. Right. And let me speak because like my portfolio continue to some of what you are asking. So you're building a portfolio. Make sure like the projects are somewhat unique. Right. The reason is like ninety percent of the resume, I see a portfolio, you have stock prediction and you have our direction. Right. It's almost common. And I'm not sure like if you're seeing resumes, you might noticed it. So so I kind of try to get hold of some unique data set. Right. And I'm not sure if you see my channel, the YouTube channel, I have a video on outdebated portfolio and I also have a playlist containing projects or data science portfolio. There are a lot of good data set available. And the second thing is in real world, the entire data will not come due. [00:28:52] In a single case, we find great sporting events. Right? If you are taking a LendingClub Data said right. Which is nothing but customers who can differ. So you can always correlate to macro economic and macroeconomic scenarios. [00:29:08] And these datasets are available online. You just need to pull in that multiple dataset to create some correlation and then model it. The now coming to the modeling part, when when you're modeling it, make sure you just don't like create an algorithm and then look at this works as much. Try to create multiple algorithms and try to present as kind of a conclusion on what you did and why did you select those and how you can explain the outcome. Right. Because I can I can take a look more like unfitted and I may get the best accuracy, but if you are not able to explain the outcome of your model right, then, then that algorithm is of no use in some industry. So the more you know your algorithm, you are able to explain it. That's target point to this all you can production AIs this right? You. README file, which was talking about trying to make it kind of a thesis paper, you can not that lengthy or at least an architecture on how you can or you can take this model deployed in Tikrit. So that's that was, I think, Nicholas asking. Right. So that is one. But not coming to your question, Shibani. Right. The real world question will always be like out of syllabus. Right. Or whatever, whatever, whatever you learn you do and everything. Now, when they see like what we are done, sometimes you will not be able to answer. It takes some time, but I know there's no one straight answer to it takes some time. [00:30:41] And maybe if you're not getting the question or the interviewer is not able to explain, you clearly take some time and always like ask, is this what you are looking for? Or you can give me more details for me to answer. [00:30:54] Right. So that's the only way and the best way. Like, again, rather than passing completely and the LinkedIn give us, I guess, give it a try. [00:31:02] Ok, so like let's suppose I have like I have done a project on machine learning for false detecting alarm. Now as you said, like I should take a pause. And before you said that, if they ask me why did I choose this model, and it it might have like a number of answers, but if I'm not able to answer this question, like answer this question to their expectations, how will I know that at the same point that you are, you're giving your best shot, the entire one we might have some might have that would like to hear your solution in mind? [00:31:40] No, you cannot. But but you are giving your best shot. But if you are able to answer it. Exactly. OK, I understood with these three algorithms now these are the metrics I got, but I feel like this algorithm will be better as I am able to explain the outcome of what I eat. Back to your Data source. Right. And then you give a best shot. You know, everybody has their own point of view. It's a science that is not like one way of doing things so that we do that. But but if you are able to put your solution, if you are able to confidently defend what you ordered from your perspective, from your satisfaction, that should be a good win for you. [00:32:18] Oh, yes. I was going to say pretty much the exact same thing as Boston, because at the end of the day, like, you are the person that knows this problem inside out. Right. [00:32:28] So the reason they're asking that question is to see how comfortable and confident you are, because there's a question behind the question. The question is how comfortable and confident are you defending your position for a chosen methodology? And were you aware enough from a train of thought, the process to have at least tried a couple of different things, established a baseline, and then against that baseline, come up with some alternatives that either perform better and that among those alternatives are performed better? Like what else did you consider? [00:33:01] The question behind the question is really how comfortable are you defending your position and did you think things through? Right. That's what they really want to see is your thought process. [00:33:12] All right. I'm a little bit less clear than before. Thank you so much more. [00:33:18] You definitely feel free to hang out. And if you have questions so anybody else have questions, go for it, because this is the part that I will edit out when it's on the podcast. Everybody on YouTube has to sit through this stuff because I don't edit video. We have anybody got this question? Definitely go for it. Not not too often that you get to interact with swimathon himself. If you guys haven't got a chance to check out his YouTube channel, I'll link it into the show notes definitely one of the most handy resources exploring, exploring every aspect of machine learning experience. And I'm curious now, when you're setting up content for your YouTube, how do you decide what to focus on? Is it from what students have reached out to you saying that this is what they're interested in? Or is it kind of pushed by what you're interested in at the moment? [00:34:13] So it's a combination of three factors, right? Like one as well. One is what my my YouTube subscribers reach out to me they want. Second is when I created the channel, I didn't want to go with the regular content of teaching. They are basics and everything. I wanted to keep it more towards our industry work. And that's that's my first play. Fast playlists. I created what I'm talking about animal engineering, because there were very few content on in engineering at that time and it started. Right. So that was the part then. It was more focused towards projects, but I'd be NLP or now I'm running a time series course. Right. So there's. Focused on projects where you can relate it to how you can take a data set on how you can apply your algorithms and basically also get insights. So it was more that we also, like my experience, comes into play because I'm not knowledgeable in all the reinforcement learning or even creativity or whatever it is. Right. [00:35:24] What I know, I kind of came back to that one thing I always have students ask me during office hours that this ice cream job is students are always like, oh, do I need to know this thing yet? I need to know this thing. Like all the all all the jobs I see say they need me to know deep learning. And it's like. Right. Well, you've looked at three of them in the summer, three out of the hundred. But when it comes to trying to figure out what to learn next, what is your advice to students? Is it go back and study the basics? Is it studied this particular algorithm? Is it study this particular text at like what if somebody is at a point where I feel good at the basics? Now what what would you tell them? [00:36:08] Some basics is very important. Now, if you date back to industry as well, most of the projects that are deployed in production are still the regular machine learning stuff. Right. And it's it's not like everybody use deep learning or transformers or everything or whatever and deploy. I'm not telling it. Not they don't it's not that they don't use it, but 80 percent of the project still runs on your traditional machine learning algorithms. We still deal with a lot of text Data. Right. If you're debating insurance, if you're doing a policy underwriting or risk claims, albeit in banking or any industry, you see we deal with a lot of structured information and traditional machine learning typically works with such information. So get your basics right. I'll be the statistics guy. Sorry, sorry. Thought somebody was asking questions and so be the statistics are you are beat the machine learning algorithms and everything right now. The second part comes to about being more kind of the industry or what market expectation really pick up one vertical. It can be your natural language processing, it can be computer and big time series. Right. So pick up one vertical and try to master it. The next aspect is try to see like if you can also learn some cloud technology because not a lot of budgets are going to cloud. It can be audit just to make or Cloudera backflow monisha. Right. I don't we don't have time to learn everything. But but we have we are going to face in this data science world, right. Where new things keep evolving and the adoption is also like moving towards cloud and other technology. [00:37:55] Thank you very much. I appreciate that. Welcome to I'm to put your name. Abbasid Belkacem. Belkacem, current government. Belkacem. If you got a question, feel free to just hop on at any point. Same goes for everybody, but I can go for it. [00:38:10] I had a question about it. So this is something that I wanted to ask about was on in terms of languages. For example, I know a lot of what I do right now is so this is an opportunity for anybody who has any insights on this would be helpful. I do a lot of work where I'm not doing a lot of people on actively in my work. Now, this is something that I want to do in terms of predictive analytics, but a lot of which I do is like more and more on the descriptive side of things where I'm using Tableau all contextual side that a bit of a SQL here and there. But so the kind of jobs I'm looking at and it requires and a good amount of hands on SQL, you know, even PyCon, is there something that you guys suggest about including. Of course there are special projects that we can do, but I think there might be a SQL sometimes becomes like ah maybe like lack of hands on actual experience becomes something of a deal breaker and is it's all based on my experience. Is that something that you guys first like to question, shows like something that you might agree with or second is adding about how do you think there is humor or something that needs to be included that would help me convince that number to out myself here like I suck at school, like I'm not that SQL like I know enough to get the job done, but I will slice and dice, summarize aggregate. [00:39:58] Organize my Data in Python and then dump it back into a single database. [00:40:02] That's what I'm working with a lot more comfortable, a lot of. [00:40:07] Yeah, that's just you know, that's at the end of the day, I think it's important to focus on your how you can deliver value the most. And to me, trying to do what I do in school, I can do in Python. So I just do it in Python. That's not to say that it's not an important skill like know, but not discounting the importance skill at all. Three. But what do you think? [00:40:31] What do you think Inspector is so like SQL, if you really ask me, is very, very key. Right. I would start with that. The reason is if you take Python, right. If you want to kind of summarize the Data group by your OK and start group activity and then whatever you want to roll up. Right, dot, dot, roll up, you can do everything in a single sequence statement. Right. If you understand sequence better. The second thing is the support of it. [00:41:01] That's the main aspect, because when you're dealing with the CSP Data right for Data in a file you typically use can use the panda's function because CSP s not support SQL queries. But most of the time when you go to real world, you're dealing with databases who may be dealing with Cloud, McLardy or Redshift or any of these databases. And those are the only language that they talk. So you are we of accessing the system changes now? You don't have to really be a master in a SQL to start with, right? Because that's kubernetes not difficult to learn. And with Google by our side, we always have a lot of solutions that can help us. So start learning at least the basics of a SQL like it can be as simple as your window functions are grouped by and everything. Those are really not difficult to load. And when you go to real world, yeah, maybe you need to write a lot of subcommittees, giants and everything at the time. You can pick up those skills and there's a lot of resources to learn that SQL right. You can even sign up for Google Cloud. You can get a three hundred dollars credit on sign of on and you can go to Biglari. They are plenty of huge datasets and also working solutions document that you can just play around with it and quickly learn SQL in the time. Right. So you can do that as well. But think about it like you can play the tandas, but if you go to industry, you'll be dealing with a lot of Eskil data sources and you don't have much option that. [00:42:37] Yeah, yeah. So sure. And thank you for that. And I'll just ask one more question Harpreet Sahota for that and I'll then go back to the go go for the concern here was that so I agree with you. I have been trying to do things on platforms like Hackleburg, even on my you know, even in my work, I try to, you know, get to talk to people about how they are working with databases. My concern is, and over the last six months or so has more is more around. When I mentioned that I don't have hands on experience explicitly, although I work with a tool called All Trades, which is essentially like picking up things from a SQL database. The problem is then it's sometimes I that is my hunch about it, is they wouldn't consider somebody who is like is not doing hands on a skill every day. And unfortunately that has been my experience or all. But yeah, that's that's something that I try to experience. [00:43:48] Doesn't have to come just from work. Like you can get experience though, having a job by taking on personal projects. So that's that's the important thing, really. Anybody can set up a SQL Server on their local machine and anybody can pull data and have it dumped into this local SQL Server. [00:44:06] And I have a suggestion for as well. Yeah, like you have to start and learn within just two or three days. [00:44:16] There's a website called WWT Schools there. You just see the code and you can you know, they give you a practice type of area there. And it's very easy. It's very fast. You can really learn in just three days the main focus whenever you go in front of me because I SQL is like my strongest language. So whenever you think a person interview, they can only only ask you a question on Joynes. [00:44:47] So according to my point of view, once you know how to make functions, once you know how to make classes, and once you know how to make procedures. The main thing comes, they'll ask then to use what so that will only come down once you choose the dataset and start using Joynes what is left showing what is right Choying and making it own definitions in your mind. And whenever a person asks you don't follow the definition, which is in your end, like you can Google out, or the definition that you can store in your brain. At how I connect, I just write down small formulas like I joined Drainville wildlings, I Data and matching Data from left. So like in a very short, short words, make your notes and keep those notes handy. It will just come out for one page or two. And if you have that single page with you and you have nothing to worry about, you just need to know. [00:45:46] That's a very technical questions. Yeah, an easier way to come to every single after is to give advice, because that's really good advice. Thank you so much for for for sharing that. Go for it. [00:45:57] I've been asked questions about like the order of operations and school that I wasn't prepared for. This is a while ago about how each of the different clauses, the order that each of the classes are triggered. I've also been asked to practice problems and some of them were more advanced than others. And there's a lot of other questions that I guess depending on the position you're applying to, if they really care about SQL, sometimes it's a little more that. [00:46:24] And so this is like this is very basic once you start. Okay, so force will come select then from there, then drop by. And then at the last hour, Gerbi. [00:46:35] So in that, in that question there's also having and other functions I know very well. [00:46:43] So the answer to is this like select from and there will always be the same Andrest will depend upon how you write. OK, so whenever the interviewer asked you that, tell me the order. So just give these answers like the definite like the order, the business and then it depends how you want to use some. [00:47:05] Thank you very much for that. [00:47:07] You seem to answer your question right. [00:47:09] Like you may be given a problem too and ask us all Enescu right. [00:47:16] And many times like it. It's more the approach the interviewer took. Looks at how you are kind of solving that particular problem. It can it can be a simple problem, but I don't think they are going to give you a Data and then SQL and find it right. But definitely they'll be looking at Harp was one of the typical question. Big was how a employee and had three previous designation. It has three to see that two tables employee and basically its rules. I want to convert the three rows to an adjoining two columns. So it should say like employee name designation one column designation, Bucaram designations. All right. So you're what we are looking at. We are not looking at the exact answer. Right. That is how we approach is converting from wrote. The column is not easy task, but it's not really hard. But so we are just trying to find, OK, he's able to approach the problem. If he's able to approach the problem, then we are sure that when we get him he will be able to like approach any problem in SQL out that there will be some questions which are not straightforward. We are definitely going to ask what will your group. Right. Typically we go little more than what the regular schools in Texas are giving enough. [00:48:35] Some awesome advice for school. Thank you guys very much for sharing your experience. Appreciate that. [00:48:39] Thank you. Thank you. So sorry. And yeah, I it on maybe this was probably my experience here because even my discussions about what I've been working with, we are working with Data our like I work a lot with organizational hierarchy Data so we get to do like a lot of my favorite things like that. But and maybe it was a small number of people where I would I was able to explain them how to do it. And in terms of how I do it right now with some of the tools that I use, but they were very, very consistent about, you know, just that you don't have any hands on school experience. And I know I'm repeating that again and again. Right. That's yes. That was my experience. [00:49:31] And next time that question comes when they say, do you have hands on experience, you can just tell them, yeah, I do not at work, but I do it for my personal projects. And, you know, that experience to me is hands on. [00:49:44] And I yeah, it's kind of different, my my convincing skills there. Yeah. Well, when you're when you're in that interview setting, like the words that I don't like. [00:49:57] In that interview setting, like you obviously don't want to say anything that would be detrimental to your chances of getting the job and you also simultaneously doing it live right by by flat out saying, yeah, I don't have any hands on experience, and that's all downhill. Don't do that yourself. [00:50:14] Yeah. Yeah. Yeah, I don't think so. [00:50:21] Can you suggest me some video links or somewhere where I can see, like how people how effectively you can express your project in front of a recruiter? Not let's not do that in front of anybody. If I want to explain my project, I will explain my project. What I know like to keep it effectively is also really important. So I wanted to see some, you know, some examples of it. So I just made some videos on that. [00:50:51] I don't know that any videos of people give an effective Data science presentation skills, but I mean, it's just storytelling, right? Like, first of all, I want to feel. [00:51:01] Yeah. So you think about this. You are presenting that for yourself. You're presenting for your audience. Right. So first, put yourself in their position, I think, about what it is that they would want to hear you make the assumption that whoever it is that I'm presenting to, obviously, if it's a teammate or whatever like you, you can assume that they have more technical knowledge. But if you're doing it in front of, like a recruiter or for a job interview, just make the assumption that nobody knows anything about the science and present at that high level, while periodically pausing and saying, I can go into more detail if you'd like, just let me know. I can go into as much detail as you'd like. Right. [00:51:39] But it starts with the first thinking about, OK, who's on the other side of the table and how are they going to be perceiving what I'm saying? Yeah, I'm so sorry about some. What do you think? I guess let's flip the question like this. Definitely. Go ahead, answer my question. [00:51:56] But I'd also like to get your perspective on what are some of the non obvious skills that a Data scientist should have due to one other thing, like when you're going to interview a lot of people that we interview with the most researching on the position they are coming from. Right. And it's easy to research into this where everybody has a digital footprint. Right. To say, like, you are going to get interviewed from X, Y, SQL the contact of them. In the end, you're in the middle, right? So you get all this go and research on what they work for. They can be working in the cybersecurity space or they can be with the Amazon Alexa team who exactly know what, what or what the people you are talking to. And based on that, you will also know like what is the questions? You can get it. And that that you can you can at least like 34 percent or 40 percent of the questions who can inform people from the position or from the post they make. Right. Like they may be on LinkedIn. They may be talking about something as some question. So the first thing is research the background before going to interview. And that would solve a lot of your problem. Right. That would also improve the way you present it. Now, if you have somebody is calling you from the same security team and one of the plot for the portfolio project, your biggest fraud detection, what I see in cybersecurity, you are detecting outliers or hacking into your network in order to change your article. Detecting outliers, something like this has been done, something like that. You can try to map your experience to what they may look at. Right. And that's a good way of presenting it. I don't think there is any video that will really tell you how to present because it's very context specific. I'm not sure. I don't see any storytelling we do as well. But it's difficult if you understand, in your interviewing. [00:53:41] Yeah, there's actually I don't know if you have LinkedIn premium. If you don't, you should because LinkedIn learning. They've got some amazing stuff. [00:53:49] I'll put a link to this, but there's a LinkedIn course Data science storytelling for Data science against them or tell stories with Data. It's written by Doug Doug Rose. Dubrow's awesome. I got one of his books on my desk right here. He's got some amazing content so definitely check that course out. And another person I can't pronounce her last name. [00:54:11] Coal Nazem Power last summer Marfleet there you got that person. She's got some amazing stuff on storytelling as well so definitely check that out. So those are some kind of more direct resources for you Shibani. So she might she may even have a course on LinkedIn call, too. I'm not even going to pretend to know how to spell her name. OK, I got her book. Storytelling with Data is pretty good too. Yeah, yeah, yeah. I clearly can't spell that. But check those, check those resources out. Yeah. There's one. [00:54:49] A small suggestion that I would also like to add is you can also try and I tried this. During my masters, is that when I would sometimes present it to people like friends who are not aware of what I'm looking like at that point, they are not doing any Data courses and spend some time with them, just doing maybe a two or three minutes of presentation, not not to wear them out with a long ten minute presentation. I used to do that a lot and I got annoyed, but that was fine. I got some good feedback or even more like in covid and that I actually got in right. Now you can record something where you are presenting, you know, two and two and a half minutes of small presentation and asked people for feedback. I mean, there are two ways of doing it. The one that I mentioned was consuming that Data like I'm doing something that you are learning, but if you are not doing it actively and most of us get it while working right, even like before I started walking, it was pretty bad. I think we everything that we did in undergrad AIs. Yeah, I don't like looking back at it, but that was some, you know, some it took some time to you know, that's a and I, I know that you have to get to it. There's no back. [00:56:14] That's what I was going to suggest to you, Shivani, as potentially we are speaking with your friend. You could set up a Data science lightning talks event. [00:56:24] And that's a great way to network with other data scientists and kind of learn what's going on in their their job search or if they're already have inroads into a firm that you might be interested in, just having a platform to give practice presentations to each other that are five to seven minutes in length, like you all just get better with time. [00:56:46] Thank you very much for that endevour. Any point if you want to come here and pretend like we are stakeholders and give us a presentation, you're more than welcome to to swing back to these open office hours to to do that. I'm happy to help you with that. I thank you very much for the advice, Nicole and Karen, that some awesome, awesome advice. Also did a interview with TI Scott and Daniel Link that here. And he talks about how to storyteller Data as well. So definitely take a look into that episode. But if you don't have LinkedIn premium, I don't know. [00:57:17] I have it at my college. I can. [00:57:21] Yeah. So I link that Telstra is with Data for you right there. So that should be pretty useful for you. So open it up for last couple questions. We're just growing up on the hour here, so if anybody has a question, go for the lightning round question. [00:57:37] Do you feel like masters programs in data analytics, data science and or computer science are valuable? [00:57:46] Um, yeah. I mean, I've got a master's degree mathematics and statistics. I mean, they're definitely valuable, otherwise people wouldn't get them. But valuable for getting a job. I mean, I don't think they're necessarily valuable for getting a job. I think what matters more for getting a job is practical experience in the form of projects. I've learned far more outside of school than I ever did in school. Um, so they're valuable if if you want one, are they valuable for getting a job? Not necessarily. What do you think, Susan? [00:58:21] Are you answered it, but if you want one. [00:58:24] You said that. Yeah. Are you pursuing. Right. Sorry. Sorry. I'm sorry. Go ahead again. I was going to ask him, I think about pursuing one but yeah, I'm happy to hear anybody else's thoughts on this topic as well. [00:58:38] So I hear about it. I also think it depends a lot on at which stage of your career you're doing your masters. I, I, I saw different people in Data having different kind of values from my masters where somebody who has been working in the same domain was, you know, focusing a lot more on getting connected to professors, alumni's, various things like it. [00:59:05] And just as I was trying to just, you know, keep up with the pressures of some of the courses I have taken, I could see different people doing different things. So and that's the best way I think about it is now sometimes people just do it online. Right. And that's a good way to make because you get something you know, you need something in your work, go ahead and do something about it. You know, it went online maybe a six months course. And that works, too. Yes. And especially if it means what you want out of it. [00:59:40] If you're looking to get into research, then I think it's probably more useful there. Like, for example, like I was about service station for a longer than I want it to be. But you can't become a biostatistician without like a masters degree because it's intensively research focused. You're designing experiments, you have to have this deep knowledge of statistics, I think. So if you're going into research, then that graduate level training is going to be valuable. If you're going into industry and you're planning on staying in the industry, you probably won't be as valuable as practical hands on experience. Wow, thanks. Cool. Take it for any last minute questions. [01:00:21] If not, then yeah, I've got one. Yeah. So one of the things that as I've been going through and kind of doing projects are kind of following along. I've been trying to find or I guess the only suggestion is for finding data sets that are messier than what you would get something from like Kaggle to work on, like the cleansing in the preprocessing, because I know that's probably the most difficult part and I'm just trying to find more. And that's not something that's outside of work, I guess. [01:00:54] So where where do you live? What city do you live in? I'm actually just outside of the Twin Cities. All right. So we're going to put Minneapolis and they're going to put open data portal. Right. Most major cities in the world have a open data portal. So it looks like city Minneapolis does have an open data portal. By the way, Minneapolis is freaking awesome. I love it. They're my favorite brewing company is dangerous, man. So I like. What is this? I don't know what this. I am a human, actually. Yes, that's true. Oh, my God. [01:01:32] I have no. [01:01:35] Yeah, but an open Data portal. You know, who's really easily accessible is New York, New York, Open Data portal. They have some amazing stuff there as well. So open Data portals. That's like UniSuper City government Data. My friend Mark Nadelberg actually did a really interesting project. He's here in Winnipeg and he founded Data Set. And this data set had all the trees that were ever planted in the city of Winnipeg, categorized by neighborhood type, all this interesting stuff. And you get a project based on that. And it was Nasse Data Real World Data. I was attempting to do a project, but then time issues where I took data from parking tickets. I took a parking ticket Data couple the parking ticket Data with socio economic data for that neighborhood because they had that granularity. And I was trying to test and see if police officers were targeting poor neighborhoods and giving tickets in poor neighborhoods. Right. So these are all all examples of like messy open data that you got to work with, open data portal waitressing. [01:02:46] Yeah, yeah. And the one that you mentioned about the freeze is that turned into a dashboard because I remember looking at something only that looks awesome, like he did it. [01:03:00] I don't know if he's revamped or anything. He did it all in Python wasn't a dashboard. It was in OK. He he had it he was doing like Japan does. [01:03:09] He did it mostly in Japan. Oh, yeah. It was really cool. Is a cool project. Definitely. Check that out. I see it cool. Yeah. They go right on. The issues are plugged, the links right there from work. Thank you for that call. Yeah. [01:03:22] Open Data portals man that that's a good episode. He also talked about face repetition. Really. [01:03:28] Yeah. Yeah that's right. That's his thing. Mark gave him an open invitation to come on the show. He said his son was just born a couple of months ago, so he's probably tired. Well, if there are no other questions, I want to thank you guys for coming on to the show special. Thank you to three of us and for taking time out our schedule to come and give such a valuable advisement. I really, really appreciate you being here Monday. The episode coming up, releasing that Sady St. Lawrence. That will be awesome. The week after that, about an episode releasing with Maya Grossman. If you're on LinkedIn, you might know of my grossmith. She wrote the book Invaluable. That's can be an awesome episode as well and a bunch of other cool stuff happening in the near future. See what's in my mind. Thank you so much for for being here. [01:04:13] I appreciate that everybody else. I really, really appreciate you guys being here asking such insightful questions. This episode will be released on the podcast on Sunday. So will the video. So if you missed anything, you can always come back and listen. Take care and have a good rest of the evening. [01:04:32] Have a great weekend. And hopefully we find out who the president is sometime soon. [01:04:38] Thank you both. And by the way, this is awesome. Thank you. Thank you. Thank you. Thank you. [01:04:44] And take care. Everybody back.