comet-ml-march13.mp3 [00:00:00] Welcome, everybody, to the comet Emelle office hours powered by the artist Data Science, super excited to have all you guys here today. This morning, I know it's Daylight Savings Time, so you guys are probably up a little bit earlier, I guess, to be here technically, but super excited to have you guys here. I know that there's I don't know, do they still practice daylight savings in the EU? I'm not sure, because I know a lot of our you know, a lot of our audience is from Europe, so they might be confused and show up accidentally an hour later, which will be interesting. But we do have Cristoff here. How are you doing, man? [00:00:36] I'm doing fine yourself. I'm good. Good. So where are you located? I'm in Germany, but just like you said, I realized like an hour ago that it's one hour earlier today for us. Yeah, but for me it's usually six p.m. and today it's five p.m. and it's only for two weeks because in two weeks we have this timesaving or how it's called. [00:01:02] That's interesting when it does kind of confuse things and mix things up, because I know that a lot of the people that do come on on Sunday are in the in Europe. So hopefully people don't show up an hour later. Probably should have been something that we had announced that, uh, I forgot it was actually lifesavings. I woke up this morning and, you know, I wake up pretty early and the clock was an hour ahead. And I was like, wait, hold on, am I here? Am I waking up late? So I see you haven't having a beer there. I hope to have one little bit later this afternoon. [00:01:35] Type of beer is that it's it's a German. It's like a normal lager, I believe. OK, what's they called Pittsburgher is they say I'm still drinking coffee. It's late here, so I've found one. They go, man, how's your how's your day been? It's been fine. Yeah. [00:02:00] How about how about your week. Have you been up to anything new or interesting. [00:02:05] And I, I learned like every day I wake up at five a.m. every day because and like this I can work for my pi myself until like eight I have a little daughter. She'll be one next week. Oh wow. Nice. So, so next week I'm not going to be here because we are celebrating. [00:02:30] That's cool man. Yeah. I've, I've got a son. He just turned ten months old that month. Yeah. Recently. So I know exactly the struggles of being a new dad. I was waking up early, you know, 4:00 a.m. for the longest time. But right around February I started waking up. I like more like five a.m. just because I guess I was just burnt out, tired. But I've been slowly, slowly waking up earlier and earlier. [00:02:56] So hopefully get back to four a.m. sometime soon, probably. Now that we're ahead an hour up, I'll be able to wake up at four a.m. again. But it's the early morning times. They're very helpful to make the day go a little bit smoother for me. That's what I do like meditation. I'll do my journaling, I'll do it. Then I'll start doing like work. Um, typically the work I'm doing right now is like I'm I'm focusing on my podcast. So a bunch of just preparation for interviews and then editing those interviews and transcribing them and annotating them and getting them all uploaded. So going through a sprint for that, like right now, as of right now, I've got enough. If I was to stop recording today for the podcast, I'd have enough episodes to last me until the beginning of October. And I see after I'm done with this sprint, which is from now until the end of the first week of May, once they do all the interviews I have lined up there, I'll have enough interviews to last me until February next year. OK, I see plenty. Yeah, yeah. I'll take a break from recording in May, then I'll spend some time editing, transcribing, annotating, then uploading everything, getting everything scheduled out and then use the rest of the year to just kind of focus on other other things than the podcast. So I'm looking forward to that. [00:04:14] So how much time does it cost to make one single episode? [00:04:20] Yeah, so that's sort of each episode itself. Right. So I typically put in in terms of research for each episode, anywhere between four to twelve hours of research. And that depends on who the guest is. Right. If the guest is the author, then I'll listen to the audio book and then I'll read the audio book because I usually read the physical book because usually or something the book as well. So I listen to the audio book. Well, actually, let me kind of describe my entire process, how it goes right before I before I start to research and read my guests books and stuff. I'll listen to a few podcasts I think that my guest has already been on if they've been on podcast. So listen to a few episodes. Just kind of see what what it is. I mean, main talking. That's what it is that they're really good at talking about, and then I'll start listening to the audio book, then I'll start rereading the book and highlighting things and doing all that. Then I'll write the introduction, then I'll come up with my questions. That process can take, you know, 10 to 12 hours spread across over four days or so. And then the interview itself recording. That takes about, you know, hour and a half or so. Then editing is another hour and then transcribing, annotating, uploading. That's about two hours. So each episode, realistically, in terms of time to produce, can be about 16 to 20 hours to produce one podcast episode. Do you have full time job? I do, yeah. Yeah, I do. Wow. So it's like this is why I do a morning, morning time. OK, but still it's like at least a week. Yeah. So usually takes about a week to prepare for the interview. And then once the interview is recorded, I get to probably just take me like one entire morning to, to get that, that interview ready for publishing. [00:06:04] So it'll be like a week of research and prep then record and then like four hours to, to get it ready for four to five hours to get ready for publishing. I see a lot of work but I enjoy it man. I mean it forces me doesn't I want to stay focused because that's not really the right word because I'm doing it voluntarily, but it gives me an opportunity to explore a whole range of various subjects. So in the coming weeks, like later this week, I'm interviewing Andy Hunt, who wrote this book, The Pragmatic Programmer. He also wrote Pragmatic Thinking and Learning. So I'm interviewing him on Wednesday. I'm interviewing, um, John Vagi. He's a cognitive scientist and philosopher. He's a pretty interesting guy, interviewing a few other people. I mentioned Barbara. All created into you. Yeah. Yeah. I interviewed her at the beginning of January, that episode. I've I've, uh, I've scheduled it out. I haven't done the work to prepare it, but that is that one will be coming out a little bit, a little bit later, probably midsummer, early summer. So that's a good one. Yeah, but I'm also interviewing right now doing prep work for Emily Balcerowicz. So she wrote the book, uh, closer to Clear Better or something like that. No, clear, closer, better. That's the name of the book. And she was recently on Good Morning America. So that's pretty big. OK, AIs. Yeah. So looking forward to uh to that. Looks like Torx was here but he just dropped off but yeah man. So that's my process for forgetting the podcast up. It's uh it's a lot of work but I enjoy it. [00:07:39] I see. So uh do you feel like waking up so early? I mean when you wake up at 5:00 or before 5:00, it's like a struggle for you. It's like shit again, I have to work. Oh, it's like, oh, finally I can do a little bit of what I like. [00:07:59] I mean, it's always a little bit difficult, right? Like the natural tendency is just like I to sleep. I just wanna stay in this nice warm bed. It's so comfortable, so cozy. Like, let me just stay here. Like, there's always that struggle. But I've got a I have to force myself to wake up and get out and start doing stuff because, you know, if you can if you could do things that you don't want to do, then on the other side is greatness. Right? That's kind of what I have in body. That's my philosophy. The same reason, like I take cold showers, like I fucking hate taking cold shower. Like, I hate, hate, hate that cold water, acetylene needles, like feel like needles like just poking my entire body. But I do it anyways just because I don't enjoy it. [00:08:37] Oh I know, I know what you mean. [00:08:39] I followed the Alex Friedman. Yeah. I guess and he talked about it and I tried like three, maybe four times. I'm not ready for it. I can do it. It's yeah. [00:08:54] It was a gradual kind of thing for me. Like at first I would do it. I would gradually I'd step into the shower, it'd be a hot shower. Then I'd gradually like twist in different directions until it got cold. Now I just jump in and it's just cold water. Oh yeah. It's not it's not comfortable. It's not enjoy. [00:09:12] I don't know what I mean but it helps. I've heard plenty how much good it does for, for, for us, for the body but also for what you say that you learn how to do things that you hate. Yeah. Yeah. And this is getting you outside of the comfort zone and I immediately. [00:09:32] Yeah. I think that's probably a people that's how do you do all this stuff. And I'm just like I just, I just do it like I just focus on the things that are important, the few things I try not to get distracted by the rest of it. And I do it even though I don't want to. And I just I just push through it, you know, like here's like an example of all the stuff I had to do this week. I you know, I've got my my weekly calendar just completely, you know, keeping track of what I do. And this has been help. Well, as well, and this this is good, because when you do have so much to do right, like for me, I've got I've got these office hours that I need to to do work for. I've got my open office hours, needs to work for, but then also for Data cents. Dream job. I've got a course that I'm teaching on SQL every week that I need to prepare for. And then on top of that, providing slack support to the students that it's his dream job. And then on top of that podcast itself, LinkedIn content and then work somewhere in between all that. I have to find time to actually make money to my activities. Yeah, it's it's fun. It's a challenge. I enjoy it. I think about what I like, what I used to do before, like how I genuinely felt. Like I like I look back at how I used to use my time previously and I'm just like, what was I doing, man? Like, what the hell was. I would just sit around and watch TV, watch movies. I wasn't really doing much of anything. Um, and now I look back, I'm like, man, that's really it has this place is good to do it, but every now and then. But like I look back at how much free time I had wasted, there's times that I could be used to improve myself, improve my character, get better at something that I squandered, but kind of making up for it now, I suppose. [00:11:11] So how long have you been doing what you what you're saying? Like waking up early. [00:11:15] And so we're all this. Yeah. All this really started on say mid twenty, eighteen, twenty eighteen. I was just, I was just, I was a very low point like like beginning of twenty eighteen is when I decided I can't do this by stats job anymore. Like I hated my company, I hated working there um because putting in so much work I was doing great work but I don't wasn't getting recognized for it. Maybe I'm biased, maybe I really wasn't doing great work but I felt like, like I'd been putting in work and I just wasn't getting anywhere at that company. And a lot of it had to do with that company culture and what it was like. And I just couldn't do it. So I started really thinking about what my next move would be, what would be my next pivot, where am I going to go after this? And then that's kind of when I started looking at data science and from February until about Touba, probably July twenty eighteen, I was just all over the place like I didn't know what I needed to to do. Right. I was just bouncing around resorts to resorts. [00:12:14] I'd take an online course, do like three courses and that they had mentioned something that online course. And then I like Scatterbrain and go now I need to go down there. I just felt like I had to keep on going down these rabbit holes. Um, I had no focus and yeah, that was really tough. But then, you know, Data started knocking ideas. I dreamed I did change my life like the courses that that that Kyle McHugh had put together, specifically this module entirely on mindset. And that changed my my life completely. Like the entire concept of growth mindset, fixed mindset. I started reading books like like Carroll do X mindset and grit, the power of habit, smarter, faster, better, like all these amazing books Drive by Daniel Pink. And these books kind of just could the ideas and these books took hold of my mind and just like combined in these interesting ways and just replaced whatever old belief system I had with this with this new better one that was serving me much more better than the previous belief system, if that makes sense. [00:13:12] And then from there, I just yeah, from there I was just like, man, like, what have you been doing with my life and how much time do you spend on each of the tasks that you have on your paper that the list of things that you do for the week. Right, like Monday through weekend, like each one of them is for like forty five minutes an hour each. Or how do you how many do you put in that, you know, in a day I guess. [00:13:34] Yeah. So this, this is only this is not related to my full time job stuff. This is just like all the extra external stuff that I do. Yeah. [00:13:42] Yeah. So all this stuff like I'll put no more than four things to do on a day. I'll do for it because it's great because that's the, that's the only number of lines this thing has. It's got four, four lines. So it's like the four big things I got to get done right. So that will take probably all of these four tasks, like for example, like my Monday task was Will I interviewed somebody called John. I had a couple. So I had to create his entire what I call a guest package. And the guest package is the profile had to edit and mix the episode, had to transcribe it, annotate, annotate it, and then create like a header image. So I did that on Monday, plus a couple of other things, and that that entire gas packaging probably takes about two to three hours to do. And then I had a couple other tasks, like I had to, uh, to make sure I posted something for Analytics IQ because they, you know, they sponsored an episode and then I had to update my, um, calendar and convert on board. So I'll typically I'll take this thing and then I'll fill out Monday, Tuesday, Wednesday, just case, you know, I don't know what the rest of the weekend look like. Like, yeah. Thursday, Friday, weekend, it's a little bit too far out. And then come Wednesday, I'll reassess what I did Monday, Tuesday, and then also what's left. Over the need to redistribute wealth. So I need to work on that really helps. [00:15:03] So so the money is kind of like the guide for the rest of the day, the rest of the week, I guess. [00:15:09] Yeah. Yeah. So I'll fill this out on Sunday, like later today after we're done with this column taking, you know, my wife and baby are going to the art museum, OK? They're doing some baby things. So we'll go do that and come home, baby. Go to sleep and then I'll print out another one of these pages and I'll plan out what I want the next few days to look like. And then Wednesday, I'll reassess and see. OK, what what else do I need to do? Yeah, yeah. That's pretty much like Sundays are the days that I used to plan out the rest of my week. Gotcha. Yeah. Well, he's just a, you know, a few days in advance, plus a few main things that I need to, to focus on. [00:15:47] So NLP, how do you actually manage control, track and document all of what you're doing? [00:15:53] Just this thing right here. Right. This is it. Like I don't have like metrics for myself that I but I have because for me most of these tasks are binary. I either get to them or I don't get to them or I finish them or I don't finish them. And everything I do is really just, I guess, time bound. Right. Like, for example, like I've got from now until the end of the first week in May, I have like fourteen or fifteen interviews to do 16 maybe. So for that I need to, to think about, OK, which people or am I interviewing, who are the authors and who are the non authors. Because the non authors are typically Data scientists that that I can kind of just riff off. I don't need to prepare that much for um so I have to list here, my authors here, my non authors. [00:16:36] I know that the reason I'm asking is because, you know, in many ways I do similar things. [00:16:43] But, you know, I'd like to also track what I've done and what I had to change and how I changed it. So I have like a simple spreadsheet where I have my tasks and then I have a date. And then, of course, if I play it, I'll keep the existing date. It's kind of a geeky approach, but, you know, over time, you start to be able to track which task may have taken longer or less or more. [00:17:05] Yeah, yeah, I said so. I'm very analog. Like I track myself mostly just by paper. Like I have this other notebook here, which is a much larger notebook. And so this notebook. [00:17:17] There you go. [00:17:19] It's no more me thinking through like like you could see here, like this is me. This is the beginning of February, thinking through what I need to do for the next few weeks. [00:17:30] Yeah, that's I see that one. [00:17:34] This is my niece is that is a tiny little one. Yeah. But it goes everywhere I go. It always made me smile and and whenever you're sitting at a cafe or anywhere else, all the ideas go in there. And once a week I just kind of go through all the ideas, organize them, structure them. Yeah. I mean my brain is just working constantly. [00:17:59] Yeah. I've got the same same concept of a handheld notebook that is it's always by my door next to like my keys and my sunglasses. So when I go out the house like I'm always grabbing it. But that this idea of always like having your mind working man, that's it's it's good to take a break and let your mind wander. Right. [00:18:20] That is that's called music to me. Yeah, that's music. I put my headphones on and I just listen to music the only way. Well, there's a second way, which is to drive a car. Yeah. It's called driving with loud music and just cross it out because my mind has to focus on other things. [00:18:36] Yeah. So that that entire network of your brain, that's like the default mode network of the brain that kind of has these ideas bouncing around and colliding with each other. How are you doing. [00:18:47] Yeah, I want to add something here. I was just, um, Stephen Colbert interviewing Jane Fonda and they were talking about how how laughter also does this how like you let your guard down and then it like it lets the ideas come in. Yeah. And so. [00:19:06] No, no, that's great. That's interesting. I don't know if you I don't know if you actually listen to my podcast on @TheArtistsOfDataScience podcast, but you should listen to my interviews that are awesome. But there was a I interviewed somebody maybe I released the episode two weeks ago and it was with Dr. Sutton Bokhara and she is a GELO theologist. And what Ajello theologist does is they study laughter and they study things like that, laughter and humor. And so we had a huge conversation about this, about how laughter is beneficial and how humor is needed in our lives and stuff. So check it out. [00:19:43] So I am looking into this right now. Yeah, yeah. I think you might enjoy that one. She's awesome. She's a really smart individual, does some awesome work, some awesome research as well. What is what is your name again. Here, I'll give you a link. I'll give you a link. To the podcast interview I did with her, she can listen to my podcast. OK, forcing my something. [00:20:07] I would also like to know her name. I kind of get good science, so I'm actually like you check it out. [00:20:15] It's hard to hard to say. Sadhana. Yeah, yeah. This is perfect. Yeah. So, yeah. Awesome. So cognitive science is awesome. I wanted to study that at one point, but just never got to. Yeah. So anyways guys does this does great. [00:20:34] I really enjoyed I enjoy talking about this type of stuff, but if anybody has questions, go for it. If not, we could continue just shooting the shit that's completely OK with me. [00:20:43] I have a quick question. I have this database at work. It's an Excel and it's a complete database. And basically the purpose of the database that they have is just like if they have a complaint that comes in an email, a copy, that entire complaint and put it in an excel so they might have the date and the complaint all in one cell, and then they will have maybe the name off of maybe the name of the person who took the complaint down or the customer service person. So it's not structured or anything like that. It's just a place to keep information about a complaint. So right now, I'm trying to analyze that complaint and I'm thinking, looks like I need to work on the Excel first, meaning I have to I have to break down that information to different columns like they need. What is the complaint about and categorize the company, whether it's shipping problem or customer unhappiness and things like that. And then the complaint is in what they call them. So I need to break those columns. Is that the way to go first? Yeah. [00:21:49] And then definitely what I would do, I would focus first on trying to get into a tabular structure. [00:21:56] That's what I thought. Yeah. So, yeah. Then before I start analyzing it, whether it's better analyzing it via Excel or maybe using a Jupiter book to analyze it. So yeah, that's what I was thinking. I just want to know if I'm doing it the right way, the right method, because right now the Excel is just a placeholder for complaints that nothing is broken down properly. [00:22:14] Is there a structure to the X? So you're saying do you have an Excel sheet and then each row in the Excel sheet, what, there's only one so that one cell will have the person's name, the complaint and the date and is. Yeah. Inexistent order or no. [00:22:30] I want to say the earlier years now what there are there is no consistency, but I guess when they were traveling from twenty eighteen going towards twenty, twenty one, I see some, some, some shape is taking form. [00:22:42] But again all the data and the complaint is all in one cell with the person's name the day the complaint all in one cell and then the other the other column. Maybe it's like a resolved complaint resolved and then the other. So it's like resolved at this meeting. So yeah. So it's very it's not I mean it's yeah. It's just a placeholder basically at the moment. So right now I'm trying to break it all down, taking that one cell with the person's name date and breaking it up to three different cells. [00:23:18] So I just want to make sure if I'm going the right way, that's how I would proceed as well as cauterizes hand up. So go for it or. [00:23:25] Yeah, go ahead. This is a very typical I've just wondered how many intersection or how many roads are we talking about? Data now? [00:23:34] Right now I think they have about maybe four or five. Oh, you mean the number of rows. Yes, it's it's not much I'd say. I'd say it's about one hundred and sixty seven to one hundred eighty at the moment. Yeah. It's small. It's small. Yeah. [00:23:50] Ok, it's done like what you started doing. This is how I would go about doing it because this is a very common problem and you know, a lot of organizations and also you're just mentioning that over the time that kind of all we need to actually to actually try and find because technically it's just a column. [00:24:09] But what I would recommend now, given that it's not that many lines, there's two things you want to cheat. One is, of course, to clean the data already up. But what do you mean by that? [00:24:20] By cleaning, which is basically what you're saying, you're going to split up a cell brigade, break it into different things. And the way that I normally do that, the approach that I take is that I just start with the first line, break it up. And as you go along, you will start to see trends. You break up, OK, so you will get new columns, you will start to replace you will start to realize you're missing columns, etc.. [00:24:44] Now, when you're finished with the task, the job that you're doing has a value. OK, so you want to make sure that you can create some sort of a new input form so that this can be implemented in the organization. OK, because. You don't want to start cleaning a second round, third round of fourth round and the fifth. So to avoid that, you want to standardize that form. OK, Roger, normally what I do is that if you working in on a network and you still continue, since it's not that many, I suggest you just stay with what you do. [00:25:20] You have the input which is occurring on a daily basis. I told your colleagues. [00:25:25] So when do you, the users or the people are using the form inputting? They will use the same more add in information in the columns. OK, but because Excel is kind of hard to work with multiple people at the same time, what you do is to take your each of your column headings. You create a new type in exile instead of keeping the columns going at the top. You now have the means or else you have now created an input form which you can print and leave at the desk of the people dealing with it so they can use it manually if they have to. Or you can then create an input sheet. And the beauty of that is that whoever is working it just keep making a new she knew she knew sheet input and then they send it to somebody that puts that copy and paste into because you just used the the conversion thing where you copied from zero to a column that also simplifies and in your approach that if you add a new column, you will add it up to at the end. OK, so that means that old forms, you will then be very quickly able to see if you're missing the new fields when you receive them. [00:26:29] Ok, so this is how I would approach it. But the cleaning process and it is a cleaning process, what you do now is really starting to get a very good picture of the data you have. Now, once you've done that, I would go to your key stakeholders. You make a simple I normally use a pivot table or some sort of just to bring it out, but then sit down with the key stakeholders and ask them, what more do you need? [00:26:54] Ok, OK. Because now you are actually starting to interact with the receiver. So that was used in the Data. OK, they may realize, OK, you have covered a lot of what they do based on the information you already have and analysis you've done. You're not covered probably 80, 90 percent of the expectation for people. But when you visualize it and you get it presented in this way, the receivers will start to see, I need this, I would like to see this, etc. Then you can incorporate, discuss, don't incorporate, just incorporate challenge because I'm a true believer. You don't want to collect too much. You want to minimize the data that comes in and so that it's strong required right now, once that's back in picture now you go back, you update your original, you make a final one and then you do three months without any changes, get people used to using the Data, the input, etc., because you're going to have to turn the conversation around to get used to it. And that is a really hard part, OK, which means that you have to follow up on every delivery and that say, why did you put this? And this is not supposed to be here. This is supposed to here. [00:28:00] This is that's the training interaction to make sure that the organization now is aligned with the input and what you're looking at. I see. I hope that can be helpful. [00:28:10] And if you want any help on that, I have some old forms laying around samples, etc. I'll be able to do. [00:28:19] Yeah, I know if you have a little sample from me, I'd like to take a look at it. Yeah, that'll be great. Yeah, right now I think what you're saying is great because there's a super helpful. So while I'm clean, actually, this is what I'm doing. I took the Excel and I converted into CSB file and then I'm changing in the CSB file. So I'm thinking like DSV and Excel are the same. Right, just as system is great for analyzing data in Python and all that stuff. So should I not be changing in kesby file? [00:28:48] Maybe just I would just use your excel because you know, your python and all the other things on 168 lines. I don't personally you can practice and do that, but do that after Cloete actually going to get proper results from your analysis. But you can use Excel now to just simplify it, because what you're looking for now is just to get a proper overview of what you have and what you would like. And then, of course, you can go on to your key stakeholders, whatever, to then have discussions around what else would you like? How do you want? Then you can start to become really fancy and start making all the graphs and blah, blah, blah, blah, because classifications are key. [00:29:33] Ok, awesome. So the first step is Excel Cleantech, so that I currently have. Right. Break it all down into meaningful data I guess Data name and complained and all that stuff. Right. And then, and then the next step would be to maybe create a input form that has data that I cleaned up and broken up and all that stuff, and then if possible, maybe have a manual sheet that emulate this form, correct? Yeah. So that. [00:30:00] Yeah. Which basically is what the people would use when they're dealing with the complaint. So they're actually just filling in by hand and then afterwards they can put it into the Excel. It just that it's better to have it's like using paper. Paper is great. Yeah. And the handwriting is quick. Yeah. You have to type it in but from the sound so that one hundred and sixty eight complaints over two years, those are not big numbers. [00:30:24] This can be one hundred and sixty a complaint over one year. So every year that's like a hundred sixty, eight hundred and fifty hundred sixty complaints. It's still the small. [00:30:34] Yeah it's not, it's not big numbers but I believe that this is something that if you want to think the bigger picture in my mind how I would be thinking is that yeah, one thing is the complaints section now. [00:30:47] But is there any relevance to other things you track because you now want to start now you're getting into the pipeline and thinking etc? Can you link it up to customer data? Can you it up to sales? Data can be linked up to other things, but that is the next stage. The first stage now is just to clean, get organized, get the data so that you have a proper database for following up. [00:31:10] And I would just converted you on the results. [00:31:13] So let's not just converted to a normal database and then you can do everything you want from that database. [00:31:19] Yeah, it has. I'm cleaning this file. I can see that trend like what you say though. I do see a few things in there that's it's repeating what I see a lot of shipping problems, a lot like a lot. And then I see another category where you're asking for product sheets and I'm beginning to see it has I'm gaining this as a you know, this is super great. If I have it all cleaned up, analyzed, I mean, maybe I'll just do it on Excel for now, analyze it and show them some stuff. Say, hey, guys, you got a problem over here. You know that shipping needs to be trained or something because there's a whole lot of issues from, you know, a lot of stuff. So, yeah. So I what they don't know I'm doing this project. I want to showcase it to them. I'm kind of like I said, I wanted to get a Data project. Right. And I, I thought, I'll start with based on everyone saying we'll start with the low hanging fruit. Right. And I know this customer complaint Data. It's it's it's it's really good because you get to see at the end of the day, if the customers were going to be paying you the money. Right. And you get your bottom line revenue whatsoever. So I thought, let me just attack the customer complaint database and they don't know I'm doing it. I'm doing it on my own at my own weekends. Why would nice to kind of cleaning up and trying to showcase it to them eventually? Maybe I'm giving myself about thirty days to do this so that I could do it. [00:32:37] And you'll be able to do this in an evening. [00:32:43] Ok. Yeah. [00:32:44] Yeah but no but I know there's more to what also I would just say is that in your mind you already now started to see some predict, some predictions or things are coming and going. Don't spend too much time thinking about those, just make notes. Because one of the things that you will find once you have cleaned up to date and summarized and made some simple pivot tables, you will actually get those confirmations. Most likely, what you should think about is that, you know, in any data set, you will have peaks like large. The first question you're going to get from your key stakeholders is that if there's a lot in shipping, they want to know, is there a breakdown of the shipping problem? OK. OK, if you have a huge problem of this or some other categories, they will want to break that down. Somehow you don't give anything to them. You don't control people's expectations. That's the key when it comes to delivering to management, et cetera. But the thing is, is that once you have that, you will also then start to have to start breaking those down. And that is just classifications, right and right. Well, I deal with classifications in an organization how to track and improve. Normally it's the same principle. I start out with five or six key items, which is just take you know, take this, take this, take this whether the combination doesn't matter. But then I always have the quote unquote other category. If it's not possible to categorize, they can click other. But if they do that, they have to put it into something, a text in an input in the database you have a tegmark and if they choose other, they have to type something because you need to start tracking the other. Yeah, exactly. In the same principal way after the first can. You're not going to bother when you start getting a lot of others. That is a strong indication that you need to break down even further. [00:34:39] Ok, gotcha. [00:34:40] Ok, so this is just some ways that I'm working with. Any time I'm on LinkedIn I don't you have any questions? [00:34:48] Feel free to guess. I'm going to thank you very much. [00:34:52] Let's hear from Christophe here, too. Looks like he has some input. [00:34:55] I believe so, because I'm I'm highly interested in NLP right now. [00:35:02] And what you're saying, it sounds to me like a clear NLP problem. Yeah. And I don't want to go too deep on it, but we've got those two types of data. There is unstructured data, unstructured data structure. Data is, you know, like spreadsheet. But your spreadsheet is unstructured because you've got everything in one set. And this is a text, if I understand it, is correct. So this is basically an NLP task, which is called information extraction. And if you know a little bit of python, there is a few libraries that could be completely helpful with it. We already talked like two or three weeks ago about spacy, which I love. And I I do things like this because this is perfect. Like you need names, you need to like numbers, you need some text. You need to find Shiping in this text and space. He would be definitely helpful. I know there is another list which is called NLP K for Natural Language Tool Kit, but I haven't used it at all. So I but I'm really sure that it would also go. So if you want, you can also connect with me on this. [00:36:30] You see my name and I can, I can help. [00:36:35] I believe I could help with Python and stuff like that. [00:36:39] So Spacy can help you with this unstructured data to kind of get it into a structured format. [00:36:45] So spacy helps basically with NLP. So almost everything that you want to do with text, you can do it with spacy. And this is like I also love spacy because it forced me to understand a lot about natural language processing, because in spacy that you can you can strip every text, every sentence into single tokens and to each door. Can you get that information? It's like what part of speech, how a sentence is. This is called dependency passing. So this is everything I learned in the last few weeks because I started using spacy. And it's like they provide you with so many information about every sentence, about every word that you will find everything like you've got probably like ten different ways to to solve their problems with only spacy. If you if you want to go deeper and you don't even have to go to deep in to deep in theory. But since you said that you want to showcase a little bit, then known a bit of theory about NLP and information instruction, I think it would be great. [00:37:59] Ok, yeah, I'll definitely pick up on the offer to I don't know how to pronounce your name, how to pronounce your name. [00:38:07] I'm just not OK. [00:38:10] God, I'm fine. And so many people call me Chris, like for Christopher. [00:38:19] Pronounced, yeah, yeah, OK, thank you very much. I appreciate that. Yeah, I didn't think about NLP too, but then I thought I told myself this is great because it can be two different projects, really. But one is the method of the Excel method, which is very easy for my company because they don't have any Data AIs stuff going on. So this might be a great way to kind of push them into the direction of Data just kind of show them, you know, you can get a lot of good information, a lot of value in your customer complaints. And they would be very receptive to that because they use Google sheets for everything and everything. But then again, I didn't think about the NLP part two, but I think simply since if I'm going to be the only person doing the NLP stuff, then what if I leave and then it becomes wasted, I guess. But it's good for me because I get to learn NLP and how to structure everything, like the example of Christopher Spacy, that they'll be good on my resume too, so I can go to different directions if I want to. But to. Yeah. So and yeah that's, that's what is missing. Or you had to. You want to say something. [00:39:24] Yeah. This is so cool. This is where my limitation I've worked on Excel Mind. [00:39:31] That's the reason I'm joined is because I've been growing, they've been breathing, eating itself for the past many, many years. But the what Christopher was saying about the NLP approach to me, this is a perfect example. Like you said, you want to test, why not use this? You do it the manual away and then use the other way to kind of match up and see how that compares to so many things you can learn from this. One is how much time does it take to do it the exact way compared to doing the NLP way? And that gives you a very good picture. And also, when you're talking to the key stakeholders afterwards, you know what I did in the manual way and this is how much time I spent now that science is getting you into this old game, Gambo and I saved this much time. So honestly, if you want to pay for my courses, do something for me. I can really make this a lot more efficient, not just with this, but with all the other databases to get your pitch. OK, you said. [00:40:33] Yeah, exactly. Now, that's right. Yeah, that's so true because I had a performance review on Friday from my company, so I kind of eluded to them. You know, my goal is to become a data scientist. And I think there's a lot of stuff that you have at the company which we can improve. And I kind of gave them some of the ideas that I had. So they let's see what they were like. Oh, really? What a scientist. I so I kind of had to give them a brief overview, but I think I'm going back to coming back to what I'm saying is I think the customer complaint database is such a great example. Now that I'm talking to all of you, it's giving me so much more ideas about. Yeah, like Excel, where you can do it this way. That and then now that pushed off it and NLP, I'm thinking, whoa, this is a great story to tell, you know. So yeah, this is amazing. [00:41:25] So I would say it's a good case for them to first if you can say, OK, you guys are collecting data in a way that probably isn't best to look at. Look at how I'm collecting, look at what I've done once I've structured this data out using these techniques. If we start collecting data in this way, here's what we can do going forward, which is using NLP to better analyze what the customers are doing. So you really essentially with this mini project, is your kind of showing a lifecycle and saying, you know what, here's how we can make these things easier. If you think this is cool, this is what we can do to make it easier. It's going to be some change management at that point because the marketing data Data so much that be difficult to to get them to start. [00:42:07] Yeah, they don't know about it yet. I'm just doing it on the side. I downloaded the database on my own system and I know and I know it's not, it's not nothing secret, nothing secretive. And so I thought that is a safe way to play the game. So I thought, I'll just download the customer complaints is just complaints. There's no numbers, nothing. [00:42:25] I'm I'm curious, though, what's where you downloading this from? [00:42:28] We have it all in Google Sheets. Right? So it's like it's in that they have this customer service email and they have everything set up in Google Sheets I just downloaded it has an Excel file into my computer. So that was it. Nothing. Nothing big. So I just made a copy of it. So I thought, let me just play around and see what it is. As I was, I downloaded his Excel file, then I converted to see me. Then I'm thinking I'm thinking it up in my mind. I was thinking I shouldn't be doing it on off on a T as V. Maybe I should do it on Excel first. Yeah. Because I'm learning as I'm going through this, this is good because this is like from scratch. Everything's unstructured and I'm also starting from zero to 10. So zero meaning now here I have an Excel file. I know I shouldn't be doing that, should be doing it on Excel. So this is a good learning is. For me right now, I need to put this on the next hour, because this is what they are looking at, right? So I kind of CSFI is just a text file, right? [00:43:28] It's just usually just a text representation of the tabular data. That's essentially all it is. So Excel that you just look at it with a nice visual images. Absolutely. Yeah. So a CSFI that's extra work for yourself. Just work, because you can take that raw data, you know, do whatever cleaning torse and then export that as a CSFI and put it into a Data frame and pandas and go nuts exactly what you want. [00:43:55] Yeah, yeah. Yeah. So it's good because now I mean I know I made a mistake right now I have to go back so but is good at least now I know the thought process so I'll just work through. But yeah. I'll share with you guys as I go along. [00:44:07] Just a quick comment. Yeah. This is what I do. Doesn't matter. I mean the amount of data that's about the size of the data. What I normally do, I pick up always for any type of data, but there's like a million transactions, a two million, ten million whatever. I will pick up the first one thousand, the first five hundred and use Excel to analyze because I am very efficient in Excel. [00:44:33] I'm not good on database and SQL or anything like that. So for me the Excel I'll start with 500 likes and then I work in that once I built it and I understand and structure the data all my quote unquote formulas and placed technically then it's just to dump all the data and whether it's a CSC file or whatever, because the formula is a copy and paste the formulas right quick. [00:44:58] Right. So, so, so this is what I do. So if you have other databases, instead of downloading the whole thing for Excel analysis, just pick up the first two 400 lines and that would be a very good picture of what's in the data normally. [00:45:12] Ok, question for you then or so. This Excel, you have to have one tab for twenty twenty one point twenty five for twenty eighteen. Is it just better for me to take all three years and put it all in one big. [00:45:26] Ok, let me, let me let me give you one thing on that one. I normally receive Data for audits that I perform. They normally give it by year. [00:45:36] The first thing I do is I consolidate. I take tab 2018. [00:45:41] I just had a new column, I call it year twenty eighteen. OK, then I copy and paste the value all the way down. Then I can copy and paste about into a consolidation sheet. So your column is now one that I've created and it will contain the year and then I just copy and paste, copy and paste. OK, this is how I do. And then I have everything in one sheet that I can run the tables that can go to and from the pivot tables I could generate new data files on specific items or etc.. So but like I said, this is how I work. [00:46:15] And right now, for example, I'm working on a file that consists of it's close to 900000 transaction lines, but it comes from about four to four, six different parts. I consolidated the main and the limitation of one type of excel. It's one million. [00:46:33] Ok, so I should probably consolidate them. [00:46:36] Ok, so consolidated one is talking about smaller data, but with the access limitation. Now, if I'm not able to consolidate because of the limitation of one part, which is one million transactions that I will keep the year because you can run pivot tables and as long as the structured to say, I just make sure to say that I can make another tabulation or I some summarize across the taps, OK, this is another way of cheating. [00:47:03] So for example, I bought the Data that covered, for example, twenty years. So what I did, I created a new tab called START and then I created a tab that's called End and I put all the Data files in between. Then I create a summary tab. I just copy the first one as long as they're all the same. The sheets, both in length, take the longest and waited. As long as they're on the same. You can create a folder in Excel to summarize across the top, but you do it from start to end top. [00:47:33] Oh I see. Yeah, that's complicated for me. But no, no, it's not complicated because what you can do that is that you can move the end sheet and the start sheet and whatever is between those two sheets will be summarized. [00:47:44] Oh ok. [00:47:45] I COVID that's just a way of getting around. So OK, so for example, if you have a summary summary sheet, it says the formula will just stay. That summarizes from start salay one end or sheet start a one, two and a one, and whatever else you have in between will be summarized into the total. [00:48:08] So I'll make a small example. Yeah. And then I can send you this to show you how the principle you can use that to, like I said, excel. Of course, in Python, another structure, you're probably much better to use. [00:48:24] So I'd say this just like use Excel to get tactile rate, just to get a feel of what is in the Data. But everything that Tor is talking about you, if you were trying to become a data scientist, you start thinking about how can I write this out programmatically? Because everything he's talking about, you can do with code. OK, this method might work well by hand for one hundred sixty eight rows or whatever, but you need to scale up and yes, you do it through code. So Excel is great. I use it as well to get tactile because there's certain ways you can manipulate a Excel sheet that you can't manipulate. A pattern is Data frame like great. Oh it's it's nice to, to use Excel, to get tactile to like, you know, kind of get your hands really. It's the only way I feel like I can get my hands on the Data is through Excel. But as you're doing that, add another column. Right. And this column is just for you to think, OK, this thing that I'm doing with this particular row of Data in pseudocode, what would this look like? OK, and just think about it in pseudocode and say, OK. And in order for me to to in this process that I just did manually by hand, I would write it as a python function. What would it have to have? Gotcha. OK, you're training yourself to think both ways. Yeah. Yeah, exactly. OK, so OK. [00:49:43] Thanks for the suggestion. I really appreciate it. [00:49:46] Yeah, definitely. So if anybody has any questions you can take another question. Otherwise we can go ahead and wrap up with some Jill Dhiab questions. He just came on. [00:49:55] I, yeah, I put this in, in the chat. So I, um, I, I, I've been applying like Data analyst positions and just seeing that, like a lot of companies, especially smaller companies, like want to use Excel, they want to experience and excel. [00:50:15] And that's like I don't know what's less scary for a lot of people because it's more familiar. And I realize, like, I have not really used Excel much besides, like, I don't know, calculating means and standard deviations, like really basic stuff. So I actually want to work for where I am. I want to learn more about how to use Excel, like how to do Data science with Excel. And I just wonder if anyone has a suggestion of where to look or like. Yes, like the key function to know or anything like that. [00:50:48] A couple of places I can refer you to first place is David Langer. He's got a awesome course that's all about Excel for Data science. So take a look at David Links. Course he's Dave on Data. I think that's the website. Daylon Data dot com. Or you can find Dave on Data in and LinkedIn. Also just go to Udemy and you just you can like I just pulled up Microsoft Excel on Udemy and let me just show you what comes up on you dummy here. There are eight thousand free courses on Microsoft Excel in UW and a lot of them have some great resources. So I probably look somewhere here and these are free. I filtered by free courses. Awesome. So definitely check those out. And then last point I want to make is you could still do all your work in Python or R, but just export the results to excel if that's what your stakeholders are comfortable using, right. [00:51:42] Yeah, I don't know. [00:51:45] I had had an interview first interview recently and and I asked him about the interview process and she said that a later stage would be a timed Excel test. [00:51:56] Ok, better learn it I guess. [00:51:58] Yeah. So definitely check out some of these free resources on on YouTube. [00:52:02] Yeah. [00:52:04] But I, I totally agree with like wanting to like persuade any, any employer that they should really expand beyond Excel. [00:52:16] I mean you don't really know what the situation is if you're just interviewing. Right. So there's there's Excel could still get you pretty far. Right. And it's it's used a lot by the overwhelming majority of analytics I think are done using Excel. Right. And you could still do a lot of statistics and regression and things like that in Excel. So it's got its place in the business world for sure. But I mean, if you only do like, you know, if your ultimate end goal is to build machine learning models that get integrated into much larger decision systems, then eventually you want to start learning Python or or what have you. But some companies aren't necessarily going there. It all depends on the analytic maturity of the company and where it is that they are trying to to move towards, because not everybody needs to be a stage five mature company. That's Data first rate. Um, you know, even though there's talk about Tom Davenports analytic maturity scale that goes from level one to level five, level five being Facebook, Amazon, Netflix and things like that, they're Data for first organizations. Level one being companies that maybe they have Data like an Excel sheets and don't do anything with it, right. So I don't think necessarily every company needs to aspire to get to level five. Maybe they might top off at like a level 2.5 or three point five somewhere in there. Just all depends, I think. [00:53:34] Yeah, I guess just a quick question that you made me think of is does does Excel have like a forecasting formula? [00:53:43] I think it does, yeah. Um, there's there's a wide variety of functions in Excel for statistics. It's got some pretty sophisticated statistical methods built into it. And I think that might be a forecasting one. Awesome. Any other questions? Cool. Thanks for, uh, for joining us. I know Daylight Savings might have missed some some time zones up for some people. So apologies to anybody who missed this because of that. But I guess going forward, it'll be an hour earlier for those people in Europe and everybody else in the states and North America. It's still the same time for you guys, I believe. Um, so, yeah. Cool, guys. Well, thanks for hanging out. Uh, great insights. Thanks for for providing all the wonderful, uh, insights there for today. I'm looking forward to see what happens from that. Um. Well, somebody just entered, so ushe has entered. Um, let's let's help us out real quick and to tell you pi day three point one for. Yes, it is. Wow. Important day. Yeah. Right on Azure. Welcome to Office Hours. I'm just a heads up. We're about to wrap up because of daylight saving time in North America. We've already been at this for an hour. But if you've got a question, I'm happy to help. I think Asia is on mute. [00:54:51] So I'm going to assume that she does not have a question, does not look like it. All right. Well, our guys will take care, have a good rest of the week. Look forward to seeing you guys again next week. And also, don't forget on Friday, don't forget to check out the podcast if you haven't already. I've got an interview releasing this Friday with Evan Polet. We talk about the science of a successful interview. So he wrote a book called Cracking the Interview Code, which is all about essentially psychology and behavioral psychology. I was like the interview process. I think it's going to be really interesting. I will warn, though, that my audio is messed up in that episode because I was speaking into the microphone, but I did not set my settings to have my voice input through the microphone. So it was it was not clear. But Evan's voice is amazing and clear. So check that out. This this coming Friday, I Sciascia joined again. So I'll give you an opportunity to ask a question if she has a question. If not, then I'll go ahead and wrap up. Sasha, if you have a question, go for it. If not, we're going to end today's session. [00:55:54] Sorry, I didn't catch anything. I think my network is a bit shaky today. Oh yeah. [00:55:58] So so Daylight Savings Time happened and, um, we actually started an hour ago because it's already 12:00 p.m. Central Time. So we're about to wrap up. But if you have a question, we're happy to help. [00:56:13] So sorry. I think I'm going to give up. I had it started an hour ago. That's what I had. Then it broke off. I think my network is a bit shaky. OK, well, do you have any questions? No, but OK, no doubt. I'll join up an hour earlier next. Yeah. [00:56:27] Yeah. Starting next week because of Daylight Savings Time. I think we'll take care. Everybody have a good rest of your weekend. We'll see you around. Take care. Remember, you've got one life on this planet when I try to do something big. Everybody, cheers. Thank you. Bye, everyone.