The Artists of Data Science -TWO YEAR ANNIVERSARY SPECIAL!.mp3

Harpreet: [00:00:08] What's up, everybody? Welcome. Welcome to the State of Science. Happy hour. We're especially special happy hour, indeed. It is the two year anniversary of the podcast. Two years ago, on April eight, 2020, we released the first batch of episodes. I think it was like 12 episodes all at once, and just that week was going to be the first in that cohort of episodes. I feel like so long and here we are two years later. What are the stats? What have we done in two years? Well, about 158,000 downloads. That's not too bad. Just coming on screens this month was actually the biggest download month. I don't know what was happening, but I had $25,000 in one month just this month. Last last month was like 12,000 a month before that was 7000. So so this is going on, man. Thank you all for tuning in. 237 published episodes, 268 hours of published content and 76 happy hour sessions. That damn. That's crazy. Thank you all so much for being here. Two years, not even for the last two months to know if I lose motivation.

Harpreet: [00:01:15] But all y'all have been here then can come serve Russell and Patrice and everybody listening at home. And if it wasn't for y'all, I probably would not have gone on for this long. So thank you all so much for being part of the data science. Two years. Two years. It's awesome. Hopefully a chance to tune in to the episode release today I was with. It was all about creativity and it was a Natalie and Nixon Dibs that we did. She wrote a book called Creativity. Leap was actually livestreamed a few months ago. So if you're looking on a YouTube video, it's a previously livestream video, so go and check that out on our podcast. So hopefully I'll get a chance to tune into [00:02:00] that. But yeah, thank you all for being here so much. Two years now, two years ago, I'm curious, what was like life for you? Like two years ago? I started Kinsey two years ago. What are you up to? And if you reflect back on those last two years camp, have you done everything that you wanted to get done?

Speaker2: [00:02:21] I don't know if I did everything that I wanted to get done, but I did a whole lot of stuff that I didn't even realize that I wanted to do, and it sort of worked out. I'm really happy with how the last two years have gone. I think around two years ago, or like just before two years ago, I'd quit a job that I wasn't really liking to pursue a significantly higher risk job and career. And there's a lot of question marks, you know, like there was not as much financial security. I said a lot of student loan debt, which I guess I still kind of do. But, you know, there are a lot of incredible things that happen. One of them, namely being a part of communities like this, you know, meeting so many people, which inevitably led to friendships, which also inevitably led to opportunities both professionally and personally. And I couldn't be more grateful. And I think that's one of the major things that we share is our love and value for just community creation in general and just building something that's that's bigger than ourselves.

Harpreet: [00:03:25] I couldn't agree more. Thank you so much. Shout out to Patrice Johnson and Stephanie Ireland in the building. You guys haven't seen me here before, but I'm so happy you're here. You guys mind sharing two years ago, which you've been up to compared to now? Let's go to Auntie and then maybe Patrice or Stephanie if you have to jump in. But yeah. Tell me, what was life like two years ago compared to today?

Speaker3: [00:03:52] Yeah. Everything's changed. I was barely. Starting. The whole. Journey [00:04:00] into data. Two years back. So. I don't know.

Harpreet: [00:04:10] And you still on that journey of lifelong dreaming? Yeah. Never quite. Never quite reached a destination here. Tom, let's hear from you. Let's go to Ben. Let's go. Let's go to Russell. By the way, all y'all joining in, if you have questions or have dropped them in the comment section or here in the chat, I will add you to the queue. More than happy to take your questions.

Speaker4: [00:04:32] Sure. So I'm not sure you've heard of it, but this little pandemic named COVID hit the world, and it forced me to go online more, which I was already trying to do. And I got shocked by how many people wanted to be mentored by me. And after a while, I just said, okay, I'm going to accept this. And that got me closer to Danny. And this super look good looking Indian guy that was raised in the States named Harpreet Sahota, who gives to our community more than anyone I've met. Honestly. Harpreet I'm not just saying this because it's your second anniversary show. Everybody sees me praising you a lot with the Chris Evans torch image. But seriously, the amount of research you do for each artist of data science episode, how hard you work, I think I work hard. Now just remember Harpreet. He works harder. But then just getting more and more involved on LinkedIn and getting to know amazing people like Serge and Ben and Ken and I can keep going. Making great friends like Russell and Dare and Aunty. It's been quite a ride. In other words. I don't have office friends anymore except my dogs, which have been my best office mates today. [00:06:00] But now I have friends all over the world and people that call me Dad from all over the world. It's been amazing and I hate that it was COVID that made that happen. But in a way, thank you, COVID.

Harpreet: [00:06:16] Thank you so much, man. I appreciate that. Appreciate the kind words. Very much so. The happy hour sessions would have taken off the way they did if it wasn't for that cold wave. For a while we had like 50 people showing up every single week. They're getting crazy. I think a lot of the pandemic I absolutely agree with you in terms of making friends from all over, many of which I'll be running into in a couple of weeks at Odesk in Boston. So if you guys are going to be there, definitely look for me at the pachyderm booth. I'll be giving out stickers and socks and all that stuff and I'm looking forward to Ken's keynote address and record of Sturgis Speak. Sorry, Sturgis Talk, not speak before speaking. Also, who is it? Keith is going to be there as well, so it'll be great. It's going to be a lot of a lot of great speakers if you'll be there. Look for me then. What's going on? Two years ago, by the way, if you guys got questions, please do let me know. I'll be with to.

Speaker3: [00:07:21] It was covered. And look, my daughter was right back there that, you know, that was all of a sudden her desk showed up and she started working right behind me. And I think hated every minute of it. Maybe not every minute, but I'd say like 90% of the minutes. That's what I remember is just having her back there and her and I just kind of shared an office for three months and it was, you know, it was kind of awesome for me just and like I said, that's two years ago. That's what I remember the most is, you know, for once, because I've been working in this office by myself for ten years now and having an actual officemate for that long [00:08:00] was like the first time I'd had anybody in the office with me for that long. Continuously, in that whole ten years, I'd go on site for a week at a time. But yeah, it really strange times, but how you've kept this going and built up a bigger and bigger community and the people that you bring in here, some of the knowledge that you bring, it's kind of amazing. I mean, two years, it feels like it's been longer than that because you keep like I said, you keep getting these guests like when you got Eddie Duke. What the how how did you do that? My idol. You know, you've been doing some amazing work, I got to say. I got to hand it to you. It really impressive.

Harpreet: [00:08:45] Thank you. I appreciate that. Yeah, I had a crazy guest on the show that Annie Duke, Scott, Robert GREENE, James out. Sure. I'm sure there's many, many more missing in there. Yeah, a lot can make it, but. Yeah. Do two years ago. Do like the direction my career is taken. It's been interesting. Two years ago, I thought I wanted to be director of data science or chief data scientist or something along that path. That's pretty much where my headspace is at now. I'm like, Dude, I don't want to manage people. I don't want that type of role whatsoever. And now I've found myself in a type of role where it's actually like professional content creator and community person. Like This is my full time job is I write blogs, I talk to people, I create tutorials, I write examples, code examples on social media. This is an actual job that people pay me a lot of money for. That's insane to me. Two years. Who knows how much change is in two years? Patrice, let's hear from you, if you don't mind.

Speaker5: [00:09:58] I want to say thank [00:10:00] you for I was trying to find it to stick it in the chat here and I can do that quickly. But you wrote a post earlier this week listing out all the people you learned from on LinkedIn. And I love it when people do things like that because I'm back like probably before where you were when you wrote that post about how you joined LinkedIn and then found all these people who knew more about things related to data science than you did and found them to follow. And that's exactly. My story, except I would say I'm starting at like a few pre pre beginner steps before you. But I do love that I've found people who are happy to share what they know and connect with others. And I, I also want to say thank you to Tom, because you were one of the first people I saw do that. You were interviewed by Sarah. Please help me with her last name. All I remember without looking it up is that it starts with H and it's long. There is somebody.

Speaker4: [00:11:07] Yes. She officially goes by Sarah somebody now because nobody can pronounce that name. But I think she's soon to be married. So let's just call her Sarah The OC.

Speaker5: [00:11:17] Sarah, who interviewed you? And it was like I think for people who are starting out, it's very easy to see people who have done a ton as inaccessible. They must not have time, they must not be interested. But watching the interview that you did with her, it was videoed. And then and then she posted it to LinkedIn. So I saw it at some point after that. That conversation was so real and interesting and I appreciate it was one of the first places where I saw well, there's a lot of people who know a lot who are willing to share what they know and enjoy [00:12:00] talking with other people who either do what they do or want to learn how to do what they do.

Speaker4: [00:12:05] So I want to share a secret with you, Harpreet and then and Sturtz, they're all going to back me up on this. The secret to knowing more in this field is giving away what you know in this field, because then people want to give back to you what they know in this field. In fact, Dare George and I were having a long talk last night, and I had to really work hard, Patrice, to convince him. Now, dare. In the future, I'm going to need your help. And he was having trouble imagining that he would be able to help me and then back me up here. It's going to be a matter of at most two or three years where he's going to know something I don't and he'll be able to help me with something I haven't covered because it's not only growing fast vertically, but it's also broadening all that we do. So unless we focus on concepts, it's super hard to keep up with everything but just keep giving it away. Patrice And it makes you understand it better when you give it away too.

Speaker5: [00:13:10] I love that. But I have one more question maybe for everybody if other people, as have thoughts on this throughout this session. I am a transplant into anything at all stem related from English and I think I've been wondering things like is this a is this a STEM field thing? Is this a timing thing? Because the first time I went to school was many, many years ago. I am experiencing for the first time an environment where people are like, don't share what you know, keep it like don't talk with others, don't like don't do study groups, don't. And I'm like, I don't understand this world. Where did it come from? Because it's not the way I think of education at all. But I don't think that it's [00:14:00] that experience is rare. I think there are a lot of institutions where that's become a thing or maybe it's always been a thing. I wouldn't know because I don't come from math or science. But it's a it's a thing I'm curious about if other people have comments or thoughts.

Harpreet: [00:14:18] They want to be able to speak to the part about coming into the field with a non STEM ish background. And the other half of that question, if anybody else wants to chime in, please go ahead. And just like these raised hand icon, I think you and if anybody else has questions that we're watching on LinkedIn or on YouTube or here in the room, let me know and add to the queue.

Speaker5: [00:14:41] So I only caught part of it because I was a bad student like always. But in terms of people sharing information in STEM and especially data science and machine learning, what I find is that I feel it's more a whole marker of domains or disciplines where people have a kind of like a huge stake to some degree in hiding behind like an ivory tower. Like what I mean by that.

Harpreet: [00:15:12] Is.

Speaker5: [00:15:13] And once again, like, I think part of the part of the question. Right. So for example. Like I found that there are pockets of engineering, like web engineering, web dev, mobile, where I can talk to people and they are really, really happy to share information. And to some degree, I've also know that those are areas where you don't have like a high degree of pedigreed people. So people who went through your master's program and what I found is that a lot of the people who come from like. We kind of had a self teach. They tend to be a lot more willing to open up and share because they understand like how hard that struggle was. Whereas like I think day to day science, machine learning, I think it's, it's only I think now we're seeing a lot of people [00:16:00] from nontraditional backgrounds start to really kind of penetrate that area. And so you're going to see that open up more. But I think part of it, too, is also the fact that if you're a person who feels like your value is based on scarcity, then you you don't have an incentive to, like, share your knowledge and to help out the people around you. Whereas if you're someone who believes you grow more by sharing and exchanging, then you're going to be really happy to help people. Because I work with like staff and like principal engineers who felt very unapproachable and I've worked with other staff in principle engineers or data scientists who are like, Yeah, yeah, like, let me go debug this really stupid bug that you included in your code because you weren't paying attention.

Speaker5: [00:16:49] And they're just happy to do that and they're happy to point out links. So I think there are definitely like pockets of people who are in that scarcity mindset of like, I need to protect my turf and I need to do that by gatekeeping. But I think the cool thing about people here is that no one here is like that. So I think you can you can find your tribe. You can find your like group of people sometimes, you know, you go to a company and that culture is very much about learning and developing. And and they will show you that by the amount of educational opportunities they offer you, maybe, for example, they have internal classes that you can do. Maybe they're very happy to do hackathons, right? But sometimes the company culture won't be there and you can kind of take that as an opportunity to kind of like build that tribe around you. And that's something that, like, I have personally, that's my that's been my approach. I find really cool people. I shortcut it by like just hanging out with people Harpreet knows because whoever Harpreet knows is pretty cool. So I shortcut it that way. Or you can kind of [00:18:00] sometimes do the unfortunately kind of hard work of like building that culture where you are locally, even in small ways. Like if you see a hackathon, you can just be like, Hey bud, I know you like your Fridays, but how do you feel like spending the weekend with me on a hackathon for Great Expectations or something like that, or even starting a book club or.

Harpreet: [00:18:21] Stuff like that, right?

Speaker5: [00:18:22] So, you know, definitely there's hope for sure. I actually just found a class where like somebody asked like, is this going to be a place where like. We start talking about plagiarism and we're really careful not to share anything with each other. And the instructor went, No, no, no. Everything is going to get turned on in a lab. Because the point is, everybody is here to learn from each other, including me, from you. And I'm like, Oh, there are classes in this discipline that work like this. So that was really exciting to see. But yeah, I so yeah, thank you for sharing your perspective on that. And also, can I toss one more question into the pool? I, I would love to hear from anyone who would like to share what their what a best python learning opportunity they've had is.

Harpreet: [00:19:16] Let's go to the coast of Maine. I think they're touching on the earlier parts of your question. And then if anybody wants to go ahead and turn the Python learning opportunity thing, let's go ahead and just answer that question in the chat. So keep an eye out for that. And then if nobody else has a question, will circle back to that one, because let's hear from you. Yeah. I mean, I think one of the interesting things I found, I studied engineering and computer science at the same time. Right? So one of the strange things that I found was they kind of operate on very different wavelengths, at least at Sydney University right now. The you have the same systems, right, like a [00:20:00] class, a class forum of sorts, right. But on the engineering side no one would touch it. Mechanical engineering subjects, electronics like robotics, no one would touch it. It was all about you're in the lab, you're working on something. You look over your shoulder and talk to the guy sitting next to you, the girl sitting next to you, and ask them, Hey, how'd you get that to work? Right. And it was very collaborative and very, you know, very verbal. On the other hand, you you look at the computer science side, and if I tried to do that in a computer science lesson, they would just not answer me and just tell me, hey, go check the forums.

Harpreet: [00:20:37] Right. So they were very foreign based in terms of how they shared knowledge and how they asked for information across each other. So I did find the different classes and different subjects have different personalities among them, and I'm not sure what the root cause was for that, but basically all the engineering subjects ignored or ignored anything to do with forums and all the computer science subjects ignored anything to do with interpersonal collaboration. We're just all on the forums, and I think that'll vary from school to school, country to country, you know, communication style. So I think you're going to see a fair chunk of variation there. So yeah, that's kind of my $0.02 on it. It's just a weird experience for me transitioning because I naturally fit in more with the engineering side because I'll just invoice up and ask a question to anyone who's around me working on the same problems, right? So yeah, I think we're all going to have different comfort zones. Thank you so much, Ken. Let's hear from you then. I can go to search. If anybody has a question, go ahead. Let me know in the chat, in the comment section. But if nobody has questions, circle back to Stephanie's search. Stephanie Patricia's second question.

Speaker2: [00:21:50] I would argue that most disciplines are moving more towards a collaborative or open nature. I mean, I came before studying computer science, before starting any of these things from a [00:22:00] business background, and my interest in business was in entrepreneurship. But most universities or places don't teach entrepreneurship classes. And one of the biggest things that I found from the business classes is they're like, Oh, you have to create this competitive advantage and you have to protect it, right? And when I started doing my own version of entrepreneurship, when I started going out and actually like exploring what the world was like when you're starting your own business, it's actually kind of the opposite. You have to like build something, get feedback, see if it works. You have to share it to vet the idea and see if it's good or not. And I look at this whole domain. I look at all of learning as the way that I can validate if what I'm doing is correct is I have to put it out into the world and I have to to share it. And to be honest, if if I do that, like the odds of me getting funding for that idea go up, the odds of any of these other things go up that are in line with me producing that or making it a reality.

Speaker2: [00:22:59] The more I hide my ideas are, the more I keep them to myself, the less likely they are to happen because I probably won't have access to the resources. I'm not that smart. I'm pretty sure if I have a good idea someone else has had it and the battle is really getting to execution. So who does it first rather than who had the idea first? Because I don't think that there are very many super original ideas out there. And that's a long way to say is I personally seek out communities. I personally probably overshare about a lot of the things that I'm working on because honestly, if someone takes what I did and they do it better than I did, then something that I wanted to be in the world is already in the world and I didn't have to do any work on it. Right? So there's there's this kind of unlimited well of good ideas and things that are out there and just reinforcing that and sharing it, I would hope, makes the world a better place. As long as the things you're working on are the ideas you're sharing are mutually beneficial for a lot of people.

Harpreet: [00:24:00] Seriously? [00:24:00] Yeah, I agree 1,000% with what Kenya said. Well, I found, like in academics, it's like really skewed. Yeah, it gets really competitive. It operates more like under what Michiko said, more like scarcity. There's more. There's only so many people that can get A's and only so many that can get B's and so on. So of course everybody's competing. And because of that, like people are, have a tendency to collaborate less, but out in the field is very much different I think, you know, yeah, there is some people with still with that academic mindset and I found such folks in my company, people that work in R&D that come from a very strong science background. Some of them still have that, you know, like they're trying to control the knowledge that they have. But at least like in the data science area, I found that not to be true. We have my own team, we have like a fun and learn session that we do every couple of weeks and we share like what we've been doing, things that, you know, practices that we'd like to share. And like every month, like the greater team, hundreds of data scientists spread across many teams in the company. We have the community of practice meeting where we call it copy, where we share like also things that we're developing across several teams. So it's been a wonderful way to cross-pollinate and realize what techniques we could use and how they tie in and to like the agronomy that we're the domain knowledge that we we're trying to connect with those methods to begin with. So yeah, I think it's it you'll, you'll find that if you [00:26:00] have you work in a company and they don't have that kind of structure that they don't that they're all trying to be gatekeepers. That's not a good company to be in. And you should probably just run identify those.

Speaker5: [00:26:13] I mean, I, I 1,000% agree with that, but I, I do you have any thoughts on how you tell, like, unless you know someone already who can give you an inside perspective or have a way to, to find someone who has good, good knowledge of what it's like to actually work there. Are there other indicators, I guess other things we could look for to that are signs of if a company is one that welcomes questions or and sharing and and that or is the opposite.

Harpreet: [00:26:54] Well you you pretty much you you talk you have to find a buddy or several that work in other departments rather than you to get a perspective, you know, find those people, ask them questions of how it is working in that company. And then, you know, you have to be very particular into finding if the work is very siloed, if, you know, you ask if they know other people from other departments, if they know what they're doing, what they're up to, you know, and if there are opportunities to share that information. And by the way, they answer that. You might find out if they have a culture of sharing knowledge. Like the company I was in before this one, they were very, very siloed. And it wasn't that they were purposely siloing themselves, it just was the structure that was there, the way the departments were set up, the way they were meant to be working on only one thing. And also there was a competitive streak [00:28:00] among all the people in there. And I guess that's just the the kind of professionals they chose to hire. But I mean, that that cultural stuff is is stuff that is it's really hard to get to early on unless you kind of ask the right questions to the right people. So just make sure you spread things out, you know, because you might get a different perspective if you ask people in completely on the other end of, you know, maybe it doesn't even have to be data scientists. You work, you ask people that are engineers or marketers and you'll you'll get a different perspective.

Speaker5: [00:28:38] Yeah, thank you. I think that that's definitely a key factor is finding the people who believe that being competitive and being the best has everything to do with the way you said a culture of sharing knowledge. Because the more you share, the more you know. Back to Tom's earlier comment. So thank you.

Harpreet: [00:28:58] Thank you so much. You're speaking at Odesk. What's your topic? Adversarial robustness. Adversarial robustness. I'm definitely be at that top forward to look forward to meeting you in person as well. Yeah. Yeah. Same here. Same here. Good. Get a couple of beers there, actually. And by the way, if you are listening and you're going to be at Odesk on April 19th, which is that Tuesday 4:30 p.m. Eastern time. So we have lunch people coming together from the MLPs community, from the other envelopes community, church community and just people on LinkedIn, they're invited to get some beers. Do come and join us. This would be a good time. Ben, let's hear from you on this topic. Odessa, Europe, Odessa east in Boston.

Speaker3: [00:29:44] In the topic was the one from is it Patricia on finding cultures that work with. Maybe rephrase the topic so I can make sure I hit it.

Harpreet: [00:29:54] Yeah. Patrice.

Speaker5: [00:29:56] How do you what are some things [00:30:00] to pay attention to or consider to to help figure out if a if a company is a place that values? I think the best way is not the way I said it, but actually the way that Serge just said it that values a culture of sharing and knowledge. And I would add to that, like, sees that as compatible with being highly competitive and doing great work, not as an opposite to that.

Speaker3: [00:30:24] Yeah, I think that's a great question. I think a lot of the I agree with a lot of the things that have been said already. I think something you could ask them would be how do you process bottom up innovation versus top down? Because something that happens when you get people in an industry, they've been very successful for a reason. They get into these silos and they can have an attitude of not listening. And so if they can explain to you how they accept bottom up innovation, if someone has a good idea, then that might be a red flag if they don't have a process to incorporate that. Because I think what I've learned in industry is. Everyone can have a brilliant idea. The brand new junior person joining the team could have an absolutely brilliant idea that changes, changes a feature, changes innovation. And if you don't have a process to accept that, that is very welcoming, then that would be a big red flag.

Speaker5: [00:31:13] Thank you so much for saying that. You totally just made something quick for me.

Speaker3: [00:31:18] Oh, good.

Speaker5: [00:31:18] So, like, several years ago, before I had any thoughts of doing data science, I won an essay contest from the Japanese Business Society of Detroit to spend 12 days with red carpet treatment in Japan. Because Japan had this initiative to help teachers outside of Japan understand some very particular things about Japanese culture that would help their children of employees who go to work overseas, adjust better and get more out of that experience. And one of the things we they they did is in the in the city of [00:32:00] Toyota, like Toyota, the company, there's also a city, Toyota. And we met Mrs. Toyota and they they gave us a tour of their factory floor, which has been written about in many books. I know Eric Ries, the lean startup. That was the first place I heard it, but I think that's now a quite a famous story. But what surprised me was they they did this long kind of stop and story to explain that they have a way to stop production when workers on the floor have anything, anything that goes wrong. And then a whole bunch of people come to problem solve it and there's not like more things piling up. That's the lean part. But then they also did this like deep dove into a story about one worker who who gave them the idea like that came out of a worker whose responsibility was to put things inside of the cars they used. Because when you tour their factory floor, the first question that one of the first questions that comes to mind is why are there no doors on the cars? Because they seem all assembled, except there are no doors on them. That's because the people who have to put things inside of cars used to have to open the doors, then very carefully walk around them slowly to make sure that they weren't scraping like nothing they were carrying or wearing would scrape the paint on the exterior of the doors as they were going to the interior to install or assemble or fix things.

Speaker5: [00:33:42] So it was taking a lot of time and slowing them down. It was and there really was no reason the doors had to be on the cars when they didn't have all of their pieces yet. So they have. And throughout Japan, like everywhere we went, every [00:34:00] experience you have your surveyed about it, and I think that's exactly what you were talking about with like that would be one method to gather bottom up ideas is to ask because often people will tell you if you ask and if you don't ask, it makes it seem like you don't care and you might not actually care or it might just not occur to you to ask. But the I'm like at the time I was like, why is this a big deal that somebody suggested the doors were in the way, but now in a different context? And when I'm asking different questions, I'm thinking, oh, like that's actually a very good example of the values of that company and how they live their values day to day, and they're how there are structures built to support those values, which is a very different story than I mean, any company can have a great tagline and a great list of values, but the extent to which they really internalize them and make it part of make it a positive and instructional and competitive and learning part of what they do is, is I think what what at least I and I think a lot of people really want to know about a place. It's not just.

Speaker3: [00:35:17] Companies. I love that people you gave you give because in the traditional in the negative work culture settings, the people doing the installs that require to open the doors would not have been invited to the meeting because they're not senior enough, are not important enough, and there are opinions aren't useful enough. And so I think that that is the negative side that we see.

Speaker5: [00:35:38] Absolutely.

Speaker3: [00:35:40] So yeah. Thanks for sharing that example. That's that's a beautiful example of why why we need this.

Harpreet: [00:35:47] Thank you very much, Patrice. Let's hear from you. Yes, a very interesting experience I had. So I when I was at City University, I was part of the student formula student motorsport [00:36:00] team. Right. So we all build cars, different universities. We'd get together and we'd do time trials and endurance runs and things like that. Or it was a massive engineering exercise. Loads of fun, right? One of the best learning experiences I ever had because I got to work with people across multiple years at university as well as with different professors who would advise us. But we had, I think it was one of the Germans. Whoa. Sorry about that. I think we had one of the German teams come across. This is probably back in 2015 or 14. And when when we had something go wrong with the car, we pretty much had it separate it down to life department. So if there was something wrong with the electronics or the wiring loom or something like that, it would be down to the electronics team who'd come across and have a look at it. Now, in the in their team, what we found was until the car was ready, in perfect, if there was something wrong with the car, even if it was just something wrong with the engine tuning, the whole team would essentially be present around the car and taking ownership of the entire problem.

Harpreet: [00:37:03] Even if you didn't have anything to offer in that moment, just experiencing watching the guys who were experts on the engine work on the engine could give them insights into other parts of how the car operated. So it was very interesting to see this whole thing where you literally had 20, 20 people sitting, standing around a car watching two guys try to fix this thing, and occasionally one of them would pipe off with a different idea, you know, like in terms of how you can potentially program the issue a little bit differently and they'd be offering up ideas. So it was it was just interesting to see that more hive mind kind of thinking. And it's to me like a corporate level or like an a workplace. It's about how efficient that is and how often you use that. Knowing when and how to use that to maximum effect is quite powerful. But I think yeah, there are certain situations where that kind of hive thinking or mob programing [00:38:00] situations can be very useful. Just yesterday I found me and a couple of my colleagues, we were we were training plenty of models over and over again for different the different use cases. And basically we were running into similar issues, but just having like an open chat line, like a huddle on Slack where we could just talk verbally.

Harpreet: [00:38:23] We managed to catch each other out on errors before we run into them because someone else had run into it, you know? So that kind of collaborative thinking kind of seeps across if you're able to execute it intentionally enough. Thanks so much. All right. See if there's any questions in the queue. Last question. After two years, once my favorite moment episode, the last few years. Definitely favorite two episodes, I'd say is. Robert Green for sure. And juncture, just those two people are just huge in my thinking and huge in that we have take up a lot of real estate on my library bookshelf as well. So it was a while to be able to talk to them and then also just be happy. That's definitely my absolute favorite time as well. Let's go to another question about learning Python to circle back to that. By the way, if anybody else has questions on YouTube or on LinkedIn. Go ahead and add that question to the queue. Or mark. Question or comment. I was calling about Python, so I'll wait for the question. Okay. All right. Cool. So after Petraeus goes, then we'll go to Mark.

Speaker5: [00:39:53] The question was just if you have any favorite Python learning opportunities.

Harpreet: [00:40:01] Yeah. [00:40:00] So my my favorite so self paced I will forever plug dataquest I. Oh honestly, that's how I learned Python for myself. I learned on the train going to my to my job and operations and picked up Python that way Mexico said real python. I reference them all the time especially their their classes tutorial. But the main thing that really from me going from I know how to use python syntax to I actually know how to code in python was actually working in a code base and getting code review and more importantly is that in an end code base so typically have this thing called secede, continuous integration, continuous deployment, and it's probably going be hard to get that for your own personal project and maybe open source projects to have that. But essentially our engineering code base, engineering code base and essentially what those tests are going to have a whole bunch of unit tests and launching and all these different features that's going to force you to write good code. And so a really great example is I built one of my first ETL pipelines and production recently and an engineer put a new test in. And I think my, my, my code kept on failing on this new test.

Harpreet: [00:41:26] It forced me to understand their code and it forced me to write even better code to account for some errors. And so that's something that I just couldn't get on my own because it took engineers or people who are more advanced to write these tests and for me to force me to consider them. And then after that, so say, for instance, you know, maybe you can't work in a production code base just yet. You're still learning going on GitHub, putting your code up as if it was a real branch and then asking someone to do code review. I've learned so much from code review. I think that's where [00:42:00] I really learn how how to really do software development. I wouldn't call myself a software developer quite yet, but, you know, getting those best practices from from someone who actually knew what they're doing just really brought me to the next level. And so identifying people and what I've done for friends is like one of my friends learning how to be a front end engineer. I just reached out to my network, said, Hey, my friend has a branch up, I'll give you some data advising if you.

Speaker3: [00:42:27] Go and do code review for my.

Harpreet: [00:42:28] Friend and people are happy to do that. So I would greatly, greatly encourage getting code review to really bring your python to the next level.

Speaker5: [00:42:38] Thank you. I will message you as soon as I'm ready for my first GitHub code review.

Harpreet: [00:42:45] I will do it in terms of learning the language. Just from the actual ground up. From zero. From scratch. My favorite resources from python principles dot com or python principles. Probably one of the best ways to learn python. You don't have to install it or anything. If it comes down to web browser and you learn syntax from literally like what is a string? What is a. Variable type of thing. I put my cousin on it and he is in 10th grade and he went through the entire course and said he actually loves it. I will also put another one of my cousins on there for electrical engineering and it's like at the end of his bachelor's and he's really enjoying that as well. So definitely check that out. Ken, let's hear from you. And if anybody else jump, jump on this. And if anybody else has a question, I will add you to the queue.

Speaker2: [00:43:43] Yeah. I'm going to go a different angle with Python. So rather than the general learning, you know, like at a high level, I think that there are some cool opportunities right now that are going on to learn specific tools. So my friends over at Tremlett are doing a 30 days of stream [00:44:00] challenge where every day they have some really, really in-depth tutorial style things that are consistent with how to build a dashboard using that platform. I think looking for what companies are doing or what open source libraries are doing with their own documentation is really interesting. Like most people are like, Oh, I'm not going to go to like Python dot org or whatever it is. Python dot, I think it's Python dot org to go and learn the language. But even on that platform, there's like a really good tutorial for all of the basics, like loops, variables, functions, classes, whatever it might be. I think it's really overlooked the people that actually produce these languages or libraries. They kind of have some pretty good stuff out there. So that's something that I know whenever I need to brush up, it's like the first place I look to and it surprises me a lot. So also shamelessly, if you're looking for some some good data or or libraries to work through, I did release all of my YouTube data a couple of months ago, and there's some really good examples that people have put together analyzing that for with some pretty cool visualizations and some other different types of analysis. So. Shameless self plug.

Harpreet: [00:45:19] Shudder, Can't Breathe Podcasts, podcast, YouTube, channel and podcasts, by the way, can sell some very resources on a business sense as well. Check that out. And in terms of learning how to write code that passes test, I guess. There's a course from Ted Petro. He has a platform with things called gender data. And he had this essentially like a course that cost like 15 bucks. And it teaches you how to pretty much build pandas from scratch and it teaches you how to test every little bit of code. So you not only learn how to build an entire data analysis library, but you also learn [00:46:00] PI test as well. It's really cool. Well worth the 15 bucks that it more than that but shout out to Petra. Should respond to my message to them on the podcast. If you.

Speaker5: [00:46:17] Yeah, I would say one thing I would say is that the types of resources that are useful are useful at different points in one's development. So there is such a thing or like I think. And this is something that I think I sort of made the mistake when I was first starting out. Like, I tried to learn really advanced stuff before I had the basics down, you know? So I feel like. I think in general, it's never going to hurt to build your own projects. And sometimes the building a project could mean taking an existing code like an existing project, doing like a code along where maybe for example you see someone's project and actually like sometimes the websites for like streamline or some of these other really popular tools they'll have like they'll have specifically a gallery section or projects. What can be really sort of nice is going to that gallery section, going to the GitHub project that's listed and like trying to take that one project, deconstruct it and then essentially do like a different version of it. So they might be doing like a forecasting off of. Oh, I don't know. Let's just say rain patterns. But they could they could be doing a forecasting of like stock or something like that. Right. But then maybe you read through it, you understand, like how they're structuring a class object, how they're doing, like the training and prediction pipeline, and then maybe you instead do something kind of like a similar setup, but a different sort of forecasting problem or [00:48:00] something like that.

Speaker5: [00:48:01] But I think like it really kind of depends on where you are in your sort of learning cycle, like when you're first learning how to code Python. I feel like if you then start throwing in dev ops and you then start throwing in data and sharing stuff, it gets really complicated super fast. So I would always check in with yourself on kind of like where you are in your process and that's a little bit hard. So it helps to talk to other people more like senior engineers and all that, and to get check in to see like what is sort of the bottleneck you need to work on to then progress to the next level. Because I have I had this huge issue where I was just boiling the ocean on resources, like I was chasing like every single blog post, LinkedIn article, YouTube video or whatever. And it wasn't helping me be disciplined in the skills kind of I needed to learn because essentially once you learn, I have good basics of python learning, pandas learning, numpy or something like that. It's relatively. Like easier than learning python. Once you learn numpy, like NumPy and some of the other libraries, then it gets a little easier to understand, like how TensorFlow is built.

Speaker5: [00:49:16] Things like that. Or even once you know how to write a script, then it becomes the next step of being able to write classes, and then it becomes the next step of being able to write a package. So I would always kind of think like think minimum viable. Piece of learning that you need to do to unblock yourself and then try to develop on top of that. So I feel like when I was first starting out, I burned a lot of cycles learning stuff that was not supremely useful in the short term. And I'd argue it wasn't useful in the medium term either. It became useful long term, like no knowledge is wasted in the long term. But you know, if you're trying to get a job or get a raise, [00:50:00] some knowledge is not as useful at some point. So, yes, thank you. Thank you for thinking also of efficiency there because there is a lot if you just Google how to learn Python. So I really appreciate all these ideas that are coming from people who have tried a lot of things and use them and shared what they found most valuable. So thank you so much. Yeah, I'm regularly getting my butt kicked by like all the senior senior staff, principal engineers around me. It's amazing how little I know and how much they're able to find in code reviews.

Harpreet: [00:50:36] I think. Thank you very much. Awful shout out to everybody else in the room. Always good to see you. What's going on, David? They're Judge Denise, Perillo, Marion and Rush. What's going on? I have to have our guys here, and Christina has a question, so go ahead and and go for it. This is a technical question, having had one of those while. Go for it. Yeah. Thank you. Thank you. Thank you, Harpreet. So I have this data set, and the purpose of me solving this problem was not to actually have a deployable model. We just wanted to extract maximum information from this dataset. So that was the purpose. So I went in and I thought I went and I use CAD Boost for because it offers a very good model. Explainability The values that CAD Boost offers is really cool and you can explain how each feature kind of plays and impacts either positively or negatively the response variable. But so I was just curious that there [00:52:00] are more models out there that I can use that offer the same degree of model EXPLAINABILITY so that I can at least verify the results that I obtained from the CAD model because I just, you know, using just one model and testing it, it kind of feels weird if I get the same results from another model kind of proves that there is really something, you know. So that is one that is one part of my question. The second is I, since my work is mostly in my office, is on linear data, which is like numerical and categorical. I don't. But in the market I see there is a lot of NLP requirements, so I would request you to anyone here to suggest me a good project or a good idea so that I can get started with. Thank you. So I just wish there was somebody in the room here who wrote a book on Explainable A.I. and also someone here was writing a book on English models that would be so helpful right now. Right. That's good.

Speaker6: [00:53:17] It's good. I read the book. I'm currently in the middle of it.

Harpreet: [00:53:22] That's not mine. No, it's not. That is just not here. Is he?

Speaker6: [00:53:27] No, he's not.

Harpreet: [00:53:27] No, no, no, no. So let's do this. Serge actually also wrote a book. And then after that, if you have anything to add from your learnings of that book, I'd love to hear from you as well. And then we'll go back to our second question about that search for it. Yeah, you can definitely the way to go is to try different models, not just caboose. I realize why you chose Caboose, but it can be any other ensemble. This [00:54:00] is entry as well. I, I try even. You know, it might sound ridiculous, but you can try also logistic regression or linear regression. You can gain some insights from those as well. You can throw SVM on it. Lda It really depends on what kind of problem it is, what kind of models you would throw at it. You can use Sharpe on any model. You can try the method that the cat blues already has. Integrated Use uses what's called the tree explainer. It's very quick, but it has. It cuts a few corners. So you want the highest fidelity you can try the kernel explainer. One of the other explainers, which is the truly, truly model agnostic one, it's a lot slower, but it will give you greater, greater fidelity if that's what you're looking for. And it does help to compare the different models. Another thing that does help, but it might be, you know, not as computationally plausible in your case if you have hundreds of features, is is is tri interaction sharp, which is a good way to see how the different features are interacting with each other.

Harpreet: [00:55:25] So generally it's a good idea to do some feature selection unless you think there's some value in a lot of the features there. If it's a question of exploring the data, what's really the point of exploring all the data if, you know, 90% of the features are worthless, you know? You know, as far as like the predictive capabilities that they they bring. So I'd look into that. But it is a good way of kind of getting an understanding, [00:56:00] trying to boost and as you you did. But then I I'd probably try feature selection. I try different models and I definitely try a different explainer. Even if you're not using, you don't have to necessarily use Sharp. You could use some of the other methods as well. You can get some understanding from them. Just to recap you, you spoke about colonel explainer and pre explainer and interaction. Shep so are these functions all available online? I can just look them up in the internet. Yeah, yeah, yeah. A lot of folks don't know that you generally by default, they will just give you the chat values. But you can also from the three models and only from the I mean, any tree model, you can do interaction values as well, which is very much so. Interact sometimes. Sometimes, yeah. I'm sorry, I interrupted. No, no, no, no. I was actually wanting to know what interaction shop is. Thank you. Oh. The shop interaction values will show you a degree of interactions between the features, so much like a correlation plot would do.

Harpreet: [00:57:16] If you're doing like Spearman or Pearson's correlation like, you'll you'll get an understanding of how they're interacting. So it's purely associational, but it does have a higher degree of fidelity because it's derived from the model, it's not necessarily from the data, and therefore it's, it's not necessarily a linear like or monotonic value. It's something that will tell you to what degree they're tying into the target variable. So that's very useful because sometimes you think a feature is important, but it's only important because it's interacting with another one. So it gets you [00:58:00] understanding, okay, this is how they're interacting. And generally you use something like a partial dependance plot to understand that and sharp has partial dependance plots as well. Which are very useful. And so the question for. Like features, like methods with so many different methods out there. How do you. Approach deciding which method to use first. Do you have your go to method? It depends. Sometimes if if you have a ton of useless values, a useless value, useless features some of the classics like you know, like you can use something like lasso and you'll just weed them all out, right? But that might be a very simplistic way of looking at it. There is a progression that I do. I try all those simpler ones first and see what I find, but I also try to validate them with the most the more complicated ones. And then I use the set operations within Python to see how they overlap. So like, I'll I'll I'll try lasso or.

Harpreet: [00:59:19] Also within psychic learn you have special variations of lasso that use the big values as well of the models so you can check if there if there's some some value between different kinds of models that it's determining with this feature selection. Forget the name of the feature. It's in my book, the feature selection method that I'm discussing that uses lasso. But more importantly, it's just what it's simply doing. It's iterating through all the different features and using very quickly lasso to discriminate. So, so sorry. [01:00:00] Did you mention special variation of lasso? Yeah. What is that? Yeah. You kind of put me on the spot here. Select model. There is. There is within Psychic Learn, which is a very, very, very complete library. Yeah, it is called select from model. You go psychic learn that feature selection feature underscore selection and you find a ton of different features. So it has select K best select from model. So the what I'm discussing is select from model and the model you use is lasso, but you can also use a variant of lasso, which is lasso last pick. Yeah, that's the one. So that one. So large or lasso large pick. So yeah. So it's, it's a lasso model with layers, which is another variant using BIC or AIC, which are very good for that particular case where they're pretty much checking the goodness of fit using these criterias. It's a long winded explanation, but these BIC and AIC are very, very famous. Criterion's. Yes. Thank you. Thank you, sir. It was very insightful talking to you.

Harpreet: [01:01:34] Thank you so much. Yeah. And well, I haven't finished, though. That's that's the simpler one. I go all the way from that, all the way from using. Perhaps if you want to be very thorough, there's a bunch of other methods that iterate across every single thing, every single variant of features, kind of trying to discriminate on the first go. [01:02:00] Okay, maybe you have 400 features and in every iteration, they're like trying to take away the, the 5% that is kind of on the bottom in terms of importance in your model. So it uses the feature importance of the model and every time it's just like 5% less, 5% less, 5% less until it's it's within a threshold of of, you know, performing well, but not not necessarily quite as well as it would with a lot of other features. But, you know, sometimes it's not worth it if it's only increasing performance a tiny bit. And then there are newer methods, like one of my favorites is genetic feature selection. I suggest you look into that one too. It's pretty time consuming. It's very resource intensive, but it's it performs very well. So you'll find I think it's the name of the library's genetic algorithm. Forget genetic algorithm. I think it is called genetic algorithm. And yeah, it comes with a lot of I, I don't want to oversimplify it though, and I feel I've been speaking for very long. I'll get that now. Appreciate it. Omar, you said you're. So I've been into explainable AI recently. Any input here? Hello, everyone.

Speaker6: [01:03:34] So, no, actually, I'm learning about it. I'm I'm still like in my early ages of it. I like it because, you know, it's it's the maturity of this domain. This is what makes our domain. This is what will make our domain like, mature and like, like, like conf and this will increase the confidence in our domain. For example, I was working with a company here and [01:04:00] they are a client of my company. So they do, they do machines for x ray. They called us out. It's like a world leading x ray company. So they have these x ray for like pregnant women to see the baby. It's called like echoes or something. So they had a problem. We solved the problem using like auto encoder and something but they wanted before like accepting the the outcome from our side. A full explainable like report. Like why does it perform how it is performing? And it is like it's independent from the human side because it is all machine related. It's not affecting the humans at all in a sense. So the like the goal is to make sure that this the son that is like the doctor holds in his hand is actually working well without having the help of a technician. So we tried like we applied something and it worked, but they wanted a full report on Explainability. They, they asked us like clearly that they want a nonprofessional to understand how this is working. And, you know, it's, it's a really hard task. And I've been working on it like for the past month after solving the problem, you know. So it's it's a big thing for me. So I'm reading the book, I'm taking some inspiration. I'm not reading like every word. I'm not applying the codes, I'm just taking my inspiration on how to do that. I actually have a question and another project where we are like currently blocked.

Harpreet: [01:05:36] If it's okay, yeah, go for it.

Speaker6: [01:05:39] Okay. So imagine you have a motors like any, any type of like small motors that are being manufactured. And when you when you want to test them, they are like this expert who listens to the like performance of the model. So just turn it on, listen to it, and if it's okay, it passes. If it's not, it goes into further inspection. [01:06:00] So they had a problem. They wanted to automate that. So we took the recordings of the motors, like many, many good ones, and like a few, like a normal ones, we say. And so we have like 40,000 recordings of 5 seconds of good working motors and like 15 or like to those that have anomalies. So we tried to do the following thing maybe to build an auto encoder that takes the input, which is the mouse spectrogram, which is the visual representation of the audio file and like, like tear it down into a latent vector in the middle of the auto encoder and then build it back up so it will learn only the good aspect of a working model. So whenever a working motor, so whenever a new animal like recording comes in, it should not be able to build it again because it only knows how to build good ones. So the thing is, we do similarity between the input and the output. And if the similarity is high, if they look like very much the same, it would be a good one. If it's not, then this would be a normal one. So we are blocked. It's not it's simply not working, which are many, many approaches. It's not working. That's a that's like it's been like two weeks since we're trying on this and it just doesn't work. You know, we have a okay, that's important. We have a restriction over the memory and the deployment because because the deployment thing is like edge deployment, deployment. They have their own design of their boards and they want to integrate it within it. So we are. That's it. Any insights? Sorry I spoke to too much, but yeah.

Harpreet: [01:07:53] It sounds like a very, very tough problem that I've never worked on, so I have no clue where to even start with that. Any [01:08:00] deeper, any experts here who specialize in this type of problems?

Speaker6: [01:08:04] Not necessarily deep learning. It's just an approach. Maybe other things might work that would be even better. Yeah. Because in our situation we converted the audio file into an image using Mel Spectrogram. This was the main approach. But if there's like maybe other ideas using the using the audio file directory, that would be great.

Harpreet: [01:08:29] Mark. Go for it. Yeah. So I have no expertize for this and it's more so I have a clarifying question is because what you're sounding sounds very complex. What's like the simplest thing you've tried so far and then how is that compared to what your current approach is?

Speaker6: [01:08:49] So the simplest thing was done by the R&D team and our company. They tried like RPF, they extracted features out of the because, you know, RPF is deployable on their edge, is designed. They have like something called neuromorphic, like parts that has like something I don't understand very well, but they have like a hardware that is specific for RPF, that works very well for RBF. So they tried to extract features from the audio. Yeah. From the audio file itself. I'm not sure how they did it and they done our beef over that. It worked to some extent, but it was not a very confident solution.

Harpreet: [01:09:36] If you go.

Speaker5: [01:09:38] Um, I just, I just want to point out that Nvidia actually has some pretty good technical blogs and talks on specifically conversational or speech AI pipelines, so it's worth checking it out. Some of it does tend to be a little bit more focused on their platform, but if you kind of sort [01:10:00] of skate around it, some of the resources there are pretty good. I posted a link to Nvidia stuff and then the other question I had was, I guess I'm not getting a clear understanding of sort of what are the what are the bottlenecks or the problems you're trying to solve for? Is it is it performance? Is it the structure design.

Speaker6: [01:10:24] You know, the actually manufacture the motors? Do they have like this pipeline of manufacturing motors? And for the higher end models, they do like human inspection on every produced motor. And they are trying to automate this, this human role in here where a person is usually sitting next to the motors and listening to them. So if if they sound fine, they're okay for shipment. If they are not, they're like went back to further inspection.

Speaker5: [01:10:53] Yeah, but what I'm saying is that like in the current summary of, of the project where you're stopped at right now is the problem that it's a problem that you've tried architectures out there in papers and they haven't worked? Or is it like a computational challenge where it's like, you know, the the design, the system architecture could work, but you just don't have enough compute power or is it your outputting a pipeline that's doing terribly?

Speaker6: [01:11:23] So for this.

Speaker5: [01:11:26] There's like three for each problem there's a. Right.

Speaker6: [01:11:30] Yeah. So actually there are we have like both of the problems. When we enlarge the model, it works. It is able to learn the exact like good motors. But this is we're talking about like very large model. And when we like try to make it a bit smaller when we like do some higher parameters, like it just doesn't work and within the restrictions we have.

Speaker5: [01:11:57] Gotcha. I mean, yeah, Kozlov [01:12:00] has his hand up, and since he does all the stuff with, like, robotics, I would say, like, because if it's a computational. Yeah, don't get that look like you work in robotics because if it's a computational thing, a lot of times it's like maybe it could be just throw more money at it to get more like compute or GPUs or what have you. But if it's like not that, then yeah. You want to take a stab at it, please?

Harpreet: [01:12:31] I mean. Well, the two things approximate. I'm not an audio expert by any wild measure, but just a question. Is there a reason you're specifically using Mel frequencies?

Speaker6: [01:12:44] Because there's no reason at all. But like in the state of the art, this is what it seems like to be the best representation of audio. Like human listen to audio file how the humans listen to it an image.

Harpreet: [01:12:59] Yeah. Yeah. So, I mean, like that, that might be where the where part of the bias is right is because when you're talking about Mel frequencies, you're talking about the like nonlinear parity of human perception being scaled into the into the frequency range. Right. But does a computer see it the same way? Right.

Speaker6: [01:13:18] Like we see. So, for example, there's a.

Harpreet: [01:13:20] Fundamental difference between how humans perceive objects and how deep learning networks perceive objects. Right. We use shapes. Deep learning objects typically use textures, right? Yeah. So there might be like a fundamental perception difference there. So I don't know whether staying away from male frequencies might.

Speaker6: [01:13:41] I actually I actually like, like cancel this option because somehow it is visual. The difference is when I put in front of me two examples, one working motor and one that's not working, I can know which one is the one that is good and which one is the one that's bad, [01:14:00] you know, so. So there are definitely.

Harpreet: [01:14:01] Like visual markers in the male spectrum.

Speaker6: [01:14:03] Yes. But maybe I'm biased because I designed the algorithm. I did the whole thing. I asked one of the company employees to come and look, you know, look at them. Just give me a burst like idea which one is the bad one. And he figured out which one is the bad one.

Harpreet: [01:14:24] And also. Yeah. It's really hard to say. Like, I mean, I'm definitely not an audio expert by any measure, but that was just my open question, I guess. I mean, the other side of it is like when you're testing motors, right? For for quality, are you able to take a different approach and use something like a miniature dynamo motor or something to run the motor against?

Speaker6: [01:14:49] So their restriction is that we don't touch the production pipeline. We don't add anything to it. There's already like we had some discussions before even allowing to include like recordings like microphones next to the motors that are being tested. So they don't want us to touch anything. We cannot put like an accelerometer over the the wheel that is dragging the motor along the pipeline. You know, they don't accept anything, but they want it to be like an outside during solution. With the memory and power computational power.

Harpreet: [01:15:28] Restrictions for certain. But before we get there, I'm curious and if you have any inputs on this. But does it have to be encoded as either audio or video came to measure? Like, I don't know what scale audio comes in, megahertz or whatever, but just imagine having a row vector that samples audio at every given, let's just say 0.00 5 seconds and you just pull some statistic of that audio value. [01:16:00] So we have a row vector that. Each viral vector is sampled at 0.00 5 seconds and gets the maximum hertz or whatever and use that as. I think the difference the difference there, I think, might be that what they're perceiving when they're listening to it is in the spectral domain and not in the time domain. Yeah. So trying to apparent match in the time domain doesn't quite line up the same way. Oh, I see. Seriously, if you can do them then. I mean. But it's been a while since I worked with audio data. But something that worked the last time I worked with audio data was there's this library called La Rosa, which you probably use before, but you can I would pre process the data, the audio before the Mel spectrum. There might be something you can do to make I did make the sounds pop more okay you did the other thing.

Speaker6: [01:17:04] There's this like PhD thing like the fast Fourier transform. It gives you a number of channels, which is somehow correlated to the number of features, you know. So it's called bins. Are you familiar with that? So you choose how many bins you want and those would be your features. I try to hyper parameter this like I even automated it and when you choose a large number of bins it gets more explainable by the AI, but you get a larger image. So it needs more computational power.

Harpreet: [01:17:40] Okay. And you said you said speaking about computational power, that you said you needed a very big model and it would work. Okay. Which you suggest. Okay. It suggests that maybe maybe that's I mean, to think that that's the only way to solve it. But [01:18:00] have you thought of making a big model and then quantifying it?

Speaker6: [01:18:03] And I did. I tried in eight quantization, like training our quantization and maybe like again post training quantization both didn't like get very helpful because we are talking about an easy design.

Harpreet: [01:18:20] Yeah. And then there's, there's another thing I thought of and it is, I mean perhaps Mel Spectrogram does not capture it because it's designed for human ears and, and, and, and perhaps there's a lot of sounds that are not going to be represented there. And so maybe, maybe you find another way of actually visualizing it. It could still be an image, but I don't know what that would be because I haven't I.

Speaker6: [01:18:48] Would.

Harpreet: [01:18:48] Look I would look for papers that describe that kind of sound and maybe how they dealt with that.

Speaker6: [01:18:54] I thought of a backward to to double check this what you're saying I did an inverse Mel spectrogram so I reverse the amount of spectrum into audio and then compared both. It's quite good okay. And a very high degree like so it is representing well what I'm hearing but it's not like at the end like we could just agree that this is a computational power restriction and that's that's that, you know.

Harpreet: [01:19:22] Yeah, well, Greg probably knows the answer to that, so I'll let him go on.

Speaker3: [01:19:30] I just had a quick question for him or maybe you said it before, is that when you said you're comparing the big model with the small model that you could see which one is failing is how are you able to see which one is failing? Are you doing the comparison with how he's processing the input data, transforming it and making an inference? Or how are you able to see the difference between the two?

Speaker6: [01:19:54] Exactly. No, actually, I wasn't very clear. Maybe I like [01:20:00] compared to meld representations one for a working model recording, working motor recording and one for like like a normal model recording. So I compared both images as like data, data inputs and I was able to see the differences. Where is it failing for there like a normal one?

Harpreet: [01:20:25] Okay. Close the. Can I ask what exactly when you look at.

Speaker6: [01:20:33] The like the metal.

Harpreet: [01:20:34] Spectrograms side by side? What do you visually see is like the marker for, hey, this is a good motor. This is a bad motor. Like, is it like a dominant, dominant frequency or are you seeing like harmonic patterns or like what are you seeing.

Speaker6: [01:20:49] That actually the harmonic patterns are the scary ones because when when a motor stuck, it keeps generating the same level of frequency. So in the Mel representation, I'm going to see like a same color for like long term of time. But when a motor keeps running, when it's still when it is like nonfunctional, you could also not see a harmony, but you would see the like, I don't know, the English term of it. It's like the, you know, the sinusoidal representation, like with like high peaks. Yeah. The oscillations.

Harpreet: [01:21:29] Okay.

Speaker6: [01:21:30] And those oscillations in some time are the normal thing when the motor is turning on and off.

Harpreet: [01:21:42] Okay? Yeah. This one's definitely a tough one. It's like it sounds like a weird crossover on. You're kind of reaching the limits on what computationally you can pull out of the spectrum, right? Yep. Yeah. Which is why, [01:22:00] like, this is where this is what I start suggesting. Okay. The signal that's driving the motor. Do you have access to that signal?

Speaker6: [01:22:08] No, not at all.

Harpreet: [01:22:12] Because there could be information there.

Speaker6: [01:22:14] Like like some feedback. Yeah, exactly.

Harpreet: [01:22:16] Like MF feedback, anything like that that might be able to give you something. This is. This sounds like.

Speaker6: [01:22:23] Yeah, I actually, like, ask them if it's possible to fix an accelerometer like, like below the wheels that, like, that drives the, the motor to the testing place. And they said like this is a big no, no way. We would like add anything to our existing like pipeline.

Harpreet: [01:22:46] That. That, to me is always an interesting like an interesting requirement. Right. But when when when you kind of given a challenge to do something with nothing. Right. Like even if it's something as simple as like a, like a couple of lasers and a and a and a slip angle sensor kind of thing. That's probably enough to tell you, like the rate of movement of, of the motor itself without being contact, without needing any kind of contact. So it's yeah, it's.

Speaker6: [01:23:15] It's always a challenge. Just, just so you know, the first discussion was about affixing a camera over the motor. That was their position, you know, using the camera. We want to know if it's working.

Harpreet: [01:23:28] Or not, you know.

Speaker6: [01:23:30] So we drove them towards a recording and that was a tough one. So like adding anything else would be a big problem for them.

Harpreet: [01:23:41] Could you look at, like, magnetometer? Sorry, I'm geeking out here, but could you look at magnetometer and like. Because naturally, like, I'm assuming these are like electric motors, right? Yeah. You'd you would see some kind of magnetic field change based on the speed, right? Yup. [01:24:00]

Speaker6: [01:24:02] Makes sense.

Harpreet: [01:24:04] Let's share some from let's hear from from Ben on this. And then after Ben died.

Speaker6: [01:24:13] Guys, I'm really sorry. I took lots of time for this.

Speaker3: [01:24:17] You're okay. In another life, I used to do a lot of audio. Long time ago. At RV, we built out automatic speech recognition systems in-house. So these are deep speech, too. We had to train these models for over a month on tens of thousands of hours of audio. And so one of the questions I have for you is how much how big is your training set? So how many unique spectrograms do you have? Is it thousands? Thousands, 40,000. Okay. And then are you doing eight bit depth on your spectrogram images or are you doing the full 16 bit or when you send.

Speaker6: [01:24:54] It into your model? I tried both.

Speaker3: [01:24:56] Okay. Is there any other data in addition to the audio that you could.

Speaker6: [01:25:02] Add to it? Nothing.

Speaker3: [01:25:03] Nothing. What time horizon are you using?

Speaker6: [01:25:08] Like 5 seconds.

Speaker3: [01:25:09] And have you. Did you experiment with that? Doing longer.

Speaker6: [01:25:12] So usually the the like the person who's working there needs 2 seconds to recognize.

Speaker3: [01:25:19] Okay. And you said so. So the human needs 2 seconds to hear it, to recognize what's the human accuracy.

Speaker6: [01:25:29] Okay, so we tried this. It's it's very hard to have a deficient motor comes out. So we faked it we added some like motors that they know that doesn't work very well. So it was 100% of human level performance, over five of their employees. Who does this job, you know.

Speaker3: [01:25:50] Okay. But you said earlier that you felt like you could tell on the spectrogram.

Harpreet: [01:25:55] Yeah.

Speaker6: [01:25:56] I'm not sure, because I'm probably biased, but [01:26:00] this is probably the case. Yeah.

Speaker3: [01:26:03] And when you train your model on the 40,000 images, what what model are you using? How long are you training for?

Speaker6: [01:26:11] I try this have like many architectures. Like the last thing I did, you know, I'm using auto encoders, so a couple of convolutions and then a latent vector. So I take like big convolutions, then a smaller, smaller, smaller to a Latin vector, and then I rebuild the image again. To obtain a new image. So I compare both images with like some similarity of functions. That's not the problem here. I know that the similarity might be something confusing to compute, but that's not the problem. But I compare the similarity between both. And if the similarity is high, this is okay. This is a normal matter. And if it's not high, then this is a normal one.

Speaker3: [01:26:52] Okay. We could take this discussion offline, but one of the things we could do if you were open to it is we do audio processing pipelines at Data Robot where we would take the spectrograms and run them through. The thing that you get is we do the unsupervised clustering for you and then we do the grad cam visualizations for you. You can obviously do that yourself, but that might be a way for you to troubleshoot. Maybe 40,000 images is a lot for the spectrogram and maybe there's some quality issues. Maybe there's something else in the data that could come out so we could talk offline where maybe we we run the data. I'm happy to run it through our platform and see if it if it gets different results, then we can dove into the modeling approach to understand what we did. But hopefully we can get to the bottom of it because it seems like if you can see, if you can tell in the spectrogram, that should be a huge giveaway that like this, this should work very well. There's no reason this shouldn't be working. So yeah, if you don't mind, send me an email Ben at Data robot dot com and we can.

Speaker6: [01:27:54] Just connect with you on LinkedIn and.

Speaker3: [01:27:57] Awesome.

Harpreet: [01:27:58] Okay. Thank you very much. [01:28:00] I appreciate that. Greg, go for it.

Speaker3: [01:28:03] For a silly question. Have you tried like different types of models, like a time series? Because when I'm hearing like motor failures, time series is very especially motor vibration. There are a lot of signals, you know, like a motor vibrates depending on the oscillation. Over time, you can create time series models that can give you more powerful signals for future failures versus, you know, analyzing images. I don't know if you've tried different approaches to see, you know.

Speaker6: [01:28:37] So and their recordings I have, you know, sometimes the human needs to turn on and off the motor like multiple times, not like to double check to make sure that it's working. So like this flow of turning the motor on and turning the motor off exists in different places of the spectrogram. This is the first thing. The second thing is that we can we don't have anything other than the recordings. That's it. So we are talking about how would we like symbolize our recordings into into numbers and to wherever type of data. So when we're talking time series, what type of time series is it like? Like just time dependent frequency. It doesn't represent what we hear. You know.

Speaker3: [01:29:23] So, yeah.

Speaker6: [01:29:25] So the discussion of is how to represent this audio file in a human interpretable way in order to digested. Yeah. So the R&D team and the company actually try to extract teachers. I said before and it worked, plus or minus fine, but that's not.

Speaker3: [01:29:47] Yeah. Because because I was wondering, you know, maybe as a data collection piece or data annotation, you could use the human to kind of like identify that noise. Um. And [01:30:00] then use that data to train the model, right? So put it on a time series. So every time they listen to it, instead of that 2/2 window and pinpoint that noise and you collect enough of that, and then if you put it over time, you'll see some patterns. Maybe if you perform that check for multiple motors over time, you'll have enough data for some sort of timeline series, maybe prediction of failure.

Speaker6: [01:30:31] So, so here you're talking about like classifying predictions, like classifying anomalies, for example, this. Okay? Yeah.

Speaker3: [01:30:41] Using humans over time.

Speaker6: [01:30:43] So the problem is that we are trying to be problem agnostic. We just want to identify the good ones because there are many, many reasons why the motor won't be running. Okay. For example, if one of the screws isn't tied, well, it will start vibrating, making this weird noise and many, many others other reasons. It has like 2100 parts in the whole motor. So there are various reasons why it might not work.

Speaker3: [01:31:14] But yet a human is capable of knowing that particular noise that says is failing. Right. Which is, you know, or do you have enough data that showcases that, you know, the human is capable of listening.

Speaker6: [01:31:26] To any kind of machine, only 15 defects. The fact that the examples.

Speaker3: [01:31:31] You know. Because that's the other approach I would take. Right, which is let's let's try a different type of model. But over time, you need to collect that data from humans and then use that as a training. So that's what I have.

Harpreet: [01:31:45] Thanks. Excellent question. Excellent discussion. I want to thank you so much. Hopefully there's something in there that was able to help you at the very least. Ben's mother. So hopefully Ben can get to us, point in the right [01:32:00] direction. Let's go ahead and wrap this episode up. Thank you all so much for being here. Appreciate your hanging out for this second anniversary party, for this podcast to have two years, two years of podcasting. So yeah. Check out the episode released today. Natalie Nixon. I was on this podcast just recently, the data scientist show. So check that out. But other cool stuff coming in the next few weeks. Stay tuned. I just finished my first pachyderm club. Cool. Like fIREHOSE. Awesome. Hopefully you guys will see me do some more often. Stuff from there then. Thank you so much for the cup. You know what I'm starting to recommend? I've been hiding behind this guide for far too long. Just embrace it. It looks good. Thank you, man.

Speaker3: [01:32:53] Yeah. Looks good, man. I do have a quick question for you. Yeah.

Harpreet: [01:32:56] Yeah.

Speaker3: [01:32:57] What is that avocado for? In your LinkedIn profile name?

Harpreet: [01:33:00] Yeah. Yeah. So. So I'm in Deveraux now, right? So I'm still in data science, still in machine learning. But my role isn't data scientist or machine learning engineer. Enable machine learning engineers and data scientist. So the avocado is the universal symbol for developer advocates. Right. So you can see here, developer advocates, this is where we're avocados because we'd be good kind of into in organization. That's that's kind of the philosophy behind that. So you'll see a lot of people on Twitter who are developer advocates or in developer relations also have that kind of thing. So I'm embracing that since that was the direction I've taken my career, because it is one hell of a career path to go. One day I will be just like, Thank.

Speaker3: [01:33:49] You for satisfying my curiosity.

Harpreet: [01:33:51] You have me, man. I'll probably be more content about what Devereaux is, what Devereaux is all about. So if that sounds like [01:34:00] an interesting career path to you, I kind of describe it at the beginning of this hour. You'll be hearing me talk more and more about that for developer relations. If you are an open source platform, if you're a startup that develops tooling for data scientists and engineers, you've got to hire somebody in Devereaux. How the Devereaux functions, because you spread the word and the message about product, right? And traditional marketing doesn't really seem to work with these type of products and this type of persona and developer relations comes in. I'll be putting more and more content out about that in the coming weeks. Guys, thank you so much for being here. I appreciate you guys joining me on the second birthday of the podcast. Awesome, awesome stuff. Coming up in the next few weeks, obviously I'll be at Odesk in Boston April 19. Hopefully you're there 19 to 20 first. I'll be taking out the pachyderm table. So come through. We've got stickers and socks for you. And in terms of people taking over the podcast saying St Lawrence is supposed to take over the podcast in a couple of weeks, so definitely check that out. I think the kicker you might be taking over is at the end of the month or beginning next month at some point. Yeah, it's coming up to the 22nd. The 22nd. That's right. So I'm excited for that. You guys, thank you so much. Appreciate you being here. Number one, let's play it. We're not putting some big.