HH51-24-09-21_mixdown.mp3-from OneDrive

Harpreet: [00:00:08] What's up, everybody, welcome, welcome to the artist and Data science, happy hour is Friday, September twenty four. Oh my gosh, the end of September. This is Data science. Happy hour number fifty one! That means next week is Data science. Happy hour number fifty two, which means I've been doing this thing for a year, one year doing these these office hours. Obviously, I've been doing the podcast for a little bit longer. The podcast had its first year

Harpreet: [00:00:34] Anniversary back in

Harpreet: [00:00:36] April. I didn't start doing

Harpreet: [00:00:37] The happy hour

Harpreet: [00:00:38] Sessions until about five months after that. So yeah, man, next week is is the the the first one of these? Hopefully, you guys got a chance to tune into the episode I released with Dennis Will, who is a Data engineer based out of Berlin, had a great chat. I actually met Dennis through Link, not LinkedIn, but Instagram. He is Azure

Harpreet: [00:01:03] Will on

Harpreet: [00:01:04] Instagram. He hasn't posted in a while, but his content is amazing. It's been great. So hopefully, you guys get a chance to connect with him. I don't know if anybody caught the episode with with Lex Friedman this week,

Harpreet: [00:01:18] Not with me. But Lex

Harpreet: [00:01:19] Friedman had an interview

Harpreet: [00:01:20] With Travis

Harpreet: [00:01:22] Oliphant, who is the creator of Nampai and Sci-Fi, and that was just a really cool episode, man. Really great way to get some history on. On the stuff that we use on a daily basis, so those really cool interview, hopefully guys get a chance to check it out. I'm about like halfway through it and it's been amazing so far. Shout out to everybody in the room. Tom, what's going on? We got Eric. Eric is back. Eric, congratulations on the move. How does that move?

Vu Nguyen: [00:01:52] Good to be here, good to be back. You know, life's good, can't complain.

Harpreet: [00:01:55] Awesome, man. Awesome shot out the coast of joining us from Down Under Russell out there [00:02:00] in the UK. Matt and A.A., I forget where he's at, but also somewhere in Europe. A lot of countries in Europe, I can't remember them all.

Harpreet: [00:02:08] Well, I'm happy

Harpreet: [00:02:09] To have all you guys here. Let's go ahead and get started. So if anybody has questions, whether you are watching on LinkedIn on YouTube. Even right here in the room, if any of you guys got questions at all, feel free to drop your question right there in the

Harpreet: [00:02:23] Chat or the comment

Harpreet: [00:02:25] Box, wherever it is that you are, and I'll be happy to to to get your question. But until then, man, let's let's go ahead. Kick this off. Oh man, we got we got an old friend of mine who lives in the building, so I used to work with vou way back in the days. The first job I had during and after grad

Harpreet: [00:02:46] School at

Harpreet: [00:02:47] This company called the warranty group. That's what is

Harpreet: [00:02:51] Called. Good to see you

Harpreet: [00:02:52] Here, my friend. But yeah, man, let's go ahead and get started. For anybody, ask questions. Let me know. I came super unprepared. I don't even have like a question to kick us off and get it started with.

Harpreet: [00:03:02] So if anybody

Harpreet: [00:03:03] Wants to take responsibility for that, I'm more than happy to

Harpreet: [00:03:06] To

Harpreet: [00:03:07] To pass that responsibility to you. Otherwise, I'm just going to keep on just talking and filling up dead air until somebody gets a question, which I'm sure nobody wants any of that stuff.

Vu Nguyen: [00:03:17] It's my first Officer Hour do people like drink beers or anything during, yeah,

Harpreet: [00:03:23] Yeah. Typically just me. I think Russell might be having beer. I'm not sure who else is, but I we just hang out, talk about Data science related stuff.

Vu Nguyen: [00:03:32] Sounds good. I'm going to grab you, one man. Yeah.

Kausthub: [00:03:38] For me,

Harpreet: [00:03:39] What time is it over there?

Kausthub: [00:03:42] Seven. Thirty. Wow. Maybe a bit soon for a beer, I think.

Harpreet: [00:03:46] Yeah. I mean, depends.

Harpreet: [00:03:47] Depends. It depends what your lifestyle is like, man. Matt Damon asking if I'm a Niners fan, Yes, I

Harpreet: [00:03:53] Am, and I understand I love the

Harpreet: [00:03:56] 49ers. Absolutely love them. And [00:04:00] he's asking, Have you guys?

Vu Nguyen: [00:04:02] You go on, Matt? That's great. That's impressive all the way out in Canada. And I'm on I-80 right now. That's I'm happy to see that.

Harpreet: [00:04:12] Yeah, man. Well, not I don't know if people know

Harpreet: [00:04:14] Or not, but I'm actually from Sacramento, California, born and raised in Sacramento.

Harpreet: [00:04:19] That's my hometown. So.

Vu Nguyen: [00:04:21] Oh, no kidding. I know the I.D. I am on my way there right now.

Harpreet: [00:04:25] Oh, to Sacramento.

Matt: [00:04:27] Yeah, I'm driving there right now.

Harpreet: [00:04:29] Yeah, right on. Well, just tell

Harpreet: [00:04:31] Everybody I say, what's up? I think Mark's actually from Sacramento as well. Yeah, man, let's go ahead and let's

Harpreet: [00:04:37] Let's get this started.

Harpreet: [00:04:39] All right. So the question I would want to kick off with, I guess, is how are you guys like handling? Like, I guess. Ok, so I've been

Harpreet: [00:04:51] The heads down thinking and talking and

Harpreet: [00:04:54] Writing about experimentation management,

Harpreet: [00:04:58] You know, just this

Harpreet: [00:04:59] Part of part of my job. And I'm wondering, you know, before you guys? Really started doing hardcore machine learning and really building out models like how were you managing your experiments before? Let's start with Tom because I'm sure Tom has some great insight because that's typically what happens when you get white here. Is he a good insight?

Thom: [00:05:22] Yes. Each of these hairs that are gray,

Harpreet: [00:05:26] Which seems like

Thom: [00:05:27] It's all of them now. They each turn grave because of some thing, I learned the hard way for the stupid way, but I come originally from doing mostly physics based modeling and. When you meet an engineer that does a lot of predictive modeling, even if they do very similar to what we do, which they would do if they were doing empirical modeling or design of experiments

Harpreet: [00:05:56] Or factor

Thom: [00:05:58] Analysis with ANOVA very, [00:06:00] very close

Harpreet: [00:06:01] To

Thom: [00:06:02] Data science type modeling, but very different lingo, too. And so when you're talking to them, you have to just be patient like, OK, there's going to be semantical differences. Stuff like that.

Harpreet: [00:06:14] But what was key

Thom: [00:06:15] Is there was always a methodology. And when I started migrating to more and more data science work, I started looking for the methodologies and realizing, Oh, this is so new, everyone's not necessarily communicating

Harpreet: [00:06:32] The wisest

Thom: [00:06:33] Methodology in a machine learning pipeline development. And so I just started collecting them. And over time, I saw how a cyclic they could be when you're developing machine learning problem, too. And then it was quite freeing. It was it was almost upsetting at first to realize. Oh, you mean we actually use models

Harpreet: [00:06:56] That aren't in the

Thom: [00:06:57] Ninety five plus percent accuracy range? And then it dawned on me, Wait a minute, I'm spoiled by the engineering realm if you get a model in place and you had no model before. Sixty five percent accuracy is a godsend, then you can improve things from there as you collect more data.

Harpreet: [00:07:17] And so it was it was

Thom: [00:07:18] Quite a bit of a change, mentality wise, but really striving to capture the concepts that that a data scientist has to operate in that are different from an engineer that models things. That was the first big thing. And then. Once the concepts were there, really mastering a

Harpreet: [00:07:41] Methodology that was key, I found.

Harpreet: [00:07:44] And then about methodology real quick, I got a question on that. So I like it. There is many, many methodologies out there. Is it important that we all have the same methodology or is it just important that when somebody approaches a problem and solves a problem, that their [00:08:00] methodology makes sense and is coherent? So his methodology is something that is written down in stone? Is it flexible? Is it problem dependent?

Thom: [00:08:10] Well, forgive me for

Harpreet: [00:08:12] Leveraging

Thom: [00:08:13] Wisdom from another field, but I'm going to lean over to Chuck Norris. If we were learning martial arts, he'd say, OK, go study one of the ancient tried and true. Disciplines first, like maybe he did taekwondo, for example, and then say, OK, now once you're good at that, at those basics, then you can try to do some new fancy things. So what I'm getting at. If one of us was getting into car mechanics. There may be someone in our group that say, Oh, let me tell you what tools you need initially and what things you need to focus on for the basics so you won't get lost when you're working on cars. Yeah, you're going to work on something unique every once in a while. But if you know those basics and you know how to look things up, then the world's your mechanic or oyster. So it's a spirit of if you

Harpreet: [00:09:12] If you build your shop and you get the right

Thom: [00:09:14] Tools in there and you get the right manuals and the right learning

Harpreet: [00:09:17] Resources, you can go at

Thom: [00:09:19] Anything. I think that's where new data scientists need to get, understand the concepts, get your tool sets that you like, get your basics. And yeah, every once in a while, you're going to have to go learn something new. But if you know those basics, they're going to really help you. And I just want to give a shout out to my top Data science student. He approached me months ago now and now we're like brothers, and that's Greg Gokwe. He's an outstanding Data science student, but obviously he's a leader leading integrator of data science into the business realm.

Harpreet: [00:09:55] And that's where I

Thom: [00:09:55] Lean on him myself.

Harpreet: [00:09:58] Thank you very much, Tom. Great, [00:10:00] great insight there.

Harpreet: [00:10:01] I'm wondering, I would love to hear from Coast about this. So pretty much the question

Harpreet: [00:10:07] That I'm asking is pertaining to

Harpreet: [00:10:09] Methodology is like, is there a methodology that's just like, this is the methodology that we should

Harpreet: [00:10:16] Use or is

Harpreet: [00:10:17] Methodology dependent? Does methodology just need to be sound and reasonable from one practitioner to the next is a problem. Dependent is the industry dependent. You know, let's pick it up from their coast up girlfriend. Then after this, I'd love to hear from either Greg or Joe. Also shout out to some new people that joined

Harpreet: [00:10:35] In Joe

Harpreet: [00:10:36] Being one of them. What's up? The Vivienne is in the building. I'd love to hear about Vivian's new job and how that's going. Also, we got Matt Housley joining in, so I'd love to ask him this question as well. But Kosta, my friend, go for it. By the way, everybody listening. We are taking your questions, so let

Harpreet: [00:10:51] Them drop them in

Harpreet: [00:10:53] The chat. Drop them in the comment section. Keep an eye out for them.

Kausthub: [00:10:56] So I guess, considering we've got so many Star Wars themed backgrounds. I simply go by the, you know, the axiom of earlier Sith deals and absolutes, right? I mean, everything is a measure of shade of grace. So when you're talking about methodologies and approaches, it's really what works for that particular business, like what I found is working across a few different companies in mind. I mean, I'm only pretty early in my

Harpreet: [00:11:22] Career, but I've worked across

Kausthub: [00:11:23] Three or four different companies, and they're all different sizes, different mentalities

Harpreet: [00:11:27] And their mentality

Kausthub: [00:11:29] Specifically works for them. And that's what makes them companies that work out well. And the same thing you can kind of apply to your approach as a data scientist or your approach as an engineer. What works for

Harpreet: [00:11:42] One set

Kausthub: [00:11:43] Of problems like what worked in robotics may not work specifically in a non robotics related data science area, right? Sorry, guys.

Harpreet: [00:11:53] Right, appreciate that, coast. Thank you so much. Let's go to Joe for this one. And then by the way, if you guys [00:12:00] are watching on LinkedIn, do me a favor, go ahead and hit share. Share this with your network. Let people know that this is going down, Joe. Go for it.

Joe: [00:12:10] Oh, hello. Yeah, so I think it's an interesting question.

Joe: [00:12:14] I mean, I'm not sure what spawned this. I sort of showed a bit late to the party, but can I get some context on what prompted the discussion on methodology to begin with the better user?

Harpreet: [00:12:24] You know how it goes. Someone says something and then it sticks

Joe: [00:12:27] Out to you?

Harpreet: [00:12:28] Yeah, yeah. So, yeah,

Joe: [00:12:30] You know, methodology is an interesting thing. I mean, because as you know, my background Harp been a lot of different things, and I would say there's what I've realized. There's not a universal methodology for anything, what works in one area. I would say what works really well, the best methodology is to adopt a lot of different methodologies and use them on a situational basis. You know,

Harpreet: [00:12:50] It's sort of like, you know,

Joe: [00:12:52] Tom mentioned Chuck Norris, and I'll mention Bruce Lee sort of the tao of Gee Quando, you know, but I, you know, he's an early adopter of what I guess is now I

Harpreet: [00:13:02] Am a may and I kind of

Joe: [00:13:03] Approach things like that where one style is great, except,

Harpreet: [00:13:08] You know, it doesn't really work

Joe: [00:13:09] All the time if you're so you know the thing I think being dogmatic and having one methodology is actually it works counter to, especially in a world that changes as quickly. And I think

Harpreet: [00:13:21] It

Joe: [00:13:21] Revolves around so many different mental models, you know, so I think the more you know, the more mental models you can adopt, the more methodologies, the better, actually. So because the thing is, it also prevents, I think, you know, not to belabor this, but it also provides a competitive advantage to where if you're in a room full of people who only know one thing and you know how to approach things from like 20 different ways. Who do you think's going to have a better outcome?

Harpreet: [00:13:45] So I don't think the

Harpreet: [00:13:48] Great flexibility,

Kausthub: [00:13:49] Doesn't it, like your ability to see that a situation specifically doesn't react well to a particular methodology is that we're Sanjay.

Joe: [00:13:59] Exactly. You [00:14:00] know, I got this from my, you know, another thing from Charlie Munger, Warren Buffett's partner, but he's he's sort of like, you

Harpreet: [00:14:06] Know,

Joe: [00:14:09] Attributed the kingpin of like mental models. You know, he I think he says you need

Harpreet: [00:14:12] About 90 mental models

Joe: [00:14:14] To be effective in this world.

Kausthub: [00:14:17] So my question for you on that, Joe. Is it more important to know which mental model to use or more important to know when the mental model you

Harpreet: [00:14:25] Are using isn't working? Both.

Kausthub: [00:14:30] If you can have one rather than the other, which we can.

Joe: [00:14:33] Both. So I mean, the truth are two sides of the same coin, really, it's like, you know, I need to know which one to apply to this situation. I also need to know the limits of my mental model to know what I don't need to use it.

Harpreet: [00:14:43] So, yeah, and just for everybody listening out there, a quick definition mental model,

Harpreet: [00:14:49] Personal internal

Harpreet: [00:14:50] Representations of external reality that people use to interact with the world around them that are constructed by individuals based on their unique life experience, perceptions and understanding of the

Harpreet: [00:15:01] World. That's great.

Harpreet: [00:15:02] Let's let's switch the conversation to mental models now since we're on that. So talk about mental models. What are some mental models that we should probably keep

Harpreet: [00:15:09] In mind as as

Harpreet: [00:15:11] Data scientists like as you're doing your work, is there something that you like, maybe a mantra or a just mental model?

Harpreet: [00:15:17] I guess

Harpreet: [00:15:17] That that you kind of apply regardless

Harpreet: [00:15:21] Of what the problem

Harpreet: [00:15:22] Is, I guess a universal mental models. I guess we start with Joe on that. Then I love to hear from Matt and there's a bunch of people popping in. We'll get to all of you guys. And if you have questions, let me know I'm keeping track in the comments and in the chat.

Joe: [00:15:37] One thing I also took from Charlie Munger, if you can't tell you like my favorite person on the planet. But the mental model I took away from him was simply just invert everything.

Harpreet: [00:15:44] So if you if you

Joe: [00:15:45] Hear a question or a statement, what if you flip it inside out? What is that? I think inversion is the most underrated and most powerful tool you can find out there, and so that's how I would approach any [00:16:00] problem to start out with at least. So if you don't take anything at face value, like flip it

Harpreet: [00:16:06] On its head and what does that look like?

Kausthub: [00:16:08] I mean, that's basically the tenets of

Harpreet: [00:16:10] Predicate logic, right? Like if you put

Kausthub: [00:16:13] Something by negation,

Joe: [00:16:15] I mean, you want to talk about proofs for a bit. Not necessarily a lot of the math professor.

Harpreet: [00:16:25] So real quick, though, just that can we get a concrete example of inversion or maybe concrete example of how you use inversion when you're faced with, like with a Data problem like?

Harpreet: [00:16:34] Like how

Harpreet: [00:16:35] Should we think about that

Harpreet: [00:16:36] When we're working with, you know? I mean,

Joe: [00:16:38] It is a simple thing. Isn't somebody makes an assertion, right? Like this? This is

Harpreet: [00:16:43] Like, this is what

Joe: [00:16:44] I'm you know, this is what I propose, right? Well, you know, I mean, what? You know, or what's a good example that comes out? Give me like a Data science question, for example, of comes up a lot. Maybe I can try and invert it for you.

Harpreet: [00:16:57] So yeah, how about this? Which which algorithms are used to solve this binary classification problem?

[00:17:05] Mm hmm. I.

Joe: [00:17:10] So I would probably like

Harpreet: [00:17:11] Flip it on its head and ask, like what

Joe: [00:17:12] What what approaches would not work for binary classification?

Harpreet: [00:17:16] Ok, nice. All right.

Harpreet: [00:17:17] So yeah, yeah. Awesome. Thank you. Thank you. Greg, go for it.

Vu Nguyen: [00:17:24] I totally agree with Joe in that sense, inversion, so this is definitely a mental model that I apply when I build road maps. Because you enter this framework where you're listing all of these great ideas that you feel will transform based on feedback that you receive from the world, and you're excited about these ideas. But you don't realize that you're putting yourself in the corner in this bubble where that idea is very best-case scenario. [00:18:00] So to do that, to refrain yourself from, you know, staying in the bubble for too long, you have to kind of invert each of these things that you come up with by coming up with things that would go against these ideas that you come from. You're coming with you, coming up with. So you come up with Idea X y z. You create a list of things that, you know, kind of convince you that something else can work better or something much more simpler can work better in order for you not to spend too much time in these big ideas that you think will change people's lives when something much simpler could have done the job in the first place. So when you're building that roadmap, you have to constantly question yourself and evaluate the things. I would go against it to make sure that you're hitting on the right points. And that's one mental model that I always keep when it comes to defining what the future needs to be for a technology or a tool, et cetera, et cetera.

Harpreet: [00:19:08] And I go for it, Tom.

Thom: [00:19:12] Just briefly, this whole talk actually relates very closely to the my favorite talk that I give called integrating brilliance. And I gave it a day's go with your help on the question, answers Harpreet, and I'm giving it again here soon at future data driven. But I got inspired by tracking the growth of math and science thinking over the centuries and wanted to kind of see was there a pattern that caused the big jumps? And so but a big key when we're integrating brilliance is.

Harpreet: [00:19:52] To say, you know, wow,

Thom: [00:19:54] Control system design, that's really cool. But if I abstract it and really understand [00:20:00] the concepts, if I understand that abstraction of those concepts very well, I can apply it to other areas of my life in other areas of math and science. The general principles so

Harpreet: [00:20:13] I like a lot

Thom: [00:20:15] What Joe was saying about, well, you know, we only need X number of models to really make it well in the

Harpreet: [00:20:20] Site. But the the models are

Thom: [00:20:22] Patterns to help you think their thing, their patterns we've seen over and over again. But I like what Russell's saying to don't be ironclad in thinking there's two problems too much trust in the

Harpreet: [00:20:35] Model and misapplication

Thom: [00:20:37] Of the model. I think the biggest problem with logic is people thinking they're being logical and not applying logic

Harpreet: [00:20:43] Correctly, and I'm saying that

Thom: [00:20:45] Of myself too. We need to always be suspect. Well, yeah, I've got this great model, but am I really applying it? Well? Yeah, we know the pain of troubleshooting our own code, even when we're importing modules from SQL. Learn so it happens. We must supply things all the time. We have to hold ourselves suspect constantly, but we sure make a lot better progress when we take this wisdom from the ages and abstract it and try to leverage from that rather than just shooting from the hip all the time.

Harpreet: [00:21:24] Thank you, Tom, thank you. Go step, go for it.

Kausthub: [00:21:27] So as someone who's more new to all this and don't have all the gray hairs of experience, right,

Harpreet: [00:21:34] How do I, I mean, is there somewhere where

Kausthub: [00:21:38] I can start looking into to learn more of these mental models that specifically apply to Data science now? Obviously, I'm going to pick them up as I go through my career through experience, right? But how do we fast track that like you guys spent years learning all these mental models, right? How do you how

Harpreet: [00:21:54] Do we then send that

Kausthub: [00:21:55] Data? I mean, anytime we develop some

Harpreet: [00:21:57] Kind of knowledge, we pass

Kausthub: [00:21:59] It on through teaching [00:22:00] and through learning, right? So how do we how do we do that? How do we get to mental models for the same thing? It's not a technical skill, right?

Harpreet: [00:22:10] Go for it, Greg, because it was

Harpreet: [00:22:13] Beeping like crazy.

Vu Nguyen: [00:22:15] Oh, I'm not sure what it means. Go for it. Go for it, Greg. Yeah. So I mean, I'm assuming your question, I think is valid for whether you're a data scientist or not, right? So to me, it's about pulling these mental models based on your experience as you go through through things, right? So it's out of your professional life or personal life. If you go through things that teach you certain lessons, you create mental models this way or the other way to is to talk to people who has been there before. Right. So one of the ways you can do that is come into a platform like this, talking to folks who have been here before and has seen success failures and created their own mental models. And you can learn from those to inform your own when you're in a working environment. You, you're talking to your peers, your manager, your mentors. They will give you things that will help you create your mental model. And I think those are the things that are not. They shouldn't be they should constantly be evaluated. Right. So you may forge ones that don't help you progress throughout your career, and you may forge one sums that do help you progress and you will have to pull some in your hat depending on the situation that you're facing. And to me, it's it's about, you know, learning on the go. Talking to people and standing on the shoulders of giants and in you make your own and you move forward.

Harpreet: [00:23:52] Greg, thank you very, very much.

Joe: [00:23:54] I would also add to, you know, the way Munger describes developing mental models is just read [00:24:00] a ton, right? And read it in areas that are outside

Harpreet: [00:24:02] Of your

Joe: [00:24:03] Normal discipline, like that's where he, you know, got a lot of his mental models. It wasn't like he had a book

Harpreet: [00:24:11] On his

Joe: [00:24:12] Mental models for dummies or something. I mean, there's

Harpreet: [00:24:15] Also like, I think, one of the

Joe: [00:24:15] Smartest people on the planet. But that notwithstanding, it's like you just have to have a natural curiosity and read outside of the normal stuff that you're typically reading. So if you're doing Data the science, I would say like. Right stuff out of left field, too, I don't know what that could be anything really, but it's about developing the habit of developing a natural curiosity and letting there's a notion of the compound interest of knowledge, right? Actually, it's a really good book here. I got to the I called the joys of compounding. I actually can't really stick my stupid background with the Star Wars guys over here.

Harpreet: [00:24:49] So but in all

Joe: [00:24:52] Seriousness, the. The whole the whole notion is just, you know, develop compound knowledge over time. I mean, that is your biggest investment. There's no shortcut that you're not going to get like one hundred mental models in a day or even a, you know, a year or something. It's like this takes an insane amount of time to build. So but there's no one direction to choose, either. But I know Monger, who bring up things like know the basics of chemistry, know the basics of physics, know the basics of psychology, you know the basics of all these. All these like the big ideas in the world, right? That's what shapes human knowledge. And to him, that creates ultimately what he calls a Lollapalooza effect, where all of a sudden, because you have all these different ideas, you able to synthesize new ideas and nobody's ever thought of before. Because you have such a. But this is a completely individual, there's no one way to do it. I read a ton. I probably read one or two books a week in addition to like a ton of articles, because I that's just how I've been wired

Harpreet: [00:25:50] Since day one.

Joe: [00:25:53] Not everyone's going to do that, but I think the most important thing, you just make the investment to learn every day. Is it even if it's 20 [00:26:00] minutes, you're still better off than you were, you know, the day before, depending what you're learning and just don't do like chewing on or some crazy shit like that. So well, just to start with,

Harpreet: [00:26:09] You are I mean, we read what you love until you love to read, right? That's it's a good way to, you know, develop a habit of reading. Eric says he likes listening to Alan Watts to think about

Harpreet: [00:26:20] Life in any way. Alan Watts

Harpreet: [00:26:22] Is awesome. You should listen to like the

Harpreet: [00:26:24] 10 year of the Don

Harpreet: [00:26:25] And Alan Watts albums that are out there. They are friggin phenomenal. I'll send you. I'll send you a good one in a second. Joy is a compounding Gautam

Harpreet: [00:26:35] Babe to add that

Harpreet: [00:26:36] To the, you know, get delivered this weekend at some point. Eric has a question. Let's go to Eric. Eric has the actual data science question, but you know, I'm about these philosophical discussions. I love this shit. But Eric gets back on course.

Vu Nguyen: [00:26:50] Yeah. So let's see here a little bit of background. So my question is about Bayes theorem and the explain it like I'm five

Harpreet: [00:26:58] Version of it because I have like I

Vu Nguyen: [00:27:02] Have a little bit of exposure to it, but I always get posteriors and priors turned around in my mind or whatever.

Harpreet: [00:27:10] And then the

Vu Nguyen: [00:27:10] Reason that I'm curious about it is because I read a really interesting article, I think from a few years ago about LinkedIn uses a

Harpreet: [00:27:19] Or used. I'm sure

Vu Nguyen: [00:27:20] They've updated it by now, but they have an algorithm for detecting spammy accounts

Harpreet: [00:27:27] By the names, by just by

Vu Nguyen: [00:27:29] Using your name. And I guess they supplemented it with email, but they got really good results just by using your first and your last name, so they don't need a lot of other

Harpreet: [00:27:37] Information about you. And the paper that

Vu Nguyen: [00:27:40] They shared was them

Harpreet: [00:27:41] Using a naive

Vu Nguyen: [00:27:42] Bayes classifier,

Harpreet: [00:27:44] And they broke down the words or the

Vu Nguyen: [00:27:47] Names into three grams, including like a start character.

Harpreet: [00:27:52] So if your name is Eric, it would be like. Start dollar sign air than dry.

Vu Nguyen: [00:27:58] I see and then I see [00:28:00] slash or something for an end, a beginning and an end character.

Harpreet: [00:28:03] And so I'm

Vu Nguyen: [00:28:05] I don't understand how a

Harpreet: [00:28:06] Naive Bayes

Vu Nguyen: [00:28:07] Classifier works, and I was hoping to get the explain it like I'm five version of that from somebody here.

Harpreet: [00:28:15] And to Matt Horsley, if he's still in the building.

Joe: [00:28:19] Is he? On the bill, I'm in the building, I don't have a super good explanation, actually. Maybe I can prepare something for next week. But the original Bayes theorem is actually just set theory.

Harpreet: [00:28:31] I won't try to

Joe: [00:28:31] Go through that now, but you can basically figure out the original base theorem just by using diagrams.

Harpreet: [00:28:37] But yeah,

Joe: [00:28:38] Someone have a better explanation of what, like naive Bayes classifiers and how those work.

Harpreet: [00:28:43] Yeah.

Harpreet: [00:28:45] That Tom or Andrew or anyone? Just an ELI5 of naive B.

Thom: [00:28:53] Well, prior probability represents what is originally believed before new evidence is introduced and posterior probability takes this new information into account guilty of Google searching that.

Harpreet: [00:29:10] So I guess. Let's drill it down a little bit further, Eric. Is there a specific part of the naive base algorithm which is

Harpreet: [00:29:18] Making you have

Harpreet: [00:29:19] A headache?

Harpreet: [00:29:21] I guess I just don't I guess I just don't

Vu Nguyen: [00:29:23] Understand how it. I don't understand how an AI based classifier is. Classifying, I guess, I just I just don't understand like, is it taking if we have? Yeah, I just don't even I can't figure out from what pieces of those three grams or something or anything, it's like taking in to update its update, its model every single time to

Harpreet: [00:29:48] To train,

Vu Nguyen: [00:29:50] To train them on, you know what I mean? Like, I get the idea with like a. Uh, linear regression and you got your and the minimizing your [00:30:00] errors in that, but I just don't understand what's happening. I just don't understand what's happening with me.

Joe: [00:30:04] Do you understand what's happening in Bayes theorem itself? Yeah, that's good because I would use that as well

Harpreet: [00:30:12] As I wish I did, right?

Joe: [00:30:13] So I start

Harpreet: [00:30:14] There.

Joe: [00:30:15] Basically, you're just using Bayes to classify something at the end of the day. I mean, it literally is that right? But in order to understand Bayes, you need to understand that.

Harpreet: [00:30:25] I think sort of what

Joe: [00:30:26] Matt was talking about and Thomas talk about, which is

Harpreet: [00:30:30] So

Joe: [00:30:31] How much a probability do you understand?

Vu Nguyen: [00:30:35] I'll tell you when

Harpreet: [00:30:36] I get stuck. Ok.

Joe: [00:30:39] How would you describe this man? I mean, sat there, I think is a good basis for this. Maybe we don't need to get too complicated, but if you want to explain like it's five anyway, right? So.

Thom: [00:30:48] Just real quick, real quick. Eric, let's say you're just looking at a brand new field of Data and you've got all these possibilities. But then one of those possibilities happens with that possibility just kind of narrowed the field of now what's possible after that? So when you get down to it, base is like saying, what's the probability of something happening given something that's already happened? Now with these engrams is just a chain of that. Ok.

Harpreet: [00:31:27] And so this

Thom: [00:31:28] Prior. Oh boy, I can hear myself anyway, is is kind of saying, well,

Harpreet: [00:31:34] Based on what evidence I

Thom: [00:31:36] Have, I'm going to make a guess. But the posterior is like saying no, a look back now I'm going to use hindsight now I'm going to I'm going to borrow

Harpreet: [00:31:45] Some great wisdom.

Thom: [00:31:48] And I'm sure he got it from someone else, but Bruce Lee. You know, when he started out, a kick was just a kick. And then as he wanted [00:32:00] to improve his knowledge. Oh, a kick was so yeah, exactly. Thank you, Harp. A kick was so much more than a kick. Can you hear this thing about? I don't fear the man that is that knows 10000 moves. I fear the man that has practiced one move ten thousand times.

Harpreet: [00:32:20] You know, it's this spirit. So Eric,

Thom: [00:32:24] What we're sharing with you today is going to help.

Harpreet: [00:32:27] But when you really get in there

Thom: [00:32:30] And sweat and bleed through coating it from scratch with the math and everything without modules and stuff, and then you get to the end of it, you're going to look back at this conversation and go, Oh, it's like what Joe and Matt and Tom were saying. It's just the possibility of something happen giving this to other things already happened, but now you'll be seeing it with the fog cleared. You've you've marched through the details of that

Harpreet: [00:32:56] Canyon and gone through all

Thom: [00:32:58] Those obstacles numerically. And now you're looking back and you're going, Oh, oh OK.

Joe: [00:33:06] You start noticing bays around you all the time, though, right? So like when you when you look outside your window, for example, right, you look at the weather, right? So I mean, is it classic kind of classic beginner

Harpreet: [00:33:17] Examples when you talk about

Joe: [00:33:18] Baz's? Well, OK. So it rained yesterday. So what's the chance it's going to rain today? What's the chance? It's going to rain if it rained today was going to be sunny tomorrow and so forth. Right. So because you're basing a prediction, you know, an outcome based upon

Harpreet: [00:33:33] Past

Joe: [00:33:34] Probabilities and what's happened, so you want to be it. But are you familiar with what conditional probabilities are? So I'm going to condition

Harpreet: [00:33:41] Basically like there's a probability

Joe: [00:33:42] Something may happen based upon something else. Mm hmm. Right. So coin flips are a good example, right?

Harpreet: [00:33:49] And so forth. Sure.

Vu Nguyen: [00:33:51] So so there's always. I was going to say I always see it the same way, too, like for days, my my [00:34:00] go to use case, it's kind of like what are the probabilities that I will bring my umbrella out today or tomorrow, right? And this is going to be based on the probability that it will rain. Right. So if the chances are raining is high, then probably the probability of me bringing an umbrella would be high. So that's how I understand it in the way you're describing your use case, and it seems like they're looking at. Uh, the probability of an account being a fraud based on the structure of the account or past structure of accounts being fraud, something like that. So it's, uh, it's something I've never tested myself in terms of understanding, but it's it's cool that you bring that up. It makes me reinforce my understanding of that, of that notion. So thanks for that, for that question. That's really cool.

Harpreet: [00:34:53] So if I'm understanding it's kind of saying

Vu Nguyen: [00:34:57] Thinking back on the names thing, if we're taking little three

Harpreet: [00:35:01] Grams, then it would be

Vu Nguyen: [00:35:02] Saying. We're looking at these little chunks of

Harpreet: [00:35:07] Three could be four or five

Vu Nguyen: [00:35:08] Letters or whatever, right?

Harpreet: [00:35:09] But let's just say three. So we'll be

Vu Nguyen: [00:35:11] Taking these chunks of three and looking at 60 million names worth, which is going

Harpreet: [00:35:16] To be

Vu Nguyen: [00:35:17] Well over a hundred million little inputs. And then is it essentially just making the model more and more confident with each one of like saying like, this is a good one, this is a good one. This is a good one. This person's name, that's J.H. J. It's probably it's a bad one. And then that way, when somebody so it's like training it as to building up that probability so that it recognizes those strings of letters, is that just kind of the basic

Harpreet: [00:35:41] Ideas like with like

Vu Nguyen: [00:35:43] What you said of like the naive basis it's seeing, it's looking back at what it's seen before and then updating itself with new information that it's received

Joe: [00:35:50] By positioning on

Harpreet: [00:35:51] Basically, yeah.

Joe: [00:35:52] I mean, what maximizes the probability of this incoming

Harpreet: [00:35:56] Of occurring right is what you trying to

Joe: [00:35:58] Get at? So [00:36:00] if you don't understand basic, I would say like a really good a really good example that I think hits home in today's world is study how bays is used to to determine false positive COVID tests, for example. Hmm. Or false negatives, either one. I think that's so false positives and false negatives, which expected base. It's a very good way to, I think, concretely learned the concept. And then naiveté is once you understand Bay's now, it's just a matter of applying that to multiple instances of of probabilities and classifications. So that's how I would do it. But to say a lot of the descriptions I've seen a I don't over the over the years. I don't really like how is described, really, because it makes it seem more complicated than it really is. So that's just me. Matt may have a different opinion.

Harpreet: [00:36:50] Just think some Venn diagrams all day?

Joe: [00:36:51] I guess so. Well, I mean, I think

Harpreet: [00:36:54] Stepping

Joe: [00:36:54] Back from naive Bayes to the broader use of base theorem in probability and research, one of the concerns is that sometimes Bayes theorem is used in research papers to sort of assert snake oil by using a really ridiculous prior that's not justified, and you have to watch out for that and say, Well, what happens if I tweak this prior? But you know, using the neighbor's classifier algorithm, that's more

Harpreet: [00:37:14] Robust because you're doing iteration

Joe: [00:37:16] In other processes that are meant to solve this problem. So you also find if you dig into this, there's religious camps of frequencies and evasions. Yeah. They like to have holy wars with each other.

Vu Nguyen: [00:37:28] It's hilarious. I have read a fair bit about that.

Harpreet: [00:37:32] Yeah.

Harpreet: [00:37:34] But it looks like there is such thing as a Bayesian frequent test. Yes. Here's the interesting paper that you can read about it.

Joe: [00:37:44] It sounds like an interesting. A schizophrenic or something. Is that heard in the chat? I want to read this. That's a great mental mental disorder. I don't believe you. I'm just kidding. Yeah. But yeah, to what Tom was just saying about the mathematical explanation. I mean, the Venn diagram is

Harpreet: [00:37:58] Really just, you know, assuming two

Joe: [00:37:59] Things [00:38:00] are true you're both in your both circles or you're in one circle or the other, and it's just counting at that point. Like the theorem itself just comes from counting

Harpreet: [00:38:08] In a Venn

Joe: [00:38:08] Diagram. That's it. I think he wants to be like studying peace or something. When he came up with this, I think it was something like, I think he was like, Was he a Thomas Bayes? He was some something like a priest or something, and he was I thought he was doing something as simple as just like counting something in garden. Or you're getting confused with Mendel Mendel. Ok, so Mendel, OK, yeah, I can't keep my people straight.

Vu Nguyen: [00:38:31] So I think I think Bayes

Harpreet: [00:38:33] Was a priest, but I don't think he was

Vu Nguyen: [00:38:34] The peas guy.

Harpreet: [00:38:35] Yeah, that was Mendel.

Vu Nguyen: [00:38:37] Yeah. Actually, Carrot's guy,

Harpreet: [00:38:42] Eric, got Gattis got stumped and further proof that you don't actually have to know every single thing about Data science to be a data scientist. We still look shit up sometimes and that figure it out,

Harpreet: [00:38:55] But it helps.

Thom: [00:38:57] No one can do it.

Harpreet: [00:38:58] It helps.

Harpreet: [00:38:58] Yes, right?

Harpreet: [00:39:01] Does anybody else have a question? I mean, I got a question that, Oh, actually, Matt, go for it. I see your hands up.

Vu Nguyen: [00:39:10] Yeah, so I've been getting confused with PCA like principal component analysis lately, I'm not very clear on how it works and how we do it, use it for dimensionality reduction. Kind of like Eric, does anyone have like a short explanation or something

Harpreet: [00:39:26] That can or

Vu Nguyen: [00:39:27] Boil it down or something?

Harpreet: [00:39:29] Thomson is up. So let's go, Tom.

Harpreet: [00:39:33] All right, this is going to

Thom: [00:39:35] Be more like a fairy tale, I hope you don't mind. So do all the work you possibly

Harpreet: [00:39:41] Can in original

Thom: [00:39:43] Space. But when you're struggling with original space, there is a magical space called eigen space. And in that space, you would be hard pressed to not find that all your features have become magically decoupled. But please don't make the mistake [00:40:00] of thinking that because all the features that are in eigen space are decoupled, and you can figure out which of the features that are IPCA features that can be removed because they're eigenvalues are really small. Don't let that fool you into thinking that if you translated, those PCA features back to original space. That you've reduced your dimensionality, you've only reduced it in eigen space, which again, is this magical fairyland space.

Harpreet: [00:40:33] But it the

Thom: [00:40:35] The PCA is super powerful because it will

Harpreet: [00:40:39] Decouple, it will remove

Thom: [00:40:41] All co linearity. That's what I'm saying, but it doesn't mean you've

Harpreet: [00:40:46] Eliminated original

Thom: [00:40:47] Features. And if you ever read a blog post that says and you don't need to worry about the eigenvectors, they don't really tell us anything. Please don't believe that. Yeah, you need to use those eigenvectors because when you're your stakeholders, say, Oh, that's cool, you found this magic space. But what does that tell us about the original space? Use the eigenvectors to say, Well, this this PKA feature is composed of these original space features, and this PKA feature is composed of these. Oh wow, that gets complicated. Yeah, but it sure had a lot of good modeling benefits, so it just adds to the burden of what you've got to explain when you go to PC,

Harpreet: [00:41:32] But it can sure

Thom: [00:41:34] Save you in a clinch situation. I hope that help and let me stay on to answer any questions you

Harpreet: [00:41:40] Have after that, but

Thom: [00:41:42] That's kind of a fairy tale

Harpreet: [00:41:44] Intro.

Vu Nguyen: [00:41:46] Okay, so focus on the

Harpreet: [00:41:47] Eigenvectors eigen space

Vu Nguyen: [00:41:49] Instead of just instead of just like just running the PCA just randomly.

Thom: [00:41:55] Just understand that when you go into the eigen space which you've applied PCA [00:42:00] and you've transferred, it's

Harpreet: [00:42:01] Not the original

Thom: [00:42:02] Space. It's a it's a new perspective completely. And it it takes explanatory energy to make it clear to any stakeholders it. It increases the burden of our Data storytelling about the machine learning development pipeline or the machine learning pipeline development. Excuse me.

Harpreet: [00:42:28] Yeah. Please, it might be

Joe: [00:42:30] Helpful to step back a little bit as well and just talk more broadly about the problem of linear algebra. What tends to happen

Harpreet: [00:42:36] And that is,

Joe: [00:42:37] Say, I have 80 features and I just toss them into my machine learning model. It turns out that a lot of them might

Harpreet: [00:42:42] Be correlated with each other in various ways. So if you

Joe: [00:42:47] If you take a bunch of measurements of various types of objects, then that you might have correlations between those different types of objects or measurements because really the objects all have the same shape and so you increase the length. It also increases the width, for example. So you're looking at cubes and taking measurements,

Harpreet: [00:43:02] The different and then

Joe: [00:43:03] Throwing all those in as parameters in your model. And so fundamentally,

Harpreet: [00:43:06] What this whole

Joe: [00:43:07] Class of techniques is designed to do is find

Harpreet: [00:43:11] Is to get rid, sort

Joe: [00:43:12] Of compress and remove that excess Data that's correlated in terms of linear algebra. A nice way to think about this is that you have suppose you have three dimensional space and you have a plane that's inside that three dimensional space. And so if you look at the coordinates, there are three coordinates for each point in that plane. So it looks like, oh, I have clearly have like three coordinates or three features that I care about. But really, it's just two dimensional behind the scenes.

Harpreet: [00:43:35] And so in

Joe: [00:43:35] Some sense, what Pisa is designed to do is find that plane for you, like find the subspace that really counts. It's more statistical than that because you're not looking for an absolute plane, you're looking at correlations and things. But fundamentally, the idea is to reduce

Harpreet: [00:43:49] Out correlated features

Joe: [00:43:51] Where possible. There are a couple of other important techniques like this from dimension reduction. If you've heard of manifold learning, masterful learning is a similar [00:44:00] idea, except

Harpreet: [00:44:00] That you assume

Joe: [00:44:01] Instead of your plane passing through the origin, you can have any kind of surface. It could be a sphere or something else, but it still exists. You've got a surface inside a higher dimensional

Harpreet: [00:44:10] Space, and you're trying to

Joe: [00:44:12] Find that the surface itself and kind of

Harpreet: [00:44:14] Project down

Joe: [00:44:15] Another one that's very popular in test design. So when they design the act

Harpreet: [00:44:21] Or the

Joe: [00:44:21] Sat, they use something called factor analysis. Once again, it's

Harpreet: [00:44:24] Another dimension

Joe: [00:44:26] Reduction technique, and I don't totally remember what the differences are. It seems like

Harpreet: [00:44:31] Factor

Joe: [00:44:31] Analysis is way less popular in Data science, but I don't remember the reasons for

Harpreet: [00:44:35] That exactly. I'd have to go back and look them up.

Joe: [00:44:37] So because it's not deep learning.

Harpreet: [00:44:40] Well, true.

Joe: [00:44:43] And the other way to think about is with the. So I mean, in Matt's example, if you had 80 features, right, and so you needed it narrowed down to three, you're trying to find the three that have the most variance. Right. And disregarding

Harpreet: [00:44:54] The other

Joe: [00:44:55] Maybe seventy six seventy seven that are mainly correlated.

Harpreet: [00:45:00] So.

Harpreet: [00:45:03] So do a coast step, then then Tom.

Harpreet: [00:45:07] So I guess Matt and Tom and everyone

Kausthub: [00:45:10] Else has been commenting specifically on the stakeholder side of this is when you're explaining it to someone who doesn't really enjoy getting into the linear algebra of it, it doesn't really enjoy getting into.

Harpreet: [00:45:23] I mean, the word eigen maybe

Kausthub: [00:45:25] Scares them a little bit, right? Is it is it reasonable to start talking about your eigen space features as effectively a proxy for their

Harpreet: [00:45:36] Real world features? So you say, like, I mean, would it make sense

Kausthub: [00:45:39] To explain to them that I'm taking all of your 80 to 90 different concerns that all of these different data points that are coming through? And I'm generating these simplified proxies for some of them. And then we do our actual clustering or our classification based on these proxies that inform us about your 80 to 90 features at a resolution that makes more sense [00:46:00] for the model.

Harpreet: [00:46:01] This is how if I had to build a stakeholder who was asking me about would work, I would say, Like this look man, like I'm trying to build your machine learning model to answer your questions, solve the problem. But this is a ton of features. They're all kind of useful, but they're all useful in a lot of different ways. So what I did was I just kind of, you know, instead of having 80 features to deal with, like 80 columns

Harpreet: [00:46:22] Like, that's too much for

Harpreet: [00:46:23] Me to think about.

Harpreet: [00:46:23] I just kind of compressed

Harpreet: [00:46:24] It down to this space

Harpreet: [00:46:25] Where I got like three or four.

Harpreet: [00:46:27] It's much more manageable. Not only that, if we use this smaller space of features, it captures a lot of the information that that we would get if we had, like all of them.

Harpreet: [00:46:36] So I just did that

Harpreet: [00:46:37] Right down to this thing and. Made something from that, which is much easier to visualize and for us to, you know, kind of get answers quickly. Right. But that's how you

Kausthub: [00:46:50] Naturally want to know about a specific feature and say, OK, so how does that affect this particular lever that I pull is there, you know, in sales or whatever?

Harpreet: [00:47:00] It meant I'd be hard pressed to find a salesman that would actually care about that.

Joe: [00:47:06] Salesforce's very bad ass for asking that question, so

Harpreet: [00:47:10] I'll be like, damn, bro. You might want

Harpreet: [00:47:11] To

Joe: [00:47:12] Think you're on the wrong team.

Vu Nguyen: [00:47:14] So I was going to mention stakeholders that actually asked specifically about PCAs. That's pretty impressive. That happens. I have an experience that

Joe: [00:47:25] We had to get Aaron Hunsicker on here. He works for us now and he he has really good stories about working with stakeholders, so we'll have to have him on next time. Yeah, you're like, Yeah, he's cool.

Harpreet: [00:47:37] Yeah, absolutely. Bring him on.

Joe: [00:47:40] He also opened for the Wu-Tang Clan back in the day, so he's even

Harpreet: [00:47:43] Cooler than you think so.

Harpreet: [00:47:45] Hey.

Thom: [00:47:48] I loved where Matt was going, and because he's giving the explanation, we would want to know as fellow data scientist. The explanation I was giving was trying to help a stakeholder understand [00:48:00] why we had to resort to PCA.

Harpreet: [00:48:02] And I probably wouldn't

Thom: [00:48:03] Even use the words PC necessarily, but I think it's important for us to all. Remember, when you first applied PCA, you are not at all reducing the number of dimensions. But once you get in magic space, I'm going to keep calling it that for a bit because now you have strength values called eigenvalues. You can realize, Oh, if I look at the if I pereda

Harpreet: [00:48:31] Order by

Thom: [00:48:32] Magnitude the eigenvalues and I do accumulation and I see I get up to these last ten and they're adding less than one percent to the value of the model of just dump those. But what you can't count on is just because you dump those. If you went back to your original space,

Harpreet: [00:48:52] You'll find you have just as

Thom: [00:48:54] Much co linearity or more. It's like Matt was saying that unique perspective that going into eigen space gives you, it's just looking at the whole. Hyperspace in a way that gets rid of all the linearity, which is magical, it's wonderful, but again, if you really want to get insights from what features are most important.

Harpreet: [00:49:17] Well, you can't just say, Oh, this

Thom: [00:49:19] Pca feature is more important, that won't really mean anything to the stakeholders, but if you say this most important feature that we have for the model is a combination of these now that's valuable.

Harpreet: [00:49:36] Thank you very much. Very excellent discussion. Picture from a bunch of different angles. Matt Matt Blaze that was thank you for kicking that off any of the questions for follow up on that PC discussion.

Harpreet: [00:49:54] Yeah. Like, I mean,

Harpreet: [00:49:55] When it comes to stuff like

Harpreet: [00:49:56] That, like for me,

Harpreet: [00:49:57] It's good enough to kind of

Harpreet: [00:49:58] Intuitively understand [00:50:00] or know

Harpreet: [00:50:00] The explanation. But fuck man, if I was in this officer or myself and somebody asked me that question, I would not know how to answer. So thank you guys for being here and helping with that. Let's go to Mark. Mark. Mark has some. The distress.

Vu Nguyen: [00:50:16] Yeah, I I don't really have a question for, say, a huge thank you to this whole group as like a big win at work and is based on all the feedback you all provided me essentially hearing from you all just like, hey, don't ask, just do for for certain things. That's essentially what I did, and also just reframing my hackathon into something positive. So essentially for my for my hackathon, you know, I've been asking engineering for the past year, like, hey, give me access to to create data

Harpreet: [00:50:45] Sets within our system

Vu Nguyen: [00:50:47] To give context. Our data is really complex. We have like our regular database system and then the homegrown kind of layer on top for additional security for things. And so it's not really straightforward. And so I basically just taught myself or my Hyperpop on

Harpreet: [00:51:05] Just how did engage and create

Vu Nguyen: [00:51:07] My own data sets for for for my project. And then when I presented that everyone was like, Oh my God, data science can do this. That's amazing. So the head of engineering approved my my pull request to get it merged, but also they finally gave me read and write access to create data sets, which I've been asking for for like a year. So I finally have this create our own datasets, which is, like, really cool. But more importantly, after like my pitch, I'm focusing on the positives from my hackathon. My manager was like, Yo isn't a top ticket item like spend the rest of the month just focusing only on this hackathon project and like, make it to the end. So that was just really cool. So shout out to all y'all's feedback and really redirected me to the right path to really have success.

Harpreet: [00:51:53] And that's awesome, and that's that's so dope, man. I'm super, super excited for you and yeah, big shout out to everybody that that

Harpreet: [00:51:59] Comes, [00:52:00] you know, every weekend,

Harpreet: [00:52:01] Share their their wisdom, share their knowledge and just

Harpreet: [00:52:04] Literally help and shape

Harpreet: [00:52:05] People's careers, helping shape people's journey, trajectory and all that stuff. Every one of you guys that show up. I mean, it's been it's been a while. I think it's been like since you guys started come in. Everybody started coming. It was probably like mid-October of last year. So for a year straight, people have been coming on to this thing every Friday and just dropping amazing knowledge.

Harpreet: [00:52:28] And and it's so awesome

Harpreet: [00:52:30] To hear the positive effects of that. Congratulations, Mark. Looking forward to a, you know, some more good news from from you and everyone else. And all my friends, any of the questions.

Joe: [00:52:46] But I have a question for people. Yeah, I mean, it's I'm curious, what are people using these days to learn data science

Harpreet: [00:52:56] And machine learning?

Harpreet: [00:52:59] What do you mean using in terms of like software? I mean,

Joe: [00:53:03] Cereals, learning materials,

Harpreet: [00:53:04] Courses, books, I

Joe: [00:53:06] Mean, what what are people finding effective? What are people finding not effective?

Harpreet: [00:53:11] That's a good question. I mean. But me personally. I like to read

Harpreet: [00:53:17] About it and

Harpreet: [00:53:19] Like the way I would mean it's probably not asking a question, but the way I learn about stuff is I'll read about it at a high level, find some hands on example, started working

Harpreet: [00:53:26] Through it through it, see

Harpreet: [00:53:28] Where I get stuck, where something is happening, and I'm like, What the fuck? Why is that happening? How is that happening? And then I'll go back to learning materials, try to dig deeper, try to get underlying concept stuff like that. But maybe it's mostly books. I like books. I'll probably like my process like this. I'll probably watch a few YouTube videos just like. Prime, the pump, I don't know if that's the right word, but it's like get get me in the mood for this particular topic. Watch a few videos to see. Okay, great. This is this is what let's say, a [00:54:00] recurrent neural network is great. I see it. I got it. Then I'll go to the books and then I'll start reading about it in books. I don't want to show you the books or go back and go through the examples in the books, and I'll drill deeper wherever I'm just like,

Harpreet: [00:54:15] Unconvinced of

Harpreet: [00:54:16] What's happening. I'd love to hear from other people.

Kausthub: [00:54:21] I guess, like for me, I'm a I'm a I'm a conversational slash audio kind of learner, so I learn faster when I'm in a conversation like this one, I learned faster when I'm talking to someone one on one about something they're working on. I learned faster when I'm listening to something. So for me to generate that level

Harpreet: [00:54:40] Of light, just

Kausthub: [00:54:41] Interested in investment in a book to go in and learn about this chunk of stuff, right?

Harpreet: [00:54:46] I kind

Kausthub: [00:54:47] Of start from a YouTube video or from a

Harpreet: [00:54:49] Udemy course because that's cheap and easy

Kausthub: [00:54:51] To access, right? So you access the Udemy course like that? And then I watch it. I'm like, Okay, I get the high level. Now let's dig deeper and deeper, deeper. By then, I'm convinced that I need to go to read papers about that particular topic or go down and drill into a book on that topic. And it's just going to vary depending on what kind of a learner you are.

Harpreet: [00:55:08] I find that those conversations

Kausthub: [00:55:10] That I'm having at work, I've learned we have once in two weeks. We've got like a a paper read kind of session where we share, OK, we've read this paper, others, what we think about it and we discuss it right? I learn so much more in that session than I do in the like three to five hours of that week that I spent, you know, reading other

Thom: [00:55:27] Papers that are out of my area.

Harpreet: [00:55:28] But I'm actually

Kausthub: [00:55:29] Curious as like a woman who's really invested in the education piece of it. Monica, what are your thoughts on? How do you identify how you best learn and how do you map that

Thom: [00:55:41] The resources available?

Vu Nguyen: [00:55:43] Oh, for sure. This is really weird, because I literally posted something on learning different types of learning styles yesterday. So depending on if your visual auditory hands on, you can go to different resources [00:56:00] and also depending on if you have a specific question versus if you're just generally curious about something. So if you have a specific question, I usually Google Number one resource to go to Stack Overflow is something super helpful as

Harpreet: [00:56:16] Well or

Vu Nguyen: [00:56:18] Any more smaller, like mini courses that are more scoped out that can answer that question

Harpreet: [00:56:27] Versus if

Vu Nguyen: [00:56:28] You're just generally curious, you don't know anything about a specific topic you can have. They have courses that are like specialty like data science, specialty course, which covers like nine different topics and all of that. I also

Harpreet: [00:56:44] Like tutorial

Vu Nguyen: [00:56:45] Like sandbox sites. W3 Schools

Harpreet: [00:56:49] Mode is another

Vu Nguyen: [00:56:51] One where you can just go in there and just start doing something. Breaking things I think is a really good way to learn as well to figure out how things are working on the back end.

Harpreet: [00:57:06] Mark, go for it.

Vu Nguyen: [00:57:09] So for for me, like how I kind of quickly try to learn things, especially for like machine learning or just Data in general, I try to start really high level. It's drilling really fast. And so for me, it's like a matter of repetition. So like I get a new concept, I try to find like, what's the intro five minute

Harpreet: [00:57:27] Quick like, explain like

Vu Nguyen: [00:57:29] I thought, like, I'm five explanation of a concept. Then I'll go find like a more in depth than maybe like a lecture or something like that. And then the

Harpreet: [00:57:37] Next step

Vu Nguyen: [00:57:38] I go find articles like towards data science, something like really high level or like

Harpreet: [00:57:43] Analytics.

Vu Nguyen: [00:57:44] Vidya, I'm butchering that name. But but essentially, that's the next step. And then from there I will go find textbooks. So now I have, like all the key terms, the words I go through, the textbooks and things like, it sounds like a lot, but I'm like, I'm watching things two times fast. I'm [00:58:00] skimming. The thing is not to like, really comprehend

Harpreet: [00:58:02] Deeply, it's just to

Vu Nguyen: [00:58:03] Get repetitions over and over again. So that way, I have a set of resources to know where to look for and then where. The real learning is is I try to implement project. Implementing a project is where I really learn the most. But doing all the steps ahead of time gives me a plethora of resources to go back to, but also like gives me a good foundation to move forward with. So I'm not just like randomly applying code from Stack Overflow, I can actually think through the problem

Harpreet: [00:58:27] For it and then to

Vu Nguyen: [00:58:28] Solidify it. And I'll do this every time because it's very intensive. I like creating tutorials, so like I have a few on GitHub tutorials and things to do. And like, teaching others and mentoring others is like where I really learn a lot because I like to share to explain to someone else. I actually have to learn and understand it and be prepared for questions.

Harpreet: [00:58:48] I think that's where that conversational aspect that Coastal was mentioning really kicks in just having to talk about it and, you know, bounce ideas around.

Harpreet: [00:58:57] Andrew got some great response

Harpreet: [00:58:59] Here to for anybody else. Wants to answer this question? Let me know. Let's hear from

Harpreet: [00:59:02] Andrew Andrew Troth. There's multiple

Harpreet: [00:59:04] Andrews. That's what I refer to before everybody.

Thom: [00:59:08] Yeah.

Harpreet: [00:59:09] Just quickly

Thom: [00:59:10] To some of the points, I think, to which

Harpreet: [00:59:14] Mark just

Thom: [00:59:15] Said whenever I have been asked to do a brown bag on a particular topic,

Harpreet: [00:59:20] I mean, I'm learning a lot out of

Thom: [00:59:22] Interest and

Harpreet: [00:59:23] Casting a fairly wide net. But when

Thom: [00:59:25] I have to then put that into a training resource for

Harpreet: [00:59:29] A brown bag

Thom: [00:59:30] Or even explain to our CEO or some of our engineers, that is when I really go down

Harpreet: [00:59:37] The rabbit hole and I will come

Thom: [00:59:39] Ready. I was doing one on NLP recently and I had history going back to like the 50 scrip target. I went down a very deep rabbit hole, but it was enjoyable. And so I think that with the combination of exploring various GitHub repositories, I've been very pleased with some of the training materials on oDesk. They open [01:00:00] data science conference, particularly some of the analytics.

Harpreet: [01:00:05] They have some really good

Thom: [01:00:06] Professors, actually. So I'm based in North Carolina. We have at North Carolina State University, the Institute for Advanced

Harpreet: [01:00:11] Analytics guy by

Thom: [01:00:12] The name of Mark

Harpreet: [01:00:13] Barr.

Thom: [01:00:14] He actually taught some of these courses

Harpreet: [01:00:16] Really into

Thom: [01:00:17] Some very interesting

Harpreet: [01:00:21] Programs

Thom: [01:00:21] For fraud analysis

Harpreet: [01:00:22] Like I

Thom: [01:00:23] Think it's Cade five eight six one is a destiny character. I forget which one.

Harpreet: [01:00:29] And so those

Harpreet: [01:00:30] Materials have been

Harpreet: [01:00:31] Great.

Thom: [01:00:32] I really like, I think, as some other folks have mentioned, kind of starting on YouTube, getting excited about things that build some momentum,

Harpreet: [01:00:40] Then you

Thom: [01:00:40] Can get into reading some of the materials, exploring GitHub repos and then sharing that with colleagues and getting conversations, getting other people excited about it, and you can feed off of that. So that's been that's been great on our end.

Harpreet: [01:00:56] And then I will after this, if I

Thom: [01:01:00] May ask if anyone has implemented any proper graph databases because I've been tasked on that recently. And so I'm curious if anybody views those in their data analytics.

Harpreet: [01:01:13] Linda, thank you very much, graph databases I've seen. That was his name. David Knickerbocker, I haven't seen in a while, David

Harpreet: [01:01:19] Appeared tuned in by chance.

Harpreet: [01:01:21] Let me see you. Come, come, hang out. Let's see, there's a. A lot of new comments coming in. And Mark says Odessa is how he got his first Data science job, that conference holds a special place in his heart.

Harpreet: [01:01:39] Speaking to conferences,

Harpreet: [01:01:40] Don't forget to sign up for dedicated happening on October 5th.

Harpreet: [01:01:43] Be sure to be there.

Harpreet: [01:01:45] I'll be presenting a dedicated. I'll also be presenting at the ML conference on October

Harpreet: [01:01:49] 15th, so hopefully

Harpreet: [01:01:51] You guys get to tune into that one. We'll get your graph databases.

Harpreet: [01:01:56] Question, Andrew.

Harpreet: [01:01:58] But first, let's go back to mark. Marcus said [01:02:00] he had something interesting to share with us and I'd love to to see the which we're talking about the technical

Harpreet: [01:02:06] Aspects of something.

Harpreet: [01:02:07] Go for Mark.

Vu Nguyen: [01:02:09] I definitely say I wasn't planning on having this question because I was literally coding this in the middle of this and trying to get some. I wasn't expecting my output to actually come out as quickly because it's like a million rows. But essentially, if you all remember, I'm working on a imbalanced classification problem for predicting neonatal death within 28 days. And this project, I'm kind of doing with my mentees together. And so we finally have output. Last week, we kind of started like, you know, here's just a plain random forest model. We haven't done anything special to it. It's going to be a bad model. I've now implemented smoke, which is a sampling method to account for the imbalance. And again applied the random forest model. No tuning yet. And so I got my output. I thought, maybe it might be interesting so you can actually see the graphs I have and the output. And just to get some feedback because I have some ideas for next steps, because it's still it's not at the level I want after doing smoke, which is expected because I haven't done any tuning. But before I go into tuning and whatnot, I'm just curious if there's anything more like simpler steps that I should consider. Like, you know, I can go down the rabbit hole and make it more technical, but it's always like, how can I simplify the problem? Or maybe simplify the data in a way that might make things better? I've already done some feature importance. I've already talked to stakeholders as well who are, like subject matter experts, a good

Harpreet: [01:03:35] Idea on some important

Vu Nguyen: [01:03:37] Values. But I was just curious one just to see output and then also have a cool conversation. Is that sound good to you all?

Harpreet: [01:03:46] Recall that sounds that sounds like an October Data science. Yeah. Yes, yes. Somebody is AIs, right? Yeah.

Vu Nguyen: [01:03:53] So so share my screen. Yeah. Can you all see my screen? Yes, absolutely. Awesome. So essentially [01:04:00] going back up. So here's the model without doing smoke. This is the training data. So of course, it's going to look nice. And then this is my validation after I split. So I have train, validate and test. So this is my validate set and so area under curve confusion matrix and then the various precision recall F1 scores. I'm going to get past this because again, that didn't count for imbalance. I implement smoked and as you can see, you know, before smoke, this was the the distribution of events very imbalance after smoke. Now they're both balanced for for the outcome variable. So I retrained it again. And so now here is my curve again. Training set sets of not really expecting much. So here is the random forest model. The area under curve increased for that. But interestingly enough, for my F1 score, it actually decrease compared to the other model, which is really interesting. And the reason why I'm choosing the F1 score is because because like health care, you know, there is a price for false positives and false negatives. False negatives will be the worst, but false positives be bad because like we say, like this, newborn is going to have at risk for dying. They go under like unnecessary procedures to put them at higher risk, right? So that's why I want the F1 score to kind of balance that. And so for me, I'm like, Wow, I'm really surprised that the F1 score went down after doing smoke,

Harpreet: [01:05:41] And again, I haven't

Vu Nguyen: [01:05:42] Done any tuning. But based on what you're seeing, I can

Harpreet: [01:05:45] Show a specific other grass. If you like,

Vu Nguyen: [01:05:48] You know what would be your next steps sneaking through this problem? You know, and I literally just discarded this, and it's got an output, so I haven't had a chance to think about it, but I thought like that would be great to talk to to figure out next steps. [01:06:00]

Harpreet: [01:06:01] Yeah. Tom, let's go. Let's let's go to you and just step away for a quick second. But Tom,

Vu Nguyen: [01:06:07] Go for almost stop sharing.

Thom: [01:06:08] Yeah, just briefly. Great explanation, Marc, and I'm working on something related just for fun. And so I don't know if you saw my comment yet, but if you're trying xG boost, which isn't always the best, but it's frequently very good. It has a balancing thing in it. But regarding the metrics, I think we can over focus on F1, especially in medical stuff. To me, all is the bomb precision. Precision just means how tightly clustered are things really, how well clustered are they? But recall is actually the individual recalls for each case, that's your predictive accuracy when you get down to it.

Vu Nguyen: [01:06:59] Can you briefly describe recall again? I always makes that and recall it's like I had to review it every single time.

Thom: [01:07:05] In fact, I want to encourage everyone to just look through my recent post feed. I've been dealing with these things because I got frustrated myself like, wait, let's put it in in just conceptual terms. So when you think of recall, it's what we've predicted correctly

Harpreet: [01:07:24] Over everything we would have

Thom: [01:07:26] Predicted correctly if it was a perfect model. So the model we have for predictive accuracy

Harpreet: [01:07:36] Over the model, the perfect

Thom: [01:07:37] Model prediction,

Harpreet: [01:07:39] Basically.

Thom: [01:07:40] And but the F1 score is really just the harmonic mean, and I'll make that simpler.

Vu Nguyen: [01:07:48] I'm actually reading your posts on the on that post. It was really good. Thank you.

Thom: [01:07:53] Well, this is a little different when when you're talking F1 score, it's just the harmonic mean of [01:08:00] precision and recall. But I'm like, sometimes we make too big a deal out of precision. And so it's kind of good to break. You can use what's called the EF Beta Score, which helps you weight those against one another. But quite frankly speaking, I think what you really care about is the recall on each of those. And but

Harpreet: [01:08:25] If you if you tell

Thom: [01:08:26] When the precision, by the way,

Harpreet: [01:08:28] Is is just saying

Thom: [01:08:30] The number we got right over, the total number we predicted as positive.

Vu Nguyen: [01:08:37] Well, that's all.

Thom: [01:08:39] Yeah, go ahead, Greg.

Vu Nguyen: [01:08:41] Sorry about that. I think for me, I think I empathize with what Mark is saying because I've been in that situation to where both recoil and precision are important and you have to resort to F1, for example, in the health care piece. False positive can hurt you as much as false negative. Of course, you're going to have one that has strong risk, stronger risk than the other. But it's kind of like hard. Like one of the explanation, I guess great theory I have for why F1 went down is because you gave the model more Data to look for for errors right to to to commit more errors. So you're committing more false positives, you're committing more false negative. I'm assuming that's why it went down, but

Harpreet: [01:09:33] It's kind of

Vu Nguyen: [01:09:33] A difficult thing to say, right? Once you know F1 is your best metric, how do you ensure that's what you need to go by? Or how do you ensure that? No is not recall, it should be precision.

Thom: [01:09:49] So if you tell us exactly

Harpreet: [01:09:51] What I'm struggling with,

Thom: [01:09:53] Yeah, for each problem, you really need to take the time to understand what's most important. [01:10:00] And then once you

Harpreet: [01:10:01] Do, once you

Thom: [01:10:02] Look through the myriad of metrics that are available for confusion, matrices do take the time to really think about, Well, what does that

Harpreet: [01:10:10] Really mean? For example, when

Thom: [01:10:11] You look at the accuracy equation, it's kind of overwhelming looking at first until you realize,

Harpreet: [01:10:18] Oh, it's just the number

Thom: [01:10:19] Of correct predictions divided by all cases. But. The accuracy alone and accuracy to me is more important than F1 score. I'm not saying F1 score is not good. You just got to put it in the context of what is the real need of this particular classification problem.

Kausthub: [01:10:42] But I mean, that would vary based on the specific problem, right?

Harpreet: [01:10:45] Like so like, yes, I

Kausthub: [01:10:47] Mean, at the moment, I'm doing a semantic segmentation like kind of task where I've got significant class and balance accuracy gets thrown because if you've got a multi class problem and you've got one like a real imbalance where there's like only 10 20 of your pixels on one particular class and everything else is

Harpreet: [01:11:02] Another, you might get

Kausthub: [01:11:03] 99 percent accuracy, but completely incorrectly, you know, classify those 10 20

Harpreet: [01:11:09] Like outlier pixels.

Thom: [01:11:12] Yeah. What you said, cost of bit, it really depends like for a certain case, getting a false negative could be like life threatening. Whereas a false positive. Ok, you're going to get the treatment, even though you

Harpreet: [01:11:31] Didn't need it, but to not

Thom: [01:11:33] Get the treatment or the procedure when you really needed it. So you've been that model to reduce those false negatives as much

Harpreet: [01:11:41] As possible that might

Thom: [01:11:42] Actually lower your accuracy. But if you really lowered your false negatives? Awesome.

Kausthub: [01:11:48] Right, and that's coming from the perspective of like a screener being an important tool in that way.

Vu Nguyen: [01:11:56] Yeah. A quick question I have to is like, so the reason why you don't want to use [01:12:00] addresses because like in balance, you could just like, say, like everything's false, right? And we'll have a high accuracy because it's super imbalance, right? Well, after smoke, things are balanced now. So with accuracy, then be an OK metric for just the training data set. My my hunch is no, but I'm just trying to think through it.

Thom: [01:12:20] It depends on your problem. It depends

Harpreet: [01:12:23] On your on your

Thom: [01:12:24] Goals. And I was about to say business goals, but yours aren't business goals. Yours are medical goals or something related. If you really got to define those goals with classification

Vu Nguyen: [01:12:37] And then one more thing, just like thinking through kind of like getting this kind of goes back to the business use case, I'll share my screen real quick. So again, we have our true negative, which is good, our true positive and then our false negative false positive, which is a really bad. The argument I'm making is like thinking about the use case, like the clinical workflow. We could argue that we can just actually just throw out false positives completely because we say, like if we think that like. I'm actually I'll take a step back, I had an idea, but after I say it out loud, that doesn't make sense. Never mind. Yeah, because I would say by the time you figure out it's a false positive you've already administered, you know? Yeah, yeah. No. To me, right? Let's go to location. That's what matters.

Harpreet: [01:13:26] Yeah, it might have some insight for us. Here you go for a man. Yeah, I was going to say like, I might have you tried like up scaling and down the sorry up up sampling and down sampling. And rather than

Vu Nguyen: [01:13:40] One that is real quick for me.

Harpreet: [01:13:42] Well, it's more like more advanced version of that like to balance the Data when you have like not 50 50 percent upscaling is basically use upscale

Harpreet: [01:13:53] The minority to the levels to balance

Harpreet: [01:13:56] And then the downscale, you downscale the majority to [01:14:00] the minority level. If it makes sense.

Vu Nguyen: [01:14:04] Definitely, I think I actually wrote a note in here, I think that so I think I've heard that as like oversampling. So I said like the difference from oversampling is that attempts to recreate the variance seen within the Data set as mainly oversampling can result overfitting. I remember pulling that from an article, and I'm reusing code from like from like a year ago, so I don't know how true that is. But that was the reason being

Harpreet: [01:14:31] Mark

Vu Nguyen: [01:14:31] From a year ago said that oversampling led to overfitting, but Mark from a year ago also knew less so. I'm curious if that's I come across for other people.

Harpreet: [01:14:42] Yeah, no, I'm saying that because I was working in this one customer churn problem, and I had the same issue because it's imbalanced, Data said. So I did try all three. I started with oversampling under sampling and smart. Surprisingly for my case, I got better results again, like that in terms of metrics from under sampling. So it depends like you have to try and then you can figure it out if it's working for you.

Vu Nguyen: [01:15:11] So I think that's a really great point because I was asking like, what's a simpler thing than going straight to the future, Tony? And I think a simpler thing would be for me, like, actually go back to like the sampling method and like just like re copy the the kind of the the positives to balance that. So I really like that.

Harpreet: [01:15:29] Yes. If I tell you, if I conceptually think

Harpreet: [01:15:31] Of smoke, it's like

Harpreet: [01:15:33] We have all these features and there's

Harpreet: [01:15:35] Feature space, right

Harpreet: [01:15:37] And dimensional space. And we draw a hybrid plane through this end dimensional space. And here

Harpreet: [01:15:43] Is one of our samples that

Harpreet: [01:15:45] We care about. Minority sample and what we're trying to do with smote is create a synthetic point that is very close and similar to that particular or general minority. Is.

Harpreet: [01:16:00] Right. [01:16:00] I mean, Conceptualist, how I like to think of smoke.

Harpreet: [01:16:04] I don't know if that's right or not, but. I'm wondering, like there must be some thing that implements it, must have some name, but. Would you be able to synthetically generate rows of Data

Harpreet: [01:16:17] By looking

Harpreet: [01:16:18] Only at the minority class and then for each feature of the minority class, each

Harpreet: [01:16:23] Column say, OK,

Harpreet: [01:16:25] Here is the

Harpreet: [01:16:26] Closest distribution

Harpreet: [01:16:27] For this particular. You know, column yamoah could be normal, it could be whatever. And then. When you populate a new row, just pick a random value from that distribution that you fit on that column and then do that for every single.

Thom: [01:16:48] That is a worthy experiment, you're describing, Harpreet, that that's the curiosity, the experimentation that we have to do sometimes don't get completely locked into the current tools. So like if you own a great table saw you're going to have to build a jig every once in a while to make a certain cut. And that's what you're talking about building a new jig for four balancing classes. I don't know if anyone. Oh yeah, I've already said it. Sorry. But again, Mark, I think you're getting it. If you take the time to really figure out what's

Harpreet: [01:17:22] Important to this

Thom: [01:17:23] Particular classification problem, then it'll become apparent very quickly in very simple terms. Oh, I need to really reduce false negatives. And and if I increase my false positives, that's not as big a deal as decreasing my false negatives. I'm not saying that's what it is. I'm just saying in many cases that could be the goal. And I think we can get so lost in the confusion matrix

Harpreet: [01:17:54] Metrics, and they get

Thom: [01:17:55] Confusing and we lose insight on what the heck [01:18:00] we're really trying to do and that classification problem.

Vu Nguyen: [01:18:03] Yeah. Wow. Now this conversation has been so, so helpful. I literally have like two hours. I'm meeting with my mentees to go over this. So speaking of like doing to teach and learn. Oh, so this is super helpful, y'all. Equip me with a lot of thoughts I can just bring to my mentor assistance to have a really good conversation with them about this.

Harpreet: [01:18:26] Ron? Great question mark. Let's go to will consider Andrew's question the the final question if anybody has any insight into this. I think thematically it is around graph databases and I've seen I've seen a few people talk about graph databases. Eric looks like it's still here. Yes, I think you might be able to speak to that. I saw you post something about that. Andrew, go for it.

Thom: [01:18:53] Yeah, it's just we've had some

Harpreet: [01:18:56] Interest

Thom: [01:18:57] From some of the folks that we work with on several things like different stages. The first is basically your enterprise knowledge graph, which is not

Harpreet: [01:19:07] Really a data analytics

Thom: [01:19:09] Question at the

Harpreet: [01:19:09] First

Thom: [01:19:10] Glance of it, but just kind of capturing some of the institutional relationships that a lot of the managers don't have

Harpreet: [01:19:17] Ais on. But then the

Thom: [01:19:19] Second part of that is to feed in some of

Harpreet: [01:19:22] The institutional

Thom: [01:19:23] Data and try and derive and infer certain relationships around some of the projects and essentially to kind of cut back on some of the research

Harpreet: [01:19:32] Time that certain folks in the company

Vu Nguyen: [01:19:35] Have been spending on.

Harpreet: [01:19:37] Like looking at what we've done in the

Vu Nguyen: [01:19:38] Past and then rewriting things that have

Thom: [01:19:41] Already been written, essentially. So we've done some experimentation

Harpreet: [01:19:47] Around adding these

Thom: [01:19:50] Documents as entities also to the graph database, which kind of includes our more of our talent in human capital management component right now. But we're also looking at [01:20:00] some supply chain management logistics. And we'd also want to be running analytics over that. So I'm just

Harpreet: [01:20:08] Curious if anyone

Thom: [01:20:09] Has experience

Harpreet: [01:20:10] With that. What has been

Vu Nguyen: [01:20:11] The experience, what kind of

Thom: [01:20:14] Product you've used, how successful it's

Harpreet: [01:20:16] Been? What have been the pain

Thom: [01:20:18] Points in deployment? Maybe.

Harpreet: [01:20:22] Opening this up to anyone that's got any insight or wisdom to share with Andrew here.

Vu Nguyen: [01:20:31] So I've never used a graph

Harpreet: [01:20:33] Database and definitely never built a graph

Vu Nguyen: [01:20:35] Database.

Harpreet: [01:20:36] But I know that I've attended a

Vu Nguyen: [01:20:39] Couple of things with Tiger Graph, and I don't know if you ever use Tiger

Harpreet: [01:20:43] Graph or if you've watched or

Vu Nguyen: [01:20:46] Seen any of the resources from

Harpreet: [01:20:47] Neo4j, but

Vu Nguyen: [01:20:49] Both of them

Harpreet: [01:20:50] Are. I mean, I'm sure you've heard

Vu Nguyen: [01:20:51] Neo4j if you do anything with

Harpreet: [01:20:54] Graph database stuff, but like they have a lot of really

Vu Nguyen: [01:20:57] Helpful and interesting resources that I've just found interesting, but otherwise my stuff's mostly just been doing my own little analyzes rather than building actual database. Thanks for that, yeah.

Harpreet: [01:21:12] Yeah, we've been trying to

Thom: [01:21:14] Migrate from Neo4j into the Amazon Neptune system, and it's a completely different animal too, so that's also introducing pay points.

Harpreet: [01:21:24] So Andrew, do you follow do you follow a David Knickerbocker or are you familiar with him?

Thom: [01:21:29] The names come up a number of times if you ask me what is tidal or where he's at, I

Harpreet: [01:21:34] Could tell you, but the name's familiar.

Harpreet: [01:21:36] Yeah, he's, you know, a member of the community. Let me,

Harpreet: [01:21:38] I'll send you a

Harpreet: [01:21:39] Link to his LinkedIn. Speaking of David, we haven't seen David LinkedIn forever, David lately where you been at man with you? Or go ahead and drop that link theory, I connect with David micro-budget, he's he's cool. You know, tell him that the artist of Data science century during their happy hour, that it was collective

Harpreet: [01:21:59] Voice that said [01:22:00] that he

Harpreet: [01:22:00] Is the go to guy and I'm sure he'll be able to help you. Eric also wants to share something and or let people know about something. So go for Eric.

Harpreet: [01:22:10] Yeah. So I

Vu Nguyen: [01:22:11] Dropped something in the chat here,

Harpreet: [01:22:13] But I also shared on

Vu Nguyen: [01:22:15] Linkedin as well.

Harpreet: [01:22:15] We're hiring on my team. So I'm pretty excited

Vu Nguyen: [01:22:18] About that because I didn't know we were going to

Harpreet: [01:22:20] Be hiring until just earlier today. And so

Vu Nguyen: [01:22:23] It's a basically it's

Harpreet: [01:22:25] The same as my job looking

Vu Nguyen: [01:22:27] For someone in strategic business analytics.

Harpreet: [01:22:29] And so like

Vu Nguyen: [01:22:31] I said, I'll drop the job description. But like. Kind of a real overview rather than job description, overview of what I do on a regular basis, it's like I work really closely with stakeholders. I use a crap ton of

Harpreet: [01:22:43] Sql,

Vu Nguyen: [01:22:44] Also a lot of tableau. There's definitely room for like R and Python. We have like. Usually we have like the data science team

Harpreet: [01:22:51] That they do a lot of like the

Vu Nguyen: [01:22:53] Model stuff. And so I'm doing very, very like

Harpreet: [01:22:55] Business, business embedded

Vu Nguyen: [01:22:57] And business oriented. And so the way that it works is you'll be involved with a one or two different verticals like I work with

Harpreet: [01:23:04] The small business, small business

Vu Nguyen: [01:23:06] Loans and then also like investments, products and so that this person would work with. I don't know, credit cards or deposits or mortgage refinance or something like that, right? And so it's a great way to get involved and learn about a specific area of the business really well and be involved

Harpreet: [01:23:20] With the sales

Vu Nguyen: [01:23:21] Team or marketing or product or some of all of it, you know,

Harpreet: [01:23:25] With a B testing and talking about

Vu Nguyen: [01:23:28] Just really how to grow and improve the business and focus on impact, that's actually one of the big things that really drew me towards lending.

Harpreet: [01:23:34] Tree is like, everybody's

Vu Nguyen: [01:23:36] Like, OK, like really focused on

Harpreet: [01:23:38] Impact and anyway, work there for a few months now. I really like it. Good place to be. So, yeah, let

Vu Nguyen: [01:23:43] Me know if you have any other questions, feel free to shoot me a message on LinkedIn or whatever.

Harpreet: [01:23:47] Ok, thank you so much, yeah, I appreciate you spreading the word about that. Is there like a preferred method? He said LinkedIn should they message you with like a resume, should they, you know, have portfolio projects already like?

Harpreet: [01:23:57] Yeah, yeah.

Vu Nguyen: [01:23:58] So I'm not. Ok, [01:24:00] good. That was me too. So I am mostly just the messenger right now. And so the biggest thing would be if you have if you have

Harpreet: [01:24:08] Questions or something like I said, feel free to shoot me a message. I mean, you can send your resume if

Vu Nguyen: [01:24:13] You want to, but it will be more effective if

Harpreet: [01:24:16] You message

Vu Nguyen: [01:24:17] Me and say hey. But then also apply for the job because I can't apply for you. Yeah, be

Harpreet: [01:24:22] Perfect. If you guys are listening on LinkedIn, those you catching this on Sunday, when the episode of at least has a podcast shout out to Eric, you guys probably already know how to get in touch with them. Mark, you were also doing a thing here. What's going on?

Vu Nguyen: [01:24:38] Yeah, we just got more headcount, which is exciting. So we just put our position a couple of days ago of my company, Umoh essentially think about it as organizational psychology to drive behavior, change through our product and make work better, improve disperse habits at work. We're specifically looking this is a really cool role. It's like a senior analyst role for marketing and storytelling. So essentially, we want you to dove through our product, Data, our survey data and potentially some customer data bring that all together. Find some really interesting key highlights and tell a really compelling story. So we'll be working with the content team and the marketing team, but you'll be under the data science team. It's a really cool position. I can also share the link to the to the role. If you are interested in it, please reach out to me on LinkedIn as well and as U.S. space so. So of course, we can't have other people outside the country apply. Mark, thank

Harpreet: [01:25:45] You so much. Speaking to Data storytelling, Brant dike's sent me a copy of this book. I've yet to take a picture and post it on LinkedIn, but it's a great book. Data storytelling. It's dope. We're actually going to be live on LinkedIn [01:26:00] on October 2nd, interviewing him LinkedIn live. We'll see a lot of me in the month of October. Not only am I doing the office hours every Friday, but you will see me live on LinkedIn doing interviews for the podcast with with Brant Dikes. Joe Reese,

Harpreet: [01:26:19] Brittany Doe got Liana

Harpreet: [01:26:23] Lou, Andrew

Harpreet: [01:26:24] Jones. We got

Harpreet: [01:26:26] The Data professor himself,

Harpreet: [01:26:28] Mr. Chanin, I cannot

Harpreet: [01:26:29] Say your last name Natalie Nixon,

Harpreet: [01:26:34] Danny Mudd as

Harpreet: [01:26:35] Well. We got the one and only Danny Moore finally coming on the podcast. I think

Harpreet: [01:26:38] I'm officially cool

Harpreet: [01:26:40] Enough for him to come on to my show. It's going to be awesome. So and then you'll see me presenting on Data. So October is going to see a lot of me on LinkedIn. Hopefully, you guys will all be

Harpreet: [01:26:50] There and hanging out.

Harpreet: [01:26:55] To check out the podcast episode that we released with Dennis will tales from a Data engineer dropping a lot of Data engineering knowledge. Don't forget that Sunday we've got the happy or rather office hour session with Comet. Those office hour sessions are going to be moving during the week. Obviously, like I work at

Harpreet: [01:27:16] Comet now, so they don't want to

Harpreet: [01:27:18] Sponsor the podcast

Harpreet: [01:27:19] Anymore.

Harpreet: [01:27:20] Make sense because you're paying me now to, like, work for you guys. But I'll be doing it in my normal work hours instead of like, you know, time when I should be kicking it with my, with my family.

Harpreet: [01:27:30] So sometime during the week,

Harpreet: [01:27:33] We'll be having the the comet ML office are still same link still broadcasted live on LinkedIn through my profile, but it will likely be Wednesday around 10:00 a.m. But once we settle on that, I'll let you know for sure. What other news I have to share.

Harpreet: [01:27:52] Well, I mean.

Harpreet: [01:27:54] My course is coming

Harpreet: [01:27:55] Along just fine, you know, I always

Harpreet: [01:27:57] Feel kind of sleazy shouting out my course, but I feel like I'm really [01:28:00] building something nice and I'm feeling like it's going to be beneficial and help a lot of people shout out to the people in this chat that put in so much time and effort to review it and provide me with that valuable feedback specifically.

Harpreet: [01:28:12] Mr. Blazer.

Harpreet: [01:28:14] Mark Freeman and Tom, thank you so much for for reviewing. They give me valuable feedback that I'm actioning

Harpreet: [01:28:19] On, so

Harpreet: [01:28:21] I'll probably be launching that course. Ready for the holiday season. It is going to be awesome in my eyes

Harpreet: [01:28:31] And not

Thom: [01:28:31] Sleazy, but other people need to know about it.

Harpreet: [01:28:34] Thank you so much. I appreciate that. Yeah.

Harpreet: [01:28:36] Yeah, you

Harpreet: [01:28:37] Know, like I figure, you know. Look, obviously why, why, why create, of course, will create a course,

Harpreet: [01:28:42] Obviously teach you

Harpreet: [01:28:43] Guys awesome stuff

Harpreet: [01:28:44] And hopefully spread my

Harpreet: [01:28:46] Philosophy and ethos and the way I

Harpreet: [01:28:48] Work out there. But I've spoken to literally

Harpreet: [01:28:52] Thousands of aspiring Data scientists right through Data science dream job. Like, I've been doing this mentorship professionally for years, and I see a lot

Harpreet: [01:29:02] Of issues

Harpreet: [01:29:04] And a lot of problems with these aspiring Data scientists who are looking

Harpreet: [01:29:08] To break into the field. But not only

Harpreet: [01:29:10] That, like my mentees get jobs and start progressing and they start moving up and leveling up and they face

Harpreet: [01:29:15] Challenges. So I have like taken

Harpreet: [01:29:18] What I've learned from helping them and just kind of bundling it up into a course. So hopefully you guys enjoy it. Any last minute closing questions or comments? I feel like I killed enough time, if there was something,

Harpreet: [01:29:33] Then I would have came through. Does not look like it.

Harpreet: [01:29:35] Thank you so much, my friends, for hanging out. And making this an awesome Friday evening. Be sure to also join the Slack channel if you haven't already, you can see awesome projects like the one

Harpreet: [01:29:45] Arpit just completed

Harpreet: [01:29:47] Code and everything out there just for you to look at and see how he did

Harpreet: [01:29:51] Stuff so he can inspire

Harpreet: [01:29:52] Your own project. My friends, remember you've got one life on this planet. Why not try to do some big cheers, everyone?