HH67-04-02-2022_mixdown.mp3

Harpreet: [00:00:09] What's up, everybody, welcome to the artist Data Science, happy hour. It is Friday, February 4th, 2020 to man, I can't believe it. It's, you know, already February. It's going by quick. Let me tell you, it's been an eventful, eventful first month of 2020 to hope you guys are doing well. Hope you guys have been enjoying your year so far. So this week released an episode on the podcast Arts Data Science podcast with Carlos Mercado, his second time on the show. Talking about decentralization for data scientists,

Harpreet: [00:00:37] What's

Harpreet: [00:00:38] Defile about? What's Web3 all

Harpreet: [00:00:39] About?

Harpreet: [00:00:41] So if you're interested in all that and how it relates to you as data scientist, check out that episode.

Harpreet: [00:00:47] Now just released today, so

Harpreet: [00:00:49] Was there on all streaming platforms, by the way, if you're listening to this, whether you're on LinkedIn, whether you are on the room or on YouTube, go and rate the podcast man, you all should go and give this thing five stars. You can directly rate the podcast right there in the Spotify app. If you have it, it now allows you to do

Harpreet: [00:01:06] That or on wherever,

Harpreet: [00:01:08] Wherever you watch this or listen to this thing, man, just go, go read it. Or at the very least, smash the like on this on this video. If you watch on LinkedIn, you know this. You know, I can't let John Cronin Daly How to beat me, man. Yeah. Got to be the premiere Data Science Podcast, The Art of Data Science. So be sure to go ahead and smash that like give it a faster review. Only five star reviews, please. So, yeah, definitely tune into that comet office hour. Early this week was Fantastic Man. These things are getting better and better. Every week we're getting more and more

Harpreet: [00:01:39] People on to talk about

Harpreet: [00:01:40] Their experiences as data scientists. This week we brought on our good friend. I think everybody here knows for something, actually. She came on the show as well as Ronnie Wang from Google I. He's a research scientist there. And then somebody, if you guys don't know, you should know her. Susan Shou Chang She's awesome. We talked a lot

Harpreet: [00:01:57] About, you know, what it's

Harpreet: [00:01:58] Like a day

Harpreet: [00:01:59] In their life [00:02:00] at

Harpreet: [00:02:00] Work, what experimentation is like for them. Talked about the ins and outs of hyper parameter tuning. Best, best practices for for for doing that. Yeah, it's a great conversation. So definitely check that out. That's on my YouTube channel. It's also on YouTube channel comments, YouTube channel. They've got the bits clipped out, so it's, you know, if you just want one piece for it,

Harpreet: [00:02:21] Go ahead and check out those

Harpreet: [00:02:23] Little clips. And we also got. Bunch of blog posts that accompany that series, so definitely check me out on Medium if you haven't already medium at Data. Ais Harp all my stuff is is free for everybody to to view and read.

Harpreet: [00:02:36] Always open.

Harpreet: [00:02:37] So go ahead. Check it out. Smash stuff. Follow on

Harpreet: [00:02:39] Medium. Give it a few claps.

Harpreet: [00:02:41] Follow me on Twitter as well. It's nice. Harp We'll start. Connect with more, y'all. Speaking of connecting, man, I got a chance to to hang out with some people earlier this week. Mexico virtually a lot this week.

Harpreet: [00:02:51] That was great.

Harpreet: [00:02:52] Also hung out with some community members Christophe, oGra Beck and Renata. So it was great seeing y'all. It's been some time, but now we are here and it's Friday. It's time for these happy hours. I'm excited for a man. Been a long, long week and another long week to come. By the way, if you are in the Bay Area, if you're in Sacramento, I return back. February 13th 13th come back to Sacramento. It's been quite some time, two and a half years to be exact since I was home. So if you're there, hit me up. I want to link with you guys. So let's kick the hour off with the question. Man, I'm curious, what is your favorite API? What's your favorite API? Whether it's a web API

Harpreet: [00:03:33] Api

Harpreet: [00:03:33] For some library package,

Harpreet: [00:03:35] What have you?

Harpreet: [00:03:37] What is your favorite API? What what is it that you like about that

Harpreet: [00:03:40] Api

Harpreet: [00:03:41] And how, you know, maybe how is it different from from not so good APIs? Let's kick it off with let's kick it off and then then we can go to Russell and either one of the Erich's we are Eric Gitonga and Eric Sims in the building.

Harpreet: [00:03:55] So, yeah, let's hear.

Speaker3: [00:03:58] Oh, I don't have a good answer [00:04:00] for this one, I don't have a favorite API. It's you kind of caught me flat footed, so I'm going to have to defer somebody else. Go. Yeah. I don't really have a favorite one.

Harpreet: [00:04:11] Russell, how about you?

Speaker4: [00:04:14] Uh, yes. My answer is not a great deal different from pins, I'm not sure I've got a great statement, although I would say the API that works. Definitely my favorite. You know, you can say you have a habit of being a bit temperamental here and there and those that are easily configurable, you know, so some interesting web API AIs can be good. I can't think of a brilliant one off the top of my head. But yeah, let me think about that a bit more and I'll try and think of a specific one. But definitely the one that works is always my favorite.

Harpreet: [00:04:47] Yeah, that's the one that follows the documentation. Yes, that is also a good quality of good API is just one that where you can follow documents and it does what it's supposed to do.

Harpreet: [00:04:57] Personally, I

Harpreet: [00:04:57] Like the I like the Twitter API a lot, man, because it's it's very intuitive to me how like the way I interact with the Twitter app, like all of the

Harpreet: [00:05:04] Endpoints, have the same

Harpreet: [00:05:06] Name as how I interact

Harpreet: [00:05:07] With the app. So that's very, very

Harpreet: [00:05:09] Important as just like a natural extension, I think of using the the API versus the application itself. It's pretty good in terms of a. Library API, like if I could learn, I think it's awesome because it's super, super consistent, that API is so consistent, doesn't matter what algorithm you used, outfit that predict, fit, transform whatever will have you like. If it's it's all the same, I think that's a very good quality of an API. I've been learning the PyTorch API this week. That's been a lot of fun.

Harpreet: [00:05:40] I'm enjoying that excitement about you.

Harpreet: [00:05:43] By the way, you all join

Harpreet: [00:05:44] In on LinkedIn if you are

Harpreet: [00:05:46] Joining on YouTube. If you have questions, let me know. We're taking questions. Go for it. Eric Sims.

Speaker5: [00:05:54] So, you know, definitely Typekit learns like what I've probably used the most. And I remember [00:06:00] like vividly during my master's program when I realized or made the connection of like, Oh, there's these uniform things I can fit, I can predict whatever and it just like it made everything seem like way more approachable, way less scary. I didn't have to memorize so many different things are always be googling it since I'm always Googling other stuff instead. But to fix something that I haven't used a lot, but I think is interesting and would like to use some more is the Reddit API because it's got so many different pieces to it, you know, being able to and it's great for for network stuff because you've got the topics, but then you've got the Redditors, but then you've got the Redditors who replied to other Redditors. And there's just so many like different different layers to it. And a lot that can come out of it. And so I think it's prah PRA. I think that is kind of I don't really like the name, but it's the Python Reddit API wrapper. So no thanks. But yeah, like, yeah. So I think that I just think Reddit school in general, and so I use it a little bit more.

Harpreet: [00:07:12] Yeah, that's that that's a good one, I've actually never looked at the documentation or played it the API.

Harpreet: [00:07:17] But in terms of like bits

Harpreet: [00:07:17] Of information

Harpreet: [00:07:18] And bits of like things

Harpreet: [00:07:19] You can get like payload from an API, I find the

Harpreet: [00:07:22] Spotify

Harpreet: [00:07:23] Api to be super, super fascinating. There's so much going on there like you can collect a lot of interesting data from from Spotify, a great API there. Shout out to Vivian in the house. I haven't seen Vivian in so long. Good to see you again, Vivian. So question coming in from Coast Deb right here on LinkedIn expanding on the question.

Harpreet: [00:07:43] I love that. What do you

Harpreet: [00:07:44] Guys look for in good APIs? What are some red flags in APIs? That's that's a good question. And I was recently reading the the book, actually, actually the book I was reading was on Costa. My [00:08:00] recommendation is

Harpreet: [00:08:02] Software

Harpreet: [00:08:02] Engineering at Google, and they have an entire section at just talking all about documentation and the importance of documentation.

Harpreet: [00:08:15] And yeah, I think a good

Harpreet: [00:08:18] Quality quality of a good API is one that just has documents. That's just whatever it is that I'm looking for is so easy to find. It just

Harpreet: [00:08:25] Tells me, what is the

Harpreet: [00:08:27] Purpose of this thing? How do I use it? You know, has a too long didn't read, you know it. It's just. Tells me what I need to know very quickly, it makes it easy for me to get to work if anybody else has inputs on that, I'd love to love to to hear. Eric Sams, Eric Katanga, anyone. Otherwise if you guys got questions coming in, let me know, then go for it.

Speaker3: [00:08:55] I think the thing for me like documentation, that consistency, you know, kind of things that we've been talking about, but also if you've got a limit or if you throttle or if you've got a cap of some sort. Please tell us like, don't don't be coy about your limitations because you could spend weeks trying to figure out what's wrong on my end and then, Oh, well, you have caps. Oh, OK. That would have been good to have them in the documentation, but you throw. Oh, OK, yeah. So I mean, the little things, the things you don't think about documenting

Harpreet: [00:09:30] That could cause

Speaker3: [00:09:31] Somebody else to have absolute pain. And if you could, you know, implementation coding examples like a GitHub repo would be awesome as well. Just give me a baseline implementation because there's probably something in there I didn't think of. And maybe you didn't document very clearly or I didn't read very well. So those two right there, I mean, if it has the like a like a WC has sample implementations for everything. And just having something like that makes my job so much easier and I [00:10:00] will not be blown up your support line, which is probably going to be a good thing for your company to.

Harpreet: [00:10:06] A great points. Giving any input, we're talking about APIs, whether I know you joined in a little bit later.

Speaker3: [00:10:13] Yeah, I was just thinking of how many times I've used frustrating APIs. Well, that have bad documentation and then I've blamed myself. So if you're struggling with an API, it's not you. You're not stupid. An API is only as good as they make it easy to use. So.

Harpreet: [00:10:32] Yeah, and that book software engineering at Google, they talk about like knowing your audience and, you know, whoever writes the documentation for an API like it's one thing like you are the person who maybe are most familiar with the API, but you can't write from that perspective. You have to write from the from the perspective of the person that is reading the documentation. So they're kind of optimizing for the reader's experience. Oh yeah, you have to consider the person's level of experience. You consider the person's domain, expertize or

Harpreet: [00:11:00] Knowledge, and you consider why

Harpreet: [00:11:01] It is that they might be coming to this API. Then you also got to think about, OK,

Harpreet: [00:11:05] Is this person like the type of

Harpreet: [00:11:06] Person who might just stumble upon the API and just hope they find something? Or is this somebody who's going to be like, like laser focus? Like, I know what it is that I want. Just make your documentation clear. Let me get there and figure it out. All good considerations for a. For documentation. So if anybody has any more thoughts on apps, do let me know, but we got more questions coming in from coast up here. Russell, go for it.

Speaker4: [00:11:32] Go for it. I'll just say very quickly, building upon your comment about Spotify, especially from the last couple of years, given the lockdown restrictions, et cetera, the Netflix API probably would be really interesting to get your teeth into, you know, because I'm sure viewing habits of people for Netflix has increased by more than listening habits on Spotify.

Harpreet: [00:11:55] Yeah, that's it, it's interesting one. I never even like thought to check and see [00:12:00] if Netflix had like a developer. Api or anything like that, but yeah, definitely worth checking out. Good one. We'll take a look at that. I've got a question about models in production. Right, cool. Let's do this, man. Any tips on do's and don'ts for model monitoring a common hidden challenges, red herrings or things like that? I think it comes to advice and tips, it's probably more succinct to get the don'ts than the dos, right? I don't know. Maybe that's just the personal opinion there,

Harpreet: [00:12:39] But let's go to Vin here

Harpreet: [00:12:41] And then, you know, if anybody else has tips, let me know how might I might pull a couple out of of the brain as well? Any tips for dos and don'ts for model monitoring?

Speaker3: [00:12:53] Monitoring like a short term monitoring and long term monitoring or metric sets, that's like the one thing that caught me off guard was what you're measuring week to week versus what you're measuring two months, three months down the road. There's two different two different objectives. So that often means that you have two different sets of metrics that you're tracking because the the biggest failures only start becoming obvious over longer timescales. And so

Harpreet: [00:13:22] It's like, I

Speaker3: [00:13:23] Didn't get that for, I don't know, a year, maybe longer than that. There was you could just do another set of metrics and you should monitor that. And long term, those trends are important to look. That's my biggest one. And the rest are just don't. Don't forget that you have users and monitoring can't in any way impact them. Not saying I've done something that was like over some logging that may have slowed things down and don't keep your logs in the same system as production runs in. That's another. Yeah. Don't do that, either. I'm not saying I've done either one of those things. I've just seen someone.

Harpreet: [00:14:01] So [00:14:00] when it comes to you mentioned short term, long term monitoring, like what are some examples of of short term monitoring? So so if correct me, if I'm wrong here,

Harpreet: [00:14:12] I would think maybe short

Harpreet: [00:14:13] Term monitoring you want to look at just maybe. It comparing basic variants and and how scattered your predictions are, if it's for a particular task, but they may be more long term, is looking at like Data drift and things like that or my misunderstanding.

Speaker3: [00:14:32] I mean, short term, you're looking for basic accuracy and failure scenarios like catastrophic failure,

Harpreet: [00:14:37] Because short term,

Speaker3: [00:14:39] Usually if something goes wrong, it's really obvious. And longer term, what you're really looking for is some indication that you have a massive flaw in your model that's only going to

Harpreet: [00:14:50] Bite you say, once

Speaker3: [00:14:51] Every six months or once a year. And if you're not watching for it, you know, and it's always different. It's not like there's one metric that I'm watching for more times than not really watching for users and their behaviors because they'll react by abandoning the product or abandoning the workflow at really strange times. And suddenly, and that'll be kind of my hint that there's something going on that I need to start tracking around whatever they were doing in that one moment. And there's something longer term there that I need to monitor. And that's what I'm really talking about is you've got the short term stuff, which is obvious, but it's also the long term.

Harpreet: [00:15:25] You're looking for

Speaker3: [00:15:27] Anomalies in user behavior that's been my kind of go to for long term monitoring. Anything which all of a sudden shows up and then stops like, it looks like people are having a hard time and then they're not having a hard time. And if that's not associated with some sort of incident ticket or training or, you know, one of the normal somebody called us and told us that it wasn't working for this reason and we fixed it. But when they're not, it's usually indicative of some sort of longer term anomaly that's only going to show itself once in a while where it's kind of a low. Lo Data class [00:16:00] of problem that I'm running into, that my model is terrible and I didn't know about

Harpreet: [00:16:06] That kind of another. Maybe on a somewhat related tangent is don't believe your. Aggregate accuracy metric, right, sometimes I think you might need to drill down a little bit more, look at various segments and see how your model is performing on different slices of Data because some of that could get buried in the aggregate metric. Yeah, I guess maybe, you know, just in terms of things for drift, you know, future distributions of training Data live Data right. Correlations between features, I think

Harpreet: [00:16:40] Is important

Harpreet: [00:16:41] For detecting drift, target distribution and things like that. Any other tips that Eric Russell, Vivian, either Eric? Any tips on model monitoring?

Speaker4: [00:16:58] So I was just typing something in there, Major. Speak up now.

Harpreet: [00:17:02] I was going to

Speaker4: [00:17:02] Talk about drift actually mentioned it already, but also by us. And just say that if you're monitoring to identify drift and bias, that's going to be far more easily detectable in your long term monitoring rather than in the short term. Not that it's impossible, you know, if you've got something that's. You know, gratuitous or just extreme, you'll pick it up in the short term, but I would imagine that under most circumstances, you're going to want to concentrate that type of monitoring on long term only because it's just going to be those extreme cases you pick up in short term.

Harpreet: [00:17:37] Awesome, thank you very much, Costa. Hopefully found

Harpreet: [00:17:39] That helpful.

Harpreet: [00:17:41] And, you know, if you want to join in the chat, man, you're always welcome. But I want to see what's up with 50. That's been so long, how you been what you've been up to each time. How's the new job?

Speaker3: [00:17:51] Hey, it's been really busy and intense, and I have to say that when you work at Metta, you have to actually [00:18:00] like, do stuff every day.

Harpreet: [00:18:02] So, you know, I don't know. I just it

Speaker3: [00:18:06] Feels like the weeks kept getting away from me. I kept thinking I wanted to come back and then the weeks just kept getting away. But another cool thing that happened is I moved to the bay. So I'm here in the bay and the weather is awesome.

Harpreet: [00:18:18] So nice. Well, I'll be in San

Harpreet: [00:18:20] Francisco later this month. I'm planning to do a little get together, Mexico, Mark and Co. And, you know, maybe Bali. I think she's out there, so I'll definitely get in touch with you as well, I think. I think it'll be fun.

Harpreet: [00:18:35] Have a good time, yeah.

Speaker3: [00:18:36] Mark was really helpful when I was I was like, I had been messaging him when we were going to come. Being like, Hey, where should we look? Because. And he had the scoop of all the good places to go. Look.

Harpreet: [00:18:50] So yeah. Where in the bay are you currently living?

Speaker3: [00:18:55] Um, right now we just are in an Airbnb. It's like a redwood city. But we got a place that we move into in like a month. It's in Alameda and yeah, really excited about it.

Harpreet: [00:19:09] That's awesome tenderloin, great spots. I've got some crazy stories about shit that's happened to me in the tenderloins. Don't go there past a certain. Just don't go there, didn't have a day.

Harpreet: [00:19:20] All right, guys, let's see if

Harpreet: [00:19:21] There's questions coming in. Please do let me know whether you are watching on LinkedIn, whether you're watching on YouTube or wherever it is that you're drawing from. You got questions. Please do. Let me know. Because just the time

Speaker5: [00:19:37] I have, I think this is kind of a pretty generic question, but I was thinking about it this morning as I was debugging, and that was like. I don't know the best way to approach debugging, like how to even start other than just what I already do myself, you know, which is definitely dependent on the situation. But you know, one of the obvious ones is, you know, [00:20:00] print here where you think something is happening. So I'm just going to grab that before anybody else takes it. But like, how do you like when something goes wrong? Where do you start? How do you how do you get going to dig down and. Get it out and debug it. And I'm kind of curious because I want to know, like if there are like specific tools or something or functionality like because, you know, there are things in like VSCO that I'm sure would make my life easier if I knew they were there and I knew how to use them. So anything that would involve that would be super helpful.

Harpreet: [00:20:32] Doesn't say print debugging is one of my coaches as well, but then it just, you know, apart from the vs code stuff, there's this. Website called a Python visual tutor. That I found to be really, really helpful. So just kind of shows like you can entertain your code and it shows like how everything's put into the call stack and then what all the intermediate kind of elements look like and then what's happening in the loop. It's it's really helpful if I can share link to that the Python visual tutor. But in terms of like vs code extensions. I mean, a good launcher, I think is helpful for you submit code, but I'd be happy to to hear Vivian, what do you use to to debug? Russell Vyn, Eric G.

[00:21:23] Like what is

Speaker5: [00:21:24] Like there's like the little red dots or there's things like step into or I guess that is that part of debugging? Is that what that is? Now that maybe that's where I was going, sorry, go ahead.

Speaker3: [00:21:38] Well, I was just going to say that I

Harpreet: [00:21:40] Actually don't use those

Speaker3: [00:21:41] Tools that much because I end up just using the internal Facebook tools so often. So and they have like their own built in like debugging stuff. So I'm sorry. And that I can't be more helpful.

Harpreet: [00:21:56] Yeah.

Harpreet: [00:21:58] I know there's something

Harpreet: [00:21:59] That the Python [00:22:00] debugger, I mean, you can look that up

Harpreet: [00:22:01] As well. Then what are your

Harpreet: [00:22:03] Tips for OG tips for debugging?

Speaker3: [00:22:07] It's the first download, Borland. It's the only compiler and development environment you ever want to use. Aside from just dating myself, it's way past 40. Yeah, I figure out the basics of breakpoints. If you haven't already, I'm getting the feeling you already know what a breakpoint is and you already know how to step through code.

Speaker5: [00:22:27] No, I don't. Let me see these things, but I've never actually done it. So this is what I'm trying to ask.

Speaker3: [00:22:31] There's like two kinds of words. I got you. Ok, so there's two types of debugging. One is when stuff blows up, you know, and you're actually getting an error and it's telling you something is going wrong. More times than not, that area will tell you what line it is. It's lying to you. It's really someplace further up that's messing up, but you're seeing the the actual failure further downstream. So, yeah, just put breakpoints in and figure out, you know, follow your key variables to figure out what it is based on the message, if it's meaningful because I know some Python error messages are amazing and some. Looks like they intentionally don't want you to figure out what's broken there.

Harpreet: [00:23:14] You know,

Speaker3: [00:23:14] A lot like Java that way, too, where it's just like, what are you talking about? What does that mean? So, yeah, try to pass through the error message. If it doesn't make any sense, just break. Point your way backwards. Figure out what it is

Harpreet: [00:23:25] That because that's really the hardest

Speaker3: [00:23:27] Part, is figuring out what really blew up and what it is that caused it. And a lot of times it's just working backwards. And so, yeah, just put breakpoints in and then you can step through the code. It's harder when you're trying to debug like a UI, but now there's a whole lot more debugging tools for you, AIs

Harpreet: [00:23:44] That are especially web you

Speaker3: [00:23:45] Ais. So for like web pages and that sort of thing? Yeah, the tools are a whole lot better than they used to be, and you can even step through that now. Then there's the other kind of failure where it's like, this doesn't

Harpreet: [00:23:54] Work, but it doesn't blow up.

Speaker3: [00:23:56] And that's the really terrible debugging where [00:24:00] it doesn't error. It simply does something really strange. And they don't tell you the greatest steps to reproduce, because when you're doing machine learning like there aren't any. So they'll give you like this vague front end isn't doing this thing, and then you just got to. It's this is the ugliest thing is when you have to go backwards and figure out, is it the front end? Is it the back end or is my model really messed up

Harpreet: [00:24:28] Because they

Speaker3: [00:24:29] At least some people that I've worked with enjoy blaming the model when they don't understand how their own code works. And so I have to sort of prove their codes busted, and that's literally a critical piece of debugging is not always believing it's your code and looking at everything else that could be to make sure that bugs actually yours before you do the horrible thing of figuring out how

Harpreet: [00:24:53] How your model

Speaker3: [00:24:54] Possibly is serving this terrible, terrible inference that's causing all this stuff downstream. Because more times than not, you have to put in a ton of logging in order to even catch what you might be serving. That could potentially lead to whatever obscure behavior that they've got downstream. And so debugging, when it comes to actual model behavior, then you're talking about a ton of logging and then tracing back from those logs and then it gets ugly because you have to figure out why you're, you know, the bigger the model, the less likely it is you're ever going to figure it out. And that's really the problem is it might do it once and then three months later, it does it again. And that's all you have to go on. Literally, there's just two tickets. Did this thing did this thing again?

Harpreet: [00:25:40] And you almost have to

Speaker3: [00:25:41] Advertise like, I can't. I'm sorry. I don't know. We should just put something in place to catch this and make sure it doesn't ever get any further than the back end and apologized profusely or something. So, yeah, debugging is actually really complicated when it comes to that second piece where it doesn't fail, but [00:26:00] it doesn't behave as expected.

Harpreet: [00:26:05] Thank you very much, Vin. Rupert.

Speaker5: [00:26:11] I was just saying think that made sense? Yeah, that was great.

Harpreet: [00:26:15] There's some great tips here from Russell Russell. Go for it. And then if the Antonio George Antonio and Mark AIs got any tips for debugging, now's the time to compile them and share. Russell, go for it.

Speaker4: [00:26:36] Uh, yeah, so I just put out a quick comment that it was supplemental to what Vin was saying at the time is separating out kind of proactive preemptive debugging, you know, looking for potential issues if there isn't a complete failure versus the panic mode debugging where something does blow up. And you, you know, you really need to drill in and find whatever the issue is. And further then, I would absolutely second Ben's point about the break points you can

Harpreet: [00:27:07] Step in and look

Speaker4: [00:27:08] Like compartmentalize the.

Harpreet: [00:27:13] Looks like Russell was

Speaker4: [00:27:15] So good sequentially rather than have to look at the entire thing, especially if.

Harpreet: [00:27:22] Oh, am I coming back? Yeah, you're back, you're back. All right, so

Harpreet: [00:27:31] Looks like there's issues with that

Speaker4: [00:27:34] Wrestles. Is looking up on most, I don't think I've got an issue.

Harpreet: [00:27:37] Yeah, no, no worries.

Harpreet: [00:27:39] Kosta says riding a black box test helps when debugging are some vocabulary words to look up there as well. So let's go to Mark or Antonio any tips?

Speaker3: [00:27:52] Um, yeah, I guess some Tasmanian out. I just joined, so I don't know. Eric, yes, I do have 16 minute [00:28:00] air supply in my closet with the door closed. I don't know what's already been discussed, but I mean, the first thing that pops into

Harpreet: [00:28:07] My head is, you know,

Speaker3: [00:28:09] One writing clean code in the first place. You know, that doesn't say, like you mean, write production from the get go. But writing clean enough code where there are going to be bugs. And so taking the extra time to actually follow a style guide, bait or whatever language you're using goes a really long way. And having those variables, I think that's the baseline. But something that's been drilled into me from the engineering team when I'm writing code now is like having a logging, having logging so critical just to see, like, where does something break? Is it being logged the way you expect it? That's been extremely helpful for that and being very thoughtful where you put your logs, especially, there's been times where like, maybe I do like

Harpreet: [00:28:48] A for loop and all of a sudden I see

Speaker3: [00:28:50] My log popping up over and over and over again. I'm like, Wait, the for loops not supposed to work like that. Like, why is that happening? Right? So there's things along those lines, in addition going a little bit further like you do like different raids, exception errors within your code. That's also

Harpreet: [00:29:06] Very helpful when when

Speaker3: [00:29:08] Doing your your code and then unit tests both for after, because you know, if you change code, the unit has to catch something, something breaks. But more importantly, what I found it when I do a unit test because you're testing like this is the expected logic that I wrote. And there's been so many times where our unit test and it fails, really.

Harpreet: [00:29:29] I'm like,

Speaker3: [00:29:29] Oh, that is not doing what I

Harpreet: [00:29:31] Expected.

Speaker3: [00:29:32] And that's like a lot of bugs from the get go. More on the engineering side. I'm very thin for engineers having continuous integration and deployment. And so when before I can merge code or even get the review, it has to pass all the tests first before I even send it to an engineer. So that's something

Harpreet: [00:29:53] That

Speaker3: [00:29:54] Many data scientists not know how to do. I don't know how to do it. I'm learning how to do it. I think Kiko posted a really [00:30:00] great link recently for the GitHub actions as a way to implement that

Harpreet: [00:30:05] And then kind

Speaker3: [00:30:06] Of go into like other things. You know, there's debugging code, but also it's like debugging Data. And so something I really work with a lot is I'll start with a dummy data set. I know what the outputs can be and run it through. And you know, am I going to expect the output from this dummy dataset as compared to when you're working like a production Data that's like millions of rows or billions of rows. There's no way to really easily get that data quality down unless you have like a data quality team already implemented and most orcs don't.

Harpreet: [00:30:38] Thanks so much, Mark.

Harpreet: [00:30:39] Eric Gitonga in the chat says he remembers pulling his hair out in freshman into the programing where a miss semicolon got him the wrong results. Take a day to figure it out. Code is producing output, but just wrong output. Then he also says logging has definitely been helpful while playing around with compiling the Linux kernel just to figure out how it worked. Logging definitely helped in figuring out stuff that broke awesome Antonio Priestley here. He's not oh. Post on LinkedIn says regarding debugging.

Harpreet: [00:31:16] How much do you rely

Harpreet: [00:31:18] On code structure

Harpreet: [00:31:20] To help versus

Harpreet: [00:31:23] Identifying how much do you rely on code structures to help identify issues? Fp principle is atomic function in strong types, and the last one is in Python. Take some questions to consider.

Speaker3: [00:31:39] I can try to take a stab at that. I think something that I think think about my code structure is I've been big fan of either functional programing. Ah, but object oriented programing in Python. And if you're writing anything, I start academia. I had zero functions, just you run the script from beginning to end all 300 [00:32:00] lines,

Harpreet: [00:32:01] And I got the

Speaker3: [00:32:01] Statistics for me to publish, right? That is horrible and trying to debug that when something breaks, that's really rough. And so if you move towards actually like object oriented programing, you know, you have these classes, you have these functions with clear names and everything is calling to certain things. And so when something breaks, I can clearly see in the code, Oh yeah, this line called this function and this function, this line right here, it broke. That's way easier to understand compared to like go to line one thirty six out of three hundred and understand what went wrong.

Harpreet: [00:32:38] Yeah, that's the great tips I remember, like, you know, I spent quite a few years writing code and SAS SAS biostatistician and making that transition to like Python was, it just was so. Difficult for me to wrap my head around, because I think this is really just like it's more procedural step by step type of thing. They just run from top to bottom the results. Thank you for sharing that, Vin. Any other tips here of instance, try to catch, try catch is the new print statement. Awesome. Hey, yo, questions, keep them coming, let's go. Questions coming in on LinkedIn or on YouTube or here in the chat, please do

[00:33:22] Go for it.

Speaker5: [00:33:30] Right. This isn't the question, but I just got to throw I'm just going to throw one thing out. I am super stoked because I have been like I've been in my current role for just over eight months and shortly after starting my role, I got this this project and it was potentially going to be a little challenging, but not a huge deal. And it ended up being a thing that took forever. And I have been working across, like all these different teams, to get things developed to help make sure that the Data [00:34:00] is doing what it's supposed to be doing and validating and everything. And just today, finally,

Harpreet: [00:34:04] It

Speaker5: [00:34:04] Has finished the validation and figured out what, like pretty sure the revenue impact will be over the next year and there is a revenue impact. So that's good. And so I'm excited for it. But also I know now like, yay, analysis is done, but now it's like, OK, now, like it's on the product team's radar, but like sticking with it and helping make sure that it gets gene through to completion. So I feel like I'm like halfway halfway across the bridge, even though it feels like I want to be all the way across the bridge because like my piece is like the biggest piece is done, but I think we're only about halfway. But I'm stoked. I'm super happy about it because it's just been like a monkey on my back for so long. So anyway, I just wanted to throw it out there because I'm excited about it.

Harpreet: [00:34:48] Congrats, man. That's awesome. Super happy for you. Love, love and stuff like that happens, man. So keep it up, keep it up, man. Question from Mark. Go for it.

Speaker3: [00:35:01] As I have a question around Data business models, so I'm in the Dow Web3 business stuff, and one of the projects we're working on is building an analytics tool for NFT communities. Really fun project and just good way to practice. Kind of like bringing something to market. Something I've been really exploring right now is just like, All right, cool. We have this idea of like some cool technology we're building in the market, but you know, what's the business model behind it and like? Does it make sense? It does make sense. A big part of our cost of goods sold. And when you're a Data company, your goods are Data. So how do you acquire Data? How you store that data with various APIs? And I'm just curious if anyone's done that, whether it's for a whole company or just for like, it's a simple project. I'm mostly looking at things like he's done this before, but like, how do you go about determining like scoping out, like what would be the expected cost of [00:36:00] doing, you know, processing and acquiring your data? I know AWU's has like certain like you can look at apps with pricing calculator, but like trying to figure out how to forecast like, all right, when we release an MVP, it's going to be this price. But like when we reach 10 customers or 100 customers, what does that price look like and does the numbers still make sense? Didn't go for it. Yeah, that's like you're asking holy grail questions right there. So, yeah, figuring out the cost of Data is. It should be stable, but it isn't. You know what I mean? It should be a stable cost because you should only have known sources, but you're going to start generating Data through processes and you have to sort of assign costs to the data gathering side of that versus all the other things you're doing.

Harpreet: [00:36:58] Like all of the

Speaker3: [00:36:59] Reasons you're doing that process

Harpreet: [00:37:00] For, it's typically around

Speaker3: [00:37:02] A product where you'll have a feature and that feature generates Data. And so part of the cost of implementation for that feature is data gathering against data gathering. Part of that feature is booked against like the actual feature itself, the engineer, the engineering platform.

Harpreet: [00:37:19] You have depreciation

Speaker3: [00:37:20] Costs. You can also can't remember what the fancy word like the finance word for it is, but you can. You can write down some of the cost of developing new product in a cool taxi tax ish way. And so that's going to go on to another line item. So that's why I'm saying like, you're asking kind of voodoo questions and

Harpreet: [00:37:42] Holy grail

Speaker3: [00:37:43] Type of questions because it's not amortization. It's something no, it's got a different. It's something else where you get a special tax break for it, and so it's written on a different line like it's a cost of R&D,

Harpreet: [00:37:58] Something like I said, I'm just [00:38:00]

Speaker3: [00:38:00] Blanking on the name, but the like the financial calculations of of all that is like somebody way smarter than, you know, than I. As far as like what is the exact number that you would book? But I got to tell you, if you're in a startup or if you're in, you know, any sort of small

Harpreet: [00:38:19] Organization that doesn't really matter.

Speaker3: [00:38:22] I know what you're trying to do. You're trying to project out costs long term and you're not going to be able to do. You're not going to be able to do that in any sort of reliable way because Data costs are going to be something that you're trying to get towards zero. You want to make it as automated as possible and you want to drive the cost instead of going up. Drive it down to really whatever you project today is going to be more. If you do it right is going to be more than what it costs you in the future. So it doesn't really matter if your numbers are off a little bit. And you're also going to look at it as a commodity, it's an asset. And so your valuation actually increases based on the valuation of your Data and the size of your Data. And so it's going to go against your balance sheet. And so you start talking about like Data as a business model, you're going to have the Data as like the core valuation for your core piece of the valuation of your company. So that's another like that's more important to focus on than how much it costs to acquire, because once it's part of your valuation, then legitimately, no one ever cares again how much it costs, because every time it gets better, your valuation goes up by a significant amount.

Speaker3: [00:39:35] And so it's a revenue generator. Quick, quick. I think it's kind of like a Web3 quirk to this. Is that cool like Data can be an asset, but with the blockchain, everyone has access to the asset, so it's a key differentiator. Or is it the fact that we able to process and have it within that? That's the key differentiator. So it's the traditional model for Data. You're right, that invalidates it because [00:40:00] it is a static resource that some companies end up putting on their balance sheet. So traditional business model that isn't like data centric or model centric. You're right, that invalidates that portion of it, but really, it's the fact that you have access to it and control of it. And if you are doing it right, you are not putting all the data on the blockchain. I understand transparency and all that, but you should be kind of holding some of the source back.

Harpreet: [00:40:29] Yeah, I'm not

Speaker3: [00:40:29] Writing, reading, reading from the Data, So. Gotcha. That makes sense. Yeah, you should be because you're gathering data and then you're doing something with it. And if you're doing something valuable with it, what comes out of that transformation is the unique piece. That's what's valuable. And so the fact that you have access to it, that's one thing. If everyone has access to it, you know, obviously that's not a value you add, but you should be holding something

Harpreet: [00:40:56] Which is unique and that's

Speaker3: [00:40:57] What you can monetize. So be like a feature store would be like the key thing that's like kind of be maybe it's a feature sort wrong word. But like all the Data transformations that we made and all the features we made, they're like, Hey, this is actually really powerful that other people want to have. No, you're going to do something with that data. It's not like an ETL type. Do something with it. You're going to you're going to transform it. I mean, transform, like in a bigger capital T not small T. I'm going to do some sort of major transformation to that which creates a unique dataset that somebody couldn't easily grabbing the same data replicate. And typically, it's experimentation that leads you to that novel Data set. And that's why I say there's a process that will create data for you, and that's going to be sort of an expense when it, you know, down down the road. That is partially just to create the data set because experimentation,

Harpreet: [00:41:55] Especially

Speaker3: [00:41:55] Novel

Harpreet: [00:41:55] Experiments, are what

Speaker3: [00:41:57] Create the best unique data sets in the [00:42:00] highest value unique datasets and what you can keep proprietary in that case is the nature of the experiment that leads to the high quality data that'll end up being a competitive advantage until somebody else figures it out. So it's not long term, but I mean, you know, it's one of those things that you can actually monetize the data on top of. And so that's you know, when you talk about a modern business model built on Data, trying to value the Data that you have really is a matter of. That's the more important matter is figuring out how to

Harpreet: [00:42:30] Value it and creating something

Speaker3: [00:42:31] That's unique that somebody else can't do. Super helpful. Thanks. Yeah, we have to go back to the drawing board iterative process, as always, with trying to bring something to market. Yeah, it's hard

Harpreet: [00:42:44] To have any like case studies off the top of your head. Maybe that, you know, white paper, something we can look into to get some more context. That sounds really fascinating.

Speaker3: [00:42:53] Everyone has figured this out.

Harpreet: [00:42:54] We'll talk about it because

Speaker3: [00:42:55] It's part of their voodoo like Facebook knows exactly how to do this. They don't publish it. Sorry, I didn't mean to matter about. There's yeah, there's a lot of companies that have figured this out, but it's not something you advertise because just figuring it out is an advantage. You don't want everyone with a Data set to suddenly get into business and now you're buying Data. You know, it's almost one of those things where you don't want people to be smart enough to start a business around it because, you know, as a large company is going to start getting painful.

Harpreet: [00:43:28] So like, like just taking Facebook as an example, they.

Harpreet: [00:43:35] Have their their Data right, they're the only ones with the access to their users data, so I could see how in that situation that Data is like definitely an asset has some monetary value.

Harpreet: [00:43:48] Right.

Harpreet: [00:43:48] But what about it? You know, if you do like like Mark's talking about Web3, if you're just kind of quote like crudely using the word, but just scraping data from the web that anybody can can [00:44:00] get. So when you do that, that's

Harpreet: [00:44:02] When you're saying like doing

Harpreet: [00:44:04] Your unique transformations, your unique experiments to then get some meaning out

Harpreet: [00:44:09] Of that data that is

Harpreet: [00:44:11] Like the intellectual property is what's valuable, not necessarily the data itself, because everybody has access to the data, but not necessarily the IP to transform it the way you did, right?

Speaker3: [00:44:22] Yeah, I think it was

Harpreet: [00:44:23] Ben Zimmer that

Speaker3: [00:44:24] Sort of wrote up the basics of what makes Data an asset. And it is that it has to be unique. You control the process that creates. And so if you are the only one who has access to it, you know, you think of Twitter. They control the process, which creates some extraordinary data sets. Same thing, Google. Same thing, Amazon. Same thing they all control. And you even look at a small company

Harpreet: [00:44:49] Like Uber,

Speaker3: [00:44:50] They control the creation process. Some really interesting data sets. And that's the like I said, I think it's Ben Zimmer that came out with that definition.

Harpreet: [00:45:00] So it's any

Speaker3: [00:45:00] Data generated by a process you control that is not easily duplicated. And so that's how they started valuing startups when it came to, you know, do you actually have data that's worth investing in?

Harpreet: [00:45:11] Do you actually have models?

Speaker3: [00:45:12] And it's the same thing if you build a model that they want to know that it's a model that is not easily replicated. There is some process that you control that created this model.

Harpreet: [00:45:22] Typically, it's a unique

Speaker3: [00:45:23] Dataset or some sort of process. That's the intellectual

Harpreet: [00:45:26] Property that led to a model

Speaker3: [00:45:29] Which you will be able to improve faster than anyone that would figure out the same things you did and then try to catch up with you. You know, that was there was a little more convoluted, but there the definition

Harpreet: [00:45:39] Of it with Data was a whole lot more

Speaker3: [00:45:41] Straightforward and easy to understand. It's just anything, anything that's not something somebody else could do next week or with, you know, six months worth of R&D.

Harpreet: [00:45:52] Yeah, it's kind of where these, I guess they're called alternative Data companies are kind of popping up. That's how they're getting that their value from [00:46:00] like, for example, like the one that I could think of is bright data that's publicly available data. But then how they get it, what they do with it to then make it valuable for you. That's what leads to those high price tags. Okay. It's interesting how you spell Bessemer, by the way, is that BS?

Speaker3: [00:46:19] I can never be SS, EMR or Bessemer. I can't. I can't spell. I'm sorry. Yeah. My daughter asks me to help her with a spelling test the other day, and one of the words that she asked me before I got wrong. So like, I just I'm sorry, it's one of my weaknesses.

Harpreet: [00:46:35] I've got it here. It's it's if if anybody listening that the link will be in the show notes. So subscribe to the podcast and YouTube channel.

Harpreet: [00:46:44] You can get that. But Bessemer

Harpreet: [00:46:45] Venture

Harpreet: [00:46:46] Partners roadmap Data

Harpreet: [00:46:48] Infrastructure, I think that might be of interest. I'm curious anybody else? Vivian, can you talk about the type of Data you're working with that matter is that.

Speaker3: [00:47:01] I don't work with user Data, I I work with like an internal ticketing system to repair stuff for the data centers, so it's not like it's not like really super cool, interesting

Harpreet: [00:47:16] Data that people

Speaker3: [00:47:17] Would like want to pump me for.

Harpreet: [00:47:22] I think it's super cool and interesting game. I do, I guess if anybody else has questions or comments on this topic, please do. Let me know. Mark says that's what Zucker trained her to say, but possible. Speaking of Facebook, man, I was listening to the Lex Friedman podcast with Yann LeCun. Oh my god, it's so good, man. Definitely check that out. It's. I mean, I've been watching it for well over a week and I'm still not done. It's like almost four hours, but some like the philosophy of deep learning that they're talking about [00:48:00] is so fascinating. So definitely check that out.

Harpreet: [00:48:02] A couple of other

Harpreet: [00:48:02] Episodes that were really interesting. I like three podcasts, were with them. A truck these are Travis or Trevor Oliphant, the creator of Nampai and Anaconda and then his business partner, Peter Wang, who's still at Anaconda. That's like a super cool insight into like the history

Harpreet: [00:48:23] Of how

Harpreet: [00:48:24] These packages that we use and take for granted came to being and came into existence. That's that's fascinating and hear these guys talk about that stuff. So I highly recommend checking those out. Shout out to Lex Friedman. Respond to my email. Come on my show. All right. Anybody got questions or anything comments? Let me know. Russell works at a company that builds data centers for Facebook and Twitter.

[00:48:52] Oh, nice. All right,

Harpreet: [00:48:58] Well, look, if there's no questions, no comments, I guess we can call this one a wrap the last time. Oh, speak it at Web3 and stuff like that. I thought, you know, obviously released episode with Carlos today. So check that out, everyone. Decentralization for a data scientist. But Mark, you might have some insight into this. Like what was this biggest heist for lack of a better word from the Ethereum blockchain of all this value? Like, what was that all about?

Harpreet: [00:49:27] I haven't got a chance to like,

Harpreet: [00:49:28] Read news or dig into it, but I'm hoping you might be able to.

Speaker3: [00:49:34] To eliminate that stuff goes beyond me, I can try to give it a shot. I'm just going to be repeating what Carlos put into this cord. So Carlos will be the person to talk about, but I guess like you can think of all these various change. So yeah, Ethereum Bitcoin have a chain and they have like Solana. So like all these different chains. And so there's these things called bridges, where how you connect [00:50:00] different chains to each other. And basically, how do these different computers keep the same state? And so there was a bug with a certain one, and they call it a wormhole ETH, where basically like millions of dollars worth of ETH was corrupted and is lost in the ether. It's like delete it. And so like, essentially is like the best I can put it is say, for instance, your computer files got corrupted and they were deleted and can no longer access it. But like those computer files are worth millions of dollars each. That's the best way I can describe what happened in simple terms and probably didn't even come close to like the whole scale of it. But I'm happy to find some cool links and share with you. Maybe share with the people. Have a great

Harpreet: [00:50:44] Left to read into that a little bit more. The last call for questions, if nobody gets questions, we can

Harpreet: [00:50:53] Go ahead and wrap this

Harpreet: [00:50:54] Up. Let me just shout out a few things happening, right? So next week, coming up on Wednesday? Got the comet office hours. We're doing a talk on reproducibility, talking about reproducibility and machine learning. Speaking with Tiffany Fabiani, who is. Data science or AI and ML lead at AstraZeneca. As well as one of our own internal

Harpreet: [00:51:19] Experts,

Harpreet: [00:51:20] Head of research at Comet Dr Douglas Blank, who? He's awesome. He's super, super knowledgeable, he's got such a huge history and in machine learning and robotics. When we talk about reproducibility and there'll be a lot of fun stuff, we check that out. And coming up for the podcast this week, get over it.

Speaker3: [00:51:43] Quick question when you say reproducibility, like what do you mean by that like reproducibility, AIs and like experiments? Because I know comments like experiment management or reproducibility in the sense of like strong code like I feel like there's so many different angles you can take in the ML space. I'm just very [00:52:00] curious what you mean by that?

Harpreet: [00:52:01] Yeah, definitely. So keep an eye over the blog post I'll be releasing on Monday where I talk all about

Harpreet: [00:52:05] Reproducibility and what that

Harpreet: [00:52:07] Means, but in. So definitely check out the blog post. Tune in to the podcast. But at a high level, it's just, you know, like if I give you.

Harpreet: [00:52:15] Like my code

Harpreet: [00:52:16] And my Data and you run the thing, you should get the same results as I do, right? But that doesn't always happen and we think about how we can ensure that happen. So a lot of. A lot of, I guess, best practices that have collated from a lot of different experts and kind of putting it all in one place. And yeah, just just talking about the reproducibility machine learning, much like reproducibility like drug discovery, reproducibility and science type of thing. But yeah, hopefully that that. Over that entices you to go check out the blog post that was released. Cool. There's like nobody watching on LinkedIn anyways, so we'll go ahead and call it a evening. Thank you all so much for coming and hanging out.

Harpreet: [00:53:03] Appreciate you guys

Harpreet: [00:53:04] Spending part of your Friday with me. You know, just shout out that I'll be back home in Sacramento on. The 13th of February until the leave on March 1st or March 2nd, something like that, so definitely want

Harpreet: [00:53:19] To reach out to

Harpreet: [00:53:20] To everybody who's there. So Mark and Kiko and Vivian and everyone, we can be hanging out, so I'll be in touch, then definitely try to find some way to get to Reno, hopefully. Hopefully, it's not like, you know, covered in snow and

Speaker3: [00:53:36] And that can also come out there. I mean, it's yeah, San Francisco's not. It's like forty five minute flight. We're not far away.

Harpreet: [00:53:44] Yeah, we definitely be in touch, man. So, yeah, we're hanging out with everyone. Sadie St. Lawrence as well. I know she lives out in Sacramento, so we'll be having a

Harpreet: [00:53:52] Good time, everybody. I'm looking forward to that.

Harpreet: [00:53:54] Can't wait to get the hell out of this snow too goddamn cold.

Harpreet: [00:53:59] I'm done with it. [00:54:00] All right.

Harpreet: [00:54:00] I'll take care. Have a good rest of the weekend and tune into the podcast. Remember it's out. Rate it.

Harpreet: [00:54:06] Give it five stars, y'all. Take care.

Harpreet: [00:54:08] Have a good rest of the evening. Remember you got one life on this planet. One I try to do some big year, everyone.