Happy Hour #93.mp3 Harpreet: [00:00:09] What's up, everybody? Welcome to the art of the data science. Happy hour is happy hour number 93, getting close to that number 100 man excited, too, to get there. I'm excited for all of you all to be here. Thank you so much for being scheduled to join. It's been fun, man. Almost two years didn't come up on two years. Been doing this thing. It's been awesome. Two years. And look at all these wonderful friends that I've that I've met Serge, Eric, Vin Jay, Jennifer Russell, Kosta, Jacob. What's going on? A lot of you. It's crazy. It's a trip. Like, I've met about 70% of this room in person, which is. Which is crazy. Got a chance to hang out with Jay in Denver. Went to a went to a baseball game with Vin Eric. Still have a kit with you, man? I'll make it out to North Carolina at some point. Serge match up in Boston. So maybe my math is wrong. Maybe 70%, more like 50%. But I'm looking forward to to to meet Neil. In person soon. Got a couple of trips planned for this year, possibly going to Atlanta for the Southern Data Science Conference. That is September 7th. At the end of September, I'll be in San Jose, going to the Intel Innovation Conference. And then if anybody is in Tel Aviv at the end of October, let me know. I'll be there for about a week we can have or so. So getting around, man, you know, Eric is moving to Utah. That's like where everyone's going. Dude, are you really moving to Utah, Eric? Speaker2: [00:01:37] Yeah. No, I really am. Harpreet: [00:01:39] Dude, that's awesome. I heard Kenji is going to be moving there as well, so that's going to be that's going to be where I got to move to next man. Got to go to Utah. But yeah, shout out to everybody here. Good to have all y'all here. Let's kick it off. Just talk about what we've been up to this week. I'm excited to hear what you guys have been working on, what you guys have been studying. I'll go ahead. Go first. So I've been I [00:02:00] split this week kind of between learning computer vision you know doing computer vision learning just because that's, you know, one of the main things that Dessie wants me to get good at. And it's so cool to have a job where they're like, yes, go study and learn how to do this thing that you're interested in and we'll pay you for it. That's just mind boggling to me that that I get to have this as a job. Absolutely love it. The other half of the week was spent doing a lot of research and kind of capturing notes, organizing notes, distilling notes down all about community, how to build a community, how to build a community strategy part of Devereaux. You know, we've got to you know, we're in the marketing team. We've got to help with the go to market kind of strategy. And I figured the strategy is to build a community. So, you know, about spun up a lot of communities on meetup.com, deep learning user groups. Harpreet: [00:02:57] And yeah, I'll be doing a lot of virtual events. So if this is something that you're interested in for sure, send me a message, send me an email. You know, my email address, the theartistsofdatascience@gmail.com. If you're interested, I'll invite you to the groups and keep you guys in the loop for, you know, all the events that are happening. It's going to be fun. One of the first presentations I'll be doing well actually on September 6th, I believe, or something like that. September 6th I'll be doing a webinar. Ask me anything on semantic segmentation with some of the in-house experts at DC. So if you're into computer vision, if you're into semantic segmentation, want to learn what that's all about or you have questions about it, then definitely come through and and check out that webinar. I did repost the link to that on my LinkedIn, so check that out y'all. But yeah, let's get into it. What you all been up to this week? Let's go to let's go to Erik, then go to Serge and then Nick Singh is in the house. So let's hear from next thing after that. I'm just curious what you guys have been up to this week. If you're watching on LinkedIn, if you're watching on YouTube, do let me know if you have questions or comments or if you want to get in to this live [00:04:00] session. I'm happy to share the link for you as well. Speaker2: [00:04:04] Go for it, Eric. All right. So today on the work side of things, I've been working on working on some Tableau stuff because I definitely think Tableau is like my weakest skill of the various, you know, SQL Python. Love those tableau. Not so much. So working on that, trying to get better, learn something. So that's good. But the thing I'm excited about is I've been working on my Fortune Cookie Movies project and I've got I've got this week I created a new branch on my GitHub repo because the original like version one of the web app was doing all of the text transformations at runtime and it was a mess. And so like, all right, well, we're going to do we're going to reorganize the whole repo so that we can do all of that ahead of time and just feed exactly what the app needs just to it. And so I wasn't quite sure if rearranging everything was going to break a whole bunch of stuff, but it didn't break a whole bunch of stuff. So I'm very happy about that. And yeah, it got one of two of the pages fixed that I need to update to to have the functionality. I want to have one where you can just click and it'll tell you the movie plot, but as a fortune cookie statement, fine. And then the other, it will give you a little quiz and it'll tell you something like, you know, you and Algonquin chief daughter will share romance when English settlers invades 17th century Virginia. Is that Pocahontas, Toy Story, you know, whatever. Right. And so you just choose the choose the movie that it corresponds to. It's pretty fun, actually. So yeah, got to get that fixed and then get it hosted in the next week. And then I'm going to check that project off as done enough for now. [00:06:00] So that's what I'm excited about. Harpreet: [00:06:01] How'd you come up with that idea for that project? What was the ideation behind that like? What was the kind of the the the life cycle for for that idea? Speaker2: [00:06:11] Yeah. So. Um. So I've wanted to I wanted to do something with NLP, but I felt kind of intimidated by it and didn't quite know where to start. So I just kind of had that rolling around in my noggin for a while. And then I came across a project that someone had done where I think they I think they fed fortune cookies into a model to write more fortunes. And as a as a project. And I was like, Oh, that's fun. Yeah. I wonder I wonder what I could do to kind of riff off of that. And I was like, Well, and then I like, read like a movie synopsis or something. I thought like, that would be cool. Like tell somebody their fortune, but it's a fortune that you recognize rather than just some abstract fortune, like you'll get rich tomorrow, but like, you know, how do you make it into something where they'll be like, Dude, that's totally me, except it's Toy Story. Oh, that's great. I want that. And so I just decided to try and figure out how that works. And also I realized there's like, nothing. Well, there's like one paper that I could find about changing from third person to second person text in English, which fortunately in English that's not that hard, but it's still fairly challenging. So yeah. Harpreet: [00:07:25] That is pretty interesting, man. I can't wait to check this project out. And so you said you're going to host this somewhere for like anyone to play around with or is this kind of. Speaker2: [00:07:34] Yeah. So, yeah. So Jay contributed to the project, which is awesome. And then Theresa bear on LinkedIn, she's like, Oh yeah, I think I could make up a Dash app over the weekend and she, like, whipped up this dash app. That's great. I don't really know Dash very well. So I'm like trying to figure out, okay, I think I know how to fix some of these things. Now I got to figure out how to host it. So probably just try Heroku or something [00:08:00] like that and then yeah. And then I'll just share it and anybody can check it out. Harpreet: [00:08:05] That is pretty cool, man. That's the power of community right there, man. Right on. Eric, thanks for sharing that. Absolutely. Love that. Speaking to the power of community, shout out to Khadijah Bryant who just got got her job at Hulu. And I think that that happened, you know, just connected with Mark, you know, on LinkedIn. So that's pretty dope. Congrats to you, Serge. Let's hear from you what you've been up to this week. Then after that, let's see who else wants to go. Man I'm curious to hear what you've been up to. We'll go to an after to go nixing shout out to Matthew Blossom. Matthew good to see you again, man. And if you guys got questions coming in on LinkedIn or on YouTube, let me know. I'm happy to to to get your question. Go for it, Serge. Speaker3: [00:08:46] Well, this week I've been working on a proof of value for a an AI vendor to see if we work with them. Yeah, that's kind of challenging because what we need from the AI vendor is it's not necessarily AutoML, it's it's far deeper in that we need help with cleaning and pipelines and things like that. Yeah. And so I'm trying to see if we can, our round peg will fit in there. Square hole, you know what I mean? Because like the kind of data we have is not like a standard as like the examples that usually vendors try to offer you, hey, we have this churn thing or we have this, you know, default problems, right? And it's all this cookie cutter stuff doesn't work with our our workflow for different reasons, like different distributions, like irregular time series, things like that. And so, yeah, we're trying to see how it can work if or if not, right? So I've been working on that on [00:10:00] a personal level. This week we kicked off the starting of my next book, which is I'm such a masochist. I'm, you know, Nick knows what I'm talking about, right? Speaker2: [00:10:15] I love that intro. I'm starting on my next book and immediately it's like, it's already sucks, but you're going to do it and it's going to be great. Speaker3: [00:10:21] Yeah, yeah, it's going to be great. The book is going to be great, but my life is going to suck. Speaker2: [00:10:26] Exactly. That's what I meant. The book's going to be great. Life's going to be suck. Yeah, yeah. Speaker3: [00:10:30] Yeah, yeah. So I'm excited about that. So you heard it first here, like, yeah. Speaker2: [00:10:37] What's it called? What's the new book called or what are you workshopping? Speaker3: [00:10:40] It's it's called DIY. It's exciting because it's for Pearsons. I know they'll put a lot of dedication into it and so they'll have to, they'll put me up into an A game, you know, like. Speaker2: [00:10:57] I don't know how to, I don't know the book's content because you're still writing it. But from a title alone that's so catchy. I love that DIY. I like it just rolls off the tongue. Speaker3: [00:11:06] Yeah, it's. It's it's for like. It's it's to teach folks that maybe other programmers hackers like do it yourself sort of people that want to get into AI, you know. So my, my thesis is this, you know, like we we have a lot of people in the field that, you know, for better or worse, they're very technical, they're very good at programing and they're very good at data engineering and that sort of thing. But we need, like other folks, you know, that can think out of the box and people that are already working in the industries they're working and they're curious about AI and they think it's something unreachable. Right. So I'm thinking, well, the project or the the book is about projects to do at home with A.I., you [00:12:00] know, with what you have at home. So that's what it is. So and I have a bit of everything in there, you know, from facial recognition to, you know, like sound monitors, you know, like if you have a baby, you can hear it cry and things like that. It's also for people. I want to take control of those things as well. You know, you don't want like Amazon hearing your baby, you know, having your baby monitor control of your baby, monitor you. You want to have control over that, right? And you can do it, you know, like you can set up a Raspberry Pi throwing a model there. You can you can make it happen. Harpreet: [00:12:37] Oh, that sounds like an awesome, awesome book, man. I'm excited. Check that out. Your last book was amazing, interpretable machine learning. That was unfortunately one of the hundred some odd books that were destroyed in the flood that I had. So I'm super sad about that. Send me a PDF so so I can re review it and we'll you know, I still got to get you on the podcast to talk about them and at some point in the near future I'll be recording again. I really like the idea for this new book man DIY The So Cool Man. Just like take these abstract ideas and just show you that it's actually not that hard to implement. Yeah, there's not a magic or anything like that. It's just a little bit code and that's it. I, you know, I was talking about learning computer vision this week. I used my webcam to kind of deploy a faster RC and with the resident 50 backbone to just like do some object detection from the like using a webcam, which is pretty cool to do. So yeah, I'm excited to see these other tricks, especially ones with baby monitors because I've got to figure out my kids will be cute, man. Thank you so much, Serge. Nick let's hear from you, man. What you've been up to this week, by the way, those you are watching on LinkedIn, on YouTube, you guys got questions? You let me know if you're in the chat right now and you got a question, let me know. I'm happy to to take on your question. [00:14:00] Go for Nick. Speaker2: [00:14:02] You know, I had a busy week, but I got some exciting news that you guys can hear first. But it's launching next week, next year's, the one year anniversary of the book being published, a pseudoscience interview. But I'm launching this thing called Data Lemur, which is like this free SQL interview platform. Bleep code exists and it's pretty good for coding interviews, but you can tell that it's kind of not meant for sequel interviews. So I've just taken all the content from my book and all the SQL questions and made them free and open and anyone can solve them. And talking about the power of community, I was actually talking to Matt earlier today, Matt Bosa, he's in this Zoom call and his wife and him both together practice on data lemur. And they found a bug and. Yeah, and then someone else. Christina She's not in this chat, but. Christina stop blogpost from Google. She posts a lot on LinkedIn. She told me like, yo, you need to make a plushie. So short story in next Tuesday I'm getting 30 little lemur plushies and. Anyone who's had me on their podcast, anyone who has the book, like someone like Harpreet, I'm going to send you also plushie next week of a little lemur that looks insanely cute. So I didn't have that idea and I didn't have that bug fix until I just talked with people I met on LinkedIn just today. That's just today. I'm dropping the link if anyone wants a beta test or give feedback or whatever and it's coming out next Wednesday. Harpreet: [00:15:24] So right on man. Congrats. And that's data lemur for you guys listening. That's el emu r so dat a ll emu ah dotcom say free SQL Interview Prep all the questions from ASU DeSantis Interview Book. Nick, thank you so much. I'll drop the link right there on LinkedIn and yeah, you've got another like cool thing that that's kind of like a sidecar course to your book that I thought was really cool and like for such a reasonable price. Man, talk to us about that real quick. Speaker2: [00:15:56] Yeah, I know. I appreciate that plug. Am I paying for this? [00:16:00] Should I be paying for this? This is amazing. No, no. I put out a little video. Course it's only 25 bucks, but just DM me, I'll give you a coupon code. It's only 15, but it's basically everything about portfolio projects and shit. But to be honest, like it was just having fun making videos. Like I saw everyone going on YouTube and everything. I was like, Hey, I want to try this video thing. So I just kind of brought up the resume. I did some resume roasts and made some portfolio projects and showed people how to do some cold emails. So yeah, people can check that out if they need it. But I think the the data is the future. So even the book I've been dealing with a lot of issues. Everyone keeps asking where's the PDF? Or I can't get in India or this and that. So I'm just like, Yo, check out the site, you'll solve all your problems. Harpreet: [00:16:39] So right on, Nick. Thank you so much, man. I appreciate that. And. Yeah. Let's see what Vince been up to this week, man. Vin, what you've been up to this week, and after Vin will go to Jennifer and then I'm scouring the the chat here for four questions and I'm scouring LinkedIn for questions, too. So how are your questions coming in? Yeah, let's let's let's get to them. Go for Vin after Vin will go to Jennifer. Speaker4: [00:17:02] I did a Strat. This is really fascinating stuff. A strategy review for one company's digital transformation strategy and their early, early data strategy and what they and like a couple of back of the napkin type Xerox copies scanned in for what their AI strategy would end up looking like. And yeah, it's it's kind of crazy. It's a huge company. Everyone thinks of big companies having these massive mature data science teams and their processes are kind of loose. They don't realize that the strategy, you know. So that's kind of what I've been working through this week is putting together this massive book on translating about 12 pages that they have into a lot of different recommendations and where they can go and where their opportunities are. And so that's a lot of what I've been doing. I worked on a small R&D project that I'm going to [00:18:00] be wrapping up this month. It's my last, probably last engineering project ever, so I'm just enjoying the last couple of weeks on that one. Learning a little bit about actually influencer marketing again, because it looks like I'm going to be doing some more of that here coming up in the future. That might be a larger growth area for my business than I expected. So I'm dusting off all of my lessons learned from eight years. Harpreet: [00:18:27] Old. Speaker4: [00:18:27] As an influencer. So there's it's been an interesting contrast this week between three things that don't have anything to do with each other. Speaker2: [00:18:38] Drop some hints. You're leaving engineering. What's what's coming next? Speaker4: [00:18:42] Strategy doing strategy. Organizational development data strategy data product strategy is helping businesses to kind of guiding them through that maturity phase from very early data maturity to we're making cash where data makes them money and saves them some money and they're starting to get into platform business models and operating models and that sort of thing. So building out three year and five year roadmaps, timelines, strategy planning, implementing its all the documents you could even imagine and more meetings than you've ever been in before in your entire life with rooms full of people where you just go, I have no idea how much this costs, but it's. Speaker2: [00:19:23] Probably a lot. Well, hopefully you get in some of that coin, so that's great. Speaker4: [00:19:31] I sound thrilled. No, I actually am pretty excited. But when I describe it to people that are in data science right now, it sounds like I'm choosing to shoot myself. That's literally the way data scientists look at what I'm choosing to do for a living. Harpreet: [00:19:46] I'm interested to hear more about this this this influencer marketing thing you're talking about. What's that look like? What's that? What's the I guess, the angle? I don't know if the angle is the right way to say it, but what's that all about? Speaker4: [00:19:59] So [00:20:00] I've been doing this forever, and I kind of compiled a rough list of things that I've learned, stuff that I know and have done, some of the analytics and methods that I've figured out. And all of a sudden there's a lot of interest in it. And it's off of three different posts that I didn't think would ever have anything to do with influencer marketing or content marketing or social media marketing. One of them I responded to and two of them I posted myself and I've gotten so much all of a sudden, I guess DMS and messages to start writing all this stuff up because I've figured out how to go from having an audience to curating a community to being able to actually activate the community and get them to spend money, which I guess is a cycle that a lot of people are suddenly interested in because they're backing off marketing spend and they're putting it into things like, you know, Devra is really, really big right now because the ROI on it's trackable. And right now with most ads and marketing, there's, you know, you throw the ad out there and you hope for the best. If you're not getting first party data, you don't really understand what the ROI is. And influencer marketing is one of those things where you can control the entire data stream. Speaker4: [00:21:19] So they're asking me sort of I'm getting questions about, Well, how do you do that? How what works? How do you pick who to go to for influencer marketing? What does an engagement look like? How do you put them on a contract? I mean, all this stuff, that's it sounds kind of basic, but it's really not. Most people don't understand that there is an influencer contract that you need to get. And what do you put into an influencer contract and how do you structure non competes and how do you work with the influencer to figure out what content to put out? Because the content you wrote was for you, not for that person. And so it has to be customized and it's kind of a back and forth creative writing process. [00:22:00] And what does an engagement actually equal? What are the vanity metrics versus what are the metrics that actually matter when it comes to conversions to a sale? How do you set? You can hear it. It's like I've been just bombarded with 88,000 questions about it. So now I'm getting to the point where I think I'll write up some posts about it, write up some content about it, and I know that's going to mean a lot of people are going to ask me to do it so. And why not? Hey, I'll take. Harpreet: [00:22:30] Money. I will too. So if you need some real consulting stuff, let me know. That's also something I've been doing internally at the exact same stuff that you're describing, trying to set up an ambassador program or help get the ambassador program lifted off the ground. How do you think about influencer marketing? And yeah, just try to create what's called a, you know, a functioning bottom up go to market movement because you've got a product products. Awesome, don't have enough people use the product, need people using the product to figure out what we need to do to fix the product. But noticing the use of product, unless you know there's a community around it, right? Like the reason why Psychic learned an umpire so big, there's probably other numeric computational platforms out there, but if they don't have a community around it, there's not going to be help or support around it. So people aren't going to be inclined to use that. You want to go to the party is. Thank you very much for sharing that, Vin. Let's let's go to Jennifer. Jennifer, let's hear from you. And then again, I'm waiting for you all those questions. Let me know. Jennifer, would you would you been up to this week? Speaker2: [00:23:35] We're going through a massive reorg. My organization, about 10,000 people. And so the financial and the organizational hierarchy have to completely change. So I'm in the middle of updating that and validating it at this point. So it's it's a whole bunch of data management scrubbing through data. But most of the week I was on vacation, so this will take me to a different question. I'm [00:24:00] reading this book, Multipliers. I don't know if anybody else has read this. If you've ever worked for someone like this, you know it. Highly recommend it. Great book. I'd love to hear if other people are reading books that they think others should read. This crowd is great resource for books and that one is one that making a profound impact in a lot of people. Harpreet: [00:24:27] Absolutely. Love that book. That book is amazing. Yeah, definitely. Definitely enjoy that book, actually. Eric Webber and I in our podcast interview, we're talking about that book for quite a bit, so it's a great book. Definitely check that out in terms of books. Here's one that I just picked up yesterday. 101 essays that will change the way you think. It's just kind of a book on the psychology philosophy. It's pretty good. I've only read like two of the essays so far. I just got the book yesterday. It's pretty good. I'm really enjoying it. Yeah, interesting to see what you all have been reading, but before we get to that, well, let's we'll go to Matt Blaze, see what Matt Blaze has been been up to and what he's reading. But Kozlov is asking Fast Arsene and working with art with image data after a while. Question for me what's been the first big surprise learning for you so far? Don't try to train your model without a GPU because I should take forever, that's for sure. Yeah. I mean, everything's been really surprising for me because it's still so, so new to me. Just the fact that that here's a pre-trained model, a model that I did not train just using right out the box and just instantiating that, you know, on my local machine it's using my webcam is able to like I can hold up a Rubik's cube or whatever it'll not Rubik's Cube but you know, other stuff and it can classify that or detect it, rather have multiple things in a frame. It can detect [00:26:00] all that like it would have. Like if you look behind me, there's like books and posters. It's like detecting all that stuff. It's so cool, man. It is so cool to see. So it's I guess that's the the surprising thing is just how accurate the thing is, especially, you know, without even having to fine tune it or anything. Cool. So let's go to a to map blog. By the way, again, if you guys have questions, let me know what you've been up to this week. What you're reading. Speaker2: [00:26:29] Can hear me. Harpreet: [00:26:30] Yeah. Loud and clear. Perfect. Speaker2: [00:26:32] Perfect. Okay. No, just going through data quality fundamentals. It's a book I saw on O'Reilly recently. Vamos has a she's from Monte Carlo, and I've been reading really reading through. It really shows how much I have a long way to go with the data quality stuff and really, really interesting stuff like using like anomaly detection as a as a means for data quality. Because I always thought like you could use anomaly detection to find like outliers for, let's say sales or that sort of thing. But apparently you can actually also use it to like check the data quality of your tables and the ingest data that's getting ingested coming in, which was a. Speaker3: [00:27:08] Mind blowing experience to me. Yeah. Other than that. Harpreet: [00:27:14] I'm going to plug for Nick here. Speaker2: [00:27:16] But yeah, I'm. Speaker3: [00:27:17] Looking at Nick's. Speaker2: [00:27:19] Book again, looking over all the problems on there a few times. I need to really review some of the stuff that I forgot over the last few years. Harpreet: [00:27:30] Yeah. I've got Nick's book right there on the bookshelf next to some other interview books. Thumbing through it the other day, man, I was like, Damn. Speaker2: [00:27:39] Yeah. My wife took it somewhere around here. I don't know where it is. It's probably with her right now. Harpreet: [00:27:43] Yeah. Yeah. The next book is awesome, man. Thank you so much for. For putting that together, man. Yeah, dude. So I got like, zero questions coming in from folks, you know, no questions coming in on a on on LinkedIn or on YouTube. So where, where [00:28:00] should we take this discussion? Coast up. Go for it. You got your hand raised. Go for it, man. Speaker3: [00:28:05] To hell with it since you're into vision stuff. Okay. So what are you using right now to. Like, I mean, what kind of data sets are you working with that you've seen out there that are catching your eye? That might be easy to work with or easy that you've found? Hey, this is just easily accessible, easily organized. And, you know, what have you been what have you been playing with so far? Harpreet: [00:28:28] Yeah. So no real data sets. I'm kind of just learning like the the fundamentals basics of I mean, I've got down, you know, I got down. But I understand how deep neural networks work. I understand how convolutional networks work now is just playing around at least this week is just been playing around with some of the pre-trained models I haven't yet training thing on my own on like an independent data set, but I'm looking forward to trying to do that. But I haven't haven't touched any real like wild data sets yet. Is there one out there that I should check out? I mean, I know there's like the the image at the Pascal of like the Coco stuff, but, you know, those are kind of those industry benchmark datasets. Is there anything out there that might be fun for us to try? Speaker3: [00:29:12] Because yeah. I mean, it depends on it depends on what kind of problems you're trying to solve, right? So obviously you've got your massive image datasets and stuff like that that can be pretty hard to work with early on purely because they're massive, right? But essentially, like there are a few fun data sets. I think Berkeley had one for like a self-driving car where they basically had like segmentation and incident segmentation data going through it. It's not too huge. It's pretty easy to work with. I found a random dataset of raccoons at one point just to play around with object detection, and that's kind of my go to dataset when I'm just testing out a new labeling platform or something like that. Right. I don't know why raccoons, but it's raccoons. [00:30:00] That's what it is. And pretty much every company I've worked out so far somewhere has a dataset of raccoons sitting around. But yeah, I mean, yeah, check out the I think it was I think it was Berkeley that had the the self-driving car dataset. That's always a fun one to play with. I'd be curious to see across like the next couple of months what kind of discoveries you make on how you like to organize the data that you're working with, because that seems to be the much more challenging part in the computer vision world, is how do people hold up different datasets, video versus image versus sequence frames, all sorts of stuff, right? It's yeah, people have been pretty loose and fast about that so far. So there's 100 ways of doing it. Harpreet: [00:30:51] So I'm curious, like, is that a challenge because like you like once you deploy this thing into like an actual device, then you've got to worry about things like the throughput, right? Is that something that that a reason why that's a concern? Like like enlighten us as to a little bit more about the challenges that that come with not doing not not organizing data the right way. Speaker3: [00:31:18] Well, I mean, I think Serge might have a few things to say about this, but the sounds of it. My my raised hand was before that, you know? Yeah. I mean, yeah. There's just so many different ways of organizing data sets. Like, my my envelopes. People have a fit sometimes with the way I organize the data. They all want these standards, you know, like DVC for data versioning and all this stuff. But it really depends to me, it depends on the size of the data and how much, how frequently it's being changed, you know. And [00:32:00] so all these standards, they're in place there. But I, I don't follow them with it. For ti as far as like structuring the data goes with images, I just, I usually go with putting some images, the train images and when folder validation and other and so forth, if they're category subcategories there, that's what seems to be like a very standard way of doing it. But there's just so many different ways. Some people use a specific format, you know, like I forget what it's called, the one that's used for TensorFlow or as I said, DBC is another one. I don't I, I was just, I just raised my hand to say, you know, I'm always looking for like funky data sets, like the ones nobody talks about, you know, like, you know, for, for my, for my second edition of my book, I, I, I use the, in my CNN chapter, I use utensil image recognition, you know, so just a bunch of images of utensils seems boring, but there's really an application for it as well. Speaker3: [00:33:15] You know, like, you know, in the previous iteration it was about fruits. Right. So why would you want to recognize fruits? There's many reasons to recognize fruits as well. It's just that, you know, a lot of people focus on images of faces, but in industrial applications, like they're just so many other things. Like at my company, like a lot of they have a lot of computer vision guys just working on recognizing seeds, flaws with seeds. Right. Because it's integral to the company to be delivering, you know, high quality seeds. Right? So it's good to fine tune, you know, your, you know, your skill sets in a way that [00:34:00] will benefit, you know, your industry or, you know, other industries, alternative industries that you wish to work at. And just like the common thing, you know, because not everybody works with face faces, as exciting as that is. That's just my my $0.02. Harpreet: [00:34:19] I think it would be interesting. It would be like a superhero versus villain classifier. I think that might be an interesting oh on project one because there's so much readily available data for that. Right. Speaker3: [00:34:31] There might be a lot of bias in that though I think, you know, like I feel like villains especially like, you know, the cartoon ones. Yeah, they they tend to be people with, you know, long noses and, like, spies and. Exactly. You know, like there's I had this discussion a few a couple of years ago with someone from the University of Chicago. Forgive me if this person's watching. I forgot their name, but they were getting a PhD and the subject matter was bias in A.I. and and particular with children imagery, you know. So. Animations, videos, books. So figure out what are the commonalities. They're as far as exactly as you said, villains versus heroes. Harpreet: [00:35:25] Yeah. It might be it might be an interesting way to check out in question. That's a good question coming here from. Oh, sorry, coach, go for it. Speaker3: [00:35:32] And this is exactly the interesting part of image data sets, right, is that with tabular data sets and numerical data sets, it's often easier to identify sources of bias, right? Whereas the image data sets the like. It's a bit hard to get down to root cause on hey, these are the factors, texturally or color wise, that are influencing my dataset. So there's a lot of meta analyzes to be done on the images themselves. And most [00:36:00] times in most projects you get like you don't get a ghost of a chance of doing it. So that's the stuff that can get really fun so that you start picking up those intuitions on it. But on a side note, on the superhero thing, just quickly, a friend of mine did who works for defense right now, but back in undergrad he did like a a michael Keaton detector. Right. He just wanted to essentially identify pictures of Michael Keaton in different frames from different movies. Turns out the Michael Keaton era Batman mask is not very good at hiding identities because it's still picked it up as Michael Keaton under the mask. So, you know, Batman's got to work on the mask a little bit. Harpreet: [00:36:45] That's pretty interesting, actually. I got two kind of computer vision related question maybe because if I told you I'm going to send you much questions, I'm still working on that. But I'm like one thing that's been kind of make me scratch my head about CNN's it's when you have a layer that learns filters in a CNN, you could learn like 50 filters in in one, you know, particular convolutional layer. Like what are those filters learning? And like when it goes through as like, is it just like a. I'm trying to understand what that conceptualize, what that looks like as like it's like a tensor. Is it just like what is that? Speaker3: [00:37:31] So I mean, I guess it depends on how how deep into the network you're going, right? Essentially, it's trying to learn. Patterns that it can recognize, that it can fit to, that it can activate, too. So, for example, let's take a neuron that activates when when it sees purple. Right. I'm just super simple. Find this a perceptron that activates when it sees purple. So it's going to learn to activate only when it sees purple. So in the same way, you [00:38:00] get clusters of neurons, essentially areas of the network that will activate when you see a vertical line, another cluster of neurons that will activate when you see a horizontal line or a diagonal line. Textural information, right. Color information. So these are your I can't remember the name or I think the. Yeah. I can't remember the name for it. There's a particular set of common filters that almost all vision networks in the first few layers tend to learn, and their textural directional lines and edges and curves and things like that. Right? So that's what it's learning. It's just learning to activate on a very specific stimulus, whereas other areas are learning to activate on different sets of stimulus. It's like if you were to say, Hey, these, this group of neurons in my network are learning to activate on house prices being high for whatever reason. Harpreet: [00:38:56] I guess that was one thing that was surprising I kind of learned was that, you know, as you as a image kind of works its way through the layers of a network, you reducing spatial dimensions, you'll have some pooling layers, increase the stride length, and then you'll go from a larger image to smaller, smaller image. But then to compensate for for that kind of downsampling, you increase the number of filters that you learn. I thought that was kind of interesting. Is that did I get that right or is that. Speaker3: [00:39:24] Yeah, roughly. I mean, like as as you downsample essentially what you're what you're saying is I care less about the information in each pixel and I care more about the information and the image overall. Right? So at some point you've got to condense that. So the early neurons are really looking at localized clusters. It can only look at a small section of the image. Right. And you're kind of zooming out as you go in a conceptual manner. Sorry, guys, I've got to be off. Harpreet: [00:39:54] Yeah, no thank you very much cos I appreciate that. So Jay had a question he was asked, he's asking about so I [00:40:00] got, got access to Dolly two earlier this week and I kind of put a post out there and said that I'll narrow it down to the top ten post that prompts that I like and they'll pass that to Dolly and then I'll create like a carousel out of that posted on LinkedIn. I haven't got a chance to sit down and play with Dolly at all yet or think about that. I hope to do that tomorrow, tomorrow morning before the kid wakes up and wife wakes up. But here's if you guys are curious. Here's what Dolly like looks like. I was hoping it would be code, but it's just like a interface. And so these are some of the images that people have come up with. I guess we could do one right off the top. Yeah. Let me see if I could find. A prompt that somebody had wanted to do. I'm not supposed to violate LinkedIn terms of services here. I'm talking about LinkedIn on LinkedIn. And here, let's just pick a random. A bridge by a fountain or a rocking horse. People eat marshmallow pies. I love that Lucy in the sky with diamonds. And you type that in and let's see what happens. I had this crazy idea where I would take a tweet storm, one of my favorite tweet storms, and then try to create visuals out of it. It was the How to get rich without getting lucky. Harpreet: [00:41:18] Tweet Storm by Naval Ravikant. And that one didn't work. So a bridge by a fountain. Okay, we're rocking horse. People eat marshmallow pies. This is an interesting. I guess that's a rocking horse person. And nobody is yet to eat a marshmallow pie, though. Yeah. So that's that's Dalia. That's super quick. Like that's mind boggling to me how fast this thing can generate images because see, here I was trying to generate. Uh. A image using one of novel's Nevada regards tweets and every time I ran it, some [00:42:00] different ones. It just came up as actual text. So. Yeah. I'm not sure. Not sure do. But there is something that I found. I'll share that with you guys as well. The Dolly to prompt book, which kind of helps you understand how to create cool pictures with Dolly too. So if anybody's interested in this, let me know. Go ahead. And I might even just post this. It was like free for me to to download. It was free for anyone to download. But I'll share this. Yeah, that's. That was how Dolly's been going. I'll probably sit down. My thought process was first I wanted to kind of break down, distilled dolly, like what it is into, you know, LinkedIn post and then share a carousel with all the images with with the prompts, with know everyone who had suggested that just a matter of finding time to sit down and do that. But yeah, a lot of cool stuff. Speaker3: [00:42:58] I'm sorry. I wonder if the folks from Dolly, they they collect all this data and analyze it, you know, all the props being created to get an idea of what kind of things are most popular or how they're structured, if there's anything interesting there. Um. Yeah, yeah, it's kind of very meta. Remember, it's like one of a job I had a couple of years ago. One of the things I had to do was do a query of queries. So I had there was this giant like table with all the queries everybody had done over the last few months. So I could get an understanding of what kind of things people are searching. And it was very interesting to find the commonalities and and what what things led to similar results and what was inefficient and what was efficient and so on. Harpreet: [00:43:55] That's interesting, a query of queries, but I'm wondering like how they would like [00:44:00] the you say to validate which images were quote unquote right. Or did a good job. I mean, they probably download usage metrics type of stuff because if you right click you can open it and you tab edit generate variations, download a report. So there probably you can probably get a signal that the prompt matches the image. If people are downloading that particular image, that's probably some, some way they do that, that super cool, super cool stuff. Go to LinkedIn. Don't see any questions coming through on LinkedIn or on or on YouTube. People just out there enjoying their summers. Robert Robinson, say two years. Awesome. Yes, almost coming up on two years. So two years in October, we knew this. So, yeah. Happy hours, man. Coming back with the podcast, be recording a lot of live episodes throughout the rest of the year, but then come January, that's when all those new episodes will be released on the podcast. Just trying to build up a back catalog because January is when the kid number two is being born. So I'm going to need to give myself more and more runway. So, yeah, I've got a couple of cool interviews coming on, rescheduling with the care of the Don. Harpreet: [00:45:18] We're making that happen again, and I've got a bunch of people that wrote some cool books that I'm getting back on to the show. One of them is Grant Fleming wrote a book called Responsible Data Science. Another author who wrote a book called Restoring Reason. Another couple of authors that wrote a book called Person to Person about peer to peer economies and communities and stuff like that. So it is quite interesting. That being said, have I prompted artists of data science yet on daily? I haven't. Let's let's see what happens. That would be pretty interesting to to see. So, Dolly, you get 50 credits, but then you can also buy credits, I think for only 15 bucks. You like a 100 something credit. So I'm [00:46:00] definitely going to be doing that. I do. This is pretty cool. This almost looks like a podcast logo. I could take this. I could take this one. The artists of data science. That's cool, man. Shout out to Carolina in the building. Carolina. What's up? Good to have you here. Questions or comments? I'll let me know. Otherwise, we'll start to wind it down. Man. Speaker4: [00:46:27] Any names yet for child number two? Let's see. Saying maybe you can outsource that to Delhi to outsource. Harpreet: [00:46:36] Kid number two do. Yeah. I got some. Got some names picked. Got some names picked. Yeah. Been, been having some, some in the pocket. Speaker4: [00:46:47] So yeah. My daughter and I both are January. So you know if you want to name them after either one of us, we got nice, we got some ideas. Harpreet: [00:46:54] Yeah. It's a trip. Like my mom's in January, my grandma's in January as well. So baby comes in January, it'll be like lined up properly. So those that don't know my birthdays in May, May 17th, my wife's birthday is May 21st. Our kid's birthday is May 8th, so we're all in May can introduce them feel a bit left out birthday in January but they got they got January birthday with their grandma. Ah yo does not look like there are any other questions. No questions come through on LinkedIn or on YouTube. So thank you all for being here. Appreciate you. Tune in and we'll go ahead and wrap things up for this week. Back next week, same time, same place. Do bring your questions, man. If you guys are enjoying the Day of Science podcast, if you're listening on Spotify, just tap five stars. Man hooked up with a five star review. If you listen on Apple again hit me up with the review. Give me them stars help get this discovered this this podcast small little podcast that's been happening for two years. I gotta do more to promote this thing I think I do more to promote. I was doing I was doing that a lot [00:48:00] when the podcast first came out. I haven't been doing it so much recently, but that is something that I do plan on getting to. But thank you all for being here. I appreciate you guys hanging out. Have a good rest of the weekend. Have a good rest of the afternoon. Whatever time of day it is that you're listening, children around. Remember, you got one life on this planet and I try to do some big cheers.