Happy Hour #96.mp3

Harpreet: [00:00:09] Welcome. Welcome to the Arts Data Science. Happy hour. Happy hour number 96. That means that we are just a few short weeks away from happy hour number 100. If my math is correct, happy hour number 100 will be on 1 to 3 October 7th, October 7th, which I think is the actual two year anniversary. Of the thing. Well, that doesn't make sense because two years will be 104 weeks. But either way, I did start this thing October of 2020 and I guess, you know, we've had so many great people that that have been just regulars and stuff. I think I'd like to to spend some time during that session to just learn how to help people even came to learn about the art of data science or, you know, what they, what they've enjoyed most about it. But yeah, I'm looking forward to having all y'all there. I'm super excited to attempt to to two years I've been doing this is pretty cool and this week has been crazy. So I spent this entire week essentially stalking people on LinkedIn. So I'm trying to build a community for DC. I'm trying to build a community of deep learning practitioners. These are people that are actually working as deep learning engineers or deep learning practitioners, deploying models into production and whatnot.

Harpreet: [00:01:28] And I've probably scoured, I think about 400, 400 different like profiles and stuff, just looking for people that, that kind of fit the bill and scoping them out and everything and. There's so many cool people. That's that's why I say there's like a lot of cool people doing a lot of really cool things out there out of like the 400 plus profiles that I have like stocked, I think I've shortlisted about 50 to to be part of the Early Adopter Initiative and got like 21 or 22 people who've agreed to be part of the Early Adopter [00:02:00] Initiative. So I'm excited for that. Some really cool people are going to be part of that. Of it's going to be one cost operation that is going to be part of the Early Adopter Initiative. Richmond, Lalaki is going to be a part of that as well. A couple other friends as well. But I can't wait to launch this early adopter initiative and kind of see how this goes. But another thing I noticed is that like a lot of people don't post or shared content on LinkedIn at all.

Speaker2: [00:02:25] So like.

Harpreet: [00:02:27] I didn't realize I was truly in like the 1% of people who actually post content on LinkedIn because a lot of people just don't which, which is cool. Like if that's not your thing, that's such thing. But I just, it just made me realize that how rare it is to actually find content creators. So shout out to all the content creators out there. We've got a couple of content creators in the building right now. We've got Tom, I've got Eric Sims. What's going on? We've got Kristian Steinert, what's going on? Russell Bunker and Jennifer, good to have all you guys here. So Tom's, you know, when we're backstage, when we're in the green room, so to speak, before we went live, Tom was talking about this. He wanted to kick off the discussion to talk about some secede. But I said, Tom, in order for you to do that, first he had talked to us about what the heck CIC even is and why should we care as data scientists. So, Tom, what the hell is CIC? Why should we care?

Speaker2: [00:03:23] I think we should care because. It makes us better at being constructively lazy, which is what we're all about now. And by the way, I'm being completely serious. It sounds kind of comical, but seriously, we want to be able to write code and it just show up in our test bed. And then when it shows that it's better than what's in production, we just copy that code over to our production pipeline and it shows up because we've already tested it in our sandbox pipeline. So what am I getting at? So we're writing code [00:04:00] and we want the people that we've written those models for, those machine learning pipelines, whatever you want to call them, stuff that does and helpful analytics on data to answer questions for the companies we work in now that we're for the customers we're serving and. I always had this vision of doing it a certain way because I've been spoiled working with data engineers and system admins that had things set up just so, and they didn't want me touching that part. They just wanted to serve me with what they had set up. They told me how to release my code and they did all that for me. I showed up at a great new company and I said, Oh, hey, how do we release these one off models for people in our company? You said silence. Well, I later found out they were a little embarrassed. They would just oftentimes create what they called an r shiny app, which is a fast way to release an API with our code.

Speaker2: [00:05:12] But hey, there's a lot of us that are Python Easters now. So six stands for continuous integration, continuous deployment. So. There's there's different ways to do that if you're in Azure or somewhere else. But I started thinking based on things I had been exposed to. And by the way, as I'm sharing this, it's what I've tentatively but pretty strongly tentatively decided is the best approach. I'm very happy for anyone to disagree with me or poke holes in it. But hang on one second. I have some background music I really should turn off. Okay. [00:06:00] So. The current tech stack that I think's the best way to do continuous integration and deployment is the following. You have a web user interface with an API backend, and in this case I'm proposing fast API. Which is built on top of stream. Many of you that are Python lovers and like to create online dashboards, we'll know about Stream List. So the brilliant creator of Fast API, which a lot of people in the community, when I said I was getting in this really encouraged me to check out Fast API over Flask. Not to say flask is bad, just fast API got some really slick abilities to it and you can completely let it play with stream length. But as I got deeper into it, created my first web UI, mostly with html CSS package called semantic and then. Almost no JavaScript, a little bit of JavaScript, but not anything I wrote. And then Fast API.

Speaker2: [00:07:18] I was really trying to use Fast API as a web framework and I realize over time I serialize. I don't think I want to do it that way. And so that's when I, I decided, okay, I need to learn one of these JavaScript libraries. I knew vanilla JavaScript, but it's one of those things where if you don't use it enough, it grows code. So I, I decided on view and even those that love react, they don't say I've made a wrong decision. They just it's kind of like an R versus Python when both sides respect both tools. So I've gone Vue.js and wrapping up a course [00:08:00] on that. Now that's got my head out of the clouds. Thank God I'm starting to see through the fog and I'm really excited about now. Docker rising all this and being able to launch those Docker containers on any server and especially if they're running on internal servers where we have a local DNS to, we can add specific domain, domain names for our tools, which will be great because they won't have a dot anything after them, they'll just have the name. And so the DNS will intercept those according to a rule and say, Oh, you want us to go to this internally served web page that's going to allow people to interact with your API? Great. Here it is. Dockers gotten so good that once you've launched it, if it senses that the GitHub that it's been created from or excuse me, any git repo that it's been created from has been updated.

Speaker2: [00:09:03] It's instant. Secede. It's integrated and deployed and again being constructively lazy. All of this was very appealing to me and I thought once I mastered this, it's kind of like going out to the shop and pulling down one of your ten jigs, laying it on your table sore and just making a new part. In other words, all this stuff, once you learn, it's pretty reusable for the web UI, for the API stuff. Now I can go back to focusing on I just want to deliver models and harpreet. You and I are excited about getting really good at PyTorch. The additional reason I'm excited about PyTorch is because I can do physics based modeling with it also. And so all of this started when you saw me start that pie twitch series of posts on LinkedIn. I thought, Wait [00:10:00] a minute, before I go to the next step, I'd really like to be able to serve these up. From some cloud service on the web and from a Docker containerized application and OC. I've been putting this off. It's time to learn this. But then it was fortuitous because my company really needed the same approach. But that's it in a nutshell. Let me know where I haven't made something clear. Let me know if you think there's a better way to do RSD or if it's just Coke versus Pepsi, you know. Oh yeah, you're doing Pepsi, Tom. But I like this Coke Coke method over here. No drugs intended here. That's it in a nutshell.

Harpreet: [00:10:46] I'd love to hear from anyone who has any thoughts on this. I don't. I don't know much about CSC, and I can can disgracefully maybe say that I have not done much of that in my career. Anyone here got any insights? Maybe been. I'm sure. Vin. Vin. Been around the block. Let's hear from Vin. If anybody else has any thoughts on this, please do let me know. And let's do Vin, then, Eric.

Speaker2: [00:11:11] I don't want anyone to feel ashamed if they've never done the secede themselves, actually. You guys correct me if I'm wrong. I love Echo, but I think it's an indication that they're still maturing. It's not like there's not a group that could do this for us. It's that they're too swamped. And my vice president said, Go for it. So I was lucky. But yeah, we needed one off delivery.

Harpreet: [00:11:43] Let's go to a then then Eric. And then the.

Speaker3: [00:11:46] Smartest thing I can say about this is I did secede about seven, eight years ago. And the way it was then versus the way it is now is so different. That [00:12:00] I would sound ignorant. And I think that's the best thing that I can say is there's a lot of people who have done it at some point and then stopped and they assume it hasn't changed and it's completely different. So like I said, the best advice I can give you is what I realized myself a few years back. My knowledge is outdated and it is so different now. It's evolved so.

Speaker2: [00:12:25] Much.

Speaker3: [00:12:26] That it's important to. Before you get into it. Even if you've done it before, it's worth refreshing. It's worth going back over again. And if possible, get an expert. Somebody like Tom. Not somebody who used to be an expert.

Harpreet: [00:12:42] Let's go to Eric.

Speaker4: [00:12:44] Okay. So. So way back to the beginning of what you were saying, Tom, and saying I can be constructively lazy because I can look at it in my test pipeline and then say, It's good, we're just going to copy it over to the production pipeline. That was where the beginning. That's about where you lost me. And so we'll say, okay, if you've got your six. So I guess what I was thinking about it is to say there could be tools that you use for it, or maybe it's a process that crosses lots of different tools. I'm sure there are different ways and structures of doing it, depending on your budget and complexity and interest. Like overall, it's a general idea. Like just so I have my like so I have, I don't know, let's pick a web app. I have the Fortune Cookie Movies app that I've launched recently. Right? And I want to make a change to it. And so rather than like a code, code, code, code, code and push to live on the web and see it break stuff is sick. The idea of just saying like, I have this version over here where I'm going to update, I'm going to code, code, code, update, create, I guess update the Docker image, I guess, and then [00:14:00] check it myself. And then if it's good, somehow I'm up to like pushing that, transferring that to say I'm pushing it to my GitHub or something like that where then it's being automatically pulled into the. Web facing or the customer facing item is that the idea of continuous integration and deployment roughly?

Speaker2: [00:14:25] Yeah, I think you're asking great questions. And to Ben's defense, I did the same thing recently where I was thinking, Hey, I'm going to start posting some posts about what I'm learning on LinkedIn and see if people poke holes in it. Will They were quite the opposite. They were constructively helpful to say, Oh Tom yeah, flask is OC for creating these if you're in Python, but we really encourage you to check out fast API. So I did some cursory research and realized, oh yeah, if I if I learn fast API that gets me already more acquainted with stream, which I wanted to do anyway. Now back to your question you're asking actually multiple ones, but in a good way. The reason for Docker is just so that you don't run into the hell of, Oh, it's not running on this machine because, you know, doctor just creates its own operational environment on any operating system it's running on and it just works. So that that was a no brainer. But. Having. Even when I first started to learn this stuff six years ago, I was amazed at how I could make a change to my code. And outside of the doctor container and where doctor was grabbing it from, it could automatically update. So that in itself is a form of suicide right there. So there was this there's two components, doctor, [00:16:00] seeing the changes, making them automatically being hardened against whatever environment you throw it in as long as you're running, doctor.

Speaker2: [00:16:08] The next thing is being able to make the changes and certain add ons you have. The doctor can see them. Really? The last component was to say, well, I'm delivering these tools to people that may not know how to run notebooks or run a Python script. And if I write a python gooey, like with two counters getting pretty nice now, by the way, and it's really easy to use. But what I'm finding is once you learn the web UI tools. There are no harder than the Kiwis. And yeah, I could put all that in a docker container still. But why mess with that when I can just say, oh, it's already updated here as a SAS and internal software as a service tool. So it's all that kind of thinking. And then to your point, I was the same way. I felt like the groundhog coming out from his digging and looking around. Should I be digging somewhere else right now? That's how all this happened. But, you know, then it didn't change a whole lot from six years ago because the only real big difference was moving from Flask to Fast API and I guess moving from react to view. But I think that was like Coke and Pepsi, frankly, from talking to people. Eric, did I answer your questions? Well. I have a.

Speaker4: [00:17:37] Follow up question from it.

Speaker2: [00:17:39] There was one other and then hold on to that question. So. What I meant was I'm always thinking that we have kind of our and it can be anything but we we always have our own sandbox where we're testing and comparing models. That's not production. And what [00:18:00] I put in production I think of as being much more streamlined. It doesn't have all my junk and and loop testing and parameter. It's just like, no, this is the model I want people to use or this is the analytic architecture I want people to use, and that's what I'm serving up now. They're going to use that until I find something better to push, is what I'm thinking. What's your follow up question?

Speaker4: [00:18:26] So like from a from a pretty practical standpoint then is that really, you know, I figure it's probably a good idea to have the person who's testing it, not be the same person who wrote it. Because then you don't run into the stuff that you can't see because you put it there yourself. So is that kind of then in a really simple format, if I have I have a stream of app that's live right now. And if I let's say I want to make a big old change to it, I guess I would just do that on a branch, do that on a branch in the repository, and then push that branch. It's still not the main branch. And then whoever is going to whoever my partner is and say, I'm working on this app with Russell, so Russell's going to then pull the branch, run it on, I guess his machine or a Docker container or whatever, whatever environment we agreed is the environment we're going to use. And then if he says it's good, then some, some process to create a merge, a merge request or pull request to have it pulled into the main branch. Right. I mean, I'm just trying to think a little bit practically of like what's the process of going from writing some updated thing, how to get it from one person to another, and then to get it back on to the main branch and how that's CIC versus something else.

Speaker2: [00:19:45] What I'd like to do is get you to think outside of get it. It doesn't have to be a gift mentality, although I think that would be the best way to do it. I just mean you decide you're going [00:20:00] to take some version of what you've been doing in development, you're going to clean it up and you're putting it in production. Now, I agree the best way to manage that is with branches and merges and stuff, but it's still this.

Speaker4: [00:20:14] Floppy disk, if you prefer.

Speaker2: [00:20:17] Yeah, exactly. But so it's still that spirit of just, okay, we've released something. But now we're looking for what's weak on that. And we're going to do the next. We're looking for the next best release when we see that, oh, this is working better than what's in production, we're going to replace production with this. And I agree using get branching and merging is the preferred way to do that. Christian asked this, and I think it's something a lot of people struggle with. There's a guy I love. You can start with him for free on YouTube. Code with mosh. I think you'll be able to find if you do docker code with mosh youtube search you'll find him. But. And I'll send you guys through Harpreet another really good teacher on Docker. But I do like Masha's approach to it. It's this spirit. Christian If you really wanted to make sure that something was going to operate on anyone else's machine, you might use this approach. You could say, okay, you have to install VirtualBox and then you have to load this virtual machine I'm sending you. And then you put some auto lock script that when that boots up, it's going to run your program for someone.

Speaker2: [00:21:48] Well, as long as they're running virtual box that hypervisor and your virtual machine copy it's going to work. That's the brilliance of however with Docker. Wait, [00:22:00] let me back up. But wait there's more with Docker. What you get is a much leaner version. Then a virtual machine and having to run a hypervisor, you run the Docker engine, but it's much lighter. It's still using the host operating system. If you're on one of those non Linux operating systems. Well the non Linux operating system, it still creates some virtual machine there locally for Docker. But once the doctor engine running and you launch those images as containers and and with Docker compose, you can have these interacting containers. So it's now it's a doctor environment like an orchestra of Docker containers. They're each running separately like little virtual machines that interact with each other and they always run the same no matter what. So it's like that virtual machine isolation, but a lot leaner. I hope that helped.

Speaker5: [00:23:09] Yeah. Definitely is is a hypervisor. Is that going to put a lot more.

Speaker2: [00:23:15] Strain on the local. It's just a it it's a bigger budget because a hypervisor like VMware or VirtualBox or one of Prox Macs, they're there. Many of them operate like their own separate operating system that can launch separate virtual machines on the same hardware. That's pretty invasive. I don't mean it in a negative way. It's just it's like the Borg, you know, they're taking over and Dockers more like, no, I can just sit in my little corner over here and run your app for you and do it reliably on any hardware. You just please load the doctor [00:24:00] engine and launch me in the doctor engine and I'll do the same job no matter where you put me. It's just. It's. It's kind of like. Oh. Then I hope I don't offend a bunch of people in Reno. You know, these guys that buy these giant trucks that have just caused all our insurance, auto insurance rates to soar and they really don't need that truck and to spend that much gas and all that. I got insulted by one of those guys the other day for driving my Prius around and I'm thinking, you know what? This is all I need to reliably, very reliably and very cheaply go from point A to point B. How stupid do you look? Now, I'm not trying to insult anyone that owns a big truck, just those that would say, tease me for having a Prius. I'm like, Damn it, you know, we don't need this big virtual machine just to insure this is going to run everywhere we send it. We can do it with a set of a Docker environment, which is a set of containers that work together.

Harpreet: [00:25:04] Yeah. Like before, back in the days, if you wanted to deploy an application, one application, one server, you'd have to have one machine doing one application thing. So Docker just makes it. So now you can do many applications on one server and all the, you know, Docker containers just like a you start with a Docker file, right? And you take that Docker file, which is just a set of instructions and then it just the container is like the living, breathing version of that. Great questions. If anybody has questions on LinkedIn, do do let me know. There's actually a question that we'll get to in a little bit on LinkedIn. I think this will require us to like marinate on it a little bit. But there's a question coming in from Saad Sajid, who's asking trends for data science and machine learning for 2023 and beyond. What do you guys think that's going to be something just to noodle on while we. Well, we move on to next question. By the way, resources for Docker. Another great resource is that Nigel Poulton. Look [00:26:00] him up. He's got a number of great resources, Nigel Holt and Ultron. So Eric, you've got a great question here and take it away.

Speaker4: [00:26:15] All right. Yeah. So cast your mind back to the early days of the artist of data science. Probably low, lower than 50 episodes, maybe lower than 25. I asked somebody like, What the heck is a unit test? And I think at the time I was probably asking it in terms of like probably that was like not really related to what I was actually working on. So the answer was, you probably don't need to worry about unit tests. However, today I was having a conversation with machine learning engineer and she's like, Yeah, unit tests are way important. And I was like, Cool. I'll ask somebody else about what that really means because I don't even know where to start right now. Here I am. So I was hoping someone could share an example or two of like, how would you use a unit test in a machine learning? Or whether it's machine learning or an application or I don't even know quite where you would apply that. I know Tom briefly mentioned it when he was originally explaining some of his stuff, but anybody else who wants to share it, that would be great.

Harpreet: [00:27:18] Yeah. Sorry. You had your hand up a little while ago. Do you want to tackle this question?

Speaker5: [00:27:23] Oh yeah, I can, I can try to tackle. I guess unit testing in machine learning is probably a massive topic that I'm super unqualified to answer. But one thing that you could do, I guess it depends on the sort of pipelines you're running. So if you have like a training pipeline that's creating some kind of like. Docker sized model that you deploy. You can have unit tests in the training pipeline itself around the data, right? Assuming that ideally the data engineering team has their own unit tests regarding [00:28:00] the quality of the data and things like that, just sort of a separate beast altogether. But you can do some kind of unit tests on maybe the distributions of like a feature. And if you know that certain values don't make sense, certain values are nonsensical, then you can sort of catch that in the training phase. Hmm. That. That's what I would think from the training side. I'm very curious to hear what people would say from the actual CD stuff, because that's more you already have a trained model and maybe you want to actually deploy a container to some container environment. I'm very curious about that, that aspect, because I think I have a lot less visibility in that side.

Harpreet: [00:28:43] Let's hear from Kosta on this. Go for it.

Speaker2: [00:28:47] Can you guys hear me alright? Okay. Awesome, awesome. I'm just trying something different with headphones. Got rid of them because they ran out of battery halfway. So okay. So I like to take a bit of a classical software engineering approach when it comes to unit tests because honestly, what we create and machine learning is not all that different, right? So I kind of like to tear it as what does a unit test, what's an integration test? What's a performance test? There are three essential layers of testing, a obviously end to end testing and then user acceptance like manual testing as well. Those are different. Like let's write, let's ignore user acceptance or manual testing for now. There's starting at unit testing. Essentially, think of it like classical software. You're still writing code in order to build your models, in order to build your preprocessing. So if I want to know that you're filtering out a certain thing or that my, you know, my augmentation function that I've implemented works in a particular way, I'd be unit testing those things. I'd be unit testing things like, Hey, I've got the output from any model that comes out to a particular format [00:30:00] unit test the output interface of that format. Unit test the input interface for anything that's ingesting that format. That's how unit tests are meant to work at a really, really finite small level, right? It's about software. You're not testing necessarily the models or the model output or the model performance. When it comes to unit tests, that is essentially a much larger test, right? That's more of a performance test.

Speaker2: [00:30:24] So the two other things that I would essentially establish is, say, in my training pipeline, at some point, I have a quality a quality check of the model performance. Right. And it has a pass fail metric for that. That's less of a unit test because I'm not testing that unit of code, I'm testing the model performance. Right. So I kind of structurally think of that as a separate thing and I classify that in a different place. So my unit tests live next to the code that that houses all of my model training that houses all of that, like nothing that's even been ized. This is before I push it to a Docker container, right? My integration testing is typically run in the ICD and this is not always the case. But essentially what I want to look at is saying, okay, I have Docker ized this model. My integration testing is does this model in this container still give me the output that I would expect when I put it into deployment or what? I put it into a training pipeline where I want to retrain that model, etc.. Right. So that's what I look at is integration and testing. So you can either test it at a model API endpoint kind of level through the Docker container. Or if you're doing like queue flow pipelines, you can test that at a component by component level. So does this component still have the expected input and output that the rest of the pipeline expects? Right.

Speaker2: [00:31:48] And then you start looking at your performance end to end testing where you're going to start looking at, okay, I'm having to train this model a few, you know, a few hundred times. I look at the whole pipeline [00:32:00] and I essentially say, here are 25, 30, 40 examples of what I might see in the real world. Right. Representative samples of what you would see. And this new model that I've created, how does it perform against these representative samples compared to a baseline minimum performance that I want to see? That's your performance testing, right? So there's three separate layers of testing and I try to reserve the term unit testing for hey, I'm testing a small function like hey, the for matter of my output, I'm not trying to test the model performance. I'm not trying to test the end to end integration of where a Docker container like a dock raised model fits within the rest of my workflow. Right? Because we need to if we apply that same structure, we can then start focusing on those things that really matter, right? So apply the end to end. So when I come to a project, 90% of the time someone's got a model that just kind of works. They've got some pipelines that build and train the model. They might have some bits putting out metrics, but typically no unit tests in the code, no real integration tests between the containers and very rarely any kind of end to end test on representative samples.

Speaker2: [00:33:17] Right. So the question becomes to me is when a data scientist of of hands this proof of concept code base to me and I as an ML engineer coming at it going okay how do I production ize this thing in a robust manner where I can make continual improvements while it's in production? Right. The first thing is a those end to end quality tests and say, okay, this is our baseline quality that we were hitting. Now it doesn't matter what I do, unit test or otherwise it might be manual, it might be really shit and really, really difficult to run this, but at least I can always check that end to end test to say whatever I've done. It still hits that model baseline performance, right? So that gives me confidence [00:34:00] to go and change things in here. Now, as I'm writing more code and changing the code, I might see that the post-processing code is absolutely horrendous because it's just difficult to read, it's difficult to write, it's difficult to change and could be difficult to deploy. It could be inefficient, it could be slow, and we might need performance improvements once something like if we want to really scale something to production, right? So that's when I'd start going, okay, I've got this bit of code that I want to change. Let's wrap that with unit test. Now, it's not like essentially we're reverse engineering test driven development. Unfortunately, half the time, if we start with that mentality and essentially say, Hey, I want to write a unit test for this tiny bit of code that does my post-processing.

Speaker2: [00:34:40] Then when it comes to deployment time, it's very easy to change and modify and upgrade that because you already have those baseline unit tests that you can use. Right. But it's really difficult to see the forest for the trees when we call everything unit tests, including model performance testing, including integration testing. We don't need to reinvent the wheel here. Software engineering has been doing this for decades. Right. There's like textbooks and textbooks about this. I don't see why we can't apply the same format or approach to what we're doing and understanding what's the difference between the model, the functional requirements and the nonfunctional requirements of model functional requirements is it infers and it gives me output in this format. Nonfunctional requirement, which is more of a performance requirement, is what quality does it hit? That might be precision, it might be recall, it might be the time it takes to infer, it might be the memory requirements, right? So those are your functional nonfunctional requirements for a model, essentially from a systems engineering standpoint. Test to that in the same way that we test regular code where we might say, Hey, I want this API endpoint to have a functional requirement, which is this output, but a nonfunctional requirement I performance requirement of it has to respond within 3 seconds. Right. We've just got to hold ourselves to the same design pattern.

Harpreet: [00:36:00] Because [00:36:00] thank you very much. Also shout out to a to Khadijah Bryant in the house. Khadijah, what's up, Jasmine? Henry just joined in. What's up, Jasmine? Good to see here. Then any thoughts on this topic of of unit testing? Eric, if you got follow up questions, let me know all the guys watching on LinkedIn. First of all, smash that. Like let me know you're enjoying this. And also, if you got a question, please do let me know right there in the chat and I'm happy to keep it up.

Speaker3: [00:36:27] Yeah. I think everybody's already nailed it. I mean, there's in a lot of cases, unit tests are awesome for all of the things around data science, like all of the edges, all of the integrations, all of the it's all you tend to think of what it is that you're doing. But in a lot of cases, the best place to put unit tests are what you depend on to work a certain way and what depends on you to work a certain way, and you will save yourself. I mean, you could even throw those in production. That's amazing. If you can do some testing to detect some sort of an event changes the data. I mean, just tons of stuff before it destroys your model in production. So all of the points that have come up, which has been great.

Harpreet: [00:37:12] Eric, helpful answers for you. Are you got clarity there?

Speaker4: [00:37:18] Notes taken hopefully learned.

Harpreet: [00:37:22] Will keep in mind this is this is this is recorded as well as you can always go back and run that back.

Speaker2: [00:37:29] Tom Gilbert Yeah, real quick at in great points. Costa Eric, I know you saw my chat to you all. Just basically if you do a Google search on Udemy Python testing, you'll get some really good courses, customs philosophy approach, you're spot on, but now you've got to learn the details. And I'm assuming you're working in Python, but if not, just look up your language testing learning platform, you'll get a lot of good course [00:38:00] suggestions.

Harpreet: [00:38:04] All right, let's let's keep moving. So you guys got questions here in the chat? Do you? Let me know if you got comments, let me know. I'm happy to take and hear to any of those. But there's a question coming in on LinkedIn, which I think we should chat about and I'll toss this one over to. I just I'd love to hear your your thoughts on this. The question is, what do you think the trends in data science and machine learning will be for 2023 and beyond?

Speaker6: [00:38:33] Oh, man, that's a hefty one, to be honest with you. I think the biggest trends in data science, the biggest trends in data science that at least are going to be successful and bring value to companies are going to be ones that actually take a step back from data science and begin looking more into data strategy and data engineering and building data processes. That work is one of the biggest disconnects between data science that is effective and that can be brought to production with models. Actually do a good job is that process gap. So yeah, it seems to be focusing on that space for the next few years and then beginning to continue to build data science models on after that.

Harpreet: [00:39:18] Me Thank you. You know, when I, when I hear data strategy, like the thing that always just pops in my mind for some reason it's like, like tabular data. And I guess just that's just because, you know, what I'm most familiar with and I'm wondering, like data strategy, what does that look like for companies whose main product is like, you know, you know, doing like computer vision or NLP or reinforcement learning or something like that. What does what does data strategy mean for like unstructured data? Then let's let's go to you for this. And then also like we're still going to be on that thread of, of trends for 2023 and beyond. So if you want to drop a prediction, please do let me know. Just raise [00:40:00] your hand. I'll call on you.

Speaker3: [00:40:03] The thing about data strategy is it's never different. It doesn't matter what you have as far as the downstream tactical implementation, that doesn't matter when you're talking about strategy, you're always talking about that higher level reason why. Why do we use data? What does data end up doing for us from a monetization standpoint? Why do we use data for anything? Why don't we use something else? What does data do for us that nothing else can do? And so when you talk about data strategy, you're not really talking about that bottom level consumer task. You really talking about what is the value proposition of having data. And the purpose of strategy is to inform decision making. And that's what you really need to be able to do with the strategy is everyone needs to look at the strategy and say, okay, I'm now making decisions about data using the same framework. And so we will have as much similarity as possible across the organization. And Data Strategy supports your analytics strategy. Analytics strategy supports your AI strategy because everybody's setting up for the next one, even if you are an AI mature company. What you're doing with data is still creating opportunities that you can take advantage of using your AI strategy. And so all three of them really have to live together and they have to reinforce each other opportunities that you want to pursue with AI, with advanced machine learning. You're going to have to go backward and say, in order to pursue this, I need to change my data strategy.

Speaker3: [00:41:39] And so it's never this static thing and it really doesn't look at what's the end product. If it's a if it's advanced models, if you're doing robotics, if you're doing that piece of it really doesn't matter. It's what's the opportunity? What are you monetizing? How are you going to use it to create value? And those are the more important [00:42:00] questions. And that really never changes. And so when people talk about strategy, especially in our field, we almost immediately start saying what we're going to do. And that's that's the trap. And we always fall into that as a field. And what we need to do is back out because hiring a data scientist is not an AI strategy. Going with us is not a cloud strategy. Bringing in a BI tool is not an analytics strategy. Each one of those really gets sort of substituted a lot of times for a strategy, and we miss the point of explaining why. And so that's the if you want to think about a strategy, that's really the way to think about it, is be as greedy as humanly possible and think about how is this going to make me cash, how is this going to save me? And think about it lazy? How is this going to save me effort? I don't want to work as hard. How can I use data to work easier?

Harpreet: [00:42:56] Ben, thank you so much. Jasmine, thank you so much. Let's go to. By the way, the topic right now we're talking about is just trends in data science, machine learning for 2023 and beyond. If you've got questions, whether you hear the group or whether you're on LinkedIn or even on YouTube watching, please let me know. I'm happy to take your questions. Go for it because.

Speaker2: [00:43:14] Can I piggyback a bit and ask kind of a follow up question to then I guess write to whoever else? So partially, right. I'm trying to validate my thinking here. Right. So the things that you're saying like smaller like well, I say small decisions, but decision points like which cloud vendor are we going with? Those are implementation details that companies tend to jump to because they hear that those are best practice. Right. Without a foundational understanding of what data do we have and what value does that pose for our company right now? Kind of touching back to her original question, which was. Reformulate the strategy for something that's more computer vision related, just as an example, right? Like now with with business sales data and [00:44:00] things like that, you've got plenty of databases with some kind of structured understanding of what content you have in your data. And at a high level, it's not I wouldn't say easy, but it's more, more approachable as a problem in terms of saying this is the data that we have right with vision. Let's say you're a company that's got a whole bunch of cameras as part of your manufacturing process or whatever, but it's not really structured or stored or where do you start in terms of saying, hey, fact finding, what data do we have? How do you go about that efficiently? I guess that's the multimillion dollar question, but how do you go about identifying that in a reasonably sane manner as opposed to, hey, we're going to have to hire ten computer vision engineers to dig through our entire company's history of image sources. Right.

Speaker3: [00:44:56] I mean, step one is hire me. Step two, obviously after hiring me is pay the invoice. Step three, no. Well, yeah, actually, but the way that you want to approach anything is get out of the technology, because that's where you're stuck. And if you're a CEO, you are now I mean, and you're doing this with the best of intentions, but you're dragging a strategic person, a strategist, someone who manages value. You're dragging them into tactics. You are dragging them into technical complexity. They don't do that. They don't manage workflows. Their connection to what you do is the value stream. It's how something creates value for the business. That's where they stop. And as soon as you try to bring a CEO or a strategy planning process onto the other side of the value stream, it's broken. It's wrecked. You can just never do it. You can't come back from that. You have to start from zero again. And so who cares what data you have? Who cares? Who cares what the cameras do? Who cares? [00:46:00] It doesn't matter. It doesn't matter what they do right now, because strategy is an evaluation of trade offs. It's the evaluation of what should we be doing? Why should we be doing that?

Speaker2: [00:46:15] So it's still like essentially boils down to a business question, right? I mean, operationally, are there efficiencies or are there bottlenecks in my operation as a business or barriers to me entering a particular portion of revenue or a particular market share area? Right. Like that was the first question.

Speaker3: [00:46:34] The first question you have to ask is, is this the best way to make money? So here's the business we've built. What is this business best built to make money with? What value is this business? Best built? And you can hear me. I'm like slamming trade offs in there. You know, I am just injecting trade offs into every question. And I'm saying, okay, not what are we built to? What are we best? And if you're a data scientist, you're going, Yeah, that's a question for me. That's a question I answer. Best optimization. I do optimization. I do that all day. What do you want to know? And that's that's the great thing about becoming a partner to senior leadership is you hear the questions that get asked at a strategy level. And more times than not, everybody around the table is like, I'm afraid to answer this question because.

Speaker2: [00:47:28] I.

Speaker3: [00:47:29] Have like three or four guesses, but I don't know. And if I'm wrong, I get fired. This is no joke, you know? And so you have a lot of strategists who will go out and they'll do case studies and they will bring a whole bunch of people who have done this thing before in different situations with different companies at different times. And they'll ask and then they'll make a best guess. That's the old school strategy planning. And so when you're thinking about strategy, what you want to do is just back [00:48:00] away from everything you're doing right now and start asking Why? Why do we do this thing? Why were you doing it this way? What else could we be doing? Like if I just put my hands up and said, We're not going to do anything else today. Everybody get in this room. No, no more work. No more work until you can prove to me what you're doing is the best thing you could be doing. Now you're starting to hear trade offs. And what should we be doing? How is the business built? And that's where this is where that question usually get asked, gets asked for the first time. How is the business built? Do we know how we build value? And data scientists.

Speaker3: [00:48:37] Again, they can say, well, so here's what we gather data on. So here's what I can tell you about this. We don't gather data on and so straight up. Ronald knows that process and no one else does. Carol. If Carroll leaves, we have no idea how, you know, and you start getting these scary. We don't know what's going on here. And those are the kinds of questions that strategy drives, you know, and you're starting to ask the right questions, but you have to start it from the other end where you start looking at what you have and this is where you start, is what it is. Doesn't matter how you got there. If the doors are still open, it's a business. Who cares? You know, it can be it can be run by Mickey Mouse. It doesn't matter if it's working. It's working if it's making cash, it's making cash. But you're going to have to be radically honest with yourself in order to start this process from strategy, not from here's what we have. What could we do with this technology? You really have to look at it at a higher level. It's very similar questions. You're asking the right questions you need to pull back.

Harpreet: [00:49:44] Thank you very much. Let's go to Tom.

Speaker2: [00:49:48] Just briefly, you asked me and I had a really fun wax philosophical session this morning. And what was neat was that her dad and [00:50:00] I had the same reaction to. Oh. That's what you call machine learning. Basically I'm dating myself. I started studying data science before we called it data science, and when I started hearing machine learning, I was like, What? And now, just as a comeback, I use the term math machines just to make people jerk the other way. But what am I getting? There's been a repeating theme among the straight discussion today, which is beware of the new terminology. Cost of you were the one to kind of first get at this to say, hey, we're really just practicing this philosophy. Yeah, you've got to learn the details for specific tool. But so much is sold in this world by taking something that was old and great, abstracting a little and moving it over to a new domain. And I kind of want to say, stop that. Give credit to the original work and let's generalize the terminology and use this generalized thinking everywhere. But it just keeps happening. And I put a link to Greg's outstanding post today. Oh, forgive me, Greg. Lord, Gary Gregoire cacao. His post today had crafting proposals using the Haislmaier catechism. And I was just so glad you posted this, because it kind of is going to this high level thinking we're talking about that Ben was trying to take us back to. And I think all of us could help each other by reminding each other, stop, think real high level, like S.R. was saying, in the way Sankara was saying in the in the chat before you just dive into analysis, stop and really [00:52:00] think. And but a lot of this is just old wisdom repeated. Kind of like what Solomon would say in Ecclesiastes. There's nothing new under the sun. Anyway. I'm done.

Harpreet: [00:52:15] Shout out to great on the building was going on. Let's go to a coast to coast. A follow up comment or question here.

Speaker2: [00:52:22] Yeah, I just want to digress just slightly, riffing off what Tom was saying. Right. This reminded me of something that I was discussing this with a friend of mine after reading a bit of Mo Godard's book, Solving for Happiness. I would recommend it, by the way, very interesting book to listen to or to read, but basically he and I can't remember the exact terminology of it. So if anyone's read it and remembers, please let me know. There's essentially he talks about this cycle of conventional wisdom and things going from challenging to accepted to essentially being so rusted in that we were biased. Now to think that this is the only way something works. So my friend and I were talking about what is the value of terminology or naming something, right? Being able to put a name to something allows us to abstract our learning. I don't need to understand every detail of unit testing, if I can give it the phrase unit testing to make some amount of sense. Right. But then at some point. Right. That phrase takes on its own, essentially its biases, that eventually we've got to come back and challenge and say, actually, is this unit testing? Right. So every terminology goes through the cycle of we name it without having 100% control over it so that we can actually get better control over it until we get to a point where that definition no longer suits what we need. And then we go through this challenge phase in this re acceptance phase of a new phrase or new terminology.

Speaker2: [00:53:58] And we kind of update our vocabulary [00:54:00] that way. Right? But it's it's like this interesting balancing act, like some of what you were saying about try to understand what the original intent of that terminology was before deciding whether we actually need new terminology. What is the difference like subtly speaking between one terminology and the other, right? Like people ask me, what's the difference between machine learning and artificial intelligence? And I'm like. Okay. That's yeah, I start them off with I mean, the way I look at it is like artificial intelligence includes robotics, but machine learning does not necessarily include robotics. Right. Like the and those are the nomenclature that the we're still trying to figure out what's a data scientist, what's a machine learning engineer. So all of these terms were giving them meanings and then we're challenging them. Right now, that challenge cycle is super fast because implemented at at scale, there's a first time. There's a first time we're seeing hundreds and hundreds of people do this, whereas previously it was maybe a few dozen people in the world doing it right. So yeah, there's this interesting cycle. I can't remember exactly where it is, but basically, yeah, MongoDB talks about being able to name something and then challenging the bias of that naming convention further down the track. But yeah, names are powerful, but it's like giving three variable names to the exact same variable, right? It's going to get confusing as hell.

Harpreet: [00:55:22] So here's an answer to that. Machine learning is something you do learn in artificial intelligence. Something you do in PyTorch. No, that's horrible. No, that's not true. Greg, how are you doing, man?

Speaker2: [00:55:37] Good, man.

Speaker5: [00:55:38] How's everyone.

Speaker2: [00:55:39] Doing? Happy to see Jasmine. Khadijah, I've seen Creation before, so I don't remember if I've seen you before, but so great to see familiar names and to put the.

Speaker5: [00:55:53] Face behind the names. Right. On that topic, too, I think.

Speaker2: [00:55:56] Somebody told me in a conversation before that lshtm [00:56:00] actually transformers are kind of like a revamp of lshtm. Can somebody kind of confirm that or.

Speaker5: [00:56:07] Kind of.

Speaker2: [00:56:08] Explain to me? I know they're different, but I strongly agree. Oh, man. Yeah, that's that's what I was like. Very I was so glad the burials teams and. Ah, yeah, yeah, you're right. Even though people have challenged me, Tom, why aren't you going to talk about. Because it would be a waste of time. Attention is all you need. It was already stated in a paper. Go read it. Sorry, that one really gets me spun up because I'm like why the hell do you want to go back to those things that caused us so many problems? Sorry.

Speaker5: [00:56:44] If you have if you have a quick thing for me.

Speaker2: [00:56:47] I'm ready to be schooled. I heard it. I didn't say anything about it, but.

Harpreet: [00:56:52] I.

Speaker5: [00:56:52] Never really.

Speaker2: [00:56:52] Took the chance. I was waiting for Oprah to come on and say, But, Tom, how does that make you feel? And I guess you all know.

Harpreet: [00:57:00] Or Jasmyne, if you got any insight on this as well. Happy to hear from you as well. Otherwise, we can hear here. Tom, go.

Speaker6: [00:57:11] I don't know how to say it. Explain it in a quick blurb. But essentially long, short term memory has a layer of trying to overcome degradation by way of a gradient processing by using gates. Whereas. And so it ends up. In certain modeling functions, having problems with with certain types of output. If you have a model that particularly has a bunch of noise versus, say, a transformer, as Tom so adequately mentioned, is a attention based. Right? So you have an attention layer and then post that attention layer, you have a few layers afterwards. So you have less degradation between the two other than that. Yeah. [00:58:00] I'm not sure why the why why the comparison between the two. And no one must know about the animosity between the two. But essentially they're not the same architecture one they function differently with how they are able to identify data, replicate data, and then from there make decisions or classifications using said data.

Harpreet: [00:58:23] Thank you, Jasmine Coast. Have any insight into this? That's one thing that is on my learning roadmap is to spend some time learning more about Transformers. But I do kind of notice that like preceding every Transformers discussion, they talk about Ram. So in my mind they kind of get the same kind of issue going on as Greg. There cost of any insight up to this?

Speaker2: [00:58:45] To be honest, there's far smarter people about this than me, right? I mean, I've spent more time with CNN's than with any kind of transformer, to be frank. I think I think a lot of the stuff that I've just read, very brief stuff, right? Like the attention is all you need paper and a few other foundational works. Right? But yeah, a few of the, a lot of the stuff that you see like medium articles and stuff, you're right. Or they do start with the A, it starts with like R and ends or it starts with LCMS and it I think it's kind of like a crutch, right? Like it's hey, start off with something that basically at a high level might have similarities and parallels that you can draw on, but not functionally or foundationally the same thing. Right. And that's that's kind of important to understand is where are they drawing a foundational parallel versus a functional parallel. Right. And I think that this is where like education, communication is really important and not everyone is good at it. Right. Most researchers are very bad at communicating their findings just because they're good researchers. They're not great educators. It's just functionally different. Right. But yeah, I mean, the to my mind, I haven't seen too much on [01:00:00] on any of this on Transformers.

Speaker2: [01:00:02] But I think just I try to look at it from a functional standpoint. What does it do differently for me that CNN's Don't Do Right, like from an object detection standpoint or something like that? I think just having that essentially that local attention, right, being able to look at correlation between between features that are not spatially close by is quite powerful in a way that CNN's don't seem to match up too. So is that allowing us to look at larger images with a larger field of view? Does that eventually start to unlock more broad contextual understanding as opposed to car in lane? Is this able to tell me about, Hey, I'm looking at a wide flow of traffic. Can I detect chokepoint, bottlenecks, things like that, right? Can I detect an additional layer of context into the data? That's the kind of thing that I look at from a very top down perspective, because I haven't had the time to really play with the foundations of it, right? So whenever I'm looking at new architectures, new technology is coming through, I kind of assess it through a lens of what does it do for me? The previous technology doesn't do and is it significant and game changing otherwise I'm not bothered to spend my time on it.

Speaker2: [01:01:13] Right? Like, just to be very frank, we don't have that much time, right? As you seem to remind us, every recovery. Right. But I look at things like there's a lot of research going into slam for robotics and looking at, you know, Kalman filtering with with deep neural networks and things like that. And I'm like, okay, you're getting a 2% improvement on my on my filter performance. Why do I really care? You're not fundamentally changing this. You're adding a stack of compute power to what I need. And it doesn't actually solve my problem. It doesn't change the fact that we're still only getting like 70% like confidence in terms of our sense of fusion. Right? Like you're not reducing [01:02:00] the you're not increasing the accuracy to which I know where I am as a robot. That's fundamentally what I need to know as a robot, right? Like, so that's where I'm like I start looking at things at a functional level and then if I see, okay, these areas are making significant leaps and it's applicable to the problem I'm trying to solve now. That's when I start diving into it, right? Like We love this discovery cycle, but at some point I'm more interested in the implementation and actually converting that discovery into real world change.

Harpreet: [01:02:31] That's a useful filter to to to view something. Very much so. Let's go. Shankar that after Shankar we will go to Tom because I feel like Tom has calmed down and he's ready to talk to us about Transformers as I was going to sound.

Speaker5: [01:02:44] Tom Yeah, I'll make my part quick. So cost about 100% agree what you said. I think one useful way that I was reading some discussion about machine learning research and you know, so much research nowadays is just driven by incremental improvements on a benchmark, hard to reproduce, hard to really say whether this will apply in your scenario or not. And obviously it comes with all sorts of caveats that probably also apply like asterisk to the to the the result. Right. So the fundamental question is why do we even have machine learning research? And I think. It all boils back down to. You know. Like not getting someone put it really, really well not getting stuck in the local optima of let's do gradient boosting for everything. Let's do deep neural networks for everything. That's why, because I see a lot of data science is like that now. Applied data science. Whereas I feel research should be the opposite based on that very pithy quote that I saw, you know, try a lot of different things. We want to understand why there is differences between these two architectures. We want to see why. One, [01:04:00] inductive bias. Works for some problems, but not for other problems. Can we categorize problems like, you know, the ideal end goal would be. I have a problem. What problem does it fall into? What model is best? You know? That's just my $0.05. I'll. I'll let Tom take it away.

Speaker2: [01:04:21] Can I just butt in there for a second? Isn't that the irony of having papers named as attention is all you need? Right. It's just things like that where we just dive into, hey, you know, the whole man with the Hamas syndrome, right? We love it in this space. And it partially comes down to how research is structured in universities and how research is valued and papers are monetized. But that's a trigger topic for me. I might need to know from myself.

Harpreet: [01:04:51] You got to have those good headlines, man. Clickbait headlines, Tom. Go for it.

Speaker2: [01:04:56] I will confess, as I was learning, transformers really struggled in the beginning. And some of you might remember I started the IT before our Guild of Data Scientists. There was a Guild of Transformer Learners and this. So before Dennis first edition of his Transformers book came out, he joined our group and he was struggling to stay hush hush while he helped us. But I got very frustrated trying to make heads or tails through studying journal articles. And finally I came across some exceptionally good YouTube videos that animated the math. And also it was a specific comment. I wish more people knew this guy is brilliant. Marcin Beard, he's Polish, but I think he's working in Germany. I can't remember brilliant guy, but he was explaining me in that chat group, Oh, [01:06:00] some attention mechanisms are just graph neural networks. And then I knew what to go study. And finally. All the pieces fell into place when Dennis says transformers are like Legos and you understand that those attention networks are really just graph neural networks that aren't so hard to train. And then you see what they're doing between those steps. It. Well, for me at least. Just all of a sudden, the fog cleared very quickly.

Speaker2: [01:06:34] The last thing that really blew my mind is that the. The encoding. They do the word encoding in excuse me, the tokenization of the words in a very unique, clever way that makes you feel when you see what the machine is doing to tokenize the base word forms. You walk away saying, Oh. It thinks English is a bastard language too. Thank you for proving that because you see the way it breaks it up into base word forms. It's just doing the best it can, but then it does that positional encoding. And I kept thinking that stuff was hardcoded. But no, even that is getting trained in and that we've gotten to that as humans. That sophistication of being able to back prop so much training just blows my mind. Of course, Jasmine is working with a team doing that and reinforce learning. This just is challenging, quite, quite a bit different. But oh, it just once you get over the hump and by the way, if any of you are struggling learning Transformers, it's easier. Now, just start with Dennis's second edition Transformers book. It's well worth the price. Dennis Rothman Yeah.

Harpreet: [01:07:58] He's been on my podcast as well. Definitely [01:08:00] check out the episode we did together. Very interesting episode. Talked about a lot of different stuff, so it was really good. I think I will definitely check out his book as well. Tom, thank you. Let's go to Jasmine.

Speaker6: [01:08:11] Yeah. I also do want to say another thing that I think might make some people believe that they are the same thing is because there's a lot of research between transformers where people are using recurrent neural net layers as the attention layer and a transformer. Same thing with a convoluted neural network layer, like they just swap that.

Speaker2: [01:08:34] Out.

Speaker6: [01:08:35] And, you know, we'll use it to to pick up some lives, especially when you talk about like natural language processing where LCMS are just favored for some reason over CNN. That's a little bit thing that I'm a little bit ignorant as to why they favor one over the other, but I think probably because it's sequential ism, but essentially they're like, Oh, okay, I have some transformer. It's not performing the way I want it to do, to want it to let me just swap and add in some gates and, and add in this lshtm layer. So that also too can be pretty confusing for folks because it's just like, Oh, they're there. Same thing. And it's like. You can put them together. They're like Lego blocks, but they're probably not exactly the same thing. And yeah, don't do the same. Don't perform the same operationally.

Harpreet: [01:09:27] Thank you very much, Kosta. Go for it.

Speaker2: [01:09:30] Can I ask a bit of a noob question here? Is that fundamentally why they're quite powerful for essentially what is it multi domain data kind of models where you can essentially have. A CNN layer or a LCMS as an attention layer essentially for different parts of your data, but then the output is still essentially attention. So you can, in [01:10:00] a sense, have a proxy or a translation point. Or am I misunderstanding how that structurally works?

Speaker6: [01:10:08] Yeah. That makes sense to me. Yeah, I think so. I mean, I know that with some multi multilingual message.

Speaker2: [01:10:19] Modal. Sorry, that was the word. Oh, yeah.

Speaker6: [01:10:22] Yeah, yeah, yeah. Absolutely. Yeah, absolutely.

Harpreet: [01:10:29] Awesome. Great discussions being kicked off, y'all. I don't see any other questions coming in from LinkedIn. Actually, there's a comment here coming from Amanda Cook agreeing that terminology can get confusing. A lot of innovation happens with that domain crossover, though. The connection is drawn by sometime who has a more shallow understanding of the two fields. Totally agree with that. And most people in the new field find ideas completely new rather than extension from another field who might be responsible for helping rein in the terminology expansion. Hmm. I want to try to tackle that. I think this touched on on ontologies. When we talk about terminologies does that is that ontologies that were that we're talking about that the word uses.

Speaker3: [01:11:19] Yeah I mean you can throw there's a lot that you can put into an ontology. Any sort of system of knowledge or knowledge management system could be over complicated into an ontology. There's usually an easier way to do it. And so until you get into a large level of complexity, then the ontology comes into play and allows you to really understand the connections between and how concepts are connected to each other. And that in and of itself actually trains neural networks and you can begin to feed that into the neural network. You can start doing you can actually turn that into a causal graph. It can become [01:12:00] parts of it can become a causal model. And so ontologies are actually really cool. I don't know structures is the wrong word, but hopefully you get what I'm saying to end up playing around with and to understand. But as much as I promote them for everything, they really they're kind of overkill for some stuff. So there's, you know, be sure you actually want to put the cash into building one because the people who do it are really expensive, really smart, and not all of them are really nice.

Harpreet: [01:12:30] Ben, thank you very much. Does not look like we got any other questions. Okay. Go for it.

Speaker2: [01:12:37] And I guess this is where that balance comes in, right? Between if I don't know anything about the area, if I don't have an ontological map. Right, I can call something whatever I like, which gives me the great freedom of sewing things together in a way that no one's ever thought of. Right. But then you also have the other side of it where if I don't know anything about how I'm doing things, I'm probably missing tricks that people have solved over and over again. Right. Knowing even to the point of in team structuring in a company. Right. Knowing what to call a team. What's the difference between calling it an ops team, calling it a DevOps team, calling it a platform infrastructure team, calling it a machine learning engineering team. Right. Naming the team the right way, the role the right way gives you the right perspective of what the purpose is of the team. Right. So we like there's this balance between, hey, we name things the right way because we have best practices that we can apply because we can identify a certain area of a problem as something we've solved before. It's like when you look at, you know, algorithms and complexity, if we can divide and conquer a problem into smaller bits and then substituted for proxies of things that we've solved before, we can use that ontological map to help us map that to things that we've solved before.

Speaker2: [01:13:57] Right? And then leave the things that we haven't solved and really [01:14:00] break it down into. It's like first principles, components of what is each part of this complex system doing. And for that to do that well, you need this interesting balance between between people with experience in the old forms of terminologies that are established and people who are kind of greenfield and new to it, to have that balance between innovation and essentially best practice from experience in years of getting bitten by stuff and learning things the hard way, right? So really, this almost touches into how we talk about start ups, miss a lot of tricks because they don't hire enough people with enough gray hairs. Right. It's just fundamentally very helpful in a lot of in a lot of cases. Right. Having that right balance of experience. That's right. I'm just shocked that. Great. I'm jealous, man. I need more gray hairs on the way under my career belt. Yeah.

Harpreet: [01:14:57] I can.

Speaker2: [01:14:58] I think then I think Ben's gray is underrepresented somehow. He should be grayer. He should be gray.

Harpreet: [01:15:06] I can attest that gray hair does not imply wisdom. I've got plenty of it. And yeah, thank you so much. It's been a great, great episode. I really enjoyed today's conversation. Super excited to get this one published a few more weeks. October 7th will be episode number 100. I can't believe that. Episode number 100. Wow. That's crazy that anybody's going to be at the Intel Innovation Conference in San Jose on the 27th and 28th of the month. Let me know. I'll be there. Looking forward to hanging out. So please do come by. That's it for today, y'all remember you got one life on this planet. Why not try? Do some big cheers, everyone.