comet-office-hour-feb7-mixed.mp3

[00:00:16] What's up, everybody? Shout out to everybody that made it here this morning. Thank you for spending part of your Sunday here.

[00:00:24] Welcome to the comet and our Happy Hour, powered by the artist Data Science. I couldn't be more excited to to do this. This is such an honor to be here with my co-host, Ayodele. We're going to we're going to get to know Ayodele real quick and then we'll jump into your questions. So ideally, why don't you tell us a little bit about yourself. So first, talk to us about where you grew up. What was it like there?

[00:00:50] Yeah, so I actually was a military brat. I grew up East Coast, Virginia Beach area. I was where I spent the vast majority of my early childhood, but we lived in New Jersey, Arkansas. All my family kind of landed in Texas. And from there I've kind of been a nomad. So I went to a ton of different places for school. I went to Pittsburgh for undergrad and then moved to Denver for my master's degree in Data Science.

[00:01:17] And I'm back in Denver now after taking a couple detours for work. But definitely I'm one of those people who has to think about it. If you ask me what my hometown is.

[00:01:27] Yeah, that's that's awesome that you were able to be exposed to so many different places in the States as far as you're growing up. And one thing that's interesting is I remember talking to you before your path into Data sciences, kind of I want to say almost reluctant, like you weren't always a Data science type of person, but you grew affinity towards that. Talk to us about that.

[00:01:51] Yeah, it's really funny. When I saw when I was going to graduate high school was going off to college, I was undecided. I was that typical. I have no clue what I want to do with my life. I thought I wanted to go into medicine. So I was premed for a while. I thought, OK, maybe now that I'm freaked out by, like patients and blood and all of that, I maybe want to be into journalism. I went to school for journalism for a while, kind of bounced around, ended up with like a associates degree in film. And when I was starting to look for work, I was like, oh, my friends look like computer science degrees. I make a lot more money than what I'm making. And I had always been into tech and computers to my parents were early adopters.

[00:02:38] They had like the big satellite TV in the early nineties. So it's like I kind of like this whole tech stuff.

[00:02:45] But I was a case major for two years and I really didn't enjoy the program. So I found it difficult because we were starting off learning like C++ and doing coding tests on paper. And I'm like, I don't know if I'm cut out for this. So I actually ended up with my bachelor's degree in communications, but it kind of worked out in that I ended up working at marketing agencies.

[00:03:11] I was doing like social media content creation and managing things like their campaigns. But that actually led to a really cool job at an app company where they wanted me to do that work. But more on there in app Data. So looking at there, a couple hundred thousand users and a B testing like push notifications in app messages. And so I did that for about six months until that little startup ran out of runway. And that's when Data science was starting to get really hot. So I was starting to see big articles and like, you know, that requires a lot of analytics work and AP testing. This is some stuff I have played with. I think I could do this. And so I actually went back to school after that for my masters in the Data Science. And then, yeah, they're working in the fields.

[00:04:01] And you've actually got a course that you recently launched about supervised machine learning, if I recall correctly. Right. Doing it tough there about that course that you've put out.

[00:04:11] Yeah, I'm really excited. It's a fairly mid level course. If you are like someone who's played around with Python or has tried to understand using no high end pandas, I kind of just goes into some of the details of the technical concepts, the what actually is a neural network and what is a decision tree.

[00:04:34] So I'm really excited about that because it is up on LinkedIn now. And it was really it was really difficult, I think, to try and do the course, especially like right as the pandemic hit.

[00:04:47] But I think that it's it's helpful for folks who definitely need to see that code along and to write the code along with learning the kind of theory at the same time.

[00:04:57] I love that approach to teaching. I think everybody here should definitely check out that class. Yep. But on LinkedIn learning, if I'm not mistaken. Right. So definitely if you guys have LinkedIn learning, I think that comes free with your premium subscription lucratively course on there and check that out. So here's a question that I know everybody here has. Like what. What? What does a Data evangelist do, like what does that even mean? Talk to us about that.

[00:05:21] Well, I think in similar to Data science in general and that it's kind of undefined, but a lot of places. But for me and that comment, it really just means being able to create technical content and to be able to relate to the day to day job of a data scientist, but working more and trying to create things that are useful.

[00:05:42] So reports, blog posts that are actually helpful or things like tutorials as well as posting stuff like this. So it's a little bit technical and a little bit marketing as well. But it's nice because I definitely felt like for a long time a lot of my marketing knowledge and things that I gained along along the way weren't really meaningful at other roles.

[00:06:04] So I could use that a little bit more now because it's kind of a way to combine skills, right. So you're combining technical knowledge with the communication aspect of it. So you're really combining skills and this little kind of intersection of these skills. This is a Data evangelism. So talk to us about Comet Emelle. Now, I know everybody here is wondering, what is Comet Emelle? First of all, before we talk about what is Emelle, why is Comet HTML even hosting office hours?

[00:06:35] Yeah, I think the thing is, is right now, it's also really hard to feel like we have a community. I know pre pandemic. One of the things that I loved about Data science is I've always felt really excited by the community. So going to different conferences and being able to ask me what I thought were really stupid questions, I always felt fairly comfortable because so many people in Data science come from drastically different backgrounds. So really we're just trying to have a kind of virtual format that's like that without, you know, being worried that you are asking a novice question or that, you know, it's going to seem too basic. So, yeah, I'm hoping that this forum allows for that a little bit.

[00:07:20] Yeah, definitely. I was super excited when Gideon reached out to me talking about he wanted to to contribute, give back to the community and just help help me help other people in a more massive scale. And I thought that was a really awesome and we're really well aligned with those those points. Just try to give back to the community and help out. So now talk to us about what is covid Emelle and where does their product fit in the machine learning lifecycle?

[00:07:47] Yeah, so I will tell you a little bit about how I got introduced to comics. So like full transparency, I've been a comment about a month now, but my first thoughts when so Gideon reached out to you and I was looking for work and he mentioned the product and my first thought was like, this is something that I should have been using. And I think that's why I'm so excited about our product, because I've been in situations many times as a data scientist where let's say I'm experimenting in a Jupiter notebook and my kernel restarts or I accidentally overwrite a single cell. Your work is just kind of gone like it is. It's a small loss that I think especially when you are moving maybe from academia into industry, you're like, oh, it's not a big deal. But when you are at these really large companies or you are doing a really intense experimentation, like building dozens to hundreds of models, it could be it's a really, really costly problem. And on top of it, what Comet essentially does is solves what GitHub kind of did for coding in general. So I think we've all probably run into the limitations of not being able to host our Data or our hyper parameters or our model metrics aren't just saved and log to get help because our code is.

[00:09:14] So that's essentially what Comet's product does. You can pip install comment, you can use it with R or Python, or if you are working in deep learning, it works with like literally every deep learning framework. And what you do is basically say, here's an experiment and I know you are making a male model and you want to save all of the data from that. You say, all right, put like a little line of code and that's actually going to save everything in that file from your code to the environment you're running in. So what version of Tensorflow you're using? I don't know if this has been a problem that you guys have noticed, but for me, especially when I'm trying to, like, collaborate with people on my team, I have to manually send the dependencies. They have to go in independently, independently download, or I've sat down like in one of my larger data science teams. There is about 40 of us, most analysts and data scientists. Got default windows, computers and all of our managers had books, so every time I sit down next to my manager, she's like, why isn't your code working on my computer? For some reason, I'm running into as a single bug here and we spend hours trying to debug that comment.

[00:10:31] That basically solves that by making it really easy, essentially just sending a link to someone as long as they have like Python installed, they don't have to go and get every single dependency that you use. But what's really handy for people trying to analyze how well their models run against each other is you can plot things like your accuracy over 15 to 20 to 30 experiments and really see how they compete instead, as I have done manual like word docs and writing down in my model metrics or a spreadsheet. So but that's really, really tedious and it's like incredibly prone to human error. And I think if we're being honest, like it's it's hard to do that on a on a large scale. It's hard to do that when you have large teams. So, yeah, that was my first thought coming to comment was like I was on a team of a lot of people and we used absolutely nothing to manage this. Everyone was kind of left to their own devices. Sometimes it would be here's a screenshot of my environment in the email with a screenshot of my metrics that perform the best. So we're trying to make that whole process really, really simple and actually less prone to that human like clerical error.

[00:11:55] Yeah, I'm a huge fan of the product. I've got to play around with it a little bit. That's freely available. If anybody wants to just comment that Emelle, I particularly like the experimentation aspect of it that lets you drill down into how individual hyper parameters are affecting how your model fits. I think that is absolutely wonderful. Who knows, maybe at some point one of these sessions we can do like a code along where we just kind of show how come it Emelle works with like a simple, simple, random forest type of problem or something like that. So keep an eye out for that, guys, in the future. But thank you so much for for sharing your story, talking to us about comment about how they're being so, so wonderfully generous as to to help host these officers. And, you know, I'm glad that we're able to talk about them and the product that they offer. Now, without further ado, let's get to your questions.

[00:12:46] First question up I got is from Norrish that after Norrish, we will go to Giovana. Then after Geovanni, we will go to Mark. So you are up.

[00:12:58] Yeah. Hello, everyone. I hope you all are doing it eventually. My goal is to make it to machine learning. And so my question is for newbies. What are the dos and don'ts? Who wants to get into machine learning?

[00:13:09] And I think that the the dos is first clearly defined. Any problem that it is that you are about to solve and make sure that Emelle solutions are applicable for the problem it is that you're going to solve? I think that is a big deal. And the big don't just don't randomly start throwing algorithms at problems and expecting some magic to happen. So that's a very, very basic level that I'd love to hear from a I think I would probably have some great insights on this. And I'm pretty sure Maqdis as well as the from my hotel in Denmark.

[00:13:43] Yeah. I would say first big deal is to still have an understanding of core concepts from like a top down level. So yes, it's good to have the linear algebra and calculus background, but really understanding why and what you're doing, especially when it comes to building things like deep learning models. I don't I would say don't focus on the algorithm at hand and focus on how well you're able to solve. I will say from a lot of my insights from industry is that what will probably end up in production, if you are a maybe startup to medium size company, is more likely going to be like a regression model and it's going to be like a CNN. So being really comfortable with making linear models and tuning linear models, you work well over wanting to work with the most state of the art all the time.

[00:14:46] And Mark, what do you think? What are some dos and don'ts for people who are trying to essentially break into Data science?

[00:14:56] Yeah, I think I think the main thing for me is that I'm trying to come out engineer eventually. Some like I feel like I had the same question. It's like when you're further along with it. I actually intentionally chose Data science rolls aren't really Emelle heavy work I startups. So that was. A really great point, we're not doing neural networks, I'll be a horrible thing. We're a startup. And I think the key thing that I was really interested in is where does Data drive business impact? And also how can you determine when a model's like a really great choice to move forward, whether we do some statistics.

[00:15:34] And so covid do is like specialty, which I work on as they solve business problems, learn how to speak to business stakeholders. They will give you all the key assumptions I'll make creating a model a lot easier and then also become very like product focused. So learn how to talk to customers, listen to customer needs. Those are really great clues on what models to potentially choose or not choose and approaches for that at all. Make the process a lot easier, because I think it's something I've learned being in startup and trying to implement maybe just a model. And I mean, we're seeing there being a regression or some type of juristic is that talking to business stakeholders will save me a lot of time up front. And so when I do get to do more of the advanced things, I have a very good sense of when the assault. I can just focus on teaching myself the books I've picked up along the way.

[00:16:31] I 100 percent agree with that. And it's interesting because when I was first breaking into quantitative careers, my first kind of job in, I want to say Data science, but let's just say it was just in the Data realm was as an actuarial analyst and I thought my job would just be sitting in the corner all day, building suite models that predict things. But that's not really how it works. Right at the end of the day, people need to understand the work that you do. So I think the biggest do I would say is do realize that your role as a data scientist, as a Data practitioner, is to solve business problems first and foremost. Right. So solve business problems typically to do two things, either reduce costs or make more money. Right. Your job isn't just to sit on a notebook out and just try random things, hoping something works. So I guess a big do, I would say, is do focus on creating value for your organization. So let's hear from from my friend Jay here. And then after Jay let's hear from Giovana on this topic and then we'll jump right into your question for you. Share your thoughts with us.

[00:17:37] You are so my experiences I've taken a lot of classes. I attended a lot of webinars. I, I would boil down to three things.

[00:17:47] You need to be really strong to be a great data scientist. You know, I've done deep learning projects, you know, all that kind of stuff. But I can tell you three things that you really need to strengthen in order to be a successful Data science. I feel you need to know programing. And number two, math and statistics and SQL. These are the three things as far as tools are concerned. You need to know those three and they are no particular order. But I think I speak from experience. I took a lot of stuff. At the end of the day, these three things were the most powerful. I when you look at a job description, you look at the first top three or plus top four bullet points and you always, always find that Python or SQL or, you know, math statistics. And I say math. I'm talking about probability statistics, linear algebra, calculus. Those are the three things. And even those three math, that is no particular order. Just pick one and focus on that and be strong on that and go to the next one. So these three things are the top for me.

[00:18:50] And I think you're doing Data science as far as algorithms and all that and machine learning is concerned, that will be like a fourth thing. So I would say, like what, Harp billion?

[00:19:01] I'd say knowing the problem, what the business problem and I'll think of the end in mind first and then go backwards and solve solve. OK, do I need to have a model or do I need something like that. Just think of the end and go backwards. So I think those are the four things that I want to share. Yeah. Well my my experience.

[00:19:23] Yes, you very much so. Do you want hear your thoughts on this and then we can jump right into your question afterwards and you have it.

[00:19:33] But one suggestion is about to start thinking on the field that you want to dove into, because when we start, we start to doing a lot of we try to do a lot of projects. Maybe we have a very nice portfolio full of projects. But if someone goes in in our in our portfolio, they don't understand what is the field that we are stronger. So you want to start a base on something that you can start building your your profile so that the it once again. And try to understand what is the field that you would like to work in and start building projects about that. And another thing is about their soft skills and AIs Harp, he said that at the beginning, it's their communication skills are are fundamental for not only for the signs, for everything, but doing that and good questions if it's going to help your tool to have great results after your mother. So that are my advice.

[00:20:53] Yeah. Thank you very much. I 100 percent agree on this being able to ask good questions. That is definitely a very underrated skill, I think is just being able to ask questions in such a way that you get more information out of your stakeholder because with more information you can help solve their problem that much better. Thank you, Giovana. So do you want to let some let's go ahead and jump right into your question, then after you will talk to Mark Depeche and John Dietz.

[00:21:23] Thank you so much. And I'll start doing a project I would like to to build my own at the set of images from scratch. And I would like to to know if someone can guide me about how to do my own. I have all the images, but I get some problems. So there is a lot of information I need on the Internet about Valdese. And I would like to have the data set labeled. So I don't know if you have any any tutorials, ideas or how to how to start at the set of images from the scratch.

[00:22:05] Yeah, that's a that's a tough question there. So. So to understand the question completely. So you've got a bunch of images and you essentially just need to get those images labeled somehow. So do you have any subset of those images already labeled?

[00:22:21] Ok. When I have got these images, these images are for classification model about if a woman is using pince cats or is not an image that is a classification inside the classification. So I have the images, but I need to put them in a data set and I need to label them to do MOOCs any suggestion?

[00:22:51] Yeah. So I'm going to take the cop out, answer here and say you should look into HWC Mechanical Turk and get people to manually label them for you. I think it's pretty cheap to do that. So sorry if I stole that answer from anyone, but let's hear from Odili. How would you handle this.

[00:23:08] I was going to either have mentioned the same thing. There's Mechanical Turk and a couple other similar websites where they just use user generated kind of labels. So those are things you can services you can pay for that you can then go and verify what those annotations are, what those actual labels are. Other than that, the only real way to do is to do it is to do it manually. I've done even for small computer vision problems where I've created my own data sets. Sometimes I've had them specifically in different folders. So pants, skirts, tops and all of these similar images in that. But it is unfortunately time consuming otherwise.

[00:23:53] Yeah. So if anybody else has some insight on this question, I'd love to hear it. Otherwise, I'll link you to resources here. Maybe if you have some subset of these images already labeled, you can look into using some type of semi supervised methodology. So I've got a link here in the chat that I'm putting in. This link is a link to papers with code dot com slash, task slash, semi supervised image classification. So just type that into Google for listening at home will be able to check that out as well. So those are really the only two things I can think of most of because I just haven't done anything like this before. But if you if you have resources to use Mechanical Turk to get people to manually image the labels, uh, sorry, manually label the images, then do that, or if you have some subset already classify, then maybe some semi supervised type of methodology. Anybody else have any experience with this problem statement?

[00:24:59] There's not, say, the classification like I'm not going to be a vision person at all. I watch a bunch of YouTube videos that are really interesting, something I thought that was really cool. That is like try to expand your Data. That is an addition to labeling. They all, like, flipped the images upside down or changed the colors for a so that way you can expand your nose a lot more in a different way.

[00:25:22] Yeah, I think adding to Mark, I think he's talking about Data augmentation of coloring, flipping horizontal, vertical, that you get extra images from the current image you have. Yeah, I think that's called Data augmentation, I believe.

[00:25:39] So hopefully those are some good keywords rejuvenated can help you on your way. If not, then definitely swing by Friday's officer where there's far more smarter people than I am who might be able to help there. So, Mark, let's go ahead and jump in to your question.

[00:25:55] Yeah, definitely. One of the things I'm really interested lately is my goals go to small engineering eventually. But this time Ops has been popping up over and over again. It's still very new ish and it seems like there's a lot of overlap. But the Mellops kind of area looks really interesting. So I'm curious, like, what's the difference between Mellops and engineers and where do you kind of see that role evolving with your crystal ball? I guess no one could tell the future.

[00:26:29] I think that's a great question to hear from NLP. So I'd like to take this one on.

[00:26:33] Yeah. So I think the upside, the way I see is it's very much a kind of hodgepodge of job titles that essentially get the operational side of Emelle kind of done so partially, at least in my experience, partially some Data engineers working with Emelle engineers, working with software developers and in most cases using like a variety of tools to get things done. So where we are and kind of the MLR space, we're on like the experiment management side. But there are a lot of other really big companies like you look at and that are managing like your Data Lake or you or cloud database systems. So all of those tools kind of work in collaboration. Together with. However, you are also pushing models to production. So whether that's like a Sensage maker, whether that's containerized with Dr. Cooper using Kubernetes AIs. So all of these things are kind of in that whole kind of bucket of Mellops. But I think Emelle engineers are typically morsal working with software engineers, data scientists to kind of finalize models and then really working to put them into production. So still working with a lot of the Moloch's tools. I think it is definitely in the realm of responsibilities, especially as we start seeing companies get more sophisticated in how we are pushing into production and how we are working on model monitoring. Sisay and Mellops is kind of like your cloud Data how you experiment and manage your experiments for the models you create, as well as all of the tools required to actually get models into your product and then like model monitoring and like feedback. So huge space. But I definitely think, yeah. That there's a lot of work for female engineers in that.

[00:28:40] Yeah. If I can piggyback on what I would like. You're saying when I think about Mellops, the first word that pops into my mind is reproducibility. Right. So Mellops is really just the operationalization of machine learning systems. So that is everything from versioning your data pipelines to versioning to data sets that you're using for training to versioning hyper parameters for whatever model it is that you're using at that time, plus the back end of it. OK, what do we do once the model is deployed? How do we track everything in terms of model health and maintenance and things like that? So if we were to kind of reason by analogy and this is probably can be Siggie reason by analogy, but I'd like to think, as you know, what what a software engineer to dev ops person is like. Right? So that's how to think about, like machine learning engineer to, let's say, a Mellops engineer. I don't know if there's any Mellops engineers job titles yet. As far as I know, Mellops just like a philosophy to operationalize machine learning systems with a emphasis on ensuring reproducibility. To answer your question, hopefully one thousand percent.

[00:29:52] Thank you. Thank you both.

[00:29:53] Definitely. So you want to go ahead and jump in to where you just ask a question. So the next question is going to be Depeche Mode, yet we could hear you.

[00:30:04] Hi, everyone. This is the fish I'm from. So I'm actually currently working in a pharmaceutical consulting firm. So since the tensorflow. For this initiative, this is my first call attending with fellow Data science aspirants, so I actually wanted to understand that when we try to move a business problem to a solution that needs a lot of machine learning, so do we have any, you know, any sort of framework or some set of criteria that, you know, maybe this checkbox ABC? So does this problem even need an artificial intelligence and machine learning algorithm to solve it? Because right now know the current focus area for us has been all we can provide an accurate pharmaceutical forecast that needs a decent amount of business context. But if we propose to a senior executives that, OK, this is a problem that needs to solve, then how do we assess that? And if anyone has any experience convincing some senior leaders that this is a problem that needs Emmel?

[00:31:14] So how do they go about, you know, evaluating the possibility that if you're just talking with respect to the framework aspect of it, you might want to look into Crisp? D.M. That's the cross industry standard for data mining industry, standard practice for data mining or something like that. So that's a good kind of framework to follow. But to to get to the real heart of your question is how do we know when we need to use machine learning to solve a problem? Is that what you're actually asking is, OK, so I'd say, are you trying to predict something? Because if you're trying to make a prediction into the future about something, then maybe that is a place where you use machine learning. Right? Because Data traditional statistical inference isn't going to get you to where you need to be, that you need to start thinking about doing predictions and stuff like that. So I'd love to hear from Odili and Mark on this topic as well.

[00:32:14] And I think you can try to decipher this in a series of questions. So starting off with like what Harp you mentioned, but even going into what kind of data you have available and what kind of data would you need. So I've noticed that it's typically not always the same data set. And if it's knowledge can actually get access to the data they need.

[00:32:44] And I would really double down and say especially within the pharmaceutical realm or within health care at large, do you have access to data about protected classes like race and gender, knowing that up front will make it easier for you if you do choose like an endless physician, to then test for fairness or to test for disparate outcomes. But kind of back on the track of how do you decipher, if you should use email at all, not only what kind of data do you have, but what would be the ideal like deployment situation? Is there something that is replacing HOUTING kind of decision making as far as making certain kind of recommendations that we want to save time with using a Malorie to avoid having to spend a lot of human time on this task? So asking questions about the kind of data available ideal kind of deployments, as well as understanding what the worst case scenarios are. So doing those kinds of having these conversations about worst case scenario and thinking about what those kind of simulations would look like, I think that's going to help you and your team really decipher is this something that is so beneficial? It's worse. It's worth the potential cost.

[00:34:13] Thank you very much. Really, Mark, what do you think?

[00:34:15] The first question I've heard like three parts was the value side AIs the right choice. Once you've determined that, like, how do you convince business stakeholders think it's the right choice as well.

[00:34:26] And then finally, there's this health care piece. My background is in health care. Why I got to design. So like to kind of valuable resources to build MVP's would be the entrepreneur Data set from the from CMS, which is essentially this giant Data set of claims Data and pharmaceutical Data. That's synthetic Data that the US government put out. You can build really great MVP's after that. The second thing is going to the FDA. They've been having a lot of talks about like what does a solution health care look like and how to bring that to market. So look at FDA. I'm blanking, putting the changa find the actual press release they've had recently regarding that.

[00:35:13] I'm a skip over the mall because I think a lot of people are disgusted by assuming, like you found, like the amazing kind of thing about drive innovation at your at your company.

[00:35:24] And I think the business stakeholders is that the key thing is you need to figure out how to reduce risk for them making that decision. Josh put out a really great statement in the comments where really easy, simple solutions get you 80 percent. There are a lot of efforts in going from 80 to 85. It gets harder with every single iteration. And so, like, how can you prove to be the signal that's worth that value to go up incrementally? So I work at a startup. Its main my main focus is getting features out. So I'm not going to build the most advanced EML model or even the most accurate or may take a lot of shortcuts. But it's very intentional because I need to get a feature out there determined. That's what the market really wants. If it's if it's a yes, then we invest more time to actually make this a correct model. If it's a no, then we saved hours and hours and hours of time. So some frameworks are really like for innovation, design, thinking, really building MBP. It can be a simple MVP's hour PowerPoint or Excel sheets before I build a model. Another thing is at Stanford, there's this class called Lean Launchpad. They have a website. This really takes you like how to innovate, a startup idea. And a key thing is just user interviews. You have a business hypothesis. You need to do a whole bunch of user interviews to say like, hey, is this your real business problem? And then show me your MBP like, hey, what we did with this, get that information from that user stories. So when you go to the business stakeholders, you know, like, hey, here's a solution that's validated. We went to talk to users. It's you know, this seems like there's some really big problem, the market size from talking to our users. You can make this much money. Right. And then we built this this first iteration and it's being kept up and it's working. Well, I think it's worthwhile to take the next step that you're speaking to business stakeholders language. It's not about models. It's about reducing risk for them.

[00:37:25] That answer your question and any follow up questions to that?

[00:37:29] Because I think that pretty much answered my question.

[00:37:32] If you don't mind sharing. Kind of like what? What's the scenario like what's the actual problem that you're trying to work on if you don't mind sharing?

[00:37:42] So actually the context is that I think a couple of a couple of months back, I developed the statistical forecasting tool for my consulting firm that needed to be used across DMS. So we usually have a monthly PHARMACEUTICALLY data that we get from IKEA and various other sources. So once we generate the demand forecast every month. So we already do have a statistical model that's being used across if I talk about my line. So it as it is being used across markets now, the tool that I developed was to be used only across one single center. The question was that we'd need a or similar algorithms to advance their time prediction because as far as you know, I, I haven't tapped into their territory yet, but based on the basic research that I had. So the qualitative reviews that I have found, even though I and I have avoided them, are not able to advance the accuracy of the statistical forecasting to a greater extent. It's like a more similar. So the straight question from the business leaders is that is that is, is it worth investing time to advance this far if we are already like 90 percent debt?

[00:38:59] That's a great question. So I would say if you've built a forecasting model using some amount of data. Right. And that forecasting model has spit out predictions and then or forecast whatever. And now you've seen the ground truth and you can assess the difference between what your forecast had projected and what the ground truth was right now, if you were to use some more advanced type of technique. Right. How does that compare over that same period of time against the statistical model, the simpler model that you write? And if you can if you can demonstrate that, OK, you know what, by using an LSM model four time series, we were able to increase our, you know, our accuracy on this baseline forecasting model that we had by 30 percent, then great. Translate that 30 percent increase in accuracy to dollars and cents and then compare that dollars and cents to the amount of resources we take in terms of time and effort to operationalize this. Right. And if that amount is less, then the essentially the savings, the. Means that you have a net positive and you have to go for it, right? That's kind of how I would reason through that and make sense. OK. So next question up we got is we got Josh next on the docket. Josh, go for it. And if anybody else has a question, feel free to just type in. I have a question into the chat now. Ajita, the line.

[00:40:33] Josh, go for a I'm over here in San Jose, California. I'm watching my one year old right now, which is why I am on mute most of the time.

[00:40:43] Well, that sounds awesome. I'm from Sacramento, born and raised in Sacramento. So, you know, we're we're kind of neighbors ish.

[00:40:50] All right. Yeah. So I have a question. I'm the only data scientist at my entire company. And since I've been transferring from, like a different field, I have a really bad case of imposter syndrome. And I just feel like that's probably a problem in and of itself. But I always hear people talking about these different tools I think you mentioned are in the small space like Apache and the IBM Business Informatics Auto Emelle. I have no exposure to any of these tools and I don't even know the first place to go to learn about them if they're important to my business because there's nobody in my company I could talk to.

[00:41:39] Yeah, well, first of all, just suggestion that imposter syndrome thing. And and then I'd love to hear Giovanna's take on the imposter syndrome after we address this issue with the tooling. But do like I suffer from that stuff all the time. And then I realize that actually I have done difficult things for companies and made the money. So I'm not really an imposter. I kind of use my skills to help make money for my organization. But I don't think the tools necessarily make the data scientist right. So right now you're shutting out these tools.

[00:42:07] But I'm like, OK, these are just names of of companies that make things like if you distill it down to what it is that you're actually trying to do. Right. And then from there, based on what it is you're trying to do, look and see that your current organization, do they already have partnerships like, for example, my company, Microsoft, Shoprite, Microsoft as first right of refusal on anything. So if I do anything related to cloud stuff, then, well, I've got to use Azure for that. Right. So it wouldn't make sense for me to look at like eight of us or GCP when in-house already this is the the tool of choice or the brand of choice. So first, just focus on your actual company. What relationships do you guys have with the vendors and then from those vendors, which tools that they have will help you solve the problem.

[00:43:00] So I think the reasoning from from the perspective of what tools I need rather than the other end of it where it's OK, what do I need to solve this problem in terms of fundamentals? Understanding the problem might be the wrong way to approach it. I don't know if that's making sense, but I'd love to hear. Let's hear from our Delta about this. And then do you want to talk about the imposition or in part because I think that's super important.

[00:43:26] So first of all, I'd say my heart goes out to you, Josh. I have been the only data scientist and work and not having a team is incredibly difficult. It is really, really hard to do this in a vacuum and similar to you. And you don't have, you know, decades of experience in stats. And I'd say it can seem really, really daunting. So the first thing I would recommend, I'm basically going to tell you all the things I have told Pastorelli as the only data scientist first is to get a mentor, someone who is like a team leader, Data science manager level, because you're going to be fighting through a lot of things that you don't know, that you don't know. And those things kind of get revealed to you the larger the teams you work on. So when I was a data scientist, no one at a six person startup, there was no way I was even thinking or caring about analysis. We didn't have a dev team. We didn't have like all of the things in place that really large scale and enterprise businesses. You and I both warned that a lot of the technologies you here are going to be for larger teams. So for mature data science organizations that have five to ten plus people, they are starting to think about experiment management or they're starting to think about cloud databases and their Data lake, where you are probably not dealing with a lot of those issues yet. So I think knowing that it's kind of OK to maybe read about it on medium, but it's probably. Are not going to be as meaningful, as meaningful for you to write right now, it should provide you with some assurance, at least. And then I would also say, first of all, being in groups like this is really helpful, especially pre pandemic. The way I kind of found out about these problems was going to meet ups and there'd be someone talking about the issues their team was dealing with, like, oh, I have that same problem and talking through X, Y and Z, what they did to fix it.

[00:45:43] So, um, the the biggest tip I have is to try to not do this in a vacuum because it is incredibly difficult to to not have exposure to other Data folks. And since you can't just bring on 10 people on the team trying to, I see it almost as like making nachos and family making nachos include the people that are maybe in larger Data science organizations that can give you insight into the kinds of problems that they exist.

[00:46:15] Absolute love. Actually, I think Marcu the first Data scientists an organization as well. Right. So before we get to Giovana on the imposter syndrome, let's hear from work on this topic. And trust me, Josh, I've been there twice, been the first data scientist at an organization twice.

[00:46:29] It is not easy. And and, you know, I can definitely identify the struggles going through, but yeah, yeah.

[00:46:38] I'm actually like the first labeled data scientist. The other software engineers are my manager who just didn't have the title, could easily done that. I've done data science projects, but they finally wanted to go towards that path and actually label people towards that. But our team is very small for that imposter syndrome actually had a realization this week. I've been working on a NLP pipeline from like one of the most ambitious Data science projects I've worked on so far in my career. And thankfully, I've been working with a lot of software engineers to get mentored by them. And through that mentorship, I realized that they don't know anything specific. I ask them, they're like, I don't know. But what they're really, really exceptional. That is like problem solving. So they can be like, yeah, I don't know that language by not solving problems. So like I was figure it out or they just know how to essentially with all the unknowns, make make the knowns come, come to light and and connect the dots so you don't have to know everything I learned. It's like, oh my colleagues like in previous Google engineers or Facebook engineers, they're really at the top and they always tell me, I don't know. And that's very comforting in a way for that that component and then kind of staying on top of like what you should know, because I think David Langer, he's the one that always says that Data science, the lifestyle, you know, you have to, like, keep on learning all the time.

[00:48:09] Newsletters, newsletters have been so helpful for me. I download this app called Stupe where all the newsletters go into this app and I basically stay up to date on things. I like the O'Reilly newsletter, very small things to stay on top of the industry. They collect all the information news articles for you and he's got a really nice feed of like what you should stay on top of for yourself or any time outside the job.

[00:48:35] I love that and I love the way you put that. Just solving problems. Right. So don't jump straight to the tools. Jump to how do you solve this problem and how do you solve this problem just kind of from first principles. Right. And really distill it down and then think about the tools later. Thank you very much, Mark. So let's hear from Giovana on this position, because I know you've talked about this before and I think you has great insights here on this.

[00:49:01] Go for it.

[00:49:02] Thank you, Josh. They're the first thing that I want to share with you is just stop comparing yourself with others because everyone has a different situation in in the position. I said that the scientists of any position. So I think because when when we have these an imposter syndrome, like we we always are trying to compare to someone and we try to compare in a sense that we believe that we need to be perfect.

[00:49:39] Except that and this is not true because Data size is a field that every day change every day we have new information so anyone can know about everything. So this is the first thing that you'll need to to think about. Another another thing is that I see what you are achieving until now. So, so far you have done a lot of things. That's why you are Data science in your. In the place that you are working, so we perceive ourselves, we are not that they are the professionals that people think we are, but this is this is something is strange because others think that we are doing a great job, but we don't believe that. So one of the AIs for you is talk with people that work with you. And yes, I like our friendly chat and ask them why. Why is it important the contribution that you are building with your work in the company and do it at the beginning with people that is very close to you. And this is like our medevacing for these these kind of syndrome, because you are going to realize that you are not then focusing on your achievements.

[00:51:16] You are focusing in the things that you are not able now to do because everyone wants to do everything perfect and we are only everyday growing. I know everything but go like baby steps. So you have to understand, everyone has to understand that we are in that learning process every day and focus on your achievements. And then if you fail in something, you have to think that is an opportunity to do it in another way and learn from the failure is an opportunity to learn. So in from the things that you have shared, I can't imagine that you are you add value to your company. So focus on that. And it's OK to try to get to know about more tools, but don't try to do it. Yes, in it. Like in one day the everything, because it's not possible to focus on your achievements and learn from your failures, because the failures are the the things that help us to grow. So this is something that I would like to share with you.

[00:52:35] I think I absolutely needed to hear that. If I could just share some personal with you guys. I actually just so I went through six grueling rounds for an interview, including the hockey rink and a take home challenge. And I got a rejection earlier this week, and that was a immense blow to the confidence right. In Africa. That rejection, like, fuck, man, am I cut out for this Data science stuff like this. What am I doing? Right. That exact same day I had my review of my boss talk about how amazing I've been doing all year.

[00:53:04] Right. So, yeah, I, I had with that just kind of sharing that story and it just what you want to say and just really resonated with me, um you know, just that passing does it. But you like to compare yourself to other people, just focus on consistently creating value wherever it is that you are. And it doesn't matter if a group of PhD Data scientists didn't let you into their club. It's all good. So let's, uh, we got three more questions that we're going to call it a day we got shot, J.R. and then Mark again. So let's go for the shot.

[00:53:37] Hey, guys. So I just graduated with my master's and apparently I'm in my job hunting process. So if I'm interested in a company and I want to know more about that, an object and some of the business problems to help in my application process, oh, what sources I could use to find peace, I'm just hesitant to ask it radically to my connections who can't keep up that because I don't even know if it's the right thing to do or ask them.

[00:54:19] Stay out of it.

[00:54:21] So upside to to answer your question, like every single company has a section, uh, probably under the resources tab that says blog. Every company has has a blog page on their website and on that blog page, if you do a search for quote unquote analytics or machine learning or innovation, you're going to find blog posts related to that topic. Right, that that company is is actively working on. So that's the first step. The first step is go to that company's website, see if they have a blog, read through the blog, and then I read through the blog. Maybe there's one or two blogs are really resonated with you. Come up with a couple of insightful questions and then reach out to connections. Like I was reading this blog on a company website. I thought that this topic was super fascinated. I've got just a couple of questions. You wouldn't mind answering them for me, this and this great blog post looks like AIs doing awesome work. Keep it up right. So that's one thing you could do. Another thing is also, I think most companies have blog posts on like medium. They might have their own medium page. So that's another option for you as well. Ideally, what do you think?

[00:55:32] I was just going to second everything you said other than I'm not sure if you have, you know, maybe your LinkedIn connections with someone at the org and you can do like an informational interview.

[00:55:44] So really, that might be a step to you of what Harp mentioned. So check out a cool blog page. Are more interested in what kinds of, you know, similar Data that you can find that's publicly available to play with. I've always found that it's worked out. You want to do like you could say, it's an informational interview, but really trying to understand them better, understand some details of the project. If you can, like, connect with the person who actually posted on Medium or whoever is representing the company in a specific blog or like YouTube video. But yeah, depending on the size of company, if it's a really, really large org, they might have like Netflix has Netflix research, so they have their own YouTube page with all of their specific AML related content, but smaller companies. I definitely check out their blog and some other online news sources.

[00:56:41] Yeah, it's fascinating how a lot of these companies, they openly share the work that they do in terms of white papers like you'll see Google Research, you'll see Spotify research, Netflix Research, Airbnb even, and a lot of smaller companies as well. Just coming up with white papers that that they just describe that entire process. And I think that is absolutely amazing and a great opportunity for you to learn how real world data scientists actually go about their practice. Anybody else have anything to add to this question? Nishant, was that helpful to that? Answer your question directly.

[00:57:16] Yes, it was very helpful. So I just did the blog and white paper.

[00:57:24] Awesome. So next question up will go Jacob and Mark and ask a question here in the chat, which I'm going to answer right now from a betio is a sensible to pay for an internship. No, do not pay for an internship. And I'm not going to discuss that question further after that job. Go for it.

[00:57:39] Yeah, I have a question. I'm actually struggling with linear algebra and I've taken classes in Khan Academy and I've done three blue one brown videos. They are kind of hard to understand. And if anybody has a suggestion of something much more elementary, middle, middle school and type of basic, basically easy way of learning algebra, because the one in Khan Academy and I think we're almost to too difficult for me, I'm looking for something really, really basic.

[00:58:10] Yeah. So you don't even you don't even need to know everything about linear algebra. You can be an effective practitioner with Data science machine learning in terms of resource.

[00:58:20] The resource from Jason Brownlie of Machine Learning Mastery. He has a seven day crash course on linear algebra. And the thing about this, which I find really, really helpful, not just linear algebra by hand, because who the hell does linear algebra by hand? I work know doing it in Python typically, and by using arrays. So he teaches you just the things that you need to know from linear algebra. But in Python. So you're getting right between that intersection of those two. I don't mean from linear algebra. Like fundamentally, what is it that you actually need to know where you need to know, OK, what what's a real vector? What's the common vector? What is a matrix? What is how do you find the rank of a matrix? How do you what is the inverse of a matrix? What's the transpose that there's not much that you need to know. Right. How do you multiply matrices maybe. What's an eigenvalue. How do I do a singular value decomposition. I mean, apart from that, like I don't really know what else you need to know from linear algebra, like I mean, like column spaces. Like I don't remember what a column space is after my head. I mean, you know, I mean, like, there's so much I don't remember from there because it just it doesn't bubble up in my day to day work.

[00:59:36] Ok, so you're saying Linear does not use as much in data science was because it's based on packages that you get and then it doesn't compute for you? Is that what it is?

[00:59:44] Yeah. I mean, you still need to understand, right? Let's say you're fitting a linear regression and your error estimates for your linear regression are just blowing up towards infinity. Right? Well, if you're not able to recognize that maybe your design matrix has some core linearity in it that is causing your design matrix to be uninvestable, which is why your error estimates for your coefficients are exploding off to infinity. That's kind of hard to troubleshoot if you don't know that. So it definitely is useful for troubleshooting purposes when what you're trying isn't working, but you don't need to know everything from it. Right.

[01:00:25] Ok. All right. All right. I've got another question, but I think I want to save that for another time to consider running out of time.

[01:00:32] Yeah, no problem. Anybody else have anything to add to that?

[01:00:35] Yeah, I would like some like me a little bit in the like to go a little bit deeper into these topics, but what a great book. It's called Linear Algebra The Easy Way. It's a one day it's less lasting. The author is saying, sorry, it's like my books over here, but it is very like middle school kind of level. Here is how we actually do things in linear algebra from like very, very beginning. So helpful for me.

[01:01:06] Ok, so it's called linear algebra. The easy way is that it is called.

[01:01:10] Let me I'm going to go look at that image. OK, that's great. Thank you.

[01:01:15] And another thing that is interesting, I'm going to share this with you. I think you will find this pretty, pretty interesting. So there's a. So I'm all for making learning fun. And there's a series of books that are essentially Munga illustrated comics with like math related topics. So there is the Mongar guy to linear algebra. And they've also got the guide to calculus statistics, regression analysis. So definitely worth checking out if you are just like tired of looking at essentially Greek symbols. Right. So this might be something interesting to check out some banks carpet.

[01:01:58] I will say I've used there's Monga books. They've got a good one on linear algebra stats database is a nice, fun way to feel like it's less daunting.

[01:02:10] Yes, exactly. Yeah, I'll get that one. Yeah. The algebra books that I'm seeing are so boring. It's just like I want to shut that both after five minutes and then I'm just like, not me.

[01:02:23] Yeah, yeah. Trust me, I feel your pain. I taken a few semesters of linear algebra in in grad school and those are, you know, traumatizing and scrubbed from my memory. Um, so yeah. Make it fun. I think this will be a great way to do it. I got to look for that illustrated guide to database's. I think they'll be pretty interesting. So if you could share a link to that, I'd love to see that as well. So, uh, next question we got up is going to be from, um, from work and then we'll call it a day because I don't see any other questions in chat.

[01:02:56] Yeah. So as I said earlier, while my recent projects was building at NLP Pipeline and I got to work with like a lot more, which is really fun, specifically the pipeline functions and Crankcase and Transformers. And I absolutely love it. I've learned about it before, but actually putting the production was a really fun way. And so my question is, you know what, that same workflow, are there any other tools outside ask learn for building up these pipelines of transformation's and kind of getting this output you want? It was really useful for both making it simple for others to understand what's happening, but also debugging as well.

[01:03:36] Quadro Khadra is amazing. It's a Python package, Python Library that essentially makes it easy to create these really extensible pipelines. And it has a it's it's developed by Quantum Black, which is a division of McKensie. Dude, that thing has changed the way I work tremendously. It's it's I work in a much more clear, consistent manner since introducing that to my life. So definitely out of control I.

[01:04:09] Would you have any other suggestions that I can say I use to pay a little bit in the past? They're like little just python package. But, um, yeah, I would not like I don't have a ton of extensive experience.

[01:04:25] So anybody else have any experience working with that? Yeah. So I'm telling you, a kid rock and roll is amazing and it's nice because they've got a visualization package with it. So it's like Federov is you just run that and you're able to see the entire pipeline, like how how Data is moving through and what nodes they're going to and what the output is. Definitely do do some research into into Khadra. Super easy to get up and running. Um, highly, highly recommend it. Cool. I don't see any other questions. Well, let's talk a little bit about this question about paying for internships. Let's let's close on this. So I don't think you need to pay for an internship. Right. Um, because here's the thing, right. You don't need to work anywhere to get Data science experience. You don't need to be a Data science to get Data science experience experience, right? If you distill it down to what it is that a Data scientist does is they solve problems using Data. Right now, the world has no shortage of problems that need solving. And you don't need someone to come to you and tell you that they have a problem, needs that need solving or a question that needs answering. You can come up with one yourself, right. So you can think about something that is interesting to you. Right. And from there, come up with an interesting question that you think can be solved using Data.

[01:05:54] Obviously, the challenge here is finding Data to help you answer that question. But if you think creatively enough, you can make it work, right? For example, you don't need Data from a million different samples, a million different people from all over the world. You generate Data every single day. Right? If you were a wearable device, whether it's a watch or a ring, you generate Data. Right. You can access your data and combine your data with other data. Right. So, for example, let's say I was interested in understanding the effect that weather in music has on my activity levels. Right. While I listen to Spotify every day. So Spotify has an API where I can pull my listening history, whether data is very, very accessible through the Weather Channel API and I've got access to my movement data. Right. I can easily combine those different data pieces together and do an analysis to determine the effect that beats per minute that I listen to over the arrange of a week has on my activity levels like things like that. I'm just riffing off the top of my dome right now. But the fundamental thing is you don't need a data science job to get data science experience. I'd love to hear from from Odali, Mark Giovana on this topic as well.

[01:07:18] Yeah, I think I would also agree right before he mentioned Harpreet, I was going to type in all caps. No, don't ever ask to be an intern. I see that, especially because there are resources out there. One really popular one is to bring talent. So if you are looking for internship experience, looking for the opportunity to play with a really large data sets or to create more models that are related to business needs, you can be an intern interning talent. And there's a couple options, but none of them require you to pay them. Some are unpaid internships, some are hourly paid internships, but none of them really require you to give to the company since you are also using your intellect and your time. So I would suggest looking for other programs like that. The other one I know of is called Arcadium. You are I think most of them. Those internships are unpaid, but it is easier to get a paid internship on. There is someone who has technical Data experience. It's geared a little bit more towards those with some more soft skills like in marketing. But yeah, I wouldn't I wouldn't pay someone to get entry.

[01:08:37] Mark.

[01:08:38] Jovana, would you guys think, OK, I got at my suggestion it's work on your portfolio. AIs Harpreet Sahota. Yeah. The important thing is to show what you are able to do. So how you use the tools, how you use the different mothers so they start building your portfolio. And it would be great if you choose choosing a specific field because it this is something that they do back, because if you have clear what is the field that you want to work in and then you share your portfolio is easier that people can count. Because I know about a business is key. So because, OK, you can be a Data science, but you don't know the business and it's like you have to learn from the scratch. So if they see that you have clear the business and then you are doing things, applying all this knowledge about Data science, I think this is a great way of the at that you can show that you are a Data science. This is my advice.

[01:10:00] Awesome. Thank you very much. Want Mark anything to add to this?

[01:10:03] I feel like I really don't have anything constructive to ask this. I'll stay quiet.

[01:10:08] Yeah, no worries. But so yeah. In general meant there's no need to. A to be an intern for somebody, you can definitely create a portfolio project in such a way that you replicate the thought process and workflow habits of a professional Data scientist, it's not hard to determine how to do that. It's just a matter of putting in time and research. Alternatively, you can come to Data CHEDID conference this Thursday and hear me talk about why and how you can create a portfolio project that will get you hired. So definitely tune in for that call. Doesn't look like we have any more questions. Big shout out to Odali, Dominic and Gideon and everybody at Comit Emelle for helping us set up this office hours. Super excited that we've entered into this partnership. Couldn't think of a better way to spend my Sunday mornings than this. Thanks, everybody, for coming. Uh, remember, you guys got one life on this planet, so why not try to do some big take care, but.