HH43-30-07-2021

[00:00:06] Oh, yeah, what's up, everybody? Welcome, welcome to the Data Science. Happy Hour is Friday, July 30th. Man Completely like I'm always surprised at how fast your year progresses, but today, especially so, hopefully I got a chance to tune into the episode that was released earlier today with the legendary Lillian Pearson. Lillian is awesome. She's she's become a mentor of mine. Over the last few months, we've been in contact regularly and it has been great to sit down and chat with her. Hopefully, guys enjoy that episode. Don't forget to catch up on the backlog if you haven't already. I know there's a lot of episodes out there that you might have missed, that you would enjoy. So go and scroll through the podcast and see if there's anything that strikes your fancy shout out to everybody else in the room. What's up to Alexandra, Antônio, Eric, Mikiko, Russell and Venkataraman? All right. Yo, yo, man, how's it going, man? Super excited to have you guys here,

[00:01:02] Like the whole Gödel's.

[00:01:03] Yeah, thank you. Yeah. Huge, huge news to share with all of you guys today. I put in my two weeks notice that price industries earlier this week, I'll be leaving price. They've been clear to me over the last two years. I've really enjoyed the work that I've done. They're really happy with some of the progress and hopefully help them make over the last couple of years. But next week, next Friday will be my last they bear. And I'll be moving on to Comet, Amelle. I'll be working full time for my friends over at Comet Emelle. It'll be really interesting position working out of their growth team. I'll be reporting to my friend Austin. And together we're going to meet up with some really exciting and engaging community activities. I'll be researching a lot of cool, deep learning models using Comet to do experiments with them, writing about those experiments, hosting more events like this, push more content out [00:02:00] there for you guys. And just in general, just being more of me, I feel like this role is such a good alignment with, um, with who I've become over the last year or so. So I'm really excited about this role. Shout out to Gideon and the team. Thank you so much for bringing me on board. I'm really excited to do some awesome stuff with the guys, so. Yeah, man, that's a huge change for me. Looking forward to that. Yeah. So let's let's kick this this off with the question. I think last week I might have touched on it, that this was like one of the, um, combatant opening questions that were about I was about to ask, but I think I'll ask it today. And that is what type of data scientist are you? Not Data science is a huge field. There are a number of different types of data science roles. What type is not the right fit for you? Let's start with you.

[00:02:50] Well, I said last week, I mean, one of the things I'm not is like I don't do anything with deep learning. So that's that kind. I'm not currently not that I never will be, but that's not what I am right now. I think I've been thinking about what kind of data scientist I am, am or am not in my current role versus where that might evolve to. And that was actually a discussion we were having in my team today, saying how some of us have skills that in our current role where we're using, we're not using necessarily. And so in my daily role, I am not a AI. I'm not I don't I don't do a lot of prediction in my current role, but it's pretty light. It's a as as Dave Langer would say, it's a very much a good enough. Is it is it good enough prediction type thing as opposed to really, really trying to be right 100 percent of the time. But, you know, I think we're moving in that direction where we'll have more of the predictive stuff. So I am not a trendy, trendy algorithm user data scientist. I can say that confidently. There's a lot of things I'm not. I'm also not as cool as Harp. [00:04:00] Yeah, it's just. That's just fact.

[00:04:02] That's true. You're cooler. That's that's right. Because for you. As cool.

[00:04:06] As cool as Harp.

[00:04:09] Yeah. So shout out everybody that just joined in. Alexander what's up Thomas. Mikiko, I love the background you got. That is awesome. Owl what's going on. So everybody listening in on YouTube, on LinkedIn, on Twitch. I'm taking questions. Go ahead and write your question out into the chat. Alternatively, you can join us right here in the room because there's always good side conversations that are happening right there in the room room chat. So by all means, join in on that as well. So the question that we're opening up with is what type of data scientist are you? Not a data scientist, a huge field. Good job. Descriptions are wide and varied, but what type of data scientist are you? Not so, Russell, you have some awesome writing here in the chat. Go ahead and tell us what you wrote out there. And then after that, go to Alexander. Love to hear from you.

[00:05:02] Thank you all throughout the briefing, everybody. So. Basically, I'm not a textbook Data scientist, definitely not the role that I occupy in everything I do in my extracurricular time is just not textbook stuff. I kind of use Data science and everything I do, but I would probably expand that to say I am not a data scientist until I use Data science in my role, but I'm more a business analyst, more a product specialist, more body controls specialist. And I use Data science in everything I do. I've been using data science for a long time and I use a bit of code from a lot of the common languages. But if I if I was in a competition and put up against a stone cold data scientist, I would not find too much emphasis on the person that translates this stuff that the Data scientists do for business use. [00:06:00] So I translate between the deep learning machine learning, the AI, the Data Science Kubernetes run list, and I translate that for the executives of the business and the the requirements of product delivery, etc.. So I kind of sit in the middle of everything. So if you if I had to say, am I to sign this or not, I would have to say no. But I certainly use Data science and everything I do.

[00:06:23] Russell, thank you so much. Absolutely. Love that response. And one of the reasons I wanted to leave with this question was just to show people that there are such diverse career paths within Data science itself. Like there are so many different ways you could take your career, shout out to everybody else that just joined in. Apparently, Mikiko has a question for Joe Billie Joe to join in. So I do want to just touch on this opening question one more time. Let me get to I would love to hear Alexandra's response and then maybe Antonios response. And if Mikiko and Joe, if you guys want to join in on this, please let me know. The opening question is what kind of data scientist are you? Not as you know, there's so many different types of data scientists. We all got our own little niche. We all got our own lane. And if anybody else wants to speak on this topic, please let me know. Everybody that's watching out there lurking in the chats or on LinkedIn and the other streaming platforms, come in and join us or post a question right there. Alexander, go for it.

[00:07:19] Yeah, it's a it's a tough question for me, given my background, just because I find the word Data scientist really intimidating in general, it sounds like really scary on the surface to me. Still, I come from a marketing background. I graduated from undergrad last year and I'm in a grad program now getting my Masters of Science and Analytics. I feel like I'm I'm pretty new to the the Data science realm. So my data scientist at all. Yet not not quite sure I'm hoping to get there, definitely. But I think that just the label data scientist, for whatever reason, sounds really scary when you say business analyst, when you say data analyst, for whatever reason, that feels a little bit more like in my wheelhouse [00:08:00] there. Maybe that's just coming from my marketing and business background. But the second you put the word science in there, I, I imagine being like a third grader and a little kan lab or something messing up experiments.

[00:08:10] So that's funny you say that at the word science because I've been going through some theological crises this week and I'm wondering, am I doing Data science from a Data induction? I'm really trying to figure that out anyway. That's not a topic right now. Let's go and pursue that further. But Antoninus, let's hear from you that after that we'll go to Mexico's question or, you know, if you want to lead with your response to this question is what could that be? Great.

[00:08:36] But sure. Yeah, I mean, the first thing that came to my mind is definitely not like a mathematician, data scientist when I started doing what was Andrew ANCs, the sisters thing about concave and all that stuff. And I think like some of those classes. But it was to me, it was like, all right, no, not for me. And I thank God I was mentoring some students and they're like, well, I'm really good at, like, the python and stuff, but I really wasn't great at linear algebra. Therefore, I'm dropping Data science. I was like, I haven't used linear algebra once in my life, my work. I know it has a place for it and stuff, you know, and some people definitely use it. And so, like, there's a lot more people. But to get started especially I don't let that discourage you. That's why you have hopefully different games. Like I have a little person. I think she's very good at statistics. So when I do need them, like I did something here, can you come and look at it? This makes sense because, you know, I don't understand all of the all of the time or even if I do understand it, as Eric put in the comments, like I'm not so confident in the side. So I think everybody like Russell, you're definitely a data scientist. I mean, the rules are so out there, [00:10:00] but everybody needs to, like, just just apply for the job if you see. I've seen people who say apply for a Data science role, and then the whole work is in Excel and nothing wrong with Excel, but it's not typically what some would say, like being a scientist, you know, just just go for it.

[00:10:18] Be very much Antonie. Appreciate that, Mikiko. Let's go to you for your question. And if you want to answer this question, definitely go for it as well or jump right into question however you'd like.

[00:10:27] Yes. Can you hear me? Yeah, absolutely. OK, cool. Let's try out these new headphones. They're wireless and they're like Chibi. So always, always just right. So I actually put the it's a three part question I'm sorry, in the chat room and that's why I Bitly.com/adsoh I could get some free consulting chorded. So actually. So those are the three questions. Let me just provide a little bit more context. So like very, very new to engineering came from the business side. So I'm used to doing a lot of like annual and quarterly planning. And then you do like monthly updates or something. Right. And so I think the three part question is, so I joined in drinking and it's like a hardcore engineering team and I'm not a hardcore in here. I'm learning to try to be and I think so we see on the Amylin shame we're tasked with building tools and especially like helping a production and develop models. So we have some internal polling that we feel it does feel like there is like big opportunities to smooth out the process and to not just like make it nicer for everyone, but to just make it more effective. We're not going to be like, as far as I know, we're not going to be going in on Vertex yet.

[00:11:43] Even though we're DP House, we're still trying to figure out how can we sort of do that. So the three part question that I put in, because I think this is important is first off, you know, what do we think should be the relationship between the science teams, teams and jobs teams in the sense of like what [00:12:00] is the value that and jobs teams provide? And the kind of the second part of it is like how do we plan initiatives in energy? Because there's a lot of talk about how delivering for like machine learning teams and business teams is different from traditional like engineering products. You build features that you ship it, you do some kind of agile and scrub. But it feels like sometimes when we do that process, it's very easy to just kind of like fix like little things do like mini optimizations and not to sort of tackle kind of high higher value opportunities. So that's like the big question that Ippolito to go on the Air Force and she's working on that Data book.

[00:12:44] And so I think I've done all these roles, which is very unique. Right. So what I would say is that it depends on the tough question I would ask is how does your company structure its technology teams to begin with as it does it focus on a domain? Do you have teams that focus on different domains or is it like the engineering team and then they focus on all matters of engineering and ops and so forth? Because what this really would mean is whether or not you're going to embed a data scientist within a domain functional team that's focused on a specific problem or whether data science could perhaps be separate from regular software engineering. So that that's the first thing probably to clarify.

[00:13:35] Yeah, I mean, right now, like so today's my teams are like they are just their own team and their product for the most part, who really have like a close relationship with product engineering.

[00:13:44] Also report the product.

[00:13:46] No engineering has its own vertical. And it's just like a private company writes far more than I think. Definitely more than a thousand employees, like half of our engineers at least. So each engineering team [00:14:00] like it does have a functional sort of focus. So the team I'm on, as far as I know, is the only MLS team. And we work directly with the design team and we have other teams that, for example, do like Data and we have a Data that we work with. And then there's also the infrastructure. Part of it that's a little bit complicated is like MailChimp Kolo for a very long time. And so they're just making the move to do so. And so there's always that question of like how do we mix an open source? Like what's the right tech mix? Right. So but it's I feel like I feel like I don't want to come in and impose like a planning cycle. But I do kind of wonder if, like, we build up these tools and then we sort of kind of expect the designs to piece it together in some workflow, which does not necessarily lead to consistent and efficient results necessarily.

[00:14:51] Yeah, you're going to have some disjointed stuff coming down the pipe probably in several months is what's going to happen next. You need your engineering. So the application side really does need to work alongside Data. I think in tandem, however, that Harp. Whether that's a product and engineering, figuring out how they're going to have some sort of symbiosis, because otherwise what's going to happen is it sounds like engineering is basically is going to kind of do what they want and Data is going to basically be on the receiving end of stuff, i.e., like everything flows downhill, if you know what I'm saying. So I would figure out, like, how are you going to make that connection between, you know, the different departments first? Because otherwise what we've seen is like typically Data scientists are going to be at the mercy of engineering and really all should be like working together on key stuff. So there's a workflow between application Data science and a good feedback loop, right? That's right. In theory, I think code and really what we and work best is if that's done on like you can, either you can either centralize this and have a kind of top down, or if you if each team works on a specific service, [00:16:00] for example, with the new and better data scientists, along with an Data engineers and so forth. So in each step of the way, each service basically has its own alignment from like Harp to Data and then back again. Yeah, tends to be what I've seen successful. But, you know, the a. pattern would really be if engineering is off doing its own thing. And then Data science is kind of like trying to figure out how to make all that work without the feedback loop. It's going to be really difficult.

[00:16:29] Yeah. And I would be curious to hear from the data scientist on the call for the ones where for the scientists or details who work with a team that either is an energy team or that production AIs the stuff. I think one of the special one of the central philosophy questions that we're dealing with is as a data scientist, what is the experience that you feel like you need in order to like launch models and protectionism and be successful? And does that expectation of the experience that you have as an analytics or professional, does that include, for example, having to pick out your your own solutions? For example, should you be expected to know what tools you need or is that a relationship in conversation with your end jobs scheme? So I'd be kind of curious to hear from my end of the days like this and Harp reasons for moving at Kommando. Well, you probably have anything on that as like what that relationship would ideally look like and what you both parties give and take. Yeah, I

[00:17:23] Mean, when I was at Price, it was just like I was really a first, like rule I had or I had deployed models and the production was at that price. And my internal mentor standard of if I knew enough was like, I can I write this code in such a way that all I have to do is open up terminal type in Python and whatever, like, you know, make Data set or whatever, and everything will just run unbroken. And can I get this to like locally work on my machine? Like, can I go to the local host and train information and have it give me a prediction like that was like my internal mental standard of OK, like I know what I'm doing. [00:18:00] I don't know if that answers your question or not, but I think for me, like if if I could write code like that, then I kind of feel like I this is production ready. Like I literally deployed it on my machine. Now the experts can take over and teach me what I need to know about deploying it into an actual much larger system. Um, and I'll pick the parts of it that I feel are relevant and the other stuff will kind of whatever put in the back of my mind or take notes on that. Answer your question. I'm just an anecdote here, but I love to hear from Antonio or Eric on this. I don't know Joe disappeared or what happened.

[00:18:33] It's OK. I want to weigh in just to actually give life to Russell's answer in the chat. So really good perspective. And then, of course, Antonios explaining of a typical setup. I think they're both good, but I really like the idea of a Data scientist that's a good teacher going around and helping enable the subject matter experts to do their own basic machine learning and then they can work with the Mellops team, whatever, like UPS, for example, to release their own models in the Data scientists can review that stuff. That way, the data scientist is amplifying the amount of work they can do by just giving the tools and the basic setup to more people.

[00:19:22] Thank you, Tom. So what I've seen or I've seen successful is kind of like an integrated team where so, for example, the Mellops team are capable of creating some endpoints to serve to software engineers on the demo team, because you have to think about where are you delivering your system? Is it going to be like a final product that a team of software engineers or managing? And how do you create that communication [00:20:00] loop to serve your email endpoints to them so they can embed that? Into their system orchestration. So I've seen software engineers from the up side, from the pro side, know enough about Mellops to do it themselves. So it really depends on the company, the culture, how they want to build them and having some cross contamination there. And with that, they would be in touch with with the Data science team as well who are delivering the Boredoms and things like that. So I don't think there's a right way, wrong way of doing it, but there will be some that there may be some cross contamination. But at the end of the day, it's figuring out where you want to serve that that model. What and, you know, having good connections between systems is key. Thank you,

[00:20:51] Greg. These answers provide insight for you to want to turn it back to Joe. You want to hear from Antonio or how you like to go?

[00:20:58] Yeah, definitely. I guess just one sort of. I had like I had like a nugget of a question based off of what Greg was saying. It was like, OK, this is definitely something that I would like to hear. Yeah, I think I was just trying to I guess I'm trying it in my head figure out and I don't know if it's because I was on the science side of the wall between, like, engine data science vs essentially I think I guess my perspective is at the end of day, we should be building tools, infrastructure for to use. And if we make it a really good experience, it means that we are less prone to bugs and issues down the line and we can standardize the outputs and all that stuff. And I guess like, yeah. So maybe that's just the thing I'm trying to figure out. But I think this is like super helpful. And you can go ahead and move on to the next question. I think there's a lot of concern out

[00:21:52] There because I do want to hear from Antonie on this. And then after Antônio, we'll get to your first question and then after heavier will be Eric [00:22:00] and Saurabh. Then she Shishir, who is asking a question on YouTube, I think will be a great question for Greg, because it's about product people who work closely with New Scientist. So I think it probably went over to.

[00:22:14] Yeah, so I don't I mean, I, I'm not an engineer. Right. Or I kind of work with the data scientists and engineers to make sure the models do no action. But if you do what we have the luxury of a big company, I know everybody does it differently, but the data scientists have a lot of work to do. So they don't want to be stuck with, like building the pipelines, putting things into production for free, saying, I never know, you just press the button and the model just works. Right? There's always something that's going to come up. So we have data scientist build the models and when they're ready, they're scored big handed off to the engineers who put it in different systems because we have very different teams. They use a right. Maybe one model is going to be used by four different systems. So they handle that onboarding. They do like the pipeline look like you're saying, keep on making sure everything is smooth. And also what we've been involved in is having kind of like dashboards, monitoring the models, looking for the grading performance and things like that. So that's how it's done in my organization. But again, I guess it depends how many resources you have.

[00:23:28] Yeah, I think we go on to the next question, but actually in the comments, just a final thought is that it feels like to some degree the industry is really specializing, but it also feels like there is a lot of people who still sort of want to expect the field full of scientists. So in the comments, I would love if you could just let me know. First off, do you think this will stack these scientists actually exist, Michael Stack? I mean, deploying Devah like data science, but also includes dev ops, infra? And secondly, do you feel like [00:24:00] that's the paradigm that should continue existing, especially as companies get bigger and scale? I kind of feel like when you're a big company, you do want specialists to some degree and you sort of want to organize accordingly. But I would love to hear how engineering heavy all y'all feel like the scientists should be really interested to know that. So I would

[00:24:20] Say, well, truly, that's probably the heaviness that I am an engineering scientist. But Mark, what do you think they'd love to hear from you on this?

[00:24:30] You've got I would say, like at a startup. That's probably a yes. But I found myself as I I'm never on one project where I'm doing the whole thing. In the end, it's a matter of where the resources and where I knew so and so one project I might be heavy engineering, another project I might be heavy analytics and another project. It might be heavy, like thinking about like Data data, warehouse infrastructure and whatnot. And so maybe like a yes no like I have as a startup, I have to have all the skills to jump in when needed to drive value. But if I'm doing everything and. And I won't be exhausted and probably won't be working on one project, and that's not really feasible to start up because I need work on many projects like where's the Fire and Priority Act over Joe?

[00:25:14] Yeah, I would say that that might have been the full spectators. I mean, I've been called that before and I didn't feel comfortable with because it reminded me a lot of like the first tech kind of software engineer from back in the day. Would you be good at both front and back end? And I think you might be decently OK at putting up an MVP, which, if that's what you're trying to do, is great. But ultimately, you know, it's specializations key. But I understand there's a balance between, like, budget, for one thing, right? Like Data, people aren't cheap. So but then also, as Tom points out, finding the the the full sick person is. Yeah, that's a unicorn of a person like I. I would maybe consider myself Woolstock, [00:26:00] but that's only because I had to be it's not like it was a thing I sought out, like, oh yeah, I'm going to be really mediocre at everything. That's great. So but then you know, especially since then, you know, I mean, I think the you know, the engineering side is where I focus and probably less inclined to focus on algorithms these days, not because I'm not interested in it, but just because it's the value for a person like myself as an engineering.

[00:26:21] So, yeah, it depends on the Data maturity of a company. I would definitely say full stack is something simple you're going to find. But couple that with like tooling, though. I think there's a lot of great tooling out there that exists now that could help automate and augment a lot of the functions that weren't really prior, you know, previously done by hand, like putting models into production. I was I mean, Lord knows there's like a million flask tutorials on how to do this. And I think that's just obsolete these days. Do you think, Vertex, just to play it that way and set, you know, set it and forget it? So I think a lot of the undifferentiated heavy lifting is being automated, thankfully, with great tooling. But still, it's super early days in the field. I mean, I look at the the modern ML stack right now, quote unquote, and it's like you get the sense it's it's being baked as we speak, but it's nowhere near where it's going to be in a few years, so. Long winded answer, I'm not even sure I answered your question, but

[00:27:18] Not not ask you about that. I mean, the awful snack Data scientists like salty, sweet nuts, chocolate, fruit, berries, all that full snack. Let's go to the first question that after Javier, it's Eric, then Saurabh, then Shishir from YouTube, then Lavina from LinkedIn. And then there's a couple of other comments coming in from LinkedIn as well. Those of you on LinkedIn, you can just join us in the room. I'll be a go for it.

[00:27:46] Thank you. So, yeah, I'm just curious to know what percentage of your time is spent on analysis or data science modeling around structured data versus unstructured data, if [00:28:00] you like, in grad school? I read a lot about unstructured data and obviously learned a lot about, you know, NLP at least in concept. But I'd say like ninety nine point nine percent of my day job is like tabular data, structured data. And I work for a larger organization. So I feel like the data is actually pretty complete. Maybe there are, you know, integrity issues from data entry, but the data is typically very complete. Like, I don't deal with a lot of messy or nasty data. But, you know, I'm just curious because you see so many people focus online or LinkedIn or blogs about all the unstructured data analysis they're doing. And I just don't do much of that right now. I think it's neat. I just don't do it. So I'm curious about other active data scientists, what they do at work.

[00:28:53] More has a more clarifying question here. Can you define how you are thinking about unstructured data?

[00:29:01] I'm thinking like either someone submitting a help ticket or mostly text, but just like a free hand text submission or text capture, like, you know, capturing tweets or something. Like I do none of that. But that would be an example. I'd like to jump on that one. Yeah, please go for it. Um, I think what they want is the longer he says that one of his things, like 80 percent of data science, is going to be like if, you know, regression analysis and Harp read actually just that on that, I think you'll be pretty ready to go and like logistic regression and things like that. But I had an opportunity in my previous role to to work with a good amount of unstructured data. And that was because I was doing a risk management and compliance and there was a lot of customer service calls or people shouting in to complain about like a ram or something. They were wrong. [00:30:00] So that kind of gave me an opportunity to mess around with that. Like trying to define froster is based on patterns, are saying, trying to predict based on somebody using what kind of language they're using when they're chatting and trying to see if they're actually the person who they say they are.

[00:30:20] So it was a lot of more kind of like NBP rather than production, because with with speech Data, especially when somebody is calling over the phone, they have a little voice where to have an accent and it doesn't pick it up correctly. I mean, the amount of funny stuff that I found, like where it's been transcribed from by one that I always share is I was looking for something called Advanced Search, and it came up as a dancer search over and over again. And I'm like, this doesn't make sense. Why is this guy calling, like Verizon, asking for like dancers and stuff? So that's kind of like I think one of the reasons why I think unstructured data is fun and it's important to learn. And like, if you have an interest where I know Harpreet is really big interest, I'm waiting for his big projects to come through. But I just think in general that there was going to be a lot more opportunities for like a regular regression analysis or more structured data that you are going to be be able to work with. If you like it, don't give up. I think there's use cases depending where you are is just the nature of the work.

[00:31:26] Yeah. In my currently current role, that price, which I did not hear, I give my two weeks notice earlier this week. I'm leaving Price on August 6th and heading over to Comet, Amelle. But I priced everything I was tabular data even at Bold Commerce. A lot of what I did was, um, tabular data. We had one project where we're scraping reviews of our product from the web and that was just a matter of putting structure to the unstructured data and just capturing in a way that was useful in this new world that I'm going into. It's going to be mostly [00:32:00] researching Epling models and stuff like that. I assume it's going to be a lot of unstructured data. But to answer your question for me, pretty much all of my work at work has been up until now, all structure Data. It's just for fun on projects and things like that, Mark. Let's go to you and then then, Ben, just to let you know, the question that's been asked here is what percentage of your day or the work that you do has been with just tabulated structured data. So maybe after an aftermarket, anybody wants to go, let me know.

[00:32:34] I think it's really heavily dependent on your domain and kind of problem use case. So, for example, my previous role was in health care and we had access to, I think, like 80 percent of all electronic health records in the US for like ophthalmology, which is like an insane amount. And if you've ever seen electronic health record, it's probably one of the messiest things I've ever dealt with Data wise. And I think my job there was 70 percent dealing with unstructured data and only a simple thing like, hey, tell me all the times that a patient received this surgery. But there's not a specific code for it, because in new surgeries, I look in the notes and you'll see, like discussed surgery, decline surgery, surgery was started. Right. And you'll have all of these kind of false positives popping up all the time for that. And many times my work was like getting the unstructured data into a structured format to actually work with. And so even though, like, you are working with unstructured data, like, I feel like many times the goal is to put into a structure format. So like a model or even counter analysis can even understand it. So I think like a great example is I about NLP if you do like Count Vectorized where you're counting all the words and sentences putting into like a tabular format for that. And my current role.

[00:33:56] We also have survey data. But again, I think think about the problem we use [00:34:00] case in Data maturity. My I probably spent maybe like five percent of my total time at home, though, working with unstructured data. And another thing I want to know is like I also want to ask the clarifying question, like what you mean by unstructured data, like it's already like clearly in a tabular form. So even though I work with a lot tabular data, our data is in a no SQL database and it's very nested and it's designed for like Web apps. And so I have to work many times to like play around with the data to get into a tabloid format because it's nested like four layers deep. So I think the main takeaway that guys like even though like I was working on unstructured data or like different types of unstructured data, a lot of my work is putting it back into a structured format to to use NLP potentially want to get more experience with that in your current role, maybe go talk and see like where the data being captured upstream. I want to be experienced to that Data understand the logic that's happening upstream because many times that might be a more kind of raw, unstructured format. And you can use your expertize as a data professional on the on the end of it. So better a large corporate

[00:35:05] Patni for sure. Really gnarly Jason Blob's I guess is that technically count as unstructured time. I'm not sure.

[00:35:13] I don't know.

[00:35:14] Yeah but that's what that's putting into structured

[00:35:17] Format we've been. Do you just throw in a Data reward like you can do with pictures and just let it do everything for you. Or you can really. Yeah. So. So Mark, what you were saying makes a lot of sense. I like to say that unstructured data is Data that's going to be structured. So no matter what we're talking about, you can find a way to structure it. So are we talking about audio or are we talking about video? We're talking about images. There will be some way to structure that into some encoding. And then once you get that encoding, it's very straightforward to start pulling it all together. But this stuff is getting way easier than it used to be. Like three years ago was pushed up the glasses and time to work for months trying to do these heroics.

[00:35:59] What about you [00:36:00] and then Alexandra or anybody else? Mikiko, Greg, if any of you guys when chime in, Eric, let me know. If not, then we can move on to Eric's question. But go for it right now, Tom.

[00:36:12] Oh, I'm sorry. Are you asking me from that noise? So.

[00:36:17] Yeah, no problem. Yeah. We're asking to, uh, to to answer the obvious question, I guess what proportion or how much of the day you work on is structured papillote Data as a potential recruit.

[00:36:29] So this is interesting. I put this in the chat to about ninety percent of my work is converting PDF text and creating Libertopia Nisa's and other math machines to pull. The Data out of that and put it into a structure secret, she was the only other thing it was going to die for. Jason files, I think they're wonderfully structured. They're just in those SQL type format of the basically Python dictionaries. When you get down to. Yeah. It's it's been obfuscated by the fact that the types of PDF I'm looking through are not consistently stretch. But and because the terminology used in this technical Data sheets that are often in PDF just standard spot on finding units and numerical values next to that and doing some heterogeneous clustering, he's now trying to take the vast array or variance in the way manufacturers in the product names and describing those properties. It's all over the place with so trying to the tool that I've created to try to best [00:38:00] capture that part of it. I call it Dunfield because it makes structure from scratch. And so it's but it's all of these parts. It really should have been done by a big team. But I'm doing it all by myself and it's fun. But trust me, the puckers, the time it's taken. So a lot of unstructured data that was probably more than one.

[00:38:25] Pick your time. Appreciate that. But just go ahead. And Briese to Eric's question and then after Google Saurabh, then Shishir from YouTube, then living on LinkedIn, then D'Wayne on LinkedIn. And if anybody else has questions, please do let me know that you go for it.

[00:38:42] All right. Answer the previous question is all structured. So my question is, this is just kind of in general, how do you how do you prioritize? Is your is your work ticket based? Do you choose your products or projects to others? Choose your projects? And then also, how do you keep different stakeholders apprized of what's going on? And I can give just a little bit more context around that. So I support two different verticals. And so I have like product and marketing over here at the GM and the other verticals a lot smaller. And so it's basically just the GM. And so I kind of have four ish stakeholders, none of which are my actual manager. And I, I pretty much get to choose my projects. I have a little JIRA board and I have tickets that I'll put up there to help me keep track of things, but I pretty much get to choose. And so my the thing I want to get ahead of so that it doesn't bite me later is keeping people like Greg knowing what's going on when I'm working on Thom's project and like, how how do I keep those stakeholders happy? Is it did they just look at the board? I just have, like, train them to look at the board. Is it a weekly something like an email? I don't want to be [00:40:00] a meeting if I can avoid it. So I just kind of want to hear what other people do and what's been successful or horribly, painfully unsuccessful. Whatever. I want to learn from you.

[00:40:09] Yeah, definitely. Let's go to Mikiko then after Mikiko go to Greg. I think we did talk about this on our podcast as well, Greg, so I'd love to hear your answer on this. And if if anybody else wants to chime in here, leave. You're welcome to if you want to jump in here or Mark Alexander. Everybody's voice is welcome here at @TheArtistsOfDataScience. AIs Happy hour, Goepfert.

[00:40:27] Mikiko Yeah. So I think there is I guess like there's you can find something in that question. So to and I'll give you the perspective of what the database team, how they structure their approach and the and see how they structure approach. There are workload, search, search team. The way they do it is they work directly with the project managers to implement multiple features. Right. So in that regard product, that's the priority. Although science can sometimes recommend projects that they think are super interesting, it would be provide business value. So they have like a set of stages. They go through and what they do is similarly track the tickets within that stage. Within JIRA, something I've seen some teams do is they will like auto generate. They'll do templates to automate auto generate tickets. So it kind of saves a little bit of that ticket creation, which is really nice. So they average your dashboard and then their manager will take that and sort of communicate it or put a nice little packaging around how to communicate it, because a lot of times business partners, they don't want to see what you're working on. They want to see like they want see how close you are to achieving their outcome, basically, like so they only really care about other people's work if they could tie in with their work or if it will impact your ability to work on their stuff, essentially. So taking that kind of lens. And so for the entering team, we work on tickets, things are like broken glass features or whatever, right? So those are two very different kind of workflows.

[00:41:56] One story, Project-Based, the other one is like, we got to do more. The scrum or agile [00:42:00] approach. I think in general with business stakeholders, though, and this is something that sort of drove me nuts when I was working as an analyst, is that they may say they want total visibility. That is not what they really want. That's what they think they want. What they want is like the hit points, which is what are you doing for me? How's it going? And are there any landmines that I need to be aware of before it blows up when I present my TBR sorry, your TVR analysis, but I'm going to present it for my business division and I don't want the director to call me out on that stuff or not have it. So to that end, what I found is one having like an email summary, like kind of every week really helps where you don't list the tasks you will see, like what was the impact or whatever. Right. Or even if you do tasks, you just wrap it up and we create a dashboard and we added some additional filters so you can drill down on these things and then give them access to your board. But don't realistically expect them to actually use it to know what's going on.

[00:43:02] I don't even expect them to make tickets most of the time. If it's super detailed, I'll ask them to, but otherwise they'll just make it myself. I'm going to

[00:43:11] Honestly like the email. The email newsletter is great and then eventually setting up like weekly or biweekly meetings so you can just give them an update, because that's something that I think we struggle with, is we'll send out messages or emails and then people read it. But it doesn't like thinking. They're like, oh, this is going to fundamentally change your workflow until they run up into an issue or they're like, wait, I thought I was going to get this thing today. Like, No, no, we talked about this two weeks ago. You are you're not getting this thing today. Oh, no. Why did you tell me? But I did tell you in an email the documentation. I don't read emails, so that was like thirty three minute weekly really just it will go a long way. So that's my that's my two cents.

[00:43:51] So let's go to it to Greg. What's up. Kenji's in the house. Can do to the question that Ken was that was asking was how [00:44:00] do you prioritize when you've got competing demands from various stakeholders and and the like. So I definitely would love to hear from you if you've got any insight on that. But for now, go, Greg. And if anybody else has any tips to share with Eric, please let me know. By the way, if you guys have questions, drop them into the chat or comment section wherever you are, and I will get to them. Go for Greg.

[00:44:20] Yeah, thanks for that question. I like it. This is pretty much where I work. I do it all the time and I focus on who who's my customer and also separate what you want versus what you need and then really prioritize the what you need. And also let my customer determine what is the impact, because at the end of the day, if you're part of the business, you will tell me what value fixing this issue will bring to you, to your business. And with that, you have prioritization. Now, if you have multiple groups, then you want to find the person at the top. Top starts creating words here who will say prioritize this versus that because you have conflicting needs. Everybody wants to cry wolf. Everybody wants their things to be prioritized. But somebody at the top on the business side will have to say, so what goes first, second, third, etc.. So from a prioritization perspective, this is how I do it. And also you have the other aspect of it, which is how do you keep in touch, stay in touch with the evolution of projects? I think it comes down to one thing, which is communication. And what are we all aligning with in terms of communication? Is it going to be monthly email updates or is it going to be ongoing tapping into some sort of system or dashboard, et cetera? I think Mikiko hit it right on the head when she mentioned what business folks want. I have two different types of monthly, bi weekly communications. Some are focused on text [00:46:00] to audience, some focused on business on the very same project. And the language is quite different. Somebody on the tech side will understand where everything stands, whether there are system integrations, stack set up, you name it, anything from a technical side or the task or clearly laid out in where we are, where we're complete.

[00:46:27] We're in a bad shape. We're about to complete it on time, et cetera. On the other side of the business, exactly what Mikiko said. They want to make sure there isn't any obstacles because they will take that in. Their goal is not to look ugly, is to make sure that they have the high level over. View of how the project progresses and also stand in line and there isn't any drift in with that, any communication, you have some monthly update that you provide to them. They want to see the red lines first to make sure whether you need help from them or you already have a plan to address these issues. So you want to clearly state what you're doing about those. And typically they are happy to read the green stuff so they can continue to give you a pat on the back of themselves. But at the end of the day, the format for both is you want to give an overview of the project or where you are and talk about the highlights, the lowlights, talk about whether you need help or not, etc. And then regardless for both teams, both audience, you want to make sure that you ask for help right away at the beginning because people don't have time to go through everything. You may find that different communications formats might be so lengthy to consume. So you want to put the most important stuff, information or information at the top to get everybody's attention. And [00:48:00] then at the end of the day, communication is the key.

[00:48:03] I got to put a link to our presentation together in this chat. I was starting to toot our horn, so I don't normally talk like that. But I was really proud of what Greg and I put together presentation wise, in case you all haven't seen a version of it, I, I think this this was probably one of the earlier versions, but still, we really like doing this together. We have a later version of the slides and we've done it individually too. But I think that's worth a listen. And it's a work in progress. So if any of you would be great, would be willing to give us some constructive criticism, you know, not on our hairstyles or anything like that, but only do that to Ben Taylor, but otherwise on the content would appreciate

[00:48:56] Something so much. I just posted a link there on LinkedIn. So anybody wants to LinkedIn it is right there for you. Love to hear what anybody else has said on this topic, Lee or Ben or Alexandra or anyone also just looks like Lee's building with cool rockets or something back there.

[00:49:15] Well, thanks for coming from the other side of the fence and project management, to be honest with you, it's like the like when you order a steak, you really don't care where they want to pull it all together. You just want it on your plate. So a lot of people don't understand what it takes to get that better. And the more information that you can provide on the process and bring people in like like you're doing with this happy hour, if you can bring people together like that, those go, miles, as far as celebrating success and building on successful stuff. But the communication, I think you're exactly right on target. I mean, if you're going to somebody or just ignoring them or just busy, you know, that [00:50:00] that that goes miles and miles for for getting things done really well. Oh, and I also like to point out to somebody mentioned something about being expensive, trying not hiring a Data science or specialist that can do that at the top level that you need. And I guarantee it's going to cost you a lot more. So, yeah, that's just the price to pay to play the game. So don't ever discount the fact the value that you bring to a party that expertize and level of practice is is amazing. And yeah, it's expensive when you don't hire the right people actually.

[00:50:36] And definitely stick around for that question. Coming up soon about tips for people who are working closely with data scientists. We'll get that after my sister has his hand up right now.

[00:50:47] I have to call it foul Harp. Oh, definitely. Friday afternoon. And you mentioned steak. What the hell the.

[00:50:56] I'm a vegetarian. So no matter

[00:50:59] What we do, we can get some lobster

[00:51:03] There. Let's go to a good absorbed and then can sort of go for it, then go to again.

[00:51:09] I just wanted to chime in on the stakeholder management question. How do we balance different competing priorities? I'm also from project management background and this is an extraordinary challenge really, when many are like insanely busy. I think some some very simple questions can help prioritize, for example. And people are asking you for some information. You just just ask them what is a buy? When do they need this? And secondly. Just tell them I'm just in the middle of X, Y, Z, so they understand, we can't say I'm checking my calendar, I'm too busy just in the middle of making a report for so many of [00:52:00] needs. And so and the second question is when they ask a question. It is we really need to dig what information they're looking for. For example, they might ask you, can you do me that document? Can I ask probing questions? What exactly you are looking for so that I can give you the correct information. So just just big further and and just ask for specifics and just tell them that when I ask them when do they need that info. And I think that's awesome.

[00:52:33] Thank you very much. We'll come right back to you for your question after we hear from you on this one again.

[00:52:39] So I hope this isn't redundant. I tuned in a little late. I apologize, but but something I found really powerful, especially if I have a good rapport with the business stakeholder and they have some analytic capability, is keeping them very involved in the process that I'm going through. So if I'm working on something and I have to prioritize, if you can show that you're making progress, if if the people know that you're working on something that can be that's a lot of the time just what they want. They don't necessarily want it done at a certain time. They just want to know that progress is being made. And that's very powerful when it comes to versioning tools like GitHub. You know, if you're routinely, you know, at least making Korvettes doing whatever, that's something that can carry a lot of weight. Often. If I'm working with a specific client as I'm going and building visuals, I'll shoot over a couple of slides and say, hey, look, you know, this is what we're working on. Obviously not a finished product, but feedback along that process can actually save time in the end product as well. So I think that making your work a little bit more public, as weird as that sounds, can pay dividends. Obviously, there are times when you shouldn't do that or when it'll put a wrench in things. But outside of the box, I think that that's something that that really can help with stakeholder management and also endearing you to a lot of the [00:54:00] business stakeholders if they feel like they're part of the process. That's also a very powerful thing.

[00:54:05] Thank you so much, Mikiko. Go for it.

[00:54:07] Yeah, I mean, something I to be honest, I wish I had really understood, like my first couple years of kind of working in the space was also I think this is general to lots of different functions, but it's understanding kind of your lines of influence. That's something that because the thing I used to do was like I would just say yes to everything and I would just try to manage the timelines and just try stack it and. Well, yeah, not only is that terrible in your health, but also it actually doesn't really make people happy. And part of it is just like that 20 like 80, 20 thing. Right. Like there's really 20 percent of the work that the key stakeholders will drive, 80 percent of their incentive, whether it's to get promoted or to get like an initiative passed or something like that. So that's something I personally kind of wish I had understood a little bit better in that, you know, when you get like 10 or 15 stakeholders, at one point I was supporting a team of 30 internationally over at Autodesk on the construction on the side. Right. And because we're like product customers, because in lots and lots and lots of different hands on different figures. And it's also a very atomized company. So that probably didn't help but something that someone had told me there was no one like ruthlessly prioritize, like emphasis on ruthlessly. But the second part is like understand kind of I don't say like the battlefield strategy because that just sounds really aggressive. But it's like really understand, like what are the levers and what are the lines of influence. So, for example, if you have ten people, they're coming up to us and stuff. Well, I don't want to say the hippo wins.

[00:55:41] It's really good to align with what is actually the strategic value or what what do you like, for example, their managers care about, like what is their director care about? And that's not to say like you then go pick the projects or the pick the stuff you work on. But it's like if you're in a crunch and you really have to pick like three or five, being able to align [00:56:00] with their line of reporting and understand what are their priorities will really, really help. Because you're not making the decision on what to focus on. You're asking them this thing like, hey, I got like forty hours in a week. The kind of scope of work right now is that like sixty five or seventy. So of those 70 hours, where do you want to put those forty hours. So I think it's really good to be proactive about that and also really good to like I had this habit of like not bringing my manager what I need to do sometimes and usually you kind of want to do that. It's kind of like escalating a little bit. But at the same time too, sometimes your manager can also provide that insight as to this is what the business priorities are and also like they will advocate on your behalf with the other managers. So I feel like something that is part of the thing that people don't talk about except for. Some low key business, first self development guys is the like understand what are the levers around you that you can sort of manipulate but you can use to accomplish kind of like what you need to do and as well as hopefully help your business partners accomplish their goals. So that's something I wish I had developed earlier on.

[00:57:09] Mikiko, go for it.

[00:57:11] I think I think Mikio actually just touched on it at the end. I was going to say really great resources. Your manager, if you have that rapport and I think you'll probably end up like the escalation component. When I talk to my manager, I don't frame it as escalation. Early on, I should have said like, hey, I'm really trying get better at prioritization and our work. This is the key thing I'm working on. This may not work on a more senior level, but I'm taking advantage of me still being new. So I'll go to my manager, but I can really hone in on this prioritization. These are the things I'm working on. That's my current set of skills. This is how I'm thinking about being prioritized. Where are your thoughts on it? So it's less of an escalation and more so like seeking mentorship from your from your manager if you have that report with them. And it's been extremely effective. And more importantly, by asking those questions, I get key insight [00:58:00] like how my manager prioritizes things and so I can deliver to that that kind of audience over and over again, because my manager is kind of like my main person who is going to be I have my stakeholders, but I'm adjusting to the person like, do I get promoted or not? Do I. Are they happy with my work? Right. And so knowing how they prioritize things is really helpful for me as well.

[00:58:20] Thank you, Mark. Let's go ahead and lead the. Well, I

[00:58:25] Just want to mention, Mark, that was really brilliant. That's a smart way to do it. I would only use one thing on to that is that if you have your your your plan, hey, this is the way you're going to attack it better plan B, this is if I get this, I can do this. So open up the picture there so that they got the option to say, no, I'll just keep going the direction you go in or wait a second, I'll get you the resources that you need to do something bigger. But that's that's brilliant. That's exactly the right direction to go.

[00:58:56] So that Eric was briefly muted is that you want to

[00:58:59] Add in some. I just want to say that was super helpful and toss in one small thing like today had someone on one of the teams asked me to do something that I was pretty sure is not really and not really my responsibility. And so it's nice because I was able to talk to somebody and say, hey, I don't think this is my thing. I could do it, but I don't think this is my thing. The person who can do it will be back on Monday. And then I just sent him back a response email that just said this isn't this isn't really my responsibility. I can help you in a pinch. But this person will be back on Monday. So I'm in a minute. And, you know, as far as I know, based on his email response, he doesn't hate me. And the sun will come up tomorrow. So it was all it was all good. But, yeah, this is really helpful for thinking of like that, like longer the longer term, just maintaining good momentum and stuff. So thank you so much.

[00:59:45] Definitely. Excellent question. Thank you so much, Eric. So let's go to your question. You got two questions in here, but you have to pick one of them, right? You got to pick wisely. I know you sent me two questions, but just pick one and then we'll go to the questions here from Shishir on YouTube and [01:00:00] living on LinkedIn, then Dwane Whitfield on LinkedIn as well.

[01:00:04] Ok, here's my test for to be consistent for an entry level data analyst. There is a lot of material, a lot of technical material available. But how does one prepare for a particular domain? Now, when I say entry level, I want to apply like across domains. I want to apply anywhere that I get to seated analyst position. How do I show value in terms of domain knowledge?

[01:00:34] Not not so clarifying question on that.

[01:00:39] Yes, please. I was about to ask when I was.

[01:00:40] Well, go for it. OK, so when you say domain, I have a couple thoughts that come to mind. Do you mean like Data stuff. But is it in marketing versus in some other the business or do you mean like it's in biotechnology versus I don't know, manufacturing products versus digital products. What do you mean domains.

[01:01:00] Um, what I mean is domains like health care, automobiles, automobile, government, whether for cars like those kind of things.

[01:01:10] And so the question is, how do you kind of get up to speed and learn about what's important for those industries?

[01:01:16] Yeah, I mean, I cannot I cannot be in like five domains.

[01:01:20] Got it. So we're like, yeah. Like are universal skills type of thing. Right. Yeah. Yeah. So me definitely equal. Right. You should definitely be good and comfortable and very professional. SQL that is an absolute must. Right. You're going to have to be comfortable with creating reports and doing analysis. That's like exploratory data analysis in general. Right. And maybe some basic skills on how to come up with metrics or how to drive them. And little bits, that and stuff in there as well can go for it.

[01:01:52] I think I probably mentioned this every time I come in here, but to me, one of the most effective ways to show that is through your project work [01:02:00] in your portfolio. Let's say you wanted to get involved in health care. I think there's almost always a cargo competition that leverages some form of health Data. To me, that's something that frankly, I of competition can be a lot of work, but at the same time you could create a notebook or you could showcase some of the skills within that domain. I actually recommend not necessarily doing like a broad strokes application to different Data sites position. I think it makes a lot of sense to target specific industries at a time that you're most interested in. And let's say you're working, you want to work and you identify who you want to work and energy, health care or sports. Right. To do three projects in your portfolio, one on each of those shows, pretty good range. But it also shows that you have very specific subject knowledge on each of those main domains, more so than someone who hadn't previously looked into that Data before. I would generally recommend people to have more than four projects in their portfolio, just in general. But if you're thinking about that, based on the domains that you're looking at, you could have one or you could have multiple associated with a specific domain. I know specifically in sports, which is the domain in which I work, most people are landing jobs not from applications, but from people actually seeing their projects and getting recruited that way. It's a very difficult domain to break into, but they're being chosen almost specifically for their domain knowledge. So there must hopefully be some truth to that.

[01:03:29] And thank you very much for that advice. Let's go to Greg and Greg or Mark those first, I think Greg, because he's Greg.

[01:03:38] Oh, I just wanted to add a little bit on kanji up on KMG. I think from what I'm hearing, Saurabh, you want to focus more on the domain, but more of the non-technical skills. Right? So how can you with that? I think you want to focus on your communication skills. How do you explore the work that you've done? How do you [01:04:00] apply your critical thinking skills? You want to showcase that you're able to analyze and not just mechanically produce an output that leads the audience to interpret what you've found, but you want to display your level of critical thinking where you exploring different outputs in or different solutions and what their outputs would be, what the risk would be, etc., etc.. So you do this for an area of interest. I think anyone looking to hire will see that critical thinking piece in you that you can transfer to an industry that you go to

[01:04:40] Give evidence that I completely misunderstood that question, but I sort of go for it.

[01:04:45] Great, great input from OK. And Greg did

[01:04:50] Many, many useful. Thank you.

[01:04:52] Yeah. Let's go to Mark on this one and then maybe after Mark will head to some other questions about some of the peer.

[01:04:58] I was about to say a quick way for you to like get industry knowledge and domain experience that I put in the comments that it took me years and like a masters and to get domain knowledge and health care. But health care is a very specialized thing and that's not necessarily like fully required. But that's my personal personal pat. But a piece that really helps me is doing informational interviews with domain experts and talking to them all the time. To give you an example, the last venture I tried, we tried building pharmaceutical software. So I had the domain knowledge of like health care, Data and structures. I knew nothing about pharmacists and pharmacies. And so I went to LinkedIn to learn how to build our first MBP. I went on LinkedIn and start messaging a whole bunch of pharmacists who are not technical and saying we can learn about your your pain points. Can I learn about what you're trying to build, what you're facing from there? I actually pulled a list of every single pharmacy in the country and their phone number started cold calling pharmacies because I'm trying to use interviews and I'll call them many times and be like, why [01:06:00] do I want to talk to you and hang up on me? It is what it is that's part of the founder kind of process.

[01:06:05] But if you want to go that way, to start talking to you will be like, hey, I'm trying to learn more about pharmacy. Can I get five minutes of your time here? My detailed questions and they're going to give you keywords. You're gonna have no idea what they mean. Write those keywords down to find the white papers, find everything, and then go back to future conversations and be like, hey, I talked to another pharmacist who talked about X, Y, Z. Can you tell me more about that? And you start building and building many times like the I think Steve Blank, who who is like the Lean Launchpad, he was like, you need to one hundred user interviews before you understand, like a product to start building MVP, maybe more. That's because he's gone through a whole process by for me, talking to the. Domain experts gave me the language to start talking to other pharmacists so that I can learn the language and like being on the same playing level, but like so they knew I knew enough to help them for when I was building the product.

[01:07:00] At that point, about like keywords is super, super important because without the right vocabulary, it's very difficult to find what it is you're looking for. That I really like that you mentioned that sort of great tips there. Anybody else want to chime in here? If not, we got three more questions left. Doesn't look like it will go to the questions. So question here from a Shishir on YouTube. Any tips for product people who work closely with Data scientists? So I think that right off the bat, I think two great people to answer that would be Lee and Greg. So if you go first, go for it.

[01:07:34] Can you recite the question again?

[01:07:35] Yeah.

[01:07:36] General kind of question. Any tips for product people who work closely with the Data, scientists like that, and we'll continue looking for a particular service tips.

[01:07:45] Yeah, I'll probably the probably the biggest tip that I have this week is humility before. Don't be afraid to tell people what you don't know at what you're interested in fashion is like Mark was doing explorer and stuff like that. Don't be afraid. Just have [01:08:00] some empathy because somebody sitting on the other side of the table, if they really put their pants on one leg at a time or they jump straight into a like fireman. But what I'm saying is my my thing this week that I've come across, this is just people people first, you know, this is a complicated enough let's let's not bring all emotions in there. But yeah. And also just network that that's that's the really part of it, too. You can't just walk up to somebody and ask them a technical question because they're either one. They don't know you or two. They probably don't have that skill set yet. So people are people, people versus a Harp.

[01:08:43] Do you mind if I follow up on that real quick? Yes, please. I just wanted to also emphasize I came over here. I'm like cleaning dishes at the moment, but I wanted to go over here and just say I couldn't agree more that networking is super key to being able to find or get into the role that you desire and whether that's like a local programing meetup or like literally volunteering, like doing something where you're actually having face to face time with other individuals is massive in terms of the opportunities that's going to bring to you

[01:09:19] Like this happy hour, right?

[01:09:21] Yeah, absolutely. Yeah.

[01:09:24] Thank you very much. Heavier. Let's go to Greg on this one. If anybody else wants to provide tips for product people who work with their scientists, you know, but

[01:09:36] Yeah, I think it's it's a nice relationship. And I really like that so far of folks have an interest to work with the science folks simply because there are science folks are here to answer the hard questions or questions or not. That cannot be answered with a simple query. And [01:10:00] if you are a product manager, you want to increase the usage of your product, you want to increase retention, you want to enhance user experience. You want to add a new feature, and you want to test whether a feature will reduce conversion for your customers or conversion rates. This is the right team that you want to partner with to answer these questions, to schedule or design experiments that you can test on your product. And this is the team that can help you answer questions that are hard for you to answer when you are seeing your metrics go in the wrong direction. And with that Tulley's point, it's it comes down to networking and making sure that both groups understand why this product exists in the first place. What is this product addressing in terms of pain point and making sure that together they work towards making sure that this product is optimized and that the roadmap is filled with features that will continue, that they will continuously enhance it and grow its user base.

[01:11:20] Thank you very much, Greg. So excellent tips there. I hope you got some value out of that if you decide to stick around, share a great question. Thank you. Let's continue and move on here. There's a question coming in from LaVena on LinkedIn. I thought it's interesting, I've never heard of these, but what to expect in Data science system interviews? Um, yeah, I'm not sure. I don't know if I've ever been in a Data system interview as anyone but Mikiko go for it.

[01:11:47] So, yeah, I did I did like six in a week. Oh, man, it was brutal. Yeah. And it's to be honest. OK, so in general, when we say system interviews, if [01:12:00] you think engineering system interviews, you be about. Right. That's sort of like where that aspect came up from. And there's a bunch of different resources that are really kind of nice. If you look at, for example, adjudicative that they have like a Emelle engineer track or something that is specifically like preparing for Emelle system design interviews so it can come in two forms. One could be like a case study. So, for example, they give you some parameters, like, for example, Levi's, when I interviewed, they are like, we want to here's these specs. We need these recommendations so that when you provide recommendations for computer vision or whatever. Right. Then we also need the we need the inference to be real time and then some other stuff. Right. So they give you all these parameters, like the speed of the data, the availability of the predictions, how would you monitor it, all that other stuff. And then essentially they're like create a system architecture based off of it. So essentially, that's what I did, was I create a system architecture based off their parameters like workbook. These are my decisions. So that's kind of like the take home slash in person, one that they could kind of do. Like a typical sort of engineering. One is kind of similar, which is they'll say something like design the Twitter feed Renk or design a clothing recommender or design a fraud detector.

[01:13:26] And they'll sort of kind of depend on you to, first off, ask really great questions back, like, OK, Will, does the prediction need to be real time? Do we have access to that data? Do I need to design the data warehouse like Data? And then you'll essentially kind of build up the architecture and the components as you're sort of like Lifestream asking them questions to be honest, like system interviews. I found first off, I love them. They are so fun and they are also terribly hard in a lot of different ways because like with these structures and algorithms questions, a lot of times you can kind [01:14:00] of generalize it down to like, oh, this is like a binary search problem. Oh, this is like a tree based approach or whatever. Right. You can kind of get it down to. There is like an optimal solution with system interviews. It tends to be a lot harder because really a lot of choices are very subjective. So when you see people saying this is the ideal data science or and I'll start first off, they're bullshitting you. Sorry, pardon my French, but they are because they're even still in the community. Like, we don't have agreement on that, do we do do we do event based messaging or whatever. Right. So that's just a system to be part of. What I would say, though, if they're asking you to do that as a baseline, I'd be a little bit concerned.

[01:14:35] So I would get a lot of clarity to what they're expecting. You can ask those questions, by the way. You say like, well, what is it going to look like? Is it going to be like a whiteboard? Is what would be the output? What's the best way to prepare? You can ask that from your recruiter. If they're asking you that, I would be a little bit concerned because once again, that's kind of a feature of the engineering sort of function, because as an engineer, you would be building up infrastructure or architecture pipelines. So just a couple of thoughts educated about it has a really great they have an email and they have an email based system design resource. They also have a regular engineering system design resource. And there's like a bunch of links. For example, the head of the Alliance for Twitter wrote one or whatever, if you can if you look at a case study, the best way to prepare, honestly, is to understand what people are mostly doing, like what are the big names? We were Twitter, all those guys read case studies and just try to understand the decisions that people have made and even try to design what would be like your perfect meal solution and look at full stack deep learning. It's a free course that has videos online where they specifically go through what a nice little stack is like. So, yeah, sorry, that was a lot. I love system design interviews six of a week.

[01:15:52] Thank you very much Mikiko. That's a lot of great information there. I think maybe if you want to hop on to this next question coming up, it has to do with design, [01:16:00] thinking, design and thinking and Data science. So if anybody has any thoughts on the intersection of that, let me know. I'll bring you up to the chat. But talk about Z right now. And I'm like the looks interesting.

[01:16:14] Sorry, I went down like the most absurd rabbit hole over the last couple of weeks on this. How would you about all this wondering if other Data people were were involved yet. So for those of you who don't know, Zadran is like owning online horse races that are horses that are NLP, AIs, and you can enter them in races. And they've essentially encoded all of most of the attributes of what actual real race horses have, like bloodline, individual characteristics, all these things. And there's a marketplace where you're buying and selling these. So there's. A huge Data application and pricing them. There's also a valuation on the types of races they excel at and these types of things, and they're slowly adding in additional parameters that that impact the success of the horse can also breed the horses and go down all these different rabbit holes. And there's like optimal strategies for making money or winning races or whatever it might be. I'm not an expert only on a couple of horses, but holy cow, there's so much. And for Data people, it's like maybe I'm always trying to look at the next big stupid Data problem to solve this.

[01:17:19] I got to I talk to you about this because this seems to be a stable down. Let's have this chat offline about this. One hundred percent interested in this damn political block gene technology to how you really catch all the words buzzwords.

[01:17:34] So the big thing for me, this is my last thing is that I think it's been really difficult to understand conceptually how Mufti's can be leveraged effectively. Right. That the technology is there. But like, what are the ideal use cases? What are the practical use cases? And there are some don't get me wrong, but like the overarching theme is that this is a great technology. How do we use it? To me, this is a really interesting, like [01:18:00] practical use case. That's a game of hide that people create value or find value.

[01:18:04] And thank you very much, Ken. Let's go to the last question for today, then we'll wrap it up question here from LinkedIn. Dwane Whitfield, what have you seen with design, thinking and science? I think that's an interesting question. Design thinking, Data, AIs. Then what about you? What have you seen with this then? I think obviously he's got some stuff here as well. So we'll go to

[01:18:26] Someone to find the term design thinking. I'm not I'm sure I use it. I just don't I'm not familiar.

[01:18:32] Yeah, that is

[01:18:34] As funny as this might sound. I actually taught a university class on this with a design thinker, so it was coined in IDEO. And so it's it's the idea of human centered design. So we're looking at the end users. We're very focused on their interactions and studying them and then kind of boiling that in and baking that into the process of whatever we're building. It could be a model. It could be some technology product, it could be a shopping market type of thing. But that term was coined specifically at IDEO.

[01:19:05] I think my only reaction is sometimes I don't trust the user. So getting user feedback, sometimes that can mislead. So I, I get excited to figure out the behavior. What is the behavior that you could measure rather than opinions like do you like my demo. Yeah, we like the demo. Does that actually mean anything? Can the other criticism is I'm sensitive to recipe playback. And what I mean by that is if I walk you through a demo on a product and I show you an example problem, I actually have no faith that you can solve a new problem. And so this is one of the things you run into some of the cloud certification programs like do I really think you can solve a new problem? And that's why it's nice to give people the challenge to do a new problem. Right. So those are two things I think about from mistakes from the past and

[01:19:50] Go for it. Then I want to hear from Mark. Mark says that it has changed the way he thinks.

[01:19:56] Ben, I think you highlighted probably one of the biggest shortcomings of [01:20:00] traditional design and also like technically one of the shortcomings of Data science as well. And so, like the problem with traditional design is we're not looking at at the Data as effectively enough and we're not highlighting unarticulated needs. Right. That's exactly what you were saying, is the customers don't know better. They don't know what they want. They show us what they want by their actions. And a lot of the time, I don't think that that's integrated as well as it could be with the design process. And from the Data science side, I think a lot of the time we forget sometimes that there is an end user. We get so wrapped up in the algorithm or what we're doing that it really pays to have kind of both of those areas covered. And I honestly do see a future where there are a lot of Data sized science teams that do have a designer associated with them for part of the process for what the end end users are saying for you. Are you x whatever it might be? I think it's a really exciting convergence. And if you have skill skill sets across both design and Data, that's going to be a pretty powerful place to be in the near future.

[01:21:06] Yeah, my team, when I was that bold, we had a designer, UX researcher and one of the books he put me on to a book that O'Reilley book called Designing with Data. So definitely check that out. I'm sure you can find that book available online if you're enterprising enough

[01:21:24] Mark for it. I want to get the reason why I like design and it's like kind of like influenced my approach to Data science is because the same time as learning Data science I was at Stanford and Stanford has is obsessed with design thinking they have a whole school for design thinking there. There's so many classes like designing X, Y, Z and so everyone. Two or three workshops and design thinking sounds like it's baked into my education. So that's the way I approach it. And I put the link there like the five steps of design thinking. There's a quick article I found by the Five Steps is empathized. [01:22:00] The fine idea, prototype and test. And being at a startup, I think the key things I think about is that I work with really broad abstract ideas and like how do you make like a first version of something? And so that empathize. You notice a lot of the things I mentioned in these conversations. I'm always saying go talk to people, go talk to users, go talk to to people outside the Data team, because I'm constantly trying to empathize and understand my customer, our end user. And so, like my first one, I had my schedule set up with me. I was like, oh, that's nice, I'll do that. But also, I want to go talk to all the salespeople, all the researchers and everyone, because I want to understand who the people who talk to our customers were, a customer saying where their pain points.

[01:22:49] And so I think that's the key aspect of it. And then another addition, not every single step of that, but the prototyping aspect of it is I create prototypes for my Data projects and then share it with the stakeholders to get a good idea of what they really want. So prototypes look like for me, it's a fake Data table shows go on Microsoft Paint and create like what is a diagram look like and then go back to my stakeholder and say, hey, this is what I think what your request meant from our conversations and what the customer wants. This is the line with it. And they'll say to things like, wow, that's great, I love that. Or they're like, actually, I made that request. Now that I see this little picture, that's completely wrong. Here's what you need to do and there's the interface. And so there's more to it. But just to keep it brief, those two key aspects I always dove back into, I think that really stems from the learning Data science and the stuff being brought in design thinking at Stanford.

[01:23:43] You have to check out that link that you've sent, the five steps of design thinking. It's also right there on LinkedIn for you guys that I want it. Mikiko, go for it.

[01:23:51] Yeah. So there's actually a community out in the Bay Area. Email us, AIs. And the funny thing was that, like, design thinking was way big at Autodesk [01:24:00] to the point that they bought the rights to host workshops and give people certificates and all that to do it in the company. So have nightmares. But it's the funny part is that I think so I feel like this could be totally wrong. I feel like where the discussion about even fairness and ethics and bias and I first came out when I started hearing it was among all my Emelle UX peeps, like it wasn't coming from anyone else, like it was coming from that group, because I think very early on, like some of them, especially some of the ones that I know who actually went on to do evalu like ethics and bias, that it was on Google, like they recognize that like well, so they were to kind of be more data driven people in general. But what they were doing was that they were looking at the data and they were saying, like, how are people actually like using the products that we're creating? And they're going, oh, my God, they're abusing the shit out of it. And so I feel like that's for a law. That initial conversation came from the group is still around. I think they're probably just like going gangbusters with all the online stuff. But it's interesting because, like, I think there's still this discussion about how do you make it that's somewhat related.

[01:25:06] Yes. But really, there's still that discussion of like, how do you make machine learning? Do science, like, valuable, useful. Right. And to some extent, for it to be valuable and useful, people have to be using it. So do we sort of create sort of an all these products before people even know it? Or do we like iteratively? But if we do it or if we are, we sometimes just literally doing the NLP powered forces as Eriksson's in the chat, which is great, you know? So I think if people are interested, they should definitely check out the HTML, UX like Google that term together for letters and they'll probably find a lot there. But I do think that like with the whole fairness and I coming up, like we're kind of in for a reckoning to some degree, we're already in the throes of a reckoning because, you know, you can't have people creating like washing my products where it's like, oh, we're going to kind of like Nerf people or like them in real life and things like that without it. Like, you put all this like power in the hands of the people, you would reasonably hope that people will use it well [01:26:00] and they don't. So we're yeah, it's it's only the beginning. It's only the beginning, especially for Federated learning so much.

[01:26:08] Mikiko can also have a podcast episode number forty six. I put the link right there in LinkedIn as well. Definitely check that out. Let's go to the ER for for this next comment here. But then I also want to come back to this concept, the reckoning do from Ben on this. But what are your thoughts on the book?

[01:26:27] I was just going to add to a lot of the comments others have said about the intersection of design thinking with Data science. And I think one of the biggest areas where Data scientists might actually benefit from incorporating a more formal design thinking process is probably at the. Onset of an engagement or an analysis where you really want to be objective about the process and the flow and you don't want to jump to a solution at the beginning, like it is similar to product design and kind of the voice of the customer, which Ben commented on earlier. But you want to try and really design something that takes your own bias away, I guess.

[01:27:14] Thank you very much. Heavier, Ben. Let's let's hear from you on this. This idea. This, I think is very fascinating idea. Thank you for bringing it up, Ichiko.

[01:27:23] Sure. Before we jump to that, I think this conversation's triggering me a little bit, but it's triggered me for a funny reason. If you when you go to a startup, it ruins you. It ruins you for life. And the reason it ruins you is and hopefully I'm saying this with much love and it's just kind of the reality of the market is payroll's not free. You need a multiplier on how much you're getting paid. And if that's not there, then you need to be fired as soon as possible, unless it's like long term R&D. Right. And so when we talk about products and different things like that and features and what are you going to build and how are you going to build it? The beautiful thing about building a product with the market is the market wants to eat [01:28:00] you like the market just wants to eat you and spit you out and doesn't care what's going on in your personal life, where sometimes being an employee, you have these nice shelters like, hey, I'm sick or I'm sad, I need to take time off. I got things going on like we've all worked with good bosses that allow you to do that. But with the startup, that is actually not possible. So at the reckoning of Mikiko, if you can do, you can kind of just expand that a little bit more and then we can get into this discussion. Is it specific around ethics? Like what what is the where is it going?

[01:28:29] Ok, yeah, well, it's where it's where is ethics going? And I think it goes along with the like businesses want to. And I am fully behind the capitalist US capitalism. Businesses want to make organizations that want to make revenue business, want to make value or profit. Right. So I totally get that and I get that sometimes, like you're you're just going to have some misses. Right. And if you take high risk, high reward. Right. You want to get stuff that's really unique out there someplace, you got to take a risk. But I do think that there's this around the democratization of Emelle and Dave. Right. We're kind of expecting people. We build these tools that the layperson can use and we're kind of expecting them to not do bad shit with it, you know? And I kind of wonder, is that really a reasonable expectation, like the employees communities were the first ones to bring this up in some ways because they look at how are people using things versus how are we designing it? But, you know, so it's it's really about ethics and fairness as we get sort of more democratization, our tools, would we not expect to see more bad shit happen?

[01:29:37] Yeah, no, that that's a great that's a great question. So I think what they are most of the time when you talk about ethics, it's we're all trying to prevent the oops. We don't want to have we want to want to be the Wall Street Journal. We don't have bad press. We don't want to have racist, sexist, ageist models. We don't want this to happen. But you're actually bringing up something else where it's actually malicious intent, which is definitely something that happens on a regular [01:30:00] basis with hackers and people using other tools. And I think what was that? There was a GitHub that was actually deleted because it was a guy that was putting your it was using fake technology to put faces onto porn scenes. Do you guys remember that? We're actually like so that was an example of, like, A.I. technology. That was awful. So how upset would you be if you found out that someone was doing that to a member of your family or just anyone in the world? That's that's terrible. And so I think you're going to have more and more examples, but hopefully you'll have technologies to kind of counteract that. So you have the malicious thing that the other thought with ethics is ethics is incredibly complicated. So here's an example to just start a discussion. So let's to look at your face.

[01:30:44] If you try to steal my car or break in at night, should I be allowed to go to the local police station and lock you up? So in the state of California, we'd say absolutely not. But what if my kids were kidnaped? So I've got young kids. If I find out that my kid has been kidnaped, but I have your face on my camera, should I be allowed to go to the local police station, lock you up? And this is complicated. Like for people on this call that are parents, you'd say, no, this is a complicated this is easy. It's actually still complicated because you have to prepare for the future. You can't plan ethics for the political system we have today. You have to plan for a political system we could have in the future and with great AIs the amplifier. Right. So I love that point. You're bringing up that lot of times with ethics. We're trying to be really well. We're trying to talk about like what where could go wrong. But there are people out there that will use these tools to do harm. So I'm curious what people think about the surveillance state. If your kid is kidnaped, do you want to go to your local police station? And have I looked that person up immediately?

[01:31:37] Absolutely, man.

[01:31:39] So it is and it's kind of up to ethics. It's not a global standard. So you have different regions, different states. Different political systems don't want to live a certain way, so for me personally, I would love to have network. I'd love to have neighborhood security. So if you drive your car into my neighborhood, I want your license plate scanned immediately. I want to compare it against people that live there. And [01:32:00] I want you to be looked up if you are going to be suspect top five suspect list. If something happened that night and we want to go find you. There are neighborhoods where people say, I want to live in that neighborhood, but I do want to live that neighborhood.

[01:32:10] Let's go to Cannes and then Mark.

[01:32:12] Yeah, I think one thing that that that does concern me is that the pace that regulation moves on any of these things is so much slower than the pace that the technology was and was always going to be. This like it's not even a cat and mouse game because the technology is so much further ahead than how fast that we can regulate it. So that's one challenge. You also highlighted the like the the cultural difference. So I'm sure many of you have probably read the book AIs SuperPower's. And to me, that's very terrifying, that other countries, the way they view machine learning, the way I use the I it essentially is set up their system so that they can leverage it more effectively than than the US can in the future, because culturally they're they're OK with their identities or whatever it might be being part of the broader political system. And so I'm not I don't think that any countries are going to take over the US using AI, but it does make this very weird, uneven playing field across the entire world based on how different countries regulate their their AI. Some countries are very pro Data and not pro individualism. And in a weird way, that really does breed innovation because there's access to so much more Data, there's access to so much more of this, so many more opportunities to do bad stuff, but also to just innovate. And I think that's in my mind where reckoning comes is when we're all moving at different speeds. What happens when one country or one group is significantly further ahead than other groups? And what are the negative repercussions of that? Because without a doubt, I think really advanced technology in general, depending on what it is, can breed huge amounts of inequality. And I don't think I in that sense, in a medium [01:34:00] term is different, maybe super long term. It starts to bridge that gap again. But it goes sort of like in my mind. So kind of two cents, maybe one a little off topic.

[01:34:09] Yeah. I mean, this this is the exact reason why I got into Data science, because I saw the writing on the wall was like, wow, Data scaling up kind of all these amazing innovations. But then the question came to mind that I always ask is like, for who who is this actually going to impact? Who's just going to help out? And who is it potentially going to impact negatively? Who's going to help? But like the cost of others. Right. And so just kind of not to call you out specifically, but like this idea, like an A.I. power, like a neighborhood. Like you said, some people really like that. For me, that sounds terrifying. And the reason being is like, I would actually love that if I felt confident that data sets weren't biased. And so for me, I'm like, all right, if I go to neighborhoods like that, like, am I going to be mixed up? Because I just didn't have enough light skinned black people in the data set. And now the one person who look like they still can't look like that as well. That's something that I'm concerned about. But I think there's a lot of opportunities that shouldn't discount pursuing that. But I think there needs to be so much more work of like where are the externalities of what we're doing and how can we account? Do we need to collect data? Do we need to, you know, maybe add some modifiers to to the bias? And we talk about a few weeks ago and for me, especially in health care, you know, a lot of the problems with clinical trials is that people of color, gender or whatever may be like do not enter into those trials and therefore do not inform the general public about how these medicines impact people.

[01:35:44] I think a great example is like there's many medicines out there where this only studied on men and when it reaches women, like completely messes them up. And it's terrifying. Right. And so I can see the same thing happening with A.I. and that's why I got into Data [01:36:00] science, because I was like, I don't know, I can't figure it out. No one's really figured this out, really. But I won't be at the table to at least try maybe to jump in on that point. Mark, real quick. So one of the things that I'm proud of, I wasn't involved with this work, but it at Data robot during the covid vaccine trials, we actually ran into an issue where there were there weren't enough minorities that were being pulled in. And so it Data robot we were predicting where some of the vaccine trials would be. And we actually helped address that issue. Like we raised we raised that issue. We helped address it. Addressing some of these issues is I've made people upset because I've said it's what if I said I said it's easier to fix bias than it is to build a rocket or something like that.

[01:36:42] And people get really angry. And and I want to make a. ClearPoint, if it comes to a specific use case like resumé prediction, if there's bias, let's say the training set is sexist, we have unconscious bias. That is a guarantee. You will always have bias Data. I can actually build two models. I can build a model to predict performance. I can build a model to predict gender. And if you build a model to predict what you're most worried about predicting, you can actually find the feature overlap and figure out what features need to go away. So if I had a name or sorority fraternity, college, hometown, all those features get nuked and they don't need a human to go find them. And so I would say that we actually have made a lot of progress where you can proactively block bias transfer. But the problem with that is there's always another bias. So we have the top biases that we think about. But there are so many biases like the one of the most upsetting ones is attractiveness. But we don't really talk about it because, like, that's really disgusting that that would even be an issue. But it is an issue for women and for men in the selection process or like your height for things like that. So, like, humans are silly. Hopefully we can tackle the big ones racism, sexism, ageism. But there's always another bias that we're not looking real quick, actually, will.

[01:37:56] I'll go first, Mark, real quick. It takes courage to speak out that honestly, [01:38:00] Ben and I totally agree with you, but even if you thought of all possible biases, there's just a big mathematical problem. We don't have the time and resources to collect the entire population. So sampling creates bias automatically. It's just a mathematical reality.

[01:38:18] Ais sorry, Mark. Oh, no, I agree with that. And also, I actually love your point in saying, like, sorry for bias. It's easier than building a rocket. I actually agree with it because I think, like, the solutions are there. There's a lot of amazing researchers, especially like sociology. One I like I like this is the problem is what you do to to stop it. The solution is easy. What's hard is implementing and actually getting people to want to implement the solutions. And that really how I like the heights, height or appearance, like we know that's really impactful, but like to get people want to do that, that's a different hill to climb. I, I also have kind of an ax here because I've gotten some pretty visible fights online with like a I won't mention the college, but there's a professor, he was tweeting like five years ago at HireVue saying like we were building racist models, news tweeting us white paper saying, like, you guys need to read these white papers. We read the white papers and they were embarrassing. You're like, are you serious? Like the amount of research that we were doing to address this bias issue in this individual? He would he would go and give these presentations.

[01:39:22] And you kind of say, like, I don't know if bias will ever be fixed. Like he makes it sound like this impossible problems like, hey, when you look at racism in the US, there are top minorities that we can address right away, like for some of the long tail, like maybe like Pacific Islanders or something you might run into. You just don't have enough data. But you have some classes like we can fix this tomorrow. We can fix this right away. We just have to get people bought in. And so this mindset of like I don't know if we'll ever fix that. That just bugs me. I'm like, hey, let's make progress. It's all it's all about progress, right? Like, let's make progress and let's celebrate the progress we're making, but also admit that we have room to go. Let's not have these philosophical ideation [01:40:00] pontifications on spinning in place. That's what I felt like. This professor is doing it. I need to get over it. I need to be tested.

[01:40:08] This guy I've even had friends who are panelists on the show get visibly upset because I pointed out problems with Data cleanliness would hold us back in the Data age. And all I had to say was very patiently and politely. Hey, from Hovind Collection, what do you think? Oh, I get your point. And it's I think what's going to hold us back in this Data it's not an engineering sciences or the science sciences as much as the other realms where Data literacy is so lacking and there's motivations to falsify data or not collected completely or whatever. And as we get into those homes, that's when we'll see real fractional improvements and in many areas. But it's really Data literacy holding us back more than anything. And Data governance. George book. You might be listening.

[01:41:06] So I think what. Go ahead.

[01:41:09] I was going to say like it's funny, like with my family, I that Thalia's think it literate, literate, letterer, literary Data literacy thing really sticks out because when I was working in growth marketing growth analytics. Right. And my parents were like, look at all the shit that Facebook does, look at all the data they buy and all the personalization. I'm like, Dad, you know how you're getting those credit card offers? Like in the mail. He's like, Oh, I thought someone picks me. I'm like, No, no, no, no, no. We are. No, they don't. They don't pick you. You know, they get some information and then they, you know, at they know you're going to like go for the credit card. And they had a hard time understanding that and especially feel like with the Senate hearings, right, with Facebook, some of those questions were just straight up embarrassing. They really were. And it wasn't about machine learning or Data science. It was like, do you know what encryption means? What [01:42:00] end to end encryption? Like, do you understand? Or even, for example, like a lot of people are concerned about jobs. Right. Which is if we if we open up immigration, suddenly the American labor forces are going to get flooded.

[01:42:12] I'm like, well, OK, so hold up. First off, a lot of starts in the Bay Area already use offshore knowledge workers. So, you know, there there is already still a connection, you know, but it's it's a little bit weird. It's like sometimes like I'll get links to my parents. They're like, what about this? And I'm like, oh, yeah, we did that at my last job. And I got paid to do that, not necessarily to create discriminative models, but that's something that I kind of it's something that I just like in some degree about this idea about how all our data scientists are engineers. Have we super technical. It's like, well, but some of these issues that they run into sometimes are just you literally didn't ask, for example, if we're doing a model on mortgage predictions, are there characteristics besides race that could predict race? Could you not have talked to your sociology colleague? Could you not have done the literature on the research about what predictors, for example, will predict race, neighborhood and lead to red line predictions? So I know that's my that's my heart takes, but can I interrupt you?

[01:43:12] No, no, you're fine. I think that that, in a sense, also adds a little bit to to where my head was going. Not a super complete thought, but I also think to piss off, then inevitably everything is going to be a little bit biased. Right. But we can we can minimize the bias by asking the proper question at hand. Right. A lot of the time we're focused on the Data. We're not as we're not as susceptible to changing our evaluation criteria or what our or what our dependent variable is. And that is so much like something is only biased if what it's predicting has some relationship to that bias. And so I think asking better questions and framing the problems better and thinking [01:44:00] through that that pipeline is something that helps us to even used biased data, what some would consider biased to to effectively leverage insight. And, you know, the most famous thing is like all models are wrong, some are useful. And I think that that definitely applies here is that essentially all data is biased, but it doesn't mean some of the models can still be useful. If we have clearly identified the use case that won't be harmful to or minimizes harm to other

[01:44:27] People isn't at the end of the day, doesn't it summarize to exactly what I've been saying in the chat room here, which is setting up mechanisms to audit your own data and also leveraging technology to monitor and take action? Right at the end of the day, you have to have that will, the will to take action knowing when things are wrong. And this is what I'm thinking. Maybe the the effort outweighs the benefits and people don't want to do it or they pay the price when it's too late. That's what's happening out there in the world because we have the technology right to monitor. So what's stopping us from monitoring for Buy-Out? So bad behavior, red line inferences Mikiko mentioned. So I'm wondering, is that what's happening? That's a

[01:45:09] Great point. Then just real quick, we're going to wrap it down, wrap it up here. That's a great question if anybody has any insight on that. But I will say that I was listening to a podcast earlier this week, the Increments podcast and the title of this episode was The Hubris of Computer Scientists. Were they heavily discussed this particular white paper good isn't good enough. And it's all about essentially this professor is throwing shade at machine learning practitioners because we don't have a universally agreed upon notion of good. More particularly, it doesn't match his unified description of what it is. Definitely. Check this out. It's freely available. You can find it mine. I'm so short read. Don't take that. Also the NFTE from today, Mark, if you want this, you might be sharing it. Let me know.

[01:46:02] Maybe [01:46:00] just kind of lean into that even more. I'd say the data science default is not good. It's actually evil. And by that I mean we're alpha chasers. So an example, if I'm always being an alpha chaser, higher view. The data sets were so big, we had people that would interview across multiple companies. And so being an alpha chaser, if I know that you screwed up your interview at this bank, shouldn't I use that for predictions for future interviews, future banks? Like, obviously, we didn't do that. But that's an example. If you're an alpha chaser, what a what an evil thing to do. Like, do you want to live in a reality that you're nervous? You're going into your interview, you got divorced, something happened. You didn't sleep last night. You're junior talent. You screw it up and you screwed up interviews for life like. So we did not do that. But the Alpha Chaser mindset would always. Grasp that so the default is evil and Ken was making something about that, right? You mentioned AIs superpowers again, and at the end of the day, it comes down to the culture of the country because there may be some cultural landscape that may accept something like that. Spike, when you explore something like Japan, they are China. They accept that their Data is governed by the government. They trust that or they accept it or the preliminary acceptance. So there may be some extreme consequences or that consequence or situations where people accept as a whole to be governed as is or B, being rated as is. And when you bring that system to the United States, people would cry foul because they have a different system, a different culture. So I think when it comes to ethics, we Data it really will depend on on the government. Right. So what do you value more in your inside of your culture? Are you a dog? More than saving a person's life? Is your governing a fleet of automated vehicles and things like that. So quite interesting.

[01:47:50] I think awesome discussions that this is all great. I mean, like a full on panel discussion about this. I'd love to love to hear more about your thoughts on [01:48:00] this. But let's go ahead and wrap it down, wrap it up here. First time I'm taking care of the baby all by myself this weekend. So it's a huge, huge moment. I got to pick him up from his grandparents house. Wife is out there probably watching, enjoying herself in a colonia with some friends. She deserves it. She's working extremely hard for the last fifteen months straight. Meanwhile, I'm a go take my baby and go for a beer at a brewery and hang out. It'll be fun. Guys, thank you so much for tuning in. Make sure you hopefully AIs get a chance to run the repeat of the virtual conference from last week. Great panel discussion with Ken myself. Danny. My God, who else is on the panel? Forget the names. But it was awesome discussion, so check that out. And then also, just in general, the entire event was great. Ben had a great presentation, sort of Tom with the opening keynote. So check that out, AIs. Take care of the rest of the weekend. See you next week. And also, uh, next week will be my last day of price. This crazy before I, uh, chill out for a while and and start my new role at a comet. So I'm excited to, uh, to embark on that adventure. You guys take care of the rest of the afternoon. Remember, you've got one life on this planet. Why not try do some big cheers, everyone.