Kyle Polich: [00:00:01] So if you find yourself spinning your wheels or doing things that are just insurmountable, unless that insurmountable thing is like the one thing that you and all the world are good at, so people have to come to you to do it, you know, find something that's easier. And usually that means ingenuity through intelligence. So find a smarter way to do your process, to automate the eat the hard parts or automate. I guess the easy part to figure out what's the core of the problem. But yeah, solve it not with lifting more, but with smarter techniques, better algorithms and that kind of stuff.

Harpreet Sahota: [00:00:54] What's up, everyone? Welcome to another episode of the Artists of Data Science. Be sure to follow the show on Instagram @theartistsofdatascience and on Twitter at @ArtistsOfData. I'll be sharing awesome tips and wisdom on data science, as well as clips from the show join the Free Open Mastermind selection by going to bitly.com/artistsofdatascience. I'll keep you updated on bi-weekly open office hours that I'll be hosting for the community. I'm your host Harpreet Sahota. Let's ride this beat out into another awesome episode. And don't forget to subscribe, rate, and review the show.

[00:01:41] Our guest today is a computer scientist turned data skeptic who has a truly wide scope of interest ranging from AI machine learning and statistics to data, provenance, data, governance, econometrics and meteorology.

Harpreet Sahota: [00:01:52] His background in artificial intelligence, machine learning and statistics, as well as his love for SQL and software design, has made him a sought after consultant in the data science field. He spent over a decade helping organizations develop products and strategies that are evidence based and data driven. Having amassed professional experience, which covers industries such as adtech, market research, e-commerce, video games, image recognition, and satellite communication. He's been an advisor for a number of early stage startups and consults for growth companies to deliver end to end data solutions. Since 2014, he's hosted a wildly popular podcast which has interviews and discussions of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches. So please help me in welcoming our guest today, host of Data Skeptic, the number one data related podcast on iTunes.

Harpreet Sahota: [00:02:51] Kyle Polich. Kyle, thank you so much for taking time out. You're scheduled to be here today. I really, really appreciate you.

Kyle Polich: [00:02:58] Hey, totally my pleasure. Glad to be here. Thanks for the invite.

Harpreet Sahota: [00:03:01] So talk to me a little bit about your path into data science. What sparked your interest? Where did you start? And kind of how did you get to where you are today?

Kyle Polich: [00:03:11] Sure. Yeah. I mean, I guess I have had a lifelong fascination with computers. And I you know, I could've told you at four years old I was gonna be a computer scientist. That was just an obvious path. And naturally, along that journey, I became interested in artificial intelligence and that really became my focus. And as I studied that, I guess I originally thought I might go a more academic path triangle for professorship, something like that. But while in grad school, I started working a part time job just to afford to be in grad school, basically. And that was at a very unique time. And I got in a somewhat unique place. Nothing particularly special. We're a search engine marketing company, so we help small businesses use Google AdWords, basically. But as you might expect, there's a whole lot of what would eventually be called data science that went on there. And my skills transferred very well. What might not be abundantly obvious to everyone is that A.I. is very largely statistics and a lot of software design. And those two things work wonderfully in industry, especially at the time I'm talking about, which was pre a lot of things that we have today. There wasn't a cloud, there wasn't CICD, there wasn't all this kind of stuff. There was just a lot of elbow grease to do in the rudimentary versions of those. So I learned a lot of lessons about working, and decided I guess I like that better or simply was more successful with that than in academia and just kind of focused on that path.

Kyle Polich: [00:04:40] That job led to an opportunity to move and that led out to California. And I guess the rest is history. I worked in, you know, a couple of various capacities doing different data science things. And at some point after I was at a startup that imploded, I decided I should strike it out on my own, became an independent consultant, and after about a year that started building a team and now we're, you know, we're, I guess, a media company, as you mentioned, we do the podcast, but we're most of the revenue comes from is really our work as a boutique consulting group. So we help small medium enterprises figure out how to do machine learning in the cloud, in particular with real time and streaming kinds of things.

Harpreet Sahota: [00:05:20] What does it mean to you to be a data skeptic? And at what point in your journey did you start to become a data skeptic?

Kyle Polich: [00:05:28] Well, when I say skeptic, I really mean maybe I should preface it as scientific skeptic. I mean, the way people like James Randi and Penn and Teller mean it. There are people who say like, well, I'm a skeptic of climate change or I'm a skeptic of vaccines. Those are that's not skepticism. Those are denial activities. If you think vaccines cause autism. The science doesn't agree with that. Skepticism is about - It's, it's really the Bayesian process taking in as much information as you can, weighing it based on the propensity for that evidence and having a posterior belief that is most in line with the truest version of the world you can know. So, I mean, just at the core, I feel like I don't have to necessarily even defend that. I want to believe as many true things as I can in as few false things as possible. But when you put the data in front of that data, skeptic becomes something kind of interesting to me. And that's where early on in the show I started with the tagline we're skeptical of. And with data because data is both a tool and something that can be misused. So it's really about the methodologies and the process of analysis and how to be skeptical and how to be clever and answer questions with the right data set. So I guess more practically,

Kyle Polich: [00:06:40] I became a Data skeptic.

Kyle Polich: [00:06:42] If I have to put a date on it, really that first job that I had mentioned when they kind of pulled me out of academia. This was, as I mentioned, at an early point before a lot of the modern business infrastructure existed. There were concepts of data warehousing floating around, but it was pretty new. And this is a little startup that grew really fast. So I had not yet encountered the phrase a single source of truth, but this would be the venue in which I would discover it. And it's important because this company had three or four different versions of the data and not everything agreed. And I was like, wow, the database is wrong and inconsistent and what is truth mean? And ultimately, I think I hope I helped that company come up with the answer and figure out the general engineering scope to be able to answer questions like that and have a single source of truth. But it was a struggle and still for a lot of people remain so. So there were important lessons for me there that, you know, well, numbers and math are always the process. There's something yeah, like I said, you have to be both skeptical of and with. So that would be my my origin of Data skepticism, I suppose.

Harpreet Sahota: [00:07:47] I really, really like that definition of skeptic man. Totally resonates with me, very beautifully put. So talk to me a bit about, you know, you've had such an awesome journey in data science since before data science was, you know, quote unquote a thing. What do you think? You know, the next big thing in data science is going to be the next, say, two to five years.

Kyle Polich: [00:08:13] Well, look at it in two ways. The engineering side and the academic side on the engineering side.

Kyle Polich: [00:08:20] I think there's going to be a continued progression of improved tooling is easier and faster and better ways to do stuff. More automation, more transfer learning, more serverless, more autoML.

Kyle Polich: [00:08:33] But just as you know, Data skeptic, the operation I run today were 12 people, depending on how you want to count it. If I had built this organization 15 years ago, I would've needed 100 people. I could've never got this off the ground and I would have needed people to run servers that I now just spin up in the cloud. And I think those sorts of general trends and efficiencies are gonna continue. And what a data scientists can do today, or maybe I should say, what 10 data scientists can do today will be done by one in two to five years, maybe more in the five year end.

Kyle Polich: [00:09:05] On the academic side, where do I think breakthroughs are going to come from? And interesting stuff like that. I think there's a lot of neat stuff going on in theory of database design and tying together ideas, you know, ACID compliance, the CAP theorem, Paxos, all of these kind of complex systems and finding unique ways to serve up tools that are customized and hyper efficient so that maybe some of that stuff is a utility or more of a utility. And Data people can think less low level and do more high level stuff. And the the bottom is - the plumbing is handled for you. I've been personally getting very interested, well not just interested, but applying something called probabilistic data structures. And I think there's great potential for more research and things to come there or just the application of the work that exists today. I think those are underused things. I'm also very excited for perhaps the theoretical side results to come out of like algebraic circuit complexity, things like, you know, class NC and complexity theory and information theory and stuff like that to give us useful novel tools for studying deep learning. Where maybe I don't know if it's tuning learning rates or talking about upper and lower bounds, but some ways that complexity theory can help move a lot of the deep learning processes to support some sort of next level, whatever that means. Five years, is it going to happen? I don't know. But it wouldn't be surprising for some cool breakthrough to come out.

Harpreet Sahota: [00:10:39] Are you an aspiring data scientist struggling to break into the field, then check out dsdj.co/artists to reserve your spot for a free informational webinar on how you can break into the field. It's gonna be filled with amazing tips that are specifically designed to help you land your first job. Check it out dsdj.co/artists.

Harpreet Sahota: [00:11:04] It's pretty interesting. So what do you think in the next two to five years in this vision you have of the future is going to separate the good data scientists from the great data scientists?

Kyle Polich: [00:11:18] Good and great. I don't know. I guess good and great often will differ, really just by luck.

Kyle Polich: [00:11:24] You know, who is the most successful data scientists? Well, it was a smart one who got a job at a good company. And also that company had a bit of luck or, a bit of success. I mean, you could take two people who start out identically and a little bit of chance plays a role. So maybe that's the distinction between good and great.

Kyle Polich: [00:11:42] But I guess greatness is achieved by a commitment to your craft and pursuing it.

Kyle Polich: [00:11:47] Perhaps that's what you mean?

Harpreet Sahota: [00:11:49] Probabilistic data structures. Can you touch on that just real briefly at a high level like that's not something I've been exposed to. And it sounds like such an interesting concept. I think our our audience would love to go get a little primer on that, if you wouldn't mind.

Kyle Polich: [00:12:04] Yeah, sure. I'll give you the high level and feel free to ask as many questions as you'd like. I'm really into this. But to understand the issue, you've got to be able to understand hashing.

Kyle Polich: [00:12:13] So I'll take a little bit of that for granted that everyone knows you can apply a hash function to some value and you should get a uniformly distributed random or seemingly random output from it. So if you hash something, you get this key that should have very few collisions. So it's unique. It's isomorphic, but it's more distributed. Now, what if you hash it twice? So two different hashing functions. Well, by definition, then, if you take a single object and you hash it twice and then you take another object that you're not sure if you've seen it before and you hash that twice, the odds of having two collisions in a row like that are astronomically small. So you can take advantage of that mathematical principle. Let me make it a little bit more practical. Maybe think of something like Web traffic when people are visiting your Web site. There's a ton of metadata floating back and forth. One thing that gets passed is called the user agent string, which usually just describes something about the software like your browser. It'll say this is Chrome. This version might have a few hints about what plugins you have installed, although I know we like privacy stuff is making that more and more anonymous. But we still know something about the browser. And if you develop like your own crawler, it'll say I'm the Google bot or I'm the Bing bot. So there's a variety of these new ones come out all the time. Any developer can invent their own. If you want to count those, you have some interesting challenges because you don't know the set in advance that will show up. You don't know, you know, if new ones will be introduced at any time or all these kind of unpredictable things.

Kyle Polich: [00:13:48] Also, if you just tried to store every single one, you might run out of memory because there's so many. But if you want to ask some simple question, like have I seen this particular user agent before? That's a really hard question to do it, exactly. You have to store literally every agent you've ever seen has them all and look them up. But if we use a probabilistic data structure, it's a compromise. You say, well, look, I don't need to know 100 percent for sure. I'm good with maybe 99 percent for sure or like some tunable accuracy like that. And then that's an important tradeoff. You can study. But if you're good with like 99 percent accurate, you can get that result with way less memory. So you don't need some expensive server or a lot of engineering. And when you don't need those things, you can maybe deploy this at scale and a lot of cases. So count many things, count the mention of keywords, count all types of stuff and use those as features for machine learning. So the one I'm kind of describing is like a bloom filter, which is your best place to get started. Go Google Bloom filter. You'll find lots of good demos and talks and stuff on it, but there's a whole collection of other things like this. Hyper log log is another one that looks at the cardinality. So you know of a certain set. How many distinct values have I seen? All different ways to look at. Mostly streaming data where there's constraints around memory and compute, but unbounded things could come through the stream.

Kyle Polich: [00:15:11] So they're neat just because they take advantage of these statistical tools. They're simple and they're like a free, cheap kind of thing in a way.

Harpreet Sahota: [00:15:19] And it sounds really interesting. How do you see probabilistic data structures impacting society in the next two to five years?

Kyle Polich: [00:15:28] Well, I think in a sort of ubiquitous way, there'll be something that different, either different technologies will be developed around them. Many good examples like Redis, if you're familiar with Redis as a type of database. It got famous for being a distributed key value store, which it is. But they've added a lot of features over time. And one of them, I think they have Bloom Filter and maybe some other probabilistic data structure in there. We rolled our own so

Kyle Polich: [00:15:54] I'm not using theirs, but I'm aware they have something. I think either they will continue investing in that or maybe it'll be a lot of competition, but there'll be more tools like that, you know, open source stuff, cloud services. People will find a way to make this available. And then, you know, developers will consume it. So it'll open up new opportunities for people to, you know, apply those ideas in different areas. Maybe in some medical domain thing where you want to count patients or count instances of DNA fragments or who knows what. Or it could be, you know, in e-commerce where you want to count the clickstream data or count something about the items people are looking at for use in a recommender system. I think it will be impactful in the fact that it's ubiquitous, like that famous line from the usual suspects. The greatest trick the devil ever pulled was convincing the world he didn't exist. That's what good data science does to me.

Harpreet Sahota: [00:16:48] It's really cool how you got this kind of wide range of interest. And, you know, it always seems like you're always learning new things and you're using your podcast kind of as a way to explore these new interesting things that that you are into. Could you talk to us a bit about the data skeptic podcast mission statement? I've heard you say that, you know, you plan on doing this podcast for the rest your life, which I think is really awesome. What is it that you want people to take with them after listening to your show?

Kyle Polich: [00:17:19] I guess I'd like them to take away of like a vernacular understanding of data and how to look at it, that if you find my show interesting, I need you to be a steward of data science and the community. And I want to be one resource for you where when you encounter things in society or in your job or whatever inefficiencies where data should be applied, more skepticism should be applied. More that I've helped you learn the tools and the techniques for doing that and get exposed to the ideas that apply in data science and helped you along in your career and wherever that takes you. So it's really an educational mission. I guess I want to remain something that's fun and edutainment. I don't want to become necessarily a college course, although I don't know, hopefully there's still decades ahead. Maybe there'll be some course we put out at some point. But really, I just want to be a casual place where people get exposed to deep ideas, not hype, not CEOs. I just I want to tell the story of how data is changing the world and talk to the people who are doing it. Part of that is I think I also want to be the long form story of A.I. because I do think that while we may not see artificial general intelligence in my lifetime, we will see some profound steps towards that. And I want to be the non surface level outlet for you to learn about how those things work, I guess.

Harpreet Sahota: [00:18:39] Absolutely beautiful men. You and you've interviewed so many awesome people on your on your show. And I'm curious, you know, after having this exposure to to a ton of data scientist, how do you view data science itself? Do you think it's more of the art or more science?

Kyle Polich: [00:18:55] Well, I guess I got to answer this a little bit carefully, given the title of the program. But what it really depends on what you mean by art.

Kyle Polich: [00:19:02] So in my anecdotal experience, just me in industry, when I've encountered people who say, oh, this is an art and a science, that's always been someone who didn't bother to investigate the science thoroughly. And sometimes that phrase in particular is an indicator of someone who's either unwilling to investigate or, you know, due to some power struggle in a company or whatever, doesn't want to look at data driven or evidence based approaches on this a lot. It needs my, you know, purview. And I'm the only one who can look at some problem. And I have a philosophical issue with that. But I think there is another side to art. The dangerous side, I guess, is that art can embrace an - art, yeah, I guess that's the main point. Art in a lot of contexts embraces interpretation and even encourages it. And that part I'm not good with data. Science is about getting to the truth. And the truth is not open to interpretation, but also the truth has nothing to fear from scrutiny. So if we mean art in that art is the application of the methods, maybe the art and how one accelerates the process of getting insights or telling the story or convincing an organization to act on the data or the art of getting the right data points to the key decision makers.

Kyle Polich: [00:20:14] There's art in that. Sure. But it's all based in in science and in sort of methodology, perhaps. Also, there's an art in the beauty of the logic. You know, analysis is to me, not something banal. Like I think a lot of people look at numbers and spreadsheets as being this kind of boring thing. But I've always found that analysis is about discovery. And even though data needs to be processed in a methodological kind of exacting way, when you do it in a reproducible, auditable, maybe some kind of open source sort of way in which other people can look at that, there's a beauty in an art in the consensus by which all the experts look at something and say, yes, this is a process. This is the science. We all agree. This is the road to truth. So for me, it's it's, I guess, purely a science. But in the spirit of the title of the show, I'd say there's certainly an artistry in how one executes those methods in the most effective way.

Harpreet Sahota: [00:21:07] Yeah, definitely. I agree with that as well.

Harpreet Sahota: [00:21:09] I think the the principles that guide us and principles that guide the work that we do are definitely grounded in in science for sure.

Harpreet Sahota: [00:21:18] But to me, it's the art comes from the fact that two different Data scientists can work on a a problem using completely different methodologies, as you're saying, but still come up with an equally good result. But as long as the principles they're kind of following are grounded in science. So, yeah, I like your definition.

Kyle Polich: [00:21:38] That's another strength, too, that I really liked that you pointed out that multiple lines of evidence should all converge on the same answer. So even if, you know, five of us go off in five different directions when we have consensus, that's powerful.

Harpreet Sahota: [00:21:52] Yes, exactly man. Thank you for that. So I was wondering, not since are kind of on this on this tip here. Will you talk about the creative process in data science? How do you think that that manifests itself?

Kyle Polich: [00:22:07] Well, I'm always reminded of the no free lunch theorem, which is another good one to Google. It's at a high level. It means that of all the there's no best algorithm. There's no one optimization technique to rule them all. They're always you know, every algorithm has a suite of use cases where it's best suited. So there's some artistry in maybe how you pick an approach. But I think most of the creative process is really about design. For me, it's about like system design. How can I build something that is sustainable and maintainable and is more of a process? There's been like historically or at least the way I learned Data science has always been this monolithic batch process. Get a big training dataset, split it into test groups, train a model, deploy the model, maybe repeat. And there are some cases where you do want that lock step, like a bank shouldn't just constantly release models. There's probably some review process, but more and more industry and the need of organizations is to do things in a more real time way. So the design of systems that enable that and allow collaboration across different frames of the data and all that is really where the creative process is. I found most manifest.

Harpreet Sahota: [00:23:18] I was wonder if you can share some advice for some data scientists out there.

Harpreet Sahota: [00:23:22] I know a lot of these scientists are working on projects and they feel some type of hesitation or fear because they're trying to get their project to be perfect before they release it to the world. They're trying to design their system, let's say perfect.

Harpreet Sahota: [00:23:36] What tips do you have for anyone that's kind of trapped in that perfectionist mindset?

Kyle Polich: [00:23:41] Well, it's it's a tough situation. I think I would almost want to take it case by case because it really depends on the person. I mean, if this is somebody who you know, if you're like truly Jimi Hendrix, just trying to, like, put the last second tweak on is a great album or whatever, just release it. People will love it, you know, and maybe you can remaster it later so perfection can hurt you. And if that's what you need to hear, maybe you should think about things like how do I fail fast? Because iteration is important, but perfection is also good. So maybe some of the listeners should be cultivating that attitude or not releasing things quite so quickly. I don't know. It depends on what you're trying to get out of the world or releasing things, too. I wouldn't worry about like any kind of embarrassment, layer, hesitation or fear, that kind of thing. You know, your stuff will disappear if you put out a project that's just lame. I don't think. Who's the bully from The Simpsons? Nelson is going to show up and do the ha ha thing too many times. Most stuff is just going to disappear. And being able to fall down but get up fast is important. I've learned a lot through quick iteration and also through mentorship. So I guess maybe getting back to the point that it goes case by case.

Kyle Polich: [00:24:53] That's the real secret. Find a mentor who can help you judge.

Kyle Polich: [00:24:57] Are you being a perfectionist? And maybe you should just release, get whatever you're doing out there? Or is this truly a good instinct you have? That's the hard question you've got to answer. And maybe a mentor can help you with it.

Harpreet Sahota: [00:25:08] As you know, Data science is kind of the quote unquote, sexiest job of the 21st century as of late, and there's a lot of people trying to break into the field. And there's definitely are, you know, a lot of barriers to entry because of the amount of knowledge that you have to accumulate in order to become successful as a data scientist.

Harpreet Sahota: [00:25:28] Do you have any tips for people who are coming from a non-technical background and they're coming up to these technical concepts face to face for the first time?

Kyle Polich: [00:25:43] Well, the field is a technical field, so you are going to have to face those things. I guess it's to do it head on.

Kyle Polich: [00:25:51] To be honest about what you know, what you don't know and come up with a good battle plan for learning, yet you're not going to learn everything and the scope of everything under the broad umbrella you might call data science is too big for anybody to master. You're going to have to figure out what parts you want to master. Non-technical means a lot of different things to different people. I've heard people who were like, well, I have a master's in stats, but I didn't finish a PhD, I consider myself non technical. And then there's like, you know, I was an English major, but I'm really good at Excel and I have a passion for X.

Kyle Polich: [00:26:25] I'm going to get into that. Everybody needs to achieve the technological or technical level that's necessary for their role. So I guess my advice is figure out where you are and where you want to be and draw the straightest line between those two.

Harpreet Sahota: [00:26:42] That's good practical advice, I like that. You mentioned, you know, there's a lot to learn in Data science. And I think if you're choosing to be in Data science, you signed yourself up for a career of lifelong learning. You talk to us a bit about, you know, your view of the importance of being a lifelong learner and maybe share some advice for aspiring data scientists out there who feel like they haven't learned enough yet to even consider breaking into the field.

Kyle Polich: [00:27:15] Well, that's an area where I've got to split it again into two groups because it's going to vary by person. So take that feeling like, oh, I haven't learned enough. Well, that's true of everybody. I haven't learned enough myself. I think universally that's a great sentiment to have. So if you feel that, moderate it. But, you know, I guess the question is, is it a confidence issue or is it true? And every person is going to have to answer that for themselves if it's something that's true. You're actually in a better situation because. Oh, it's true that I haven't learned enough. What do I have to do? I have to go learn more. So if you're not qualified, step one is go get qualified. But if you are and I don't want to belittle that, make it sound like that's easy. Obviously, it's a lot of studying and hard work, but I have good advice for you there and could elaborate on that. The other group for people where the problem is not that they haven't learned enough, it's confidence that's a serious issue. And you need to work on that because the process of learning is about feedback and getting rejected. So you've got to go out there at some point and get something from the world to point you in the right direction. And you need practice and you're always gonna be learning. So find a place that's suited for where you're at, but also can help you with growth in the direction you want to move. So, yeah, it's it's coming down to that big question of is it confidence or is it true the true part's easy. Just learn more the confidence thing. I'm not necessarily the best person to help you with that, but except to say that I identify it if it's an issue and look into it.

Harpreet Sahota: [00:28:46] I liked that a lot.

Harpreet Sahota: [00:28:47] So kind of on the flip side, like, what is your advice for data scientists who they feel like they've learned enough, and just don't even need to learn anything else to be successful?

Kyle Polich: [00:28:55] Well, that's the sort of alarming perspective, I guess, because I find it to be just very willfully untrue. If someone thinks that, I mean, maybe they're the smartest person around. But I have found that the pace the world moves at requires constant learning. Nothing about what I do on a day to day basis was around when I was in college.

Kyle Polich: [00:29:17] You know, the tools I was trained on. I mean, they still exist, but they're not. I do want to say there isn't C++ anymore. There surely is, but it's used in embedded systems and in optimizing very specific things. Are you going to write some cuda code? Most of us are going to work at much higher levels than that stuff. And I work on libraries that didn't exist five years ago. And in five years I'm going to be probably working on libraries that don't exist right now. So, yeah, the notion that you wouldn't be a lifelong learners, a little crazy to me.

Harpreet Sahota: [00:29:45] That's the one thing I truly love about the field of data science. And being a data scientist is - you've signed yourself up to be a student forever. And I absolutely love that, it seems like you share the same sentiment as me. So, that's very reassuring. And I hope that our listeners are also - if you know, you're out there trying to break into Data science, just know that you signed up for a field where  you're continually going to have to learn. And it doesn't really matter if you feel like you haven't learned enough now because you're always going to feel like you haven't learned enough. And that's the beautiful part about being in this field.

Harpreet Sahota: [00:30:23] So we've talked a bit about, you know, technical skills.

Harpreet Sahota: [00:30:27] I think that that's where a lot of up and coming Data scientists tend to focus on primarily, rightfully so. It's such a key component to the work that we do. But what would you say are some soft skills that candidates are missing that are really going to separate them

Harpreet Sahota: [00:30:42] from their competition?

Kyle Polich: [00:30:43] Well  I'm gonna speak to people who are really looking for their first job.

Kyle Polich: [00:30:47] Because to me, there's a huge distinction between a first job and second job. Like, it's night and day after you've had some job. You understand so much more about the industry and how to approach things that that group already has, like exponentially more wisdom than those just starting out.

Kyle Polich: [00:31:03] So I think the most good can be done there. Let me talk to that group and say, first and foremost, it's about getting out there. No one is going to come find you and say, like, oh, we've been looking for exactly you. And the search over the world has brought us to you, entry level person. You need to get your foot in the door. And there are a hundred other feet trying to do the same thing. So you will probably need to carve a unique path for yourself in some way. Yeah, identify all the companies you want to work for. Go apply. But the odds that you're going to go to some website and put your upload your resume and get a job are astronomically small. The purpose of the resume, the first step in anything is to get on a phone, call your resume, make it through the phone call. It doesn't get you the job. Also, some, you know, ingenuity can get you that phone call. But that's your first barrier to get to. Once you've got the phone call, everything from there is how do you get to the next step? So there's also a little bit of lessons in sales that I didn't get till I was in industry.

Kyle Polich: [00:32:03] So I'll give you a quick lesson on what's called the funnel model. Visitors come in at the top of the funnel to your Web site. And if you've got a thousand visitors, you need to have a good landing page in a good pitch. So that 10 percent of the people will click to the next page and 10 percent of those will buy the product and 10 percent of those will come back and buy more later. And then you want to turn 10 percent into 20 percent and so on, so forth. So learn your numbers and learn about those processes going through the interview process. You'll get further and further different companies, but figure out how you get in and figure out how you move those conversations forward.

Kyle Polich: [00:32:34] Starts with your resumé, but it's about how you sell yourself uniquely. The biggest mistake, I see people in that age group, and by age I don't mean biological age. I mean like career age. If you are zero to one years old and career age, the most common problem I see in a lot of applicants is when they say, like, oh, I'm here to learn anything. I took a thousand random courses. I'm, you know, a puppy looking up, eager to work, put me to work. Well, sometimes that's helpful. And, you know, sometimes big consulting companies will hire up a a big stable of graduates like that and then figure out later what to do with them. But mostly what people need is not someone who says like, oh, I'll do whatever you want. It's for you to come to the table and say, I'm really into this. I've been pursuing it in these ways. I've distinguished myself from my peers by, you know, getting really good at it in this regard. And you need me for those reasons. So whether that's micro service, architecture, design, machine learning, design, coding analysis, SQL skills, whatever it is, pick your one thing and run with it and present yourself with that skill, best foot forward. Find someone who needs that skill. And as you're getting rejected, learn from it. You know, why do people say no? A lot of the answers are going to be you're not experienced enough, which is fine because you're not. But that's just a roadmap for what you need to do next.

Harpreet Sahota: [00:33:56] I like that approach. Seems like at pretty much you're seeing this kind of specialize in one thing and gain experience and that one day you make that kind of, quote unquote, superpower for lack of a better word and capitalize on that.

Kyle Polich: [00:34:13] So if nothing else, it shows me you can do it, right?

Harpreet Sahota: [00:34:17] Yeah. So I'm curious on your view versus like, you know, on Certificate's versus projects? I know a lot of people out there are just chasing certificates and just doing them one after the other. It kind of to me, it's like it's it's almost like it's a passive way of learning, you know, just sitting in front of a computer. Watching some guy teaching, and get your certificate. But, you know, what's your view on Certificate's versus self directed learning and self directed projects?

Kyle Polich: [00:34:47] Well, there's learning and there's the resume. And as far as learning goes, you need to do whatever works for you. I can tell you what works for me and suits my personality, but I don't know what works for you personally. So if it's a certificate, if you need to be part of a group and an eight week deadline and all that, and you need that pressure, that's social something or whatever. Absolutely. Go do that.

Kyle Polich: [00:35:08] I personally are I'm a little bit better with self directed, self guided, but I also take on too many projects at once and I finished things very slowly. Those are things I consider in some my strengths and weaknesses of a little bit more of a weakness than a strength. But I'm okay with it and I'm making terms with that. But you've got to figure out what you want to balance. So what are your goals? What are you trying to achieve and kick the thing that helps you learn the most as far as certificate's on a resume and all that? Some people care if a person got the certificate themselves. They generally care a lot more than other people get it. I guess to prove that they could go through it, too. As an employer, me personally, I don't necessarily look too hard at that kind of stuff because I found that the bar has gotten easier and easier to get on it. And it's like, you know, I want to be convinced otherwise that you can do the skill, not because you have some cert, but I know there are a lot of employers where that's a prerequisite, or a large company who gets hundreds of applicants and can't handle them all has to have simple filters, like, do you have n AWS cert?

Kyle Polich: [00:36:15] And if you want to get past that filter, you've got to have that cert. 

Harpreet Sahota: [00:36:18] For that group of people who is going down the projects, self directed learning type of route. Do you have any tips for them on how they can come up with an idea for a project? Because I get a lot of questions from mentees and they always want to know which dataset should I use for a project? Which algorithm should I use your project? Do you think that's necessarily the right approach to that when it comes to wanting to build a project?

Kyle Polich: [00:36:44] Well, I think it's maybe a misguided way to ask the question. For me,

Kyle Polich: [00:36:50] What I would ask of a mentee is, what are you trying to achieve? And I don't know if you have a consensus amongst them or know already what you're your mentor mentees are trying to achieve. But, you know, you're looking to develop the skill to get a job, to make a social impact. And that's really what's going to direct the advice I would give you if you know, like I'll take that social impact. You want to help your city and get some data set out the city portal and you're passionate about equestrienne rights, or pedestrian's, or homelessness or whatever it is.

Kyle Polich: [00:37:24] Don't worry about the algorithm or whatever, because the hardest part is going to be the turn Data into something actionable.

Kyle Polich: [00:37:31] There's no fancy technique that's going to matter unless you just want to prove that you can use a technique. But if you're really after something impactful, it's got to be something very simple and very like raw to the data and probably will require collaboration closely with the people who produce that data or the city or whoever. If, on the other hand, you're like, oh, I'm trying to get into grad school, I want to put up a good GitHub profile, then yes, it would be more about demonstrate the techniques or if you're trying to get a job that you look as knowledgeable and presentable on the outside. So figure out what you're trying to achieve. And I think that will guide most of the next steps on what you want to be doing.

Harpreet Sahota: [00:38:13] Yeah, like that. I like that advice a lot. Yeah. I tend to tell my mentees that - Just start with a question that's inherently interesting to you, that you find interesting, because the first thing is that's gonna keep you motivated throughout the process. And if it is for you to perhaps land a job, then really think about what industry it is that you're trying to be a part of. Then read up a bunch of case studies about how data science and machine learning is being applied in that particular industry, so that you can just come up with some mental models for yourself and develop a bit of vocabulary for that particular industry and then work from there.

Kyle Polich: [00:38:51] And whatever the scope of your project is, cut it in half. Think small. That's the only way to get stuff done.

Harpreet Sahota: [00:38:57] I feel like some people come up with these really huge projects and, you know, you've got to give them big ups for the good thought and wanting to put in the effort. But that can really kind of leave you feeling stuck if you don't make progress quickly on it.

Kyle Polich: [00:39:10] Yeah, it's hard to move a mountain.

Harpreet Sahota: [00:39:13] So you mentioned there's there's that you had advice for for the soft skills for that first group who were just breaking into the field. And you mentioned that there's, you know, people who are maybe one to two years into their career. They've had a little bit more experience, let's say that group of people who are, you know, one to two years into the career or perhaps in their second job. Do you have any tips for them if they were to find themselves in a room full of executives and they need to communicate their ideas.

Kyle Polich: [00:39:40] Yeah, I guess in that scenario, first of all, you should understand the dynamics of the room and what they're what you're trying to accomplish.

Kyle Polich: [00:39:47] So I have personally worked in mostly small and medium enterprise, a little bit of big business. But there is a world in which you walk into an executive board room and there's a lot of suits and ties and, you know, even like a military style hierarchy where, you know, you should be careful about how much you speak and that kind of stuff. I've never been in that world. I know that world exists. If that's you ignore my evidence, my advice and all that. But I have always been. Or maybe I've read the room wrong. But I've always felt I was in rooms where it was very flat and very cool to chime in and say whatever.

Kyle Polich: [00:40:26] You got to read the room and make sure you're letting the leader of the meeting lead it. But I've never felt like, hey, my comments would be out of place here, although I wouldn't encourage people to speak up. But also be careful, know what the purpose of the meeting is. Don't be a distraction. If you've got to communicate your ad, your information or your results or whatever, you have to know your audience. So who's gonna be in there? Ideally, think about if you could be two steps ahead of them. So I think everybody's seen the movie Groundhog Day where Bill Murray keeps repeating the same day. If you could do that, you'd eventually be perfect at this meeting. Unfortunately, most of us don't have Bill Murray's power. So can in your mind, you simulate. Who is this person? What do I know about them? I've been in meetings with the CFO a bunch of times. He's a stickler for details. Maybe I should show up without having any errors whatsoever in my data or the marketing lead is always concerned about Facebook data. So make sure you merge things in and you come prepared to answer the questions you think you're gonna get. Those people are trying to do their job, which may not be the same as your job. So if you can guess what they need to know what decisions they're trying to make and you can help inform those decisions, then you're ready to go.

Harpreet Sahota: [00:41:43] What's up, artists? Be sure to join the free, open, Mastermind slack community by going to bitly.com/artistsofdatascience. It's a great environment for us to talk all things Data science, to learn together, to grow together. And I'll also keep you updated on the open biweekly office hours I'll be hosting for our community. Check out the show on Instagram at the @TheArtistsOfDataScience. Follow us on Twitter at @ArtistsOfData. Look forward to seeing you all there.

Harpreet Sahota: [00:42:14] Awesome man, thanks for that great advice.

Harpreet Sahota: [00:42:16] Kind of good advice in general is just, know your audience. No matter where you are. Just know your audience. And I think you might be aging us a little bit with that Groundhog Day reference. I for one definitely remember that movie. I'm a huge

Harpreet Sahota: [00:42:30] Bill Murray fan

Kyle Polich: [00:42:31] That should be assigned reading just culturally like it's just a good movie.

Harpreet Sahota: [00:42:34] Anything by Bill Murray. That's it Bill Murray. I wish, he was my best friend. Have you seen A Very Murray Christmas on Netflix, by any chance?

Kyle Polich: [00:42:46] Is this the documentary about all the people who've had strange encounters with him?

Harpreet Sahota: [00:42:50] No, this is actually Bill Murray's Christmas special. It's just like an hour long. And this is a Christmas musical special. And right around Christmas time, I had that thing on non-stop for like a month straight. It is the most amazing christmas movie.

Kyle Polich: [00:43:03] I don't know it. I'll have to check it out.

Harpreet Sahota: [00:43:06] It's pretty awesome. So while we're on the way here, you know, I was wondering what advice or insight you could share with data scientist for breaking into the field. You know, they come across these job postings and some of them seemingly want the abilities of an entire team wrapped up into one person, and they tend to end up feeling dejected or discouraged from applying.

Harpreet Sahota: [00:43:29] I was wondering if you might be able to share some tips or some insight into that for them.

Kyle Polich: [00:43:34] Sure. So I'll spin this one it's had a little bit. I know you feel how you feel. I've been in a similar position earlier in my career, but it's it's easier said than done. But you should turn that sort of feeling of discouragement or dejection into one of gratitude, because that company just gave you the list of checkboxes you need to check. OK. It has 10 things. You only know two of them. Well, you love this field, right? Go learn the other eight. Now, I know it's not as simple as that, and that might be a mountain to climb.

Kyle Polich: [00:44:04] Also realize, though, that a lot of those job descriptions, there's very little science to how that stuff is put together. In fact, having now been under the hood and worked with recruiters and built teams and stuff like that, I know how what a comedy of errors that kind of stuff tends to be.

Kyle Polich: [00:44:22] What usually happens is a company will either have an internal or external recruiter who does all the work. They're not an engineer. They're just going to probably go find some random job description copy and paste it. They don't even know better. Right. You had this weird Frankenstein monster of a job description that may or may not loosely correlate with the with what the company needs. So just apply. What's the worst is going to happen. You get rejected. No one you know, no one cares. No one remember is your resume and me. Just the worse it happens is you get into the pile and no one sees it. So just apply. Now, you should also examine that feeling because you feel like, well, I'm super unqualified for this year. Maybe it's because that's a part of job description, like I described. And you'll be fine. Let the company make that decision, but also get some external advice. You know, if you're a person with two years experience, probably CTO is not the next logical step for you. Maybe maybe if you're an all star, but maybe more like senior developer. Senior data scientist is where you want to go next. So just make sure you're looking in the right place. But if you are that job description, that's not a barrier. It's a list of, like, nice to haves. And you can apply and they'll make a decision.

Harpreet Sahota: [00:45:36] That's really good advice. I mean, like, if if you didn't apply, they weren't gonna call you back anyways. Right.

Harpreet Sahota: [00:45:42] So, like, there is no...

Kyle Polich: [00:45:44]  It would be weird if they did.

Harpreet Sahota: [00:45:45] Yeah, right. But I like that idea of using the job description as a way to kind of guide your studies. So I think a good thing that anybody breaking into the field could do is maybe take an inventory of five or six different job postings that are really resonating with you, print them out on actual paper and highlight commonalities between them. And then all the sudden, you've created a self directed learning syllabus for yourself of skills that are going to get you into a job that you want to get into. So definitely like 

Kyle Polich: [00:46:20] Great idea,

Harpreet Sahota: [00:46:21] that approach, as well.

Harpreet Sahota: [00:46:22] So last question here before we jump into the lightning round. What's the one thing you want people to learn from your story?

Kyle Polich: [00:46:29] Well, I don't know that there's anything unique to my personal story. Maybe my story is the story of data skeptic in some way. And the story there is really that, I guess, the most useful, most powerful tools, most effective tools.

Kyle Polich: [00:46:43] That's all data driven. I want to be the source, and that the story of Data skeptic is sharing the things I've learned with people and helping everyone understand that it's not always easy to manage data, store it, analyze it and leverage it. But it's well worth it because the tools and methodologies that you can learn are pretty much the most effective way to build things and to learn things and to optimize processes.

Kyle Polich: [00:47:09] So I guess the story of Data skeptic would be about being skeptical of data and learning the techniques to do so. And maybe that's what I hope people learn from it.

Harpreet Sahota: [00:47:19] Let's go ahead and jump into the lightning round. What's the number one book? Fiction, nonfiction or both that you'd recommend or audience read and your most impactful takeaway from it?

Kyle Polich: [00:47:28] Well, I'm really more partial than nonfiction anymore. And in the spirit of some of our questions and who I presume the audience might be, I'm going to go with a book called OpenIntro Statistics. You get as a free PDF or as like a ludicrously cheap down real book. It's like 10 bucks on Amazon. I don't know if this is a myth or not, but somebody at our conference, one told once told me that the printed version, they keep it at cost. And that means they have to adjust it due to the price of paper fluctuating. So it's always like ten dollars and 47 cents or some weird number for any event. OpenIntro, because it's a great easy to follow.

Kyle Polich: [00:48:06] To the point way to get on board with stats if you're not already beyond that. Yeah. There's a million different directions you could go, I guess, closest to data science and the ML side, Elements of Statistical Learning. So I guess that those are my two book recommendations.

Harpreet Sahota: [00:48:21] Awesome. So what are you reading nowadays? Is there like a book that you're currently in the middle of?

Kyle Polich: [00:48:26] Well, I've been really trying to get back into a book called Kolmogorov Complexity and Its Applications by Ming Lee and Paul Bettany.

Kyle Polich: [00:48:36] It's an area of theory of computation that I've always been passionate about.

Kyle Polich: [00:48:40] And I've taken this COVID experience to kind of focus one thing on a product we're developing and then the rest of the time on learning some quantum computing stuff and a little bit more about Kolmogorov complexity.

Harpreet Sahota: [00:48:52] So what would you say is your favorite sub topic within data science?

Kyle Polich: [00:48:57] Well, at the moment, I guess I should say A.I.

Kyle Polich: [00:48:59] But I'm good to go with probabilistic data structures. They've been so interesting and so useful to me in a particularly application we're developing that it's got me very excited about doing more with them.

Harpreet Sahota: [00:49:10] If we could somehow get a magical telephone that allowed you to contact 18 year old Kyle, what would you tell him?

Kyle Polich: [00:49:17] Buy Google stock

Kyle Polich: [00:49:19] And don't talk to girls named Suzy. Honestly, I don't know. Yeah, I feel like every good and bad choice I made brought me exactly where I am. And it'd be a sort of existential suicide to say anything else. I also don't know that I could get through to myself at that age. So I guess just keep on keeping on.

Harpreet Sahota: [00:49:40] So what would you say is the best advice you've ever received?

Kyle Polich: [00:49:43] Oh, I think that's. I'm not really a fortune cookie guy, but I've got a good one. It's work smarter, not harder.

Harpreet Sahota: [00:49:49] That's good man timeless, timeless advice. What does that mean? What does working smarter, not harder, mean to you? How does that kind of play itself out in your in your day to day?

Kyle Polich: [00:49:58] Well, I'll tell you, at the times when I have worked the hardest for other people, like when I wasn't running my own company, you know, places where I was doing 70, 80 hour weeks because we were under these big deadlines or big pressures or whatever. Those were the times when, like looking back, that company was in a disaster state and there was no reason for anybody to be doing things like that. We should have somehow taken a better assessment and found a way to be smarter about what we were doing. So if you find yourself spinning your wheels or doing things that are just insurmountable, unless that insurmountable thing is like the one thing that you and all the world are good at. So people have to come to you to do it, you know, find something that's easier and usually that means ingenuity through intelligence. So find a smarter way to do your process, to automate the eat the hard parts or automate. I guess the easy parts are up. Figure out what's the core of the problem. But yeah, solve it not with lifting more, but with smarter techniques, better algorithms and that kind of stuff.

Harpreet Sahota: [00:51:02] What motivates you?

Kyle Polich: [00:51:04] I guess it's really a burning desire to understand the mechanism of everything I encounter.

Harpreet Sahota: [00:51:10] Very profound. I like it a lot. I like that. So what's the song that you currently have on repeat?

Kyle Polich: [00:51:18] Oh, it can only be a song. I was gonna go with the album, man. I have been heads down for the last month coding on a project and listening to Save Us All by Be Like Max. So if I've got to pick one track, I guess it's Home Away From Home, because I've probably had that played...I don't know why I'm not the king on Spotify yet. I have been really run that front to back. Many times last couple of weeks.

Harpreet Sahota: [00:51:41] Have you ever gotten this message on the Spotify app when you open it up that, yo, you're in the top one percent of fans for this particular artist? Has that happened to you with with Be Like Max yet?

Kyle Polich: [00:51:52] No, that's why I said I was surprised. A couple of similar bass. I'm kind of big into discussing, like I'm one percent on Big D and The Kids Table and I think on Skank and Pickle, but I got that, like, right away and I don't know why I'm not on a couple more bands.

Kyle Polich: [00:52:04] But, that's good  then right.

Harpreet Sahota: [00:52:07] Yeah. I like the ska too. I'm originally from California. Born. Raised. There so..

Kyle Polich: [00:52:15] Nice.

Harpreet Sahota: [00:52:15] The Aquabats were pretty awesome. I don't know if Sublime could be considered ska, but whatever they were like, .

Harpreet Sahota: [00:52:25] I absolutely loved that.

Kyle Polich: [00:52:25] For sure. Orange County seen then. Yeah.

Harpreet Sahota: [00:52:27] Yeah man, Then OC is awesome man. So hey. How can people connect with you. Where could they find you.

Kyle Polich: [00:52:34] All right. Well in terms of one way communication weekly, you can catch me on data skeptic, like you'd mentioned. I'm doing a livestream on May 30th. It's six year anniversary of the podcast. And we're also going to be unveiling some tools that we're gonna give out new data science tools in this cloud thing I was mentioning, we've been building out, so I'm excited about that. And we're going to Q&A and stuff. So that's a place to meet. Beyond that, you get on the Data Skeptics Slack Channel. I'm always in there. Or you could try email, but you have to write really good emails. I get too much of it and I miss a lot. It's just Kyla@DataSkeptic.com But - Oh yeah, And then Twitter @DataKkeptic. That's a good one, too.

Harpreet Sahota: [00:53:12] Kyle. Thank you so, so much for taking time out of your schedule to chat today. I really, really appreciate it. Can't can't express my gratitude enough. Thank you.

Kyle Polich: [00:53:20] It's been a pleasure. Thanks for having me.