Sean Tibor: Hello and welcome to Teaching Python. This is episode 88. Are we still preparing our students to look at data science? So as part of our series, we're going to be continuing our look at new careers that are empowered and enabled by Python. My name is Sean Tibor. I'm a coder who teaches. Kelly Schuster-Paredes: And my name is Kelly Schuster Perez. And I'm a teacher who codes. Sean Tibor: And today we're joined by a very special guest. We have Michael Garnick joining us. Welcome, Michael. Michael Galarnyk: Hey, thanks for having me. Sean Tibor: Yeah, we're happy to have you. Michael currently teaches Python for data visualization for LinkedIn learning, data analytics using Python for UCSD Extension, machine learning fundamentals for UCSD Extension, and machine learning with Python for Stanford Continuing Studies. So that's a pretty good course load, Michael. Michael Galarnyk: Yeah, it's been fun. Sometimes I teach more than others, but I'm a little bit more on the high end of teaching right now. Kelly Schuster-Paredes: And you have tons of videos out there that I've watched a couple of them. Michael Galarnyk: Yeah, quite a few YouTube videos. I'm more of a blogger these days. Get into, like, why? In a bit. But I find that blogging is very easy to do whenever you want, whereas videos, you have to be in a nice place. You have to have quiet recording area and stuff like that. So log in and do an airplane. Kelly Schuster-Paredes: I agree. It's very hard to get a recording out, and it's much easier just to type something on a lesson plan and get it going. But sometimes I remember during COVID recording in my garage because everybody else is sleeping and I'm trying to whisper and the sound system is horrible. But I'm like, at least it's recorded done. Sean Tibor: Well, we'll get into our main topic to discuss all these things that you're teaching and the roles and jobs that students can take, starting now, using Python and using some of these other programming and computer science skills. But before we do that, we're going to start in the same place we always do, which is with the winds of the week. And as we like to do, we're going to make Michael go first because he's our guest. So, Michael, is there a way you'd like to share from inside or outside the classroom this week? Michael Galarnyk: Well, so for this week, my win of the week is inside the classroom. And so one thing I've realized with having a lot of students is you not only teach students and give your knowledge to them, but they also give you knowledge to you. So my students started projects this week for my machine learning course at Stanford Human Studies. And so they get to choose projects on eight days that they want and whatever direction they like to take. So this week I learned about imaging noising, using auto encoders, because when my students are interested in cleaning up some old images, get image to text kind of applications. I learned a lot about auto encoders how they work, where they can fail spectacularly and so on and so forth. And so if I didn't have my students, I would have the opportunity to learn so much from them. That's my way of the week. I spend a lot of time on images. Kelly Schuster-Paredes: I love that. And your win is like my win this week. I just love that ability to learn from students. I think that's where no offense to Shawn, my mentor, but I learned so much more from the students and helping them sift through all the stuff that they're finding. And it's just a great win when you have that. Sean Tibor: There's no offense taken, Kelly, because I've learned so much from our students also. It's amazing what they bring to you. And the teacher benefits as much as the learner 100%. Kelly Schuster-Paredes: I'll follow up with my wife since mine was totally the same thing ish and it has to do with the whole idea. Michael, we provided a lot of agency, especially with our 8th graders. We build an agency of learning in 6th and 7th. But by the time they get into 8th grade, the projects are so significantly different. You have about I have only 20 Ish kids in each class, but it's full on these 13 and 14 year olds when they have problems. And they're right now investigating libraries that they have chosen. Some of them sidekick, some intense or flow. And these are kids that have basic concepts of Python. The idea is they're not coding at all. It's a demonstration of learning. It's their ability to figure out what they're reading, the documentation, the big words, the stuff that they have no clue about. And I had a student who was working through the Google Trans library. She really wanted to use that and play around with it because it was very interesting of how Translate works. And she kept reading about it. She's like, it's supposed to say, this is Spanish and it means hello. And this is from Arabic, and it means whatever. And it wasn't working. And everyone's asking me a question and somebody else was having some caress, which I had no clue. And another person had something else going on. And I was just like, okay, I don't have enough time for this. Mr. Tiber's not in the classroom with me. I'm going to my other best resource. And I just got onto Twitter. I took a screenshot, and I was like, I need help. And I had Vasco Mora from Lisbon and Danger Mouse not know where he is, but he's a data nerd, amateur sports prognosticator. And anyways, he goes to me, they both say to me, well, if you scroll down on this GitHub, there was a pool. And I go to the GitHub link and it's like 40 pages. I'm like, I don't have time to freaking scroll down 40 pages in this classroom. Scroll down. And it was a simple version that needed to be imported versus the newest version. And so we did a double equal sign, and we did version, whatever. And she's like, and then she's like, Wait, it's only translating it as Spanish. I'm like, I don't know. That's a different problem. Go away. But it was such a cool thing. And we had that power of the community of Python, and it was such a win. And showing her and the rest of the students that in 20 minutes, somebody replied to us and solved our problem. And it was just huge. And it was like, Yay, I'm seeing a recurring trend. Sean Tibor: Like, I think this was your Win last week too, with ability to go to the Twitter Verse and ask for help and get really good quality feedback from the community. It's awesome. Kelly Schuster-Paredes: These guys, these girls, these people that are out there just helping me because we just don't have the time to answer everybody's problems because it takes time to dig in for these error messages. Michael Galarnyk: You're not an expert in everything. That's the thing that you realize you don't know everything and you never know everything. Kelly Schuster-Paredes: That was the funny one. When Reuben learned back, he was like, I don't know, I've never heard of it. And they go, what? I thought you said he was an expert. And I'm like, oh, my God, he is. And she's like, well, he doesn't know it. I'm like, Nobody knows everything in Python. So it was great. It was so good. Still makes me smile. Sean Tibor: Well, for me, this week, the Win was working with my student Mentee that I've been tutoring and helping her with her website project. And it was another one of those great sessions where we were talking about something. I forget what it was, but it was really around. She's going to set up her website, and we wanted to talk about how files move around and what is an actual web server? Like, why do you need a web server when you already have a computer? And so we were talking about all these different elements of what goes into making the Internet work and why do they call it the World Wide Web and all just different topics that kind of gave her the foundational knowledge to be able to go learn and do the things that she needed to do. But it was just really great because we were talking about everything from how DNS systems work, how traffic routes across the Internet and hops from point to point over switches on the network. And we got into this really fascinating discussion about cybersecurity and treating the Internet as an untrusted network, where as soon as it leaves your home and your local network, that you should assume that anybody can be trying to look at your data. And so that gave us the ability to talk about encryption and being able to make sure that the data that you're sending from your computer gets to the other side in an encrypted way that other people can't look at. And it was just really fascinating. And what I thought was the real win was just how engaged and interested the student was in the whole concept, the whole topic. That natural curiosity and the questions that she was asking about. But where would they actually look at that data? Isn't it just going straight to the other computer and we got to talk about things like the myth. I don't know if it's actually true or not, but the myth that's always gone around about at and T data centers and switching centers where there was the locked room that only some guys in dark suits had access to back in the 70s and 80s, and all of these things about you should just assume that not everyone out there is a bad person, but there's definitely people who are going to be trying to look at your data and you need to assume that that's happening all the time. And so it got us into this whole conversation about so when you put that server out there, how do you make sure that it's well protected? How do you make sure that it is able to send and receive the data securely? And how do you get your files to it in a way that makes it so that you can change them but nobody else can? And it just turned into this really big conversation about how the Internet really works. And I was just really excited and interested to see how engaged she was in that conversation and how fascinating she found it, because it's something that I find fascinating too. Kelly Schuster-Paredes: Your energy pawned off on her, I'm sure. Sean Tibor: I mean, I hope so. But I was also the guy who was in my networks class in College going, tell me more about the first time you installed thick Ethernet cable on campus. That sounds amazing. Kelly Schuster-Paredes: Not me. Just tell me what color to put on my background. Sean Tibor: I mean, different things for different people. My thing is networks and the Internet. Kelly Schuster-Paredes: Cool. Do you have any fails you want to share, Michael? Sean Tibor: Oh, man, there's lots we weren't going to make him go first. Kelly Schuster-Paredes: I did it. I did it. He doesn't have to. He can pass. It's very about social emotional learning and understanding. Here you can say pass. Michael Galarnyk: I think for me I started dancing again because before the pandemic, I used to do like a lot of Lindy hop, like a lot of swing dancing. And it was pretty good. But then realized that over two years have passed not doing anything. I realized that I'm not able to bend my knees the same way anymore. I'm not able to do a lot of the same things. And so I embarrassed myself because I can't bend the same way I can anymore because there's jumps, there's all sorts of stuff and did not work out the way it plans my limitations. Now I'm going to slowly get back into things and be patient. I guess I'm not going to go into details. What happened? Kelly Schuster-Paredes: I used to do a lot of swing dancing, and I can only imagine. I did West Coast and East Coast swing with West Coast. It's more of a line. And East Coast is kind of like a Lendy. And one time I blame it on my partner, but he let go and it was a fast spin and I face planted because he had my hands onto the floor. And so I can only imagine what happens when you don't dance. For two years, I stopped dancing. I'm never dancing again after that faceplant. So I totally get it. Sean Tibor: You talk about the cloud, right? About things that I know a lot about. And you're like, oh. Kelly Schuster-Paredes: You never knew I danced, did you? Sean Tibor: No. And I still have no idea what you're talking about. Kelly Schuster-Paredes: That's okay. Michael knows. Michael Galarnyk: Yeah. I think patience is probably a good lesson for a lot of things that you're starting again. Sean Tibor: And I think that's a good point, too, is like we're all starting back to a lot of new things these days in the US. A lot of mandates are starting to get lifted, and I think we do and should be patient with ourselves for a while here as we start to go back to whatever the new normal is. Kelly Schuster-Paredes: I still remember, like Richard Culotto when he was telling about how he went and traveled first on I think it was Richard when he traveled first on the plane. He couldn't remember the whole routine. And I keep going, oh, my God, I'll be traveling on a plane soon. What do I have to do? Tsa. You start thinking of these things that you haven't done in so long. Anyway, I digress. Go. Any fail for you yet? Sean Tibor: Just the usual tech fails. I'm working on a project right now that seems like it was pretty straightforward at the beginning. And it's just a series of not working, not working, not working. But again, I'm feeling that forward progress of at least there are new things that are not working and not the same stuff. So I know that I will get to the end of it. But it's definitely making me think about that whole concept of the difference between being new to something, being a novice versus being an expert is that I know that I'm going to get to the end of this path and it's going to work. It's just that it's going to take time to get there and there's going to be a lot of stumbling blocks along the way versus being a novice not knowing that there's an end to it and always feeling like everything you do doesn't work. I know that it's going to work eventually. Kelly Schuster-Paredes: It's very true. Well, my feel is very simple. I put some code up forgot that I had intentionally put that code up as teaching lessons to tell them to move around some lines and to get rid of some repeating. And I was like, here's the code. Copy it. And they're like, it's not working. I'm like, did you copy it? Right. It worked for me. And of course, I didn't try it. It worked for me. And then I look at it, I'm like, oh, well, that code is not even in a while loop. And I was like, oh, wait a minute. Why is that ring twice? I was like, oh, yeah, that was a different lesson. Whatever. It's one of those times now. I'm so comfortable. I'm getting comfortable with the circuit playground now that I'm able to easily. So, Michael, I hated the circuit playground, but now I've been using it so much, it's actually becoming actually becoming something that I'm getting a little bit better at. So I was able to solve it, get the lights working, get the accelerometer working. And the kids are like, cool. And so that was my sale. But anyways, it was a short one. So let's get started. Sean Tibor: Yeah, let's do it. I'm excited to talk about this because if there's anything that I'm nearly as excited about as networks and land cables and thick Ethernet and all that stuff is data science, data visualization. And I know Kelly is right there with me. She and I have both geeked out over line charts before and scatter plots and looking at best fits and everything. We've definitely shared that enthusiasm in the past. We're happy to have you here, Michael, to geek out a little bit about this whole area and talk about what the world looks like now for data science and analytics and visualization, and how we can best prepare students to be able to start using those skills and knowledge to be able to empower themselves. So maybe we can start just a little bit about how did you get into this area? What's your background and what are you doing today in your teaching? Michael Galarnyk: Well, I have a bit of a divergent background, like a very nonlinear sort of career path. And this is probably a good lesson for students that you don't have to start out in the career that you eventually end up in. So my undergraduate and Nano engineering, if you're wondering what that is, probably a good question, but it's like engineering things at a Nano scale or micro scale. And that clear possibility for my undergrad was not very good in terms of prospects, money, and those sort of things like think of the engineering in some cases as like chemistry and biology to create small scale structures. Think of it as fabrication, computer chips in some cases and things like that. Then I realized I don't really like working in a lab necessarily. I don't necessarily like working at clean room. So during my undergrad, I was fortunate to work in a lab. And one thing I realized and I realized these days is that data is everywhere. And I was working on a project about tracking particles underneath the microscope, and there were hundreds of thousands of particles. And in order to do it efficiently, I had to learn some data skills. I had become a little bit more data literate. So I started using MATLAB and Python to track particles and just clean my data, organize my data, and be able to produce conclusions. So I started getting into making content because I was sharing what I was learning. I was writing research papers. If you look on my name online, you'll see, like, a lot of research papers, but microfluidics tracking particles, Fitbit studies, those sort of things. So I started realizing I really like making content and talking about data science, machine learning. And I like sharing with people. And one way you do that is through visualizations. And also just by teaching basic Python stuff, it really helps to have visuals of what's going on underneath the hood with Python and machine learning and stuff like that. So I write blogs. I started getting asked to write courses, so I end up with a UCSD extension. I started creating learning classes because people like my content for my blog, and it's kind of escalated from there. And what I do these days is I work for intel and Converge IO, making content about machine learning, distributed computing in some ways, and just how hardware interacts with software for data science. So I do a lot of different things, and I can kind of break that down into smaller chunks. Kelly Schuster-Paredes: If you like, I'm going to divert you because we definitely need to break it down a little bit. I did read some of the previous research papers about the little Nano, I don't know, nanobots or something. And I was like, oh, that sounds cool. I have no idea what he's talking about, but that sounds cool. Cleaning up things inside of molecular level and being an ex biology major, I was like, oh, I wish I had known that stuff or Python back then. I might have stayed in biology, but I quickly got out of biology after 1520 years quickly. So I like the concept of the whole getting the data and manipulating it. And I think that's what Sean was saying. It's like, for me, nothing made I shouldn't say nothing. Python became easier when I met the Matplotlib library. I was just like, I get it. Talk more about kind of how you use it and how you use it with the data to clean it up. Michael Galarnyk: To clean up the data. There's a lot of different libraries in Python and to visualize data, there's a lot of different libraries you can use in Python. So first of all, the biggest thing with data is getting into a format that you actually can visualize it. So that's why people use there's a lot of reasons people use Pandas. But a big reason is Pandas makes it easy to clean your data, manipulate your data and put into an organized Tabler format. And that Tabler format can then be used to visualize various columns of your data, to look at statistics of your columns. So oftentimes you want to look at relationships between multiple columns and that's what Pandas you do once you have your data in a tablet format. So that's typically how things start. And I should say that Map. Lib is a great library for data visualization. Python it's a bit older of a library, so you can do whatever you want visually in Mathew Lib, except for interactive animations for the most part. And you can specify wherever you want colors you want. You can specify how you want to grab things, relationships. You can automate a lot of your visuals because oftentimes people have data coming in and they want to create snapshots of their data. They want to look at a machine learning model as it's training. So there's a lot of different use cases for why you want to visualize something specifically in Python versus using the eye tool like Tableau or Power Bi or a different library, even just visualize something in Excel. The reason why you do something in Python is that oftentimes you have a lot of data and you want to be able to crunch it and visualize it while you're still working. In Python, that's map. Lib separate subject separate. Is there's a lot of wrappers around Map. Lib that make it easier so you don't have to do a lot of boilerplate code in Map. Unknown: Lib. Michael Galarnyk: Map. Lib is very powerful. It allows you to specify anything you want. But the problem with a lot of libraries is their defaults aren't necessarily the most obvious, the most easy to work with. There's a library called Seaborne, which essentially is a wrap around Lib that allows you to do a lot less code for a lot prettier results, typically. So people often use Seaborn, and then they can buy with Map outlive if they want to adjust some final X tick size or some label or whatever. And then people also often use Pandas wrapper around Map. Lib to create graphs directly from Pandas as well. So there's a lot of different ways you can create graphs in Python, and I'm just naming a couple of them that is cool. Sean Tibor: I think one of the things I wanted to touch upon also is really that visualization piece as a way to communicate. Right. And I think that that's something that's really important that non data scientists or non data communicators, I guess is another way to put it, maybe oversimplify or don't really understand, is that the best way that we found so far for humans to communicate information and to understand it is visually, right. Especially when we're talking about large scale. And I think that's one of the things that is also helpful to define when we talk about large scale, when it comes to data science. We're not talking about hundreds or thousands of rows or columns of data. We're talking about millions or beyond billions of tuples of information, potentially. Right. And it's almost impossible for any human to look at millions of rows of data and to be able to derive meaning from that. Whereas with visualization techniques that you can do in Matt, plotlib, Seaborn or whatever, you can communicate and convey information and meaning with those visualizations, can you talk a little bit about that role that visualization plays in both the work that you do as well as the teaching that you do and how to communicate that with students? Michael Galarnyk: So for general work that I do, the biggest thing for visualization, particularly in a programming language, is you're working a lot in massive data sets. And sometimes you can only ample and graph part of your rows or part of your columns and look at trends, look at outliers, look at box plots, look at violin plots, those sort of things to kind of get a better understanding of your columns, of your data. Because there's no way you can sometimes look at all the points of your graphs. It's just not possible if you do, oftentimes your entire graph is filled in with just color because you have way too many points. I should also mention that part of the reason why people use tools like iPhone, map outlets, keyboard, is that a lot of times tools like Excel, they can't handle the volume of data that's being asked to graph or visualize because there's just too much data. So that's another reason why people use Python, things like that. And then as far as a role in teaching people visuals really help people understand basic relationships that they normally wouldn't understand or see on their own. So one example I commonly use is for, like, mortgages. If you look at interest and principal over time on a graph, you can see the consequence of taking out a 30 year mortgage to 15 year mortgage very easily in terms of how much interest you pay over the course of a loan. And that's a really simple graph. But it's a very powerful thing to show people visually what they're signing up for. And for machine learning, you can often see how models training over time, like as you're training a machine learning model, which is extremely powerful because machine learning models take a lot of computational time. This often translates to money. If you're training on the cloud. And if you look at a graph that shows that your machine learning model isn't getting better over time, after a certain point, you can stop training. And that visual really gives you insights. Your model as it's going. Kelly Schuster-Paredes: Here'S a crazy thought. I don't know if this sounds stupid, but as I'm thinking it so when you have those machine learning modules, TensorFlow and everything, and I know that libraries are on top of each other, is it running with some sort of Matte Put, Lib Seaborn kind of graphing when it comes out as well, or is it built in as? I don't know. I was just thinking like, how does that I saw that when we were doing the Deep racers with AWS. Michael Galarnyk: Is that part of and I don't know if this is how they're doing it beneath the hood, but for a lot of TensorFlow as well as PyTorch, people tend to use something called TensorBoard, which provides visualization and tooling for the machine learning process. You can visualize the layers of your neural networks. You can look at how your model weights change over time. So model weights can think of it as if you ever looked at linear regression where you have like a Y equals MX plus B, your M value can think of as your weights. You can look at how those sort of things change over time for neural networks. More complicated than just having a slope value for a linear line, of course. But those are sort of things that tend to force gives you access to see. And of course, there's a lot of other things that you can see as it goes along. But the tool for that is typically TensorBoard. Kelly Schuster-Paredes: Got you. Shaun and I have been saying since we started teaching that they need to provide some sort of data science class, math kind of thing where we're looking at data, case in point. I have a student who is investigating the Seaborne library and I had no idea what a violin plot was. And he's like, what does this show me? I was like, I don't know, I'm a biology major. We only use these things. But I did find this on LinkedIn. Thank God for LinkedIn data to visit website. I'll put it in the show notes where they have here. And it's really tiny infographic. When you look at it, you have to increase the view. But this is what you would use this plot. And nobody in regular not nobody but people who do data, but we don't see these kind of plots. So why there's so many plots? Do you not explain that? Why are there so many plots? Michael Galarnyk: Well, like anything, when you're starting anything, a lot of data science is about practice and getting familiar with the space. It's the same thing with learning any sort of skill. You have to practice it to kind of get an idea of which plot to use for which occasion. And for a lot of data science work, people often try a lot of different things and see what works best for their particular data set, their particular use case and for what they're trying to accomplish. Use case and accomplish is pretty similar. But sometimes you look at distribution and you see, oh yeah, I have a lot of outliers here. I should look further into that. Or you look at distribution, nothing is there. Or look at the box plot you have no outliers. So you may want to change a visualization technique. You look at a bar graph, maybe you see nothing interesting. So you try a lot of different plots and just go from there. Sean Tibor: Yeah. I think this kind of gets into a bigger topic, which is kind of that data literacy and the idea of what are you trying to communicate and what are you trying to see? In addition to the science realm, where we look at measured data and measured values, I think this is particularly important when it comes to data journalism, for example, is a great place where this is used and misused in a variety of fascinating ways. Right. But at least what I've seen is that there are a lot of different ways that people use visualizations to convey specific meanings or to imply relationships that may or may not actually be there. Right. And so one of the things that we've talked about, Kelly and I, in terms of what should be taught is how to critically think about visualizations, to be able to analyze them, to look at them and ask those questions that say, what is this really telling me? What else could this be showing? Or what other visualizations might be helpful here? Michael, how do you incorporate that into the teaching process in terms of visualizations, that iterative approach to creating visualizations and looking at what may be next or what may be necessary. Michael Galarnyk: So one thing I always teach students is to start out simple and then go more complex, start the line graph, bar graph, start with the pie chart if necessary, whatever, because oftentimes the more complicated you get for the visuals, the more easy it is to misinterpret them or not be explained to someone else. The visual is oftentimes a lot less valuable if you can't really explain what's going on. I think the other thing with a lot of graphs is also to show bad examples. I know in teaching oftentimes you don't want to say, hey, don't do all this stuff over here. But I think for visuals, it's important to have part of a session of a class, at least just talking about really things you shouldn't do, like make your Y axis really skewed or do a 3D like pie chart, because depending on where the data is in the chart, you can make things look bigger or smaller than they actually are, because in visual, visualizations really easy to lie to people. So it's important to try to get those practices out before you do them and share them with other people and misinterpret and essentially lie to people. Sean Tibor: Yeah. I think one thing I'll put in the show notes is there's a Professor I think he's either a professor at Junk, faculty at University of Miami. His name is Alberto Cairo, and he has written several books on this specifically around data journalism, but around the way that people use statistics and graphs and visualizations to misdirect, misinform people. And he's got a lot of really great examples of that, everything from those really skewed axes. Right. I even saw one that he had where they inverted the axis so it looked like things were good when they were actually bad. It was amazing to see what people have gotten away with publishing as fact. Unknown: Right. Sean Tibor: Like the data doesn't lie, but often the person generating the visualization does. So I'll put a link to that in the show notes. There's a really good book about that that I can share. But I think the other thing that this brings up is that we're talking about this a little bit in the context of this is a job that exists today. And for Kelly and I, our students are in middle school, so by the time they are entering the workforce, that could be ten years from now. Michael Galarnyk: Right. Sean Tibor: But what can we teach our students today about data literacy, data fluency, data visualization that they can bring to other subjects? You mentioned doing that denoising work on images to be able to use them for optical recognition and things like that. Are there other sorts of areas where you've seen your students take what they're learning in your classroom and apply it to other parts of their either academic or personal lives? Michael Galarnyk: So I think the biggest thing for students, especially if they're young, is getting them interested in a project or somewhere in the use case. So they actually end up using these skills and learn these skills and actually apply them later on. Because the hardest thing in any sort of teaching in my experience, especially with adults, is gain people interested and take these skills and apply them. So the skills I recommend for people starting out is find a data set, see if it's interesting, analyze a little bit. So that means that they need to learn skills in basic Python or R or whatever. So I've seen students do this with finances very commonly these days, especially with student loans, just like analyze data, it's basic data sets before they sign up for College. Look at interest, look at principal, look at how much you need to make to pay off these sort of loans. I've seen that very commonly. I guess finance is getting really common these days from my students. I see a lot with NLP use cases. So a lot of people I know, they're essentially looking at Facebook all day for disparaging comments, racist comments, those sort of things. And learning basic NLP skills has been very useful for them. Kelly Schuster-Paredes: Natural language processing. Michael Galarnyk: Yeah. To see about general sentiment, sentiment analysis of some text. I've seen people do this sometimes with people learning about generating stories during the next Harry Potter book based on text passion projects. And sometimes these projects go well. Sometimes these projects don't go well, but the learning process is what makes students better at these sort of things. So they can apply it in future jobs and future additional projects. Kelly Schuster-Paredes: That's so funny now that you mention all those things. Because of course, the first thing we think about, at least I think as a former science teacher, is labeling and science and graphs are used in science and graphs are used in math. But I think that's what's beautiful thing is bringing it out to students in the areas where they don't think especially don't think Python exists or don't think data exists. When you send NLP, my first thought was with the NLTK project from Levon's book where he took and they take the words out and then they strip it down to do Martin Luther King's speech, I have a dream and it shrinks it down to have the sentiment shown. Imagine if it's graphed out with the words, even just using a word cloud. So I think now that teachers can bring that into other classes and not just science and not just math, that it would bring that data forward for the rest of the students. Sorry. I think a lot when people are talking about activities that could happen. Michael Galarnyk: Even the case for textbooks or history. Sean Tibor: Right. Michael Galarnyk: You can look at sentiment across the history textbooks. If anyone's a little bit biased or how they're talking about the history. Kelly Schuster-Paredes: That would be cool. Sean Tibor: One of the other things I wanted to pull out from what you said, there was a number of examples of students taking something that was personal to them. Right. It's not just an unrelated data set, but it's something that they are either in the data themselves or they're closely related to it. So it could be their Facebook posts. Right. Everything that's in their feed, or it could be their finances, something that has personal relevance to them. I think that's kind of an interesting point on the data literacy side of this also is that when students are personally related to the data, they also see how it affects them. Right. They see that there's a relationship and it has meaning to them in that way. And I think your point also, it gives them that ability to see that this is it changes their perspective. Right. By visualizing the data, by seeing the data in a new and different way, it changes their relationship to that concept or that idea. Right. Like my student loans, I'm going to look at differently now because I visualized the interest rate and repayment terms on that versus just seeing it buried in terms and conditions that are 20 pages long. Kelly Schuster-Paredes: Right. And I think that's a good point to make, but it also gives them that why. And they kind of know what they're looking for. So they have the why, they have the what, and now they only need to focus on the how to do it. And that's pretty cool. Sean Tibor: And there's a little bit of that element of surprise in there, too. Right. Something where their expectations were missed, but in an interesting way. Kelly Schuster-Paredes: Right. Sean Tibor: So they might expect that because I'm paying less for a 30 year mortgage month to month, that over time I'm paying less. But when they actually see the cumulative effect of that interest rate on the principal, they can see, oh, wait, I'm actually paying a lot more. And that is disrupting their expectations in an interesting way. Kelly Schuster-Paredes: Yeah, very cool. And they also know why. And their science teacher in middle school made them label all their axes and everything. Sean Tibor: Michael, you're getting us excited about the data. Kelly Schuster-Paredes: This is great. Michael Galarnyk: And there are other cases, too, not just like financial. There's physical data. People look at their Fit, the data, their Apple Watches, they can bring the data to see how their heart rate is doing overtime. They're sleeping badly. So I can also inform your health. Like, oh, I see, I got bad sleep last couple of weeks or a couple of months. Like, what am I doing differently? Am I allergic to my pillow? Am I not sleeping well when I travel? Like, these sort of things can have a big impact on people's lives if they look at the data and see how their exercise affects their heart rate, their activity, et cetera. Kelly Schuster-Paredes: Et cetera, how are they getting their data? Do they find the app and export it with a CSV or how does it come out? Michael Galarnyk: So for Apple Watch, I don't have one. I have Fitbit. And you can use a Fitbit API to get your own data. You can look at long term trends in your sleep, your heart rate, et cetera. And so in some people's cases, they've seen things before they actually curb. There are studies about people looking at their data or University looking at people's data and saying like, oh, people had early symptoms of cold. They didn't know it. So there's a lot of power in having personal data and be able to even extrapolate on it depending on how good your model is. But yeah, Apple Watches. Sean Tibor: I have to check I have an answer for that one. So Simon Willison, who is the creator of the Data Set project, which is a really interesting way of doing container based data set analysis. So you can put like a Google container or AWS container. It's basically just a Docker container that you can put a SQL file into or SQL Light file in and do analysis on. It came up with something called the Dog Sheet project. And I know he's been working on this for a few years, but it's basically the idea to take all of your personal data that's in various places in the cloud and on your devices and extract that and put it into your own personal data Lake that you can work with. And so one of the things that he was specifically looking at was the Apple Watch. And it turns out that most of the data that your Apple Watch is collecting is actually being stored as SQL. Light files on the watch and then gets backed up to your phone and your computer. So you can actually get access to all of that. And it's all in SQL light. So once you copy it, you can query it, you can visualize it. It's a really fascinating project and I'll put the link to that in the show notes. Kelly Schuster-Paredes: How do they get to the SQL file? Come on, explain basics. Sean Tibor: The sequel file is file on the watch in memory. Right. And I believe what he's doing is taking the backup file that comes off the Apple Watch and opening that up and extracting the SQL. Light files out of the backup. There are several steps to it. Kelly Schuster-Paredes: But it seems like it's doable it's sort of like a request. Is that what you would do with your kind of your API of your Fitbit? Right. You go into a request, but it would be a lockdown file. You have to do credentials. Sean Tibor: I think this is all local on the file system. I think it's all just files on the computer. Kelly Schuster-Paredes: Michael, just for all you I've only been doing Python or using Python for about four years. So this is all learning for me. Always. Michael Galarnyk: We are all always learning. Kelly Schuster-Paredes: Yes. Michael Galarnyk: Especially with Python. There's always a library. You don't know new updates. Who said library you didn't realize happens or some new algorithm or 40 pages down on the GitHub pull request. Sean Tibor: Exactly. Kelly Schuster-Paredes: Excellent. So real quick summary a little bit. Anything you want to tell middle school, high school, even elementary school teachers? What could we do? Push them towards data science our students or guide them gently to learning data? Michael Galarnyk: Now I never want to tell people that are in the field like how to do their job. But my only advice for a lot of teachers is find some interesting data set which is not necessarily always easy. Do a simple project where they either extract data, make some visuals, look at long term trends. It could be looking at Minecraft data. You can find something like that, look at something that's relevant to them that they might actually be interested in and also take your students input. Because oftentimes as an instructor, when I first started I had some lesson plan I want to do and add some data set and some tasks. But if students are more interested in Y, when you suggest X, maybe look down that sort of data path and do a project they may be more interested in if you have time and your favorite type of plot. I love box plots, but that's not probably very interesting for a middle schooler or grade school. Kelly Schuster-Paredes: The violin looks really interesting. Here's a question that might throw you for a loop real quick because I don't have that much time. But this was asked me the Seabourn. I don't know if you played with any of their data that they give for tutorials. They have a car crash. If you don't know the answer to this, I'm going to wait and see if you can find this out for me. They have a car crash data set. And a kid asked me, what is something that was relation they tell you to do during the tutorial of here's, alcohol and speed. And it's like zero five and zero eight. And he's like, what is this saying? And I'm like, I don't know. There's no labels on here. Something about alcohol, speed and dying. That's what it means. Michael Galarnyk: Well, the biggest thing here is are you able to find information about the data set. So as you're talking about this right now, I went on Seabourn's GitHub. There's a Seabourn data library on GitHub. And I scrolled down, I saw this is actually a data set based on 538 data set, and it's on Kaggle. So I'm looking right now to see if there's any information about this data set. And looking at the data set a lot is based on National Highway Traffic Safety Administration from 2010 to 2012. It talks about what each of the columns are. And there's also a story that this data set is based on. Kelly Schuster-Paredes: Is that in Kaggle. Michael Galarnyk: This is on Kaggle. So it's 538 ad driver's data set. Sean Tibor: Cool. Michael Galarnyk: That's the one that's contained Seaborne. Sean Tibor: One of the things that I really do love about data scientists is they're obsessive about attributing and sourcing their data. Right. Because it can't just come from the ether. It has to be data from somewhere. So whether it takes you one step back or three steps back, you will get back to the original primary source of that information. Michael Galarnyk: Yeah. And sometimes, honestly, it's harder to find the sort of information. So a lot of data science is not just about you have magically, instantly good data and just doing machine learning algorithms based off it. Sometimes you get to find your own data. Sometimes you have to break your data legally. Of course, sometimes you have to use APIs, get Twitter data, whatever it is, bring data from database. So a lot of data science work is not always the fun machine learning parts. Also in the data and research, etc. Etc. Sean Tibor: I think someone said the work of data preparation, that is the work of data science. Michael Galarnyk: Right. Sean Tibor: Because if you don't have good data, it doesn't matter how good your visualizations, your machine learning models are, right? Michael Galarnyk: Yeah, that's an unfortunate truth. And every job has the wonderful highlights as well as the lowlights. And sometimes getting data is a lowlight because it's not even just about find the data. It's also getting permission to use the data. And there's a lot of different things that go into this. Sean Tibor: One of the things I was going to add to Michael's excellent advice for teachers about getting started with this and how to help students really see the power of it. The only thing I would add to it is something that I've seen when we're working with just computational thinking in general is showing how you can scale up from small scale to larger scale. And I think one of the things that are the traps that teachers fall into is like, okay, I showed them how to do it with ten data points. My work here is done right. And so we really need to show them that leap from ten data points that prove the concept of a particular visualization or a data set or cleaning. And take that to now here it is for 10,000 or 10 million for two reasons. One, it starts to get them thinking in large scale systems of information. But secondly, because it also demonstrates for them that the value of the skills that they're learning is the ability that it can be applied in nearly the same amount of time to a small data set of ten data points as it can be to 10 million, where they can leverage the power of the computing device to be able to get the outcome at a variety of scales. And it's something that they can then do quickly and easily in their mind versus something that could take them weeks or months of effort to calculate for a large data set. Kelly Schuster-Paredes: Absolutely. And I'm looking at the time we could talk forever about all things nerdy about data. That's the one thing that Sean, like you said, have in common Besides teaching and podcasts and everything else. But data is our fun part. But I just had a great time talking to you, Michael. Hopefully we can talk again and be a little bit more nerdy about some data. Michael Galarnyk: Well, thanks for having me. I'm available anytime you want to talk. And for anyone listening to this, feel free to reach out about your data questions and your frustrations, because data is a lot of fun, but there's good and bad parts, so feel free to reach out. Kelly Schuster-Paredes: He's quicker at researching than I am. Sean Tibor: Well, we will definitely make sure we link the various places where people can find you online and get access to your courses that you provide and get a chance to learn from you as well. Michael, I want to say thank you for joining us today. It's been an absolute pleasure to have you on the podcast with us. As Kelly said, we could talk for hours about this, so we'll just have to bring you back again and talk about other subjects in data science and data visualization. Michael Galarnyk: Thanks for having me. Sean Tibor: Thank you for joining us. So if you want to learn more about Michael's work, as I said, we'll have this in the show notes, you can always connect with us on Twitter at Teachingpython. We're also online at Teaching Python. Fm. That's our website. You can contact us through the website or through Twitter. Either one works for us. We are wanting to say again one more time a big shout out to our Patreon supporters who keep the show going and moving and give me a break from time to time on the post production edits through your support. We are super excited to be speaking at Python this year. We will be speaking on Saturday afternoon so if you are there. Kelly Schuster-Paredes: You can see us 345 I think yeah. Sean Tibor: 345 local time so we will find out exactly where that is around the world but our session should be streamed as well so if you are attending remotely you should be able to catch us live there as well. Our session topic is learn Python like a twelve year old so if you are looking to brush up on your learning skills, let's look at how twelve year olds learn and see if there's some things we can learn from that. I think that's everything for now we might be working on an education summit proposal as well this week so more to come on that a lot of good things in the works for the remainder of the spring and working in the summer so that's a lot to share but I'm going to stop there and say thank you again, Michael and for teaching Python this is shot and this is Kelly signing off.