Sean Tibor: Hello, and welcome to teaching Python. This is episode 94, Anaconda in education. My name is Sean Tibor, I'm a. Kelly Schuster-Paredes: Coder who teaches and my name is Kelly Schuster and I'm a teacher. That codes. Sean Tibor: Been a few weeks since we've recorded last. It's been a pretty nice summer break, I think, for both of us, but we're jumping back in with both feet into a topic that you and I both have spoken about many times before and is near and dear to our hearts. And we've got two amazingly qualified experts here to join us and talk about this. So I'd like to welcome Sophia Young and Albert DeFusco from Anaconda. They're going to be talking with us today about data in education, data fluency, data literacy, data science. We're excited to have both of you here. So welcome, Sophia. Welcome, Albert. It's great to have you. Sophia Yang: Thank you for him. Sean Tibor: So we're going to jump right into introductions and BIOS and everything here in a moment, but before we do that, let's start in the same place we always do, which is the winds of the week. So something good that's happened inside of the classroom. Inside of the office. Inside a Jupiter notebook. Wherever you happen to be living this week. Something good that's happened that we can celebrate and we like to make our guests go first. Put them on the spot a little bit and makes it a little bit. Kelly Schuster-Paredes: More fun for us as host next time. I'm going to take a snapshot and tweet that right when you say it. Because they're always like. Go ahead. Sean Tibor: Who would like to go first? Albert Defusco: Yeah, I'll go. The winner of the week for the week ending. Now, all last week we were in Austin for the ten year anniversary of Anaconda. Way too much partying and barbecue, but still plenty of important work done and got a chance to see everybody. It's been at least two years since we've seen a lot of people in person. Kelly Schuster-Paredes: No wonder the airport was so busy. Sean and I were both in Austin as well. I was checking out wedding venues, not my wedding, and eating a lot of barbecue and voodoo donuts. Albert Defusco: Yeah. Sean Tibor: And I was there in Austin with friends from college and we also got barbecue. I think Terry Blacks should be enshrined as a historic place within the city of Austin. It's pretty amazing. Kelly Schuster-Paredes: It would have been cool to record there. What were we thinking? Next time, Albert. Next time. Sean Tibor: How about you, Sophia? Same win or something slightly different? Sophia Yang: Oh, yeah, I was about to say the same thing. Yeah, it was amazing. Ten year anniversary for Anaconda and all of us, and I had a great chance to meet a lot of my coworkers that I have never met before. So it was a lot of fun, build a lot of connections and just really nice to see people in person. Kelly Schuster-Paredes: That's pretty cool. Is it just everyone or you come and talk or what happened there for you? Sophia Yang: Yeah, it's all company employees with more than 200 people coming from all over the world, including Europe, India, Dubai, wherever, all over the world. And we have a two day event. The first day is like an internal company wise events to celebrate what happens in the company, celebrate everyone. And then we have a really nice barbecue dinner party with live music. The second day is we have invited industry leaders for panel discussions. We had four panels discussing the pie, data ecosystem and anaconda. Kelly Schuster-Paredes: Very cool. Anyone buy any boots or cowboy hats while you're there in Austin? Albert did. No, Darn, not yet. Next time you go. I had to go lots of boot shopping with my little sister. Sophia Yang: We both live in Austin. Kelly Schuster-Paredes: Oh, so it's no fun for you. Me too. Not the same, then. Sean Tibor: How about you, Kelly? Go ahead, Kelly. Kelly Schuster-Paredes: I had a fun week. I think my biggest win was actually for the first time, speaking solo without you. Not there was Wind speaking without you, Sean. But it was a win. It could be. But I spoke at Python virtually. I was the only one virtual on the panel. So if you can imagine ever going to a Python conference, everyone sitting on a panel. And then my head is on the big screen, and I was so paranoid about being the big head in the screen. But it was great. I spoke with a couple of people. Chris Reina from Maker Led project. Amy Fagan was our panel host. She's a person that works with Microbit Educational Foundation. Sarah Jane Carrie. And then another person who stepped in, and I think her name was Royce, and I'll have to get that later. But it was such a great conversation. We talked about the pitfalls of teaching Python and education. We talked about the wins. We talked about things that were successful, things that we wanted to change, and we could have talked all day long. It was so much fun. And it was just a highlight to be able to speak to all the educators from Dublin and listen to their accents and hear about all the fun that they were doing. So it was huge. Sean Tibor: That's pretty awesome. Well, my winds on a little bit smaller scale, so I think I mentioned I have an intern at my job this summer who's working with me on some pretty cool projects, and she's pretty impressive. I'm really amazed at what she's able to pick up and do in such a short amount of time. But we were chatting about different working styles and productivity and everything, and she was mentioning that she is a pretty constant vim user for her text editing, so doesn't use Vs code or Pi charm. She is always in vim. And I thought to myself, if she's learning all this stuff this summer, I can learn something new, too. So I started learning vim and I edited and saved my first file in Vim this week, and it was a big win because so much of what we do is muscle memory, right? And Vim is a completely different set of muscles. And doing it differently and thinking about it differently took some rewiring of your brain. But it was really satisfying to do it because it's always something that I've bounced off of over the years. Like, oh, I accidentally opened them. I don't even know how to quit out of it. And now I was able to write, edit and save a file, so I felt pretty good about that one. Kelly Schuster-Paredes: So what is the for the text. Sean Tibor: Editor in the terminal or in the command line? Kelly Schuster-Paredes: I'm staying away from that. Unknown: Sorry. Kelly Schuster-Paredes: I'm still in terminal. Sean Tibor: Well, it was nice because I did find a really good cheat sheet that has a bunch of the commands on one page, so I've got it in front of me and I'll know that I'm doing pretty well when I can get rid of the cheat sheet and do things from memory. So far, so good. Kelly Schuster-Paredes: Very cool. Do you have any fails this week? Sean Tibor: Oh, too many to mention. I've been working on a really cool distributed alert and monitoring system to monitor some of the systems we're about to launch at my company. And it's amazing. It's scratching all those really cool data. It's just around like, time series data and monitoring different conditions. And now I'm getting excited about all these things that we could do in terms of applying some machine learning and data science techniques to understand what our baselines are and what our expected behavior is versus maybe early indicators of failures. But none of it was working. So that was my fail, just banging my head against the keyboard. I think late Friday afternoon, I finally got part of it working. And so today I'm optimistic that the rest of it will come along. Kelly Schuster-Paredes: Very cool. Well, I've been still working with the now robot, and I have it living an autonomous life, which is very scary because it actually answers questions. So the amount of programming that was put into this robot is incredible. Without any coding from me prepackaged, connecting to WiFi, whatever, this robot can engage conversation. I can tell it hit them that I like chocolate. What do I like? And now we'll say you like chocolate and then sings ABC sing song. So I'm so excited. Looking forward to bringing it to lower school in autonomous mode, but I still haven't been able to use those little blocks and choreographs, I can get it to walk and go back, but that's about it. So there's a lot to learn. So I think the biggest failure is underestimating the amount of time that I thought I had during the summer to learn choreographs. But it's slow process, though. Sean Tibor: One step at a time. Kelly Schuster-Paredes: That's one step at a time, literally. Sean Tibor: And Sophia, any fails anything that hasn't been working? Albert Defusco: Well, yeah, just finding time to work on open source projects, and then we spent all week with our colleagues, and two days of the sort of conference like Sophia talked about, and three more days of sitting down and chatting and strategizing. It's like, Man, I've got too much to do. Sean Tibor: That's a good problem to have, but it's still a problem, right? Albert Defusco: Yeah. Sean Tibor: Sophia, how about you? Any particular fails this week that you're working through? Sophia Yang: Yeah, I agree with it. Again, time management is hard. Besides that, I think I was surprised how many people I don't know from the company and how many people I still don't know. I wish I would know more people. We are a fairly small company, so I think I just really need to reach out to them a little more. Sean Tibor: I definitely know the feeling. I joined a very large company again, and I'm very much heads down within my team, so it's been challenging to get to know people and to build real relationships with them. And I don't mind doing that over zoom or over teams or anything, but it's just you have to make the time. You have to put the conscious effort into it, and sometimes it's hard to do when you're juggling all the other things that are on your plate. Sophia Yang: Yeah, exactly. Sean Tibor: Well, why don't we move on from our week and go into the world ahead of us in terms of data science, data literacy? But let's start with introductions. It was really great. I was reading through each of your BIOS, and I was seeing all these different things that we could connect with and talk about. So really excited to get to know each of you. I think what was most exciting was that a lot of the open source projects that each of you work on are tools that I've actually used. They're things that I've put into my toolboxes and stuff that I can use to solve problems. So maybe we can start there. And so a little bit of an introduction about who you are, what your role is, Anaconda, a little bit of your education background, and maybe some of the projects that you work on that people might recognize when they go into the Python Package Library or the Anaconda library packages. Sophia Yang: Sure. My name is Sophia. I am a data scientist at Anaconda. I use data science to facilitate decision making for various departments across the company. My background, education background. I have a master in Science in Statistics and a PhD in educational Psychology, both from the University of Texas at Austin. Yes. Majority of my work are not open source. I deal with our company internal data. But in terms of open source project, I have several Python libraries, such as Conduct Desk, which is a tool you may use if you want to get access to package downloads for Python libraries or R libraries through Anaconda for example, if you want to know how popular Pandas is, you can use the tool to see the monthly downloads of Pandas. Sean Tibor: I have a feeling it's very popular. My answer is it's a lot. Right? Sophia Yang: Right. Yeah. Tennis is one of the most popular data science libraries down there. Unknown: Cool. Sean Tibor: Very cool. And Albert. How about you? Albert Defusco: Yeah, so I'm Albert. I have a PhD in theoretical chemistry, which I tend to classify as quantum mechanics, written in Fortran to run on supercomputers, and I spent a lot of my time doing that, and I've been with Anaconda since 2015, so it's coming up on seven years. And from about 2015 to, say, late 2019, I was a Python and data science instructor. I tell people that for that time, I basically built my own master's degree in data science because I had to teach it to other people. And these days, I've made sort of, like, mini little one line contributions to a number of projects. But my passion project is reproducibility, and the tool that I work on is called Anaconda Project that I'm kind of the maintainer of and trying to get the concept across and evangelize it as best I can. Sean Tibor: Very cool. And I believe you're also a Pittsburgh Oakland neighborhood resident like myself. I went to CMU for my undergrad and graduate school, so we can probably both lament the demise of The O for five. Albert Defusco: That's too bad. I know that was a cool place. Sean Tibor: Yeah, that was a great neighborhood. But onward and upward, I guess. All right, so let's dive in and talk. It's great to meet both of you, and it's great to see the education passion that you have throughout your career already, the way that you've embraced it. One of the things that Kelly and I have spoken about many times in the past and we've talked about, especially with other teachers, is the role of data in education and learning, and not just the data about learning, the data about education, but it's really about weaving data science, data fluency, data literacy into these other subjects that we teach. And what we wanted to do was learn about some of the projects and efforts that Anaconda has underway in this space. We want to talk about some of the resources that might be available to teachers who are looking to incorporate more data science and data literacy into their own subjects and not specifically computer science. And then also kind of, where do we go from here? What's the next steps for making education more data focused and more literate in that direction? Kelly Schuster-Paredes: But before we do, we got to explain to everybody what is Anaconda. Sean Tibor: Yes, that would be helpful for those of you who haven't. Sean before, can we get the $0.02 elevator pitch about it? Sophia Yang: Yeah, go ahead. Okay. And it kind of the world's most popular data science platform with over 30 million users. Sean Tibor: Yeah. And so it's like it's kind of own distribution. It's like a lot of different things combined together. It includes its own distribution of Python. It has a package library. It has sort of that I guess it's like the batteries. We always talk about Python being batteries included, but this is really like, here's everything you need to get started in data science and Anaconda as a package. And the thing that I love about it the most is that it just kind of works, right? So I don't have to think about all of these different pieces that come together. I can go download and install Anaconda and I get Python, I can get Pandas, I can get NumPy, I can get all of these other libraries that I would have to install myself. Otherwise it's all kind of there and working and it's tested to be able to work together from the beginning, right? Kelly Schuster-Paredes: Do you have to still go in and import? You don't import Anaconda, you just import all your regular libraries normally or after. Just explain it to everybody, getting it through my I've worked with Pandas, I worked with NumPy. I'm trying to do a little bit of data science, but understanding the whole package is very interesting. Sophia Yang: So when users or students first sean and use Python, the first thing they do is use yearly to download any kind of distribution, which, like you said, includes many pre installed packages and the condo environment management system and basically provides everything you'll need to get started with your data science project. Kelly Schuster-Paredes: Cool. Super. Sean Tibor: And the cool thing about this is that because it is so popular and because it's so pervasive, it enables sharing, right? So as a data scientist, I can share my notebooks, I can share my data, I can share my code with other data scientists and have a reasonable expectation that they're going to be able to start using it and open it, and they're not going to have to go into dependency nightmares with like, Pipi. If we're using very specialized packages and libraries to be able to get the. Albert Defusco: Work done right, it's such a good baseline. If you tell someone I did this with Anaconda distribution, they're in a good starting point to then sort of micro optimize what exact version was it? Kelly Schuster-Paredes: Cool. So it's like my requirements file in a package. So I like to use metaphors and think of things in simpler terms because you have to teach middle school kids. So I like to talk in the simpler terms. Good. Awesome. Sean Tibor: Let's talk a little bit about Anaconda as a company also, right? So it's open source distribution of Python and a bunch of other tools, and I believe there's some R woven in there as well, too. If you wanted to use it to manage some of your dependencies, like Anaconda as a company, how did that begin and why is Anaconda around? Albert Defusco: I can get started. So Anaconda began in 2012, and one of the main goals is exactly the thing we just said. It's without an easy way to manage your dependencies. It is rather complicated for someone to go download Python from, say, Python.org and try to add Scipi, try to add NumPy. Those are some of the two most challenging packages to think about building from source. And as soon as I say building from source, that requires a huge amount of background knowledge and skill that people like data practitioners and people wanting to be data practitioners aren't going to have right away, and it's going to turn them off. And so, Anaconda, in the beginning, one of the goals is just to provide a distribution, a single installer for the most common packages that you could want to get started on your project. Sean Tibor: And then since then, I also noticed that there's been a lot more emphasis on enhancing the data science community, also providing enterprise grade support for companies that want to leverage the Anaconda package, but may not have the in house expertise to figure it out themselves. Right? Albert Defusco: Yeah. So I think a theme that we've sort of adopted more recently is thinking about supply chain security, but it's something that we've been shepherding and talking about since the early days, and that people want to do open source within their organization, within their company. And it can be pretty scary for their It team to not have any control over where the source code is coming from, to have no visibility on how it's being used within the company. And our tools can help bridge that gap. Providing on premise air gap repository of open source code that can be trusted, that nobody is putting malicious software in the things that you have access to. And it does what it says it does. Yeah. Sean Tibor: And I think, like you mentioned, that's become a huge facet of conversation recently, in the last few years after things like the Solar Winds incident, other cybersecurity problems, where the attack wasn't in the code that the organization was running. It was in a package that someone wrote two years ago and checked into the Python package index and they didn't realize it had a vulnerability and someone else exploited that. So the ability to say, yes, this is tested packages where they're trusted, we've got reliability and reproducibility on the whole chain of building them. And like for an It organization, that makes a lot of sense because instead of having to try to review and approve packages as they come in or as they're updated, it can be bundled together here's, the entire distribution, it's been tested, we have reasonable acknowledgement of the security and build process. So kind of a cool from a cyber security perspective, a really cool way of making sure that we can have safe code and keep that agility that we need as data scientists and as computer scientists to be able to get our work done and use modern tooling to do it. All right, so, next question, we start talking about education and we start talking about how to get there. So for those of us on the call, we've all gone through school, we've all gone through university, we maybe learn some of these things either in those forums or afterwards. But as we think about the next generation of data scientists and citizens and computer scientists and just people, one of the big things that I know the Anaconda company is working on, one of the things that we care about is trying to bring that knowledge of data and the use of data earlier into the education process. So that the first time some kid learns about statistics and data scientists isn't when they get to college, it's before that. So can you tell us a little bit about some of the efforts and initiatives that Anaconda is working on to support that? Albert Defusco: Would you like to start? Sophia Yang: Yeah, I can start and then you can add on. So we have an educational resource program at Anaconda with two full time employees. So we have a whole department dedicated to work on the mission of providing people and help people with their data literacy from early on. In terms of initiatives, we're working on a data science Olympics competition or an event for high school students where we provide resources and other tools, even maybe equipment, for students to learn about data science and to be able to provide valuable content and compete in the space. So, yeah, that's one of the project and accountants working on. Kelly Schuster-Paredes: Can you elaborate on some of the skills just so in case people don't understand? Because when I think data science, obviously in the middle school, we're looking at large chunks of data. How do we parse it, how do we make it readable for other people? How do we understand what's going on? Is that kind of where it's looking for with the competition? Or can you elaborate on more of the skills of a data scientist? Sophia Yang: Yeah, definitely. So we have different things. For example, we have a robotics thing. People could use data science or Python to build robots and using Raspberry Pi to build an application that's interactive. We also have a theme that we call it predictive modeling. So you can predict weather data, you can predict stock data, whatever. And another one is called storytelling and visualization. That's probably my favorite one because if you don't know how to tell a good story from your data, it doesn't mean anything if you just let it sit there. And even if you plot your data into good visualization dashboard. And plus you kind of have to communicate where your data is telling and make a good convincing story out of it. So, yeah, that's the Olympics. Kelly Schuster-Paredes: I love that. I think we definitely would love even to bring it down into the middle school because those are all great skills that cross the board of just being able to look at any type of information and to tell the story of what's going on. I think Sean and I did a project. We talk about this often with the race cars, the dragsters of collecting data and just showing the students the visible. I mean, we all know more math, it's going to go or less math, less math going to go faster. Long time ago, four years ago. And you could clearly see the line of best fit. And the students were like, oh, there's my car. So it was pretty cool. Sean Tibor: Yeah. I think one of the things that we've always liked about education and data is that it provides a different way for students to think about what they're learning and really assess for themselves whether it's true or not. Because what sometimes happens is that we make everything in learning feel so intuitive, right? Like, oh, it's logical. It fits from here to here to here to here. And sometimes the data doesn't fit that story. Right. So what happens is the data shows something different and it breaks the expectation. So those sorts of experiences, like in an Olympic setting where they have some sort of hypothesis, they have something that they expect to happen and then the data doesn't line up, is a great opportunity to Sean something new. So you have the Olympics, I understand there's also a program going on to help teach teachers more about data science. Unknown: Right. Sean Tibor: Because for a lot of us, we may not have had this experience or this training as we're going through school, and so for those of us who are past it, we have to find a new way to learn this. Right. So, Albert, can you tell us a little bit about, like, well, maybe any of this get influenced by that Masters program that you basically do? Albert Defusco: Yeah, so I've been sending over all of the training material that I used to teach from sort of sending it over to our education specialists, trying to see what is in there that may not be overly replicated, say, out in the field among all the boot camps and the other things that are out there with a specific focus, as you say, towards our education outreach. And so thinking about where in the curriculum does this fit, what departments fit? When I was in school, there was no such thing as data science. The word hadn't really been invented. And now there are data science degrees or developing degrees, and other departments are interested in it. There are a number of certifications out there, so there's a good time to kind of get in and try to figure out where is this going, what are the standardizations in this field that are beginning to develop? Our outreach program is right now just sort of getting out there, trying to figure out what people are talking about, what is our stats? All Ivy League schools use Anaconda and at least one curriculum. Nice it may not always be the department you imagine it to be. It could be something quite interesting. But we've got a good presence, and we're trying to figure out trying to reach out to them and figure out what they're up to and see where we can go with. Kelly Schuster-Paredes: I listened to a talk in the year of Hyson 2022 from a gentleman who was bringing in, trying to break the barriers, break the stigma of this computer science classes where coding happens. And he was showing a lot of examples where he was using math and Pandas and data and plots to do physics and math. But one of the questions was, can you bring it into humanities or can you bring it into English? Have you ever seen any great examples with parsing of languages and social kind of studies, social civics kind of ideas in your classes? Albert Defusco: I've got a little bit of insight on some social things. I worked on a little bit part time with a group in Pittsburgh that's connected to Black Lives Matters, and we were trying to study some data that we could find. They were curious about police union contracts, so we tried to grab a few and do a little bit of NLP on it, find some common language. I kind of just helped them kick it off, and I think they ran with it afterwards. It was really great insight for someone to try to think that they could do it. And that's really the kind of key aspect here. Kelly Schuster-Paredes: The same thing kind of happens when teaching, right? So if you have a student that thinks he or she can do something with Python or Code, you get them interested. And I can imagine a lot of students in high school and middle school, if they find some sort of data that they want to keep, or imagine even sending off some data of your high school years, soccer or baseball stats and extrapolating it somehow against, I don't know, your growth or your wins and losses. So pretty cool stuff. Sean Tibor: Well, I think there's kind of two hurdles to get over for most students and teachers, for that matter. I think the way we've taught statistics and data literacy has really been at a very small scale, right? So when we talk about, here's a set of data, it might be like a single scalar column of values, right, that maybe has ten or 15 things, and they talk about, what's the mean of this? Kelly Schuster-Paredes: Yeah, the bag of M and M is laid out on a piece of paper, right? Sean Tibor: And to be fair, that's a really valid starting place because it gives you the opportunity to understand these concepts in a small scale right, and understand how they apply. So even my son is seven, and he's been learning about different kinds of means and medians and modes and things like that in some of his math classes. And it's phenomenal. It's really great. To see that. But I think the part that people miss is that next step where it scales up and going from the tens and hundreds of data points to the thousands, tens of thousands, millions and beyond, and being able to apply those same ideas, I think that's one area, because that can be intimidating. I think for a lot of people, all I could do it for ten, but how could I possibly do it for a million? Well, if we teach you how to do it with code or with a tool, doing it for ten is basically the same as doing it for a million. Just may take a little bit longer. Right. The other hurdle that I see is finding the data. Unknown: Right. Sean Tibor: So it's always this thing, well, I have to create it myself, or I have to use some data set that the teacher provides. But we live in the time when data sets are amazingly available. There's data about everything if you look for it. So how could we help with both of these problems, like both that scale up problem and helping students and teachers understand that there's so much data to use? Are there projects, are there ideas, training that we can do to help them understand as educational resources more than anything else? Albert Defusco: Sofia, you want to start? Sophia Yang: Yeah, sure. I can start. So the first question, as the data gets bigger and bigger, you can't really calculate by hand. That's why old days we have Calculator, now we have Pandas, right? We just import Pandas and the Pandas handle everything. And when you have even more data, you can use other tools like desk, like other cloud computing resources, and more computing power to help you do the calculations. Yeah, there are a lot of tools out there to help you handle big data calculations and modeling and stuff. The second question, what was the same. Sean Tibor: Question again around getting access to data sets? Like teaching teachers and students where to find data? Sophia Yang: Yeah, data nowadays is everywhere. Like if students are into social media, right? There's the Twitter API you can use to get Twitter data for free. Limited data, not all the Twitter data. And then you can get data from using web scraping to scrape pretty much every website online. There is also amazing public data you can use. Like census data is public. And if you want to create your own surveys, if you want to study the impact of social media on your classmates, you can design a survey and ask your classmates about their wellbeing in terms of using social media and then collect that data and analyze data for your own research. Anything to add? Kelly Schuster-Paredes: You just gave me a great first unit for my 8th graders. We're going to create our own little form questionnaire. I'm thinking this is why I love doing what we do. Thank you. Go ahead, John. Sorry, Albert. Sean Tibor: I was just going to hand it over to you. Albert Defusco: So that is, at some level, exposing the next gap in data engineering. That is like the emerging hot field, because you're typically taught data science from clean data sets that someone gave you. And there's a whole new field out there, as Sophia mentioned, and it's becoming easier. We both work on an ecosystem called intake, which is trying to help. It's not standardizing data, but it's democratizing access. So if somebody knows how to write a code to access the Twitter API, they can help someone else by giving an intake package or driver say, look, I made it easy for you to do it with Pandas. Kelly Schuster-Paredes: I was just going to say that. So that seems to be that's, like, my biggest issue, I think, when I was bringing in information for students and we're talking 1314 year olds is trying to find data that I'm not going to go, Oopsie, sorry, I didn't know that was in there. Or we have to go through all these convoluted steps or put in a key or something that a 14 year old can't really get a hold of. Some people are telling me to go to is it caught? Yeah. Albert Defusco: Kaggle would be a good place that gives you quite clean data and a reasonably quick access format. But I think there's still a gap in this space. Things that we work on are agnostic. We're not asking people to like, okay, you got to go transform it, put it in this other place. We think there's some ability to sofia mentioned web scraping. There's some ability to automate that. And this is also a feature of the Python ecosystem, is that within Python, a developer tends to write for the next skill level down, right? I write tools to do some data access patterns. I need to get data from this format and transform it a little bit, or it's really complicated to get it from this place. So I write a tool to make it a little bit easier. Kelly Schuster-Paredes: YouTube, Insta just plug it in, and I can imagine a kid going in there and saying, oh, I want all these cat videos, or something. That would not be me, but all these videos from YouTube, and here I can go and use these couple lines of code. And now I have my data to manipulate for data literacy. Albert Defusco: Data project that's the gap to fill right now is to continue to push on these sorts of automation so that someone who cares about, as you say, the literacy and not have to struggle through a week of coding. Sean Tibor: Yeah, I mean, I think that really hits the point, because the advantage of all this is that we can use the computer as a tool to help us get to the insights faster right. To ask those data oriented questions that we can answer with the data. But if you spend a week just trying to get the data manipulated into a format where you can start to do the analysis. Everyone gets discouraged by that. It really takes a lot of persistence to get to that point. And so, I mean, even as kind of silly as the example is of cat videos on the internet, but what would happen if I got a twelve year old, had easy access to data about YouTube cat videos? Like what the color of the cats were, what they were doing, what the attitude? Does it have background music? All that, and came up with some amazing insight about what people are actually watching and why and how to understand the psychology of watching cat videos on the internet. It seems like a silly thing, but when you add it all up, people are probably spending a shockingly large amount of time watching cat videos on the internet, right? Actually it could be a really interesting question to ask and get answered, but. Kelly Schuster-Paredes: It'S just having that posing that question is starting, I guess, with your data literacy. Having the question of what is out there, what patterns are out there, is like the hardest step. But do you have any suggestions on asking the right questions? Where did the questions come from when you're looking at data? Albert Defusco: The thing that I like to talk about is data driven decision making. You mentioned this sort of just right alongside it, Sean, a little bit earlier, where it's not my intuition that matters, it's the actual data itself. And that's something that I think I didn't get in my education until much later. I studied chemistry. A lot of it is intuition and very little bit of it at the time was data. People would have thought five entries in a table was plenty big enough. That's where you can start to bring that in earlier. Maybe if you talk about physics, instead of just talking about the sort of the ballistic equation, get in a couple of samples of if I throw the ball, if I can figure out I could throw it just high enough and maybe if I have some device to help you measure forces in some way, right? You can start to gather, well, what is the distribution? Instead of just talking about first you talk about the equation and then you can start to bring in little bits of information about distribution of results. Sean Tibor: I think it gets students and learners thinking about it in a different way, right. When you start to bring that real world data in, because how many times do we talk about in physics? So we're going to do this in a vacuum, right? So there's no air resistance. We're going to take that out so that we can have this perfect equation. But unless you're doing orbital physics, right? And even then there's some amount of resistance that you get from various things. It's never a perfect equation, it's never clean. And so thinking about it that way and talking about it in a different way, asking those questions around the data that says maybe you do have like a ball with an accelerometer in it that can measure the speed at which you're releasing it and everything. Who in the class can throw the highest right, or who has the most force when they launch it? Can we get it to hit the same place every time? All of these different things that get learners thinking about their data differently and measuring it and making predictions and that's the thing I love about the Olympics that you propose, is that these are all important facets of leveraging data in a literate way is the ability to analyze it, to predict it, to be able to visualize it, to really communicate that to others. So maybe there are ways that we can do that in the classroom. Maybe there are ways that we can do that with other teachers and be the experts for them when it comes to the data. I have a question for you. Do you have an example for me of what's a good question to ask about data? What's a question you can answer with data and what's a question that you can't answer with data? Or maybe it could be rephrased better, right? So here's one that people would ask, but maybe it's better if we change it to be this. Sophia Yang: That's a hard question. There's no bad questions. Every question has its value. Sean Tibor: But sometimes what we get is we start with a question that is hard to answer with data, but if we reframe it, we can make it easier to answer with the do you have examples of that? Albert Defusco: Kind of good one? And this is mostly, I think, inspired from a podcast I listen to. So the tagline is there's no such thing as an average human, but there is a median human, right? Kelly there is no such thing as a person of average age or average education. Those are discrete quantities at some level. But there is a median person who has all of the median value of every attribute that is a real person, but the average of those things can never be a real person. Sean Tibor: Wow, that kind of blew my mind a little bit because that's always the tough thing when you're talking about social sciences and you're looking at social data and biographics demographics, any of those things, you're always thinking about the average. But it's sort of like the same reason why people have a hard time grasping the mean family size of 2.3 children, right. Versus the median would be more appropriate. Right. Kelly Schuster-Paredes: But you both hit the nail, like right on the topic. I'm thinking all this stuff in middle school, we teach all these concepts and it's just concepts and concepts and concepts and there's no real understanding like mean mode median to a middle. Schuster they're going to regurgitate the definition but truly visualizing it. And so when are you starting to do challenges in middle school, I'm ready to trial them out. But here's the honest conversation, because this is one of the things we were talking in European about where the fact that one of these professors has these students coming into college who have never had any coding, who really do not have any understanding of data literacy. Here they are on a track to be a computer scientist or a data scientist, and they're starting at the same level of what I'm teaching in middle school. So any future horizons on where you're going after you do your high school. Sophia Yang: Plans, hopefully soon to middle school? It's definitely a plan. Albert Defusco: Let's see how that survey turns out. Sean Tibor: And this is all going to be pretty open and accessible to right. So teachers who want to experiment, maybe it's not high school, but maybe it's like right before high school, maybe they could bring that down if their students were ready for it and try it out. Right? Sophia Yang: Yeah, we'll have resources on our web page pretty soon and everyone will be. Kelly Schuster-Paredes: Able to access it on your website. Is there like a teacher education page. Albert Defusco: Or we're growing one and we've got some things that may help. So the challenge is, as you start getting down towards middle school, are we talking what is the right level of what you might call computer science to include? And challenges maybe with access to computers or just thinking about installing a condo is already probably too much. So there are useful they're going to be important limits. It's like, what are we thinking about bringing in programming, training and education and learning little bits of coding? And we're looking at some ways by which we might be able to sort of push the envelope on that a little bit more. Maybe you've heard of Pi script recently from Pikon this year. That opens up a lot of opportunity to bring programming capability directly into browsers and other sorts of devices that were kind of shut out. Kelly Schuster-Paredes: I was just going to mention some sort of add on feature with Google Colab plug in anaconda. Here you go. Sean Tibor: Especially with high school. I was sitting in the audience with that during that talk by Peter Wong and I just kind of had that moment of like. Albert Defusco: Oh. Sean Tibor: This changes a lot because we've talked for a long time about bringing Python into the browser more. But because we have all these new technologies. Particularly with web assembly and being able to have things run in the browser in an efficient way and integrate with the browser in different ways. It opens up a lot of opportunities for students to start writing code in something that they already have installed on their computer. And we've approached that in the past, had great results with things like Colab and Kaggle and Jupiter notebooks. But this takes it even a step further. Albert Defusco: Yes, it breaks it out of data science too. That's something I'm curious and want to explore more about you just write games with Pi script, right. You're not stuck with the Jupyter data science experience. You can think of the coding portion of all of this, and that's something that also gets wrapped up is, does data literacy require a certain amount of coding? And the Python, we can start to think about bringing that kind of ease of coding to a lot of other domains. Sean Tibor: Well, I think the other thing to point out here is that a lot of us have this assumption that there's like a qualitative assessment of the code that we write. Like, this is good code and this is bad code. If it gets you hacked or something like that, or you leave a vulnerability, that's bad code. But other than that, you can do really great data science with pretty mediocre code. Right. The code doesn't have to be incredible or most Pythonic or whatever, but you can do some really great work with pretty bad code. And who's going to judge you on it, right? Kelly Schuster-Paredes: Well, if you had a couple of extra numbers in it, they might judge you. You're only showing the even numbers. Albert Defusco: Yeah, people do. There's a lot of concern that something like Jupiter Don't Poke is bad coding experience. There are spectrum of ways to get over that, but it seems to be not something really worth worrying about. Sean Tibor: Yeah. In education, if we're learning something, if we're gaining new insights, if we're getting better as we go, then we're in the right place. Whether that's a Jupiter notebook, Pi script, the command line. But just getting students interested and excited is the most important thing. So I want to poke into PY script a little bit more because I think there's a huge opportunity here. I understand there's like some sort of hack day project thing coming up that students can use to get excited about. Teachers can work with their students about it. Can you tell us a little bit more about that? Albert Defusco: Not so familiar with that one, but I have to research it. Yeah, I Sean. We've been thinking about what can you do for educators? And certainly with PY Script, in some of the demos that we saw in Python afterwards, it opens up a lot of opportunities. Well, can I do interactive programming and learning with PY script? Can I type something and practice with it and ask, is it correct? Right. And with traditional Python based execution models, that was just a pain. Right. You're thinking about how many layers of cloud tools do I need to spin up and maintain? Just to answer the question of did someone write the correct code that gives the right result? With PY Script, it runs it all in the browser. And I can imagine now, well, I've got my little rapper over there and I asked someone their challenge is to write a function that does this. And rather than just writing it and having them sort of self verify. Everything runs all in one place. And so I can imagine a little bit of a PY script website or app that can do all that at once. And then it's just an HTML, right? Then it just loads on our website. Or you can pass it around. I think that has the opportunity to sort of expand this interactive learning capability with Python. Sean Tibor: Yeah, I think what I really like about that is you mentioned that Portability the ability for you to be able to pass around because it's just an HTML file. One of the things that we constantly struggle with as teachers is how do we get the assignments to the students in a way that makes sense and doesn't require them to have to figure out a bunch of things. And if it's just a download. Unknown: Right. Sean Tibor: Like here. Download the zip file. Or download this single HTML file and edit it in place like I've started it for you. That opens you up with a lot of flexibility to use it in your LMS. To use it in GitHub. To use it in a variety of places. And then you're not dependent on so many different pieces of infrastructure to make that work. Oh, you're using a Windows computer, you have to do it this way. Or you're using a Mac, you have to do it this way. Just works. Kelly Schuster-Paredes: And that would be a huge game changer for everyone in education. Right. Another conversation we are having is that ability, because not everyone is as lucky as I am working at the place I'm working out, where we all have computers, we have somewhat decent, somewhat decent WiFi. They don't like me safe. Sean Tibor: Most of the time. Kelly Schuster-Paredes: Most of the time most of the. Sean Tibor: Time it's pretty amazing. Kelly Schuster-Paredes: Most of the time it's pretty amazing. But I'm just thinking about how it was when I was in Peru, in Peru, everything. When I first started, it was like it was such a pain to download anything. But being able to access the web, that's a huge game changer. And being able to manipulate data on that web atmosphere is going to be a thing. Albert Defusco: I think that people are working on is to bring Pandas and as much of the data science stack as seems to be appropriate into high script. Kelly Schuster-Paredes: The only thing I worry about, and this is kind of where coding has been going into play, is because there's not many people out there to teach computer science in the schools. We're having a teacher shortage as it is, but to find a coder that can go and actually teach computer sciences is quite difficult. The only thing I worry about sometimes with Aggregating data or aggregating a system is that it becomes so easy for the education that it's not really built for how it really is in the real world. Right, so then you graduate out of high school and you've had this little package and you go and then you open, put a URL in and, oh, everything works. And then you go out to the real world and you have to scrape your own data and you have to package everything. So there's like this quite little balance of being able to get the teacher to do it without giving a drag and drop program to students. So the interesting food of thought, but definitely excited. Albert Defusco: We hear that from the field. That's part of the skills gap. Not a skills gap, more of a vocational gap, is that people know the code, they know the stats, or they know enough stats, but there are some gaps still to fill to get them into their first job. Kelly Schuster-Paredes: That's my gap. Albert Defusco: Federating out. The programming education from computer science can be a little more helpful. If you can get enough skills in your data science or your physics or other disciplines, you might be able to help that along. Kelly Schuster-Paredes: Absolutely. Sean Tibor: I think the key is really the professional development. I think if we can get more teachers access to the right resources and not computer science teachers. But your kind of standard physics teacher or chemistry teacher or someone who's teaching social sciences and wants to start bringing more data in. If we can give them the right tools and resources and the room to learn it. To create the space for them to be able to incorporate this into their curriculum. I think we're going to see huge strides. And I'm encouraged by a lot of the work that the College Board is doing with AP testing in high school, so they're including more data literacy questions on AP exams. But I think it's kind of sneaking up on a lot of teachers who may have been teaching the same way for a while or just, we're trying to survive COVID and get through it and haven't really thought about pushing ahead a little bit yet. We're in this space now where these things are starting to change and we really need to support the teachers that are going to be sharing this with their students. And so I'm really encouraged by the work that you're doing at Anaconda to make this happen. It's the right steps, it's the right place to apply the resources because it has such a huge impact and spreads out to so many students and learners around the world. Kelly Schuster-Paredes: Very cool. I have one last silly question. So thinking about your path and where you were prior to Anaconda, what do you think is one of the things I want to say things, but one of the reasons why you like working in code or data or what are some of the driving forces or one driving force that got you where you are? Go ahead, Sofia, while you're thinking. I'll drag it out a little bit. So, like, I was a biology major and being into getting into code for me was the idea that things started, I kind of got an AHA. Oh, wow. That's how I problem solved. And I was just like, oh, my gosh. And that new way of thinking, okay, I stole. Sophia Yang: Yeah. So I was a psychology major. I did my PhD in educational psychology. Unlike everyone else, is thinking psychology research, at least for me, it's not about talking to people. It's actually about working with data, working with getting the insights, statistics from the data and build models and doing all kinds of statistical models. And it's all data science, and it's all very interesting because that's how you publish papers. You get results and insights and understand human behavior through data. So data literacy is absolutely important for, I think, every discipline in research, whether it's biology, chemistry, psychology, it doesn't matter. Every field needs data science. And that's why I joined Anaconda, because I want to continue working on data science and be able to. And also, I love our vision of empower people with data literacy. It just really speak to me, and that's how I got here. Kelly Schuster-Paredes: Lovely. I like that. Albert Defusco: Yeah, I thought pretty similar story. I was in the scientific field, and I enjoyed the science, but I was a little more interested in the tools, writing the software, and talking about the software that I can achieve that goal. Kelly Schuster-Paredes: I wish I had known code before taking organic chemistry. I might have passed it the first couple of times. Being able to aggregate all those chemical compounds and strapulate it would have been great. There's another plugin you could have done help people pass organic. Kim. Sean Tibor: Maybe that's some of the ways that we get to people is helping them understand that there's more than one way to learn. Rather than just, like, kind of pushing through and getting it out, maybe it is applying some code. Maybe it is looking at some of these data tools and statistics and using a variety of different approaches to enhance your learning is a completely valid way of learning something new. And maybe we need to make that more normalized for people so that someone who's struggling with organic chemistry can say, well, I know a little bit of Anaconda, and I know that there's a library out there that may help me with it. Maybe I can try using that and see if I can solve these problems that way. Kelly Schuster-Paredes: Can you imagine how cool that would be? And you have all the models popping up, and this is what could be made with out of these carbon. That would have been cool. Sean Tibor: My first job out of college, not having taken chemistry since I was a junior in high school, was managing a chemical inventory database for Rd centers around the world. And we had literally a million and a half different inventory items that we were tracking and several hundred thousand different chemical compounds that were in there. And one of the coolest parts of the system was that if you had the there's a special string that you could use to model out the molecule itself. And like, if it was a single molecule, you could have this string of the molecule there and it would actually take that field and render it into a 3D spinning representation of the molecule on screen, or like a flat version of it. So remember your basics like water and hydrogen peroxide and everything. Oh, it's a couple of bubbles here. That's cool. But modeling out some really cool, like, long chain organic molecule or compound was really fun to do. And you could see your computer just sit there, grind on it for a while to try to try to model. Kelly Schuster-Paredes: That funny. Well, do you have any questions for us? Or coming close to the time limit? Albert Defusco: So, what has been the new interesting challenge that you heard about teaching data literacy, or Python specifically? Kelly Schuster-Paredes: Not many people are doing it. So we're at the cusp yes, in the lower levels, and we have it sporadically throughout the curriculum. It's there. Right. But I don't believe many people have sat and mapped out the skills, PK to twelve, PreKinder to 12th grade of what the skills are. And what are we going to achieve at that level in order for us to say yes, a graduating child from this institution is going to be data literate, at least at that point of time. Having been a former science teacher, we've taught graphing, we teach plotting, but we taught it. I did, as a single component that we use to write up a lab report. You have to have your graph in there and you have to make a conclusion on what the graph is. But it wasn't really explicitly shown as this is really what happens in a bigger picture. Now, imagine if we extrapolate this science experiment to every child in the US doing it, and then we aggregated all those results. Well, this is what we could do in code. This is what people do now coming from that point. So we had a conversation a couple of months ago about that whole idea of data literacy and just trying to make that big connection to what's being taught in the classroom, to what's really happening in the real world. So I think that's where the future data literacy strand is going in education, where we have to actually think about map it out. What does it mean to become a data scientist after college and then work back? Sophia Yang: So I was thinking that teaching data literacy should not only happen in school, it also should happen outside of school. Because, as you said, we have a shortage in computer science teachers and data science teachers in school. But the Pi data community is huge, right? And we should come to students, we should use TikTok, should use whatever students are. I don't know what other apps are using. I know there is a scientific Python project. Recently launched their TikTok to educate students in high school, middle school students. I guess they are the target audience on TikTok to educate them about Python data science. I think maybe more people should do that 100%. Kelly Schuster-Paredes: There's a math educator who every single day produces a math minute or a mental math and breaks down how you can estimate or how do you do these crazy math tricks? That's the ticket, right? Because as we grow more and more into this educational field, students really don't need a school education. They need a learning education. And that doesn't necessarily happen within the school system. Definitely would love to see a data scientist going out there and doing, here's your data science minute of the day, or here's your TikTok minute of the day. All right, Albert. That's you, Mrs. Sean Tibor: I think the other problem that we have, too, that we have to find a way to solve, and this is becoming, I think, more and more urgent all the time is there's only so much capacity within the education system for topics. Unknown: Right. Sean Tibor: And so adding data science is a challenging thing to do. If we think of it as being incremental to everything else that's being taught, it's very challenging to say, okay, now we're going to add on data fluency, data literacy, all of these things into what we're doing. It's a little bit of the idea of, like, I have a bucket full of water, and I'm trying to add more water to it to overflow it, but if I have a bucket full of sand or a bucket full of gravel and I pour water into it, even if the bucket is full, I can fill in the spaces with the water. Right. I'd love to see us thinking about data literacy as that kind of water that helps connect these concepts together. Unknown: Right. Sean Tibor: Because it can be used in so many different areas. It can be used for all these things. And I think we're going to encounter resistance as we bring that in, because the assumption is that I have a full bucket of water already. Why am I adding more water to it? But if we think about it more like filling in the gaps and making those connections between those pieces of education, there's room for it. There's space for this to happen. And I think we also have to look at that through the lens of prioritizing some of the things we're already teaching. Does every student need to go through calculus A to graduate? Do they need to learn calculus? Could statistics and data science be a valid alternate path within the math department? And who's going to fight for that? Who's going to say, yes, we need to make this happen? And how do we make that so it doesn't become just another silo within an academic institution that actually is connected to all of these things? It's going to take brave educators who go out there and fight for these things. To say, this is important, it's necessary, and it needs to be done in a way that doesn't take away from everything else, but actually enhances everything else that we're doing. Albert Defusco: Yes, I agree. And I'd go further and say there's nothing wrong with a spreadsheet. You may only be limited to a million rows, but that's plenty. Most people think that's big data to begin with, so we can look at that. Kelly Schuster-Paredes: Definitely huge for eleven year olds. Albert Defusco: Yeah. If that's a minimum benchmark, it's a pretty good place to be in. Sean Tibor: Yeah, and we've talked about that with the context of coding. Like if you are running into limitations of, say, the new editor. Right. Well, now let's talk about other editors that you can use. Maybe it's the same thing if you outgrow Excel or a Google Sheets spreadsheet, maybe that's a good sign. It's time to start talking about Anaconda and doing more. So it's like a good signal to a teacher or to a student that it's time to try something different, something bigger, maybe. Well, I think why don't we wrap up here, because I think we're just about out of time. It's been an absolute pleasure to speak with both of you about this. As you can tell, it's something that Kelly and I really get excited about and we care about, and we can sense that same enthusiasm from you and enjoyment of the topic. If people want to learn more about Anaconda's education efforts, where could they go? Where could they learn more? How do they get in touch with Anaconda? Albert Defusco: One of the best things they can do is email us education and Anaconda and get connected with us directly. We're there to respond to that email address. We'll reach out and sort of see what you need. Sean Tibor: Excellent. Well, I think that's a great first place to start. We'll definitely be staying in touch on this, and if any of our listeners reach out and can't remember that email address, although we'll put it in the Show Notes, we'll definitely direct them your way for Teaching Python. If you want to get a hold of us, you can reach us on oh, sorry, Sophia, did you have something to add? Sophia Yang: Oh, yeah. I just wanted to add inacado cloud. It's a good learning resources, and it's where we're going to add a lot more learning resources and for students and teachers and everybody. Sean Tibor: Excellent. I am bookmarking that right now. Sophia Yang: Thank you. Sean Tibor: Okay, so for Teaching Python, you can always contact us through Twitter at teaching Python. Kelly is at Kelly Perez on Twitter. I'm at Smtiber on Twitter. You can also reach us through our website, which is Teaching Python FM. As always, a big thank you to our Patreon supporters who are supporting the show financially. If you'd like to do that and join them, there is a link to that in the Show Notes. We are working on our first meet up for our Patreon supporters to happen over Zoom. I didn't quite come together as quickly as I would like, so I'm working on that for early August. Please listen in and see if you can join in on that. Kelly Schuster-Paredes: You cannot blame the summer slide this year on that. Sean Tibor: No, sorry. No, that's all me. Full accountability, that's all me. So we'll be working on that towards the fall. Our goal is to get the first of those going before the school year starts, so that teachers have an opportunity to discuss some of their plans and get things going. So I think that's the major places to get in touch with us. Kelly, any announcements that we want to share with our listeners? Kelly Schuster-Paredes: I don't we're just going to keep an eye out for the Europaython. Hopefully you're going to do some YouTube videos for that. There's a lot of great talks on Edge Education. They had a whole day of educational talks and no, I think that was it. That's all for me. Sean Tibor: All right, sounds good. Well, then, for teaching. Kelly Schuster-Paredes: Python this is Sean and this is Kelly signing off.