Travis Oliphant: Python, as many people don't understand, is a story of many players, could have many individual mavericks basically, or people who've put a lot of effort into growing different parts of the ecosystem. I did some stuff to get it started, but the only reason anything's successful is because lots of other people jump in. And a key part of growing the community is actually enabling that contribution, enabling that participation. Eric Anderson: This is Contributor, a podcast telling the stories behind the best open-source projects and the communities that make them I'm Eric Anderson. Eric Anderson: Well, today, we have Travis Oliphant here who is a prolific open-source founder, and we'll get into all that story, but he's famous for both NumPy and SciPy. Travis, thanks for coming on the show. Travis Oliphant: Thanks, Eric. It's great to be here. Love talking about NumPy and SciPy. They've been a part of my life for over 20 years and still part of my life, basically. Although, I'm not so much developing anymore. Now, I try and inspire people and maybe fund it. That's been most of my life. Eric Anderson: Very good. And in some ways, you were doing open source before open source was cool. Travis Oliphant: Yeah. Open source was cool a long time ago, right? But it was still in the shadows. It wasn't as popular. There were a lot of people ... This is back in the day when Microsoft would talk about the evil of opensource, and now they've understood that it's actually a benefit to them and they understood how to embrace it. Travis Oliphant: But yeah, this is back in the day when open source was the young kid trying to break through in the world, still a bit rebellious to work on open source. Yeah, that's where I hail from. Eric Anderson: You'll take us there in a moment, but first, before we get too far, why don't you explain what NumPy and SciPy are, just to level set with everybody? Travis Oliphant: Yeah, we'll start with NumPy. NumPy is an object. It's an extension to Python. Python is an awesome language because it's easy for domain experts to think about and use. You don't have to be a programmer and be able to program. So Python's been very popular. It's very easy to extend as well. Travis Oliphant: So NumPy is an extension of Python that adds a fundamental new object to Python that's called an array object, or a tensor as it's been called these days. So it's an endometrial array, not just two dimensions, but three, four, five, seven. Travis Oliphant: And then dot-object can be used to do lots of calculations very quickly on lots of data. So NumPy is a foundation using lots of scientific applications. They need array objects. SciPy is a library of functions, methods, concepts, that need array objects, but add features. Travis Oliphant: So SciPy has an optimization library. It has an integration library. It has ordinary differential equation integration. It has special functions. Just a lot of things that a scientist was going to need, SciPy adds those to Python requiring an array object like NumPy. Eric Anderson: Great. And then take us back then. It was NumPy first and then SciPy, right? Travis Oliphant: It actually wasn't, it was SciPy first then NumPy. Many people ... I'm more known for NumPy because that was a significant transformative event in the history of Python. But SciPy was the story. SciPy is actually my baby. It's what I got passionate about. Travis Oliphant: I was a young graduate student at the Mayo Clinic studying biomedical engineering, biomedical imaging, MRI, and ultrasound. And I thought I'd be working at GE, maybe working at Philips or Siemens. And I loved medicine. I was an electrical engineer by training. Loved applied math and loved using that to benefit society. And so medicine is where I was studying as an engineer. Travis Oliphant: Along the way, I was doing a lot of computation. Landed in a ... You have a lot of data with ultrasound. You have a lot of data with images, and so how do you process those efficiently? You pull out C, and I was enough of a programmer that I learned C, I'd learned MATLAB, I'd learned some Perl. Travis Oliphant: I knew how to get a lot of stuff done with programming, computing, but I was not really content. MATLAB, I didn't like the license, didn't like the fact that if I published code, others couldn't use it and run it. They had to go buy a license to run the code. Travis Oliphant: I'm not opposed to software and paid software, but when you have to pay to run code, it feels like more of a ... I want more people to be able to share, and learn, and grow from that. Feels too much like you have to pay for your books. There aren't low-cost available things. So I really didn't like that, so I was kind of uncomfortable generally, even though I liked the high-level language of MATLAB and what it let me do. Travis Oliphant: So as I was studying, I encountered a problem where I didn't have enough memory. So I had big five-dimensional datasets. I was trying to differentiate five-dimensional datasets coming out of a MRI experiment called elastography that I was doing. Travis Oliphant: So we'd take images over time of a wave form propagating it through three-dimensional regions. So it was five dimensions, well, four and a half perhaps, or a dimension of time, dimension of space, dimension of time. And then three vectors, three directions of data, so it's a big dataset. Travis Oliphant: It didn't fit in memory in MATLAB, unless I could get it to fit if I used floating point instead of double. So 32 bits instead of 64 bits. But MATLAB didn't have a good floating point type, so I started to look around, "Well, what else could I use here on the internet?" So back in the day ... Travis Oliphant: And Google was starting to become useful back then. I found Linux as a graduate student. I was pretty much a typical ... I used VAX/VMS as a master's degree student. Many people don't know what that is, but there were other things besides Windows and I'd learned Unix, and if I wasn't at home, I still used Windows. Travis Oliphant: As a grad student, you have more time than sense, usually, and so I installed Linux and started to get to know the open-source ecosystem around that. And I found that I really liked it. It was a hobbyist place. It empowered me to do stuff. Often, would spend a lot of time. I remember trying to figure out a hard drive. I got a new hard drive for my box, it was a Linux box, but the kernel didn't support it because I had to go figure out a module and debug a kernel module. Travis Oliphant: So crazy stuff like that, but it was fun as well, and enough of a programmer to feel that I could do that. And then I started to look on the internet, see what else is out there. Maybe there's some other open-source things out there that can help me with this problem I have at work or at school. I was in my PhD program. Travis Oliphant: I found lots of things, but I started to find Python with its nascent array module called Numeric. Numeric was written in 1944, started in 1944. 1995 is when it actually came out, written by Jim Hugunin. And as I looked through the internet archives, I found the mailing list. And we all communicated over mailing list back then. We didn't have the other ways to communicate. We just had a mailing list. Travis Oliphant: But it was great because I could basically see intelligent people talking about hard problems dispassionately, just talking about the problems and just saying, "Hey, this is how it works." And "Oh, it's cool. I could learn a lot." So I learned a ton about how people were trying to approach the problem of array computing in Python. Travis Oliphant: So I found Python. It was similar enough. It was a high level like MATLAB. It was similar enough. The syntax wasn't too crazy. So I thought, "Oh, this is actually pretty good to use." I previously tried Perl, and I liked Perl for some of the same flexibility reasons, but I struggled because, when I came back to what I'd written a month earlier, I couldn't understand what I'd written. It was too cryptic. It didn't leverage enough of my language center. I'd sort of have to learn a new language instead of leveraging what I already understood. Travis Oliphant: So I found Perl hard to maintain. So when I came and saw Python, I was like, "Hey, this is pretty nice," and I started to use it, but it lacked a lot of capabilities. So Numeric existed and Numeric had a floating point. And so I could basically do some experiments like I was doing in MATLAB with the data I had, but using Numeric and floating point, and it worked. I went, "Oh, this works pretty well." So I was really grateful to the community that existed, I got to know it. Travis Oliphant: This is 1998, 1997, '98 when I started to come into the community, but it existed already. Jim Hugunin had written Numeric '95, Paul DeBois, David Asher, Konrad Hinsen. There are many people who are basically active, talking about this array object, what they'd done, what they built. I went, "That's cool. Well, all that's missing is a bunch of libraries to support this." In fact, there was even some dialogue going on, "Man, wouldn't it be great if we had an optimization library or if we had some way to do ... " Travis Oliphant: So as a graduate student at a PhD program, I didn't tell my wife, but I said, "I want to go do some open-source coding." And this story has played out and I've heard it 10 or 12 other times, and probably 100 other times, it's played out. The future's your own. Your PhD is, you have classes and you have to find some project to work on, and so there's time to explore. And I used that time to explore open source and open-source contribution, and I found that I really liked it. Travis Oliphant: So '98, I did that. In 1999, I basically said, "Well, you know what? Dude, I'm a grad student. I finished my classes. I just got my PhD dissertation to write. Why don't I spend a few months routing some libraries and linking them to Python, and making them available to others, and see where we go?" So I started with ... scratched my own itch. Do things that I needed. I was studying MRI, like I said, and so I needed ordinary differential equations. So the block equations, or the classical simulation or model of how MRI works. Travis Oliphant: And so I wanted to model that and use ... It's a solving an ordinary differential equation problem. So, "Okay, well, let's see. Oh look there. There's a whole bunch of FORTRAN codes that actually do ordinary differential equation solving." And since Python's extensible, what I became an expert on was extending Python. Travis Oliphant: I learned the Python/C API, I learned how to take C/C++ FORTRAN code and link it to Python. If anything, that was the big skill that I had, was the ability to ... So I was an applied mathematician who understood the math, but then I could have the skill of linking old FORTRAN codes to Python. And that would really set the stage. Eric Anderson: And we think today of Python as the data science language, but I don't imagine that was the case then, right? Travis Oliphant: No, no, not at all. But back then, a lot of them was somewhat obscure. Java was starting to grow. It wasn't too popular back then. A lot of C. A lot of FORTRAN in my area. A lot of FORTRAN, and then kind of a hodgepodge. MATLAB was very popular, Java was growing in the business world. A lot of C++. Travis Oliphant: In fact, I didn't know much about the business world, what they were using. I don't know what they were using. R was growing in popularity too. At the time, the S was popular, but there were a lot of different platforms. The big thing you have to recall is, back in that day, you had Windows, then you had HP-UX, and AIX, and SunOS, and like 7 or 10 different Unix variants. Travis Oliphant: Linux was entering, disrupting those variants. So I would run software on all those different kinds of variants. There's a lab at the Mayo Clinic. They were a biomedical imaging lab that put out software for bioimaging, and they would test that software in this lab on like seven different Unix variants. Travis Oliphant: And so, because I was a student there, I had an account on all seven of those boxes. So I'd play with those versions of Unix and try out, "Does this Python compile on those platforms? Can I link a C extension? How do I actually compile for FORTRAN?" Travis Oliphant: So a lot of my days in '99 were spent ... And I've gone back to the mailing list and actually looked at my announcements. You can see, basically, over a course of like every three months, this yahoo out of Mayo Clinic would show up saying, "Hey, I got a new package. Here's a website. Go download it." Travis Oliphant: Of course, what it meant was, here's a tarball. What these processes back then meant I would create a tarball or tar up the code, put it on a website that I created that was really ugly, but someone would go and download it, and then compile it. There's a lot of work to actually take the work that I had done. It wasn't that simple. Travis Oliphant: It was a lot of work to get it and use it, but people did. During that year, I got really excited because people did. I got emails from Estonia, from South Africa, from somewhere in United States and people were like, "Oh, thanks. I used it, installed it. Hey, I have this suggestion." Travis Oliphant: I went, "Wow, that's pretty cool. I'm collaborating with people all over the world around this thing, that is helping them in their work." That was very addicting. At the end of the day, that's what I wanted to do, was just help people and do stuff that mattered to them, and this was what was happening. Travis Oliphant: It didn't really help my PhD program. I mean, it didn't. I used it, but it wasn't directly responsible what I had to do to graduate and convince my committee to give me a degree. But I really enjoyed this work, and so I released a bunch of modules. Travis Oliphant: I put it out as a name the Multipack. In 1999, I released the module, then another one, then another one. The first one actually was in 1998, in December. I'll tell this story because it's really valuable, I think, to understand how does knowledge get shared? Travis Oliphant: I was a young kid, didn't know a whole lot. There was a guy at the University of Illinois Mike Miller was his name, right? He wrote a package called Table I/O. Table I/O was just how to read data from a table, from a CSV file, into a data structure in Python. But the key thing is, he released his code and I could see, "Now, how did he write this extension module? How did this work? How did he actually link the C roulette routines to something that Python I could read?" Travis Oliphant: And I basically took his module and said, "Oh, I can cut and paste that." Take his starting point, start inserting my own, revamping it, and that's how I released my very first extension module, was basically reading his stuff, looking at his stuff. And then Gito wrote a detailed blog post on how reference counting works in Python. Kids today don't actually worry about that much because they use things like Cython and F2PY, and other tools to write extension modules. Travis Oliphant: But back in the day, you have to count references. That's the hardest part about extension-model writing in Python. That's how it manages objects. Instead of garbage collection, it uses reference counting. So you had to know that in C, or else you'd create memory leaks. Travis Oliphant: If you didn't handle the reference count properly, then objects would stay hanging around and they wouldn't get cleaned up. And so you'd just blow up your memory really quickly, especially when your objects are 100 megabytes of data. And those get created every time you add 200 megabytes of data, you get another 100 megabytes of data. Travis Oliphant: In intermediate calculation, it's supposed to be created, then disappear. But if your reference counting is not right, it will stay around. And so you have all these objects floating in memory and you just run out of memory really quickly. So learned about that. Travis Oliphant: But they shared their stuff, so I could release my first module called NumPy I/O, and that was just a simple way to read data, particularly medical imaging data from DICOM format, like a particular analyzed format, it was called, the package that we worked on there at the Mayo Clinic. Travis Oliphant: That got me started down the process, and then I uncovered on the internet, you say, "Oh, there's all these FORTRAN modules that were written in the '70s, and finalized in the '80s that still work if you can get FORTRAN compiler and then link them to Python." Travis Oliphant: And so pretty soon we had optimization routines, we had ordinary differential equation solutions, we had special functions, and they were just available as extensions to Python, you installed separately. You'd have to compile them. Travis Oliphant: And then a guy named Robert Kern, I didn't know at the time, but he was, I think, 17, 18 years old. He was really young. I didn't know this because the internet back then, you were known by your typing. There's no picture or photograph. You had a handle and a typing. Travis Oliphant: But he basically took my tarballs and then released Windows installers for those tarballs. So he did that work and release the Windows installer. And what do you know? As soon as those installers were available, lots of people started downloading it. Lots of people started using them. Travis Oliphant: So that was the first lesson I had in distribution matters, like making it easy to install is the quickest way to get people to use your stuff, because my first approaches were just to put tarballs out there. Eric Anderson: Yeah, you're a mailing list package manager, basically. Travis Oliphant: Yeah, correct. Exactly. People today may know me for Anaconda, which we won't get into, but how in the world did a scientist from the Mayo Clinic become a founder of a packaging company? What? How does that work? Well, that's kind of why, is because along the way and releasing ... Travis Oliphant: That was '99. I also met Pearu Peterson that year. He started working on ... He said, "What are you doing? You're manually wrapping all these FORTRAN libraries. That's insane." That's how he could tell the difference between someone like me, who's more of a domain expert, who uses computers, to a real computer scientist. Travis Oliphant: But a real computer scientist finds a problem and tries to automate it, right? And sometimes, he'll automate things that don't need to be automated because they're not used that often. But they're always looking for that. I'm kind of like, "Let's get the job done. Let's just solve a problem. I have some other things I want to do." Travis Oliphant: So he looked at that goes, "What are you doing? You're kind of boiling the ocean, trying to wrap every FORTRAN module in existence." So he wrote something called F2PY. It's a beautiful piece of software actually. F2PY would parse your FORTRAN code and automatically generate the extension module. So effectively, it was an auto generator for SciPy. Travis Oliphant: Now there were a lot of details there, and we had a lot of fun that year. I helped him make F2PY more ... Some modules, there's something called reentrance, like if you call optimize on a Python function, but what if you're going to do an integration? So you're going to integrate a function. What if that function itself has to call integrate? Travis Oliphant: So all of a sudden, your integrate might call, and then the key piece to your calling a FORTRAN routine, that FORTRAN routine is going to call back into Python, because it's past the Python function integrate. So you've got to handle that somehow. How are you handling that transition between back and forth? Travis Oliphant: And then what if the Python function then calls back into your integrate routine, right? Because you're doing double integral, so you have reentrance. So I helped him handle it, set up the stack and the variables so that F2PY could create reentrance extensions. Details like that, because I'd hand done it, then he could make it work. Travis Oliphant: So there's a lot of details like that, that we worked out. But F2PY has been a beautiful tool. It's still useful, still used today. And I get to work with Pearu now too. Actually, 20 years later, we started working together, finally, formally, even though we've been informal colleagues for decades. Travis Oliphant: So that's fun. It's kind of full circle, but that's how SciPy started. SciPy started, me doing that, people going, "Oh, that's cool. That's super helpful." And then pretty soon, we had like eight, nine modules, then shipped them all as multipack extensions. Travis Oliphant: And then a guy named Eric Jones and Travis Vaught, they started a company called Enthought in 2001. They called me. I was finishing my degree at that point. I started that in '99. I finished my degree in 2001. So '99 you can look at it, and I basically delayed my graduation by a year, by focusing on open source rather than doing my dissertation. Travis Oliphant: But I did my dissertation work too. But by the time I finished, I ended up ... I thought I'd go to work in industry, but I got a job in academia. My alma mater called and said, "Hey, why don't you come back and apply for a tenure track position?" Travis Oliphant: So I went, "Okay, yeah." I mean, I decided to primarily because that's where my kids were, my kids' grandparents. They could see the family and I had three kids by that time, and my wife wanted to move back to be near family. So we did that. Travis Oliphant: At the same time, I got a call from Eric and Travis, who was starting their company. And they said, "Why don't you come work with us? We're trying to do this company." And they came up with the name SciPy. So I came up with ... I've gotten better at naming, but I'm not an awesome namer. My name was Multipack, right? Travis Oliphant: They came up with the name SciPy and they wanted to build, essentially, an overall environment for scientific computing for Python. So they said, "Let's work this together." And so Eric had a couple of modules he brought and then Pearu brought some modules too. Travis Oliphant: And then I brought all the multipack, and we had the idea to create SciPy. That was in 2000. So we worked on SciPy and releasing the SciPy project 2001, is when it came out as a first thing, based on this work had been done before. And SciPy, it was trying to be everything. It was really trying to be an environment, almost like an equivalent of a MATLAB. Travis Oliphant: What I realized later is that the biggest part of SciPy was distribution. We spent a lot of time just trying to make sure people could install it on Windows, in particular. So a lot of time was spent on just packaging. In fact, I often say SciPy was the first distribution for Python masquerading as a single library. It was collection of massive extensions, but really just helping people install it. Travis Oliphant: About that same time, 2001, maybe a couple of years later, people start realizing, "Well, is SciPy going to have everything? One of the challenges of open source is that open source is about communities coming together, than a single governance stock. But how much can that manage and what's the topology of governance over the contributions? Travis Oliphant: The scope of a SciPy was, "Hey, every scientific extension, let's put in one module." Well, how are you can really manage that? People who are good at statistics versus good or differential equations, maybe they want to be differently governed. And so basically, we were very loose knit. There wasn't much structure. Travis Oliphant: Enthought as a company was trying to start, but as a small company, doing consulting. It wasn't like they had the resources to invest in a large team, trying to manage the SciPy governance or SciPy story. They did start a SciPy conference. It's still going strong today, and that helped unify the community and grow the community. Travis Oliphant: So from 2001 to 2004 or '05, SciPy existed and was growing. I had a graduate student at ... BYU was where I taught. I went back and taught there. It was my undergraduate alma mater and I went back and taught there. I had a graduate student that actually worked on SciPy modules for iterative algorithms for linear algebra. Travis Oliphant: So Krylov subspace algorithms, iterative algorithms for optimization. We'd add things to SciPy to make it better and better over those years. In '94 ... well, it actually started in 2000, I got a call from Perry Greenfield from the Space Science Telescope. Travis Oliphant: Before I left the Mayo Clinic to come to BYU, he was saying, "Well, we want to use Python for processing Hubble Space Telescope images." He was seeing the same thing I saw, which is an easy language to learn. If we add a few things to it, it's actually can replace ... For them, it was something called IDL. For me, it was MATLAB. For them, it was IDL. Travis Oliphant: They said, "We can replace this with an open language that has more community feel, and [inaudible 00:21:45]." They were pushing that direction, but they said, "We need the array object to be stronger." Numeric, we were using for SciPy, they want it to be better, stronger. Travis Oliphant: So we had early conversations in 2000 about, "Well, what can we do?" What changes we made, what fixes to NumPy could happen? We started talking about that. They eventually went to work. I think it was in 2001, about the same time. They actually went to work on a project called NumArray, and NumArray was an array object, a new array object. Travis Oliphant: So over 2001 to 2004, you basically had this world where Numeric was the established array object, and we had SciPy built on it. And then NumArray was starting to emerge as a new array object, right? And very similar today, where we have PyTorch and TensorFlow, and they're different systems, but they kind of do the same thing. Travis Oliphant: And that's fine to have competing implementations, except we were very young. I mean, there was a handful of people in this community, maybe a hundred developers, right? And then maybe 10,000, 100,000 users at that point. Maybe, it was creeping up to a million by then. Travis Oliphant: But effectively, I started to see the split in the community. So 2004, 2005, we'd see ... Remember, SciPy was my baby. That's what I'd put out to the world, what I spent so much time on creating, and growing, and helping to create, and then NumArray come out there, and then effectively some modules being written that just worked on NumArray. Travis Oliphant: So in particular, there was one module called ndimage. I studied at the Mayo Clinic, biomedical imaging, and one of my classes was morphology and image processing. And I always wanted to have a morphology library for Python. So morphology, for those that don't know, it's an image processing technique where you set theory locally to ... there's dilation and closing, and opening and closing images. Travis Oliphant: You basically deal with gray scale primarily in masking, like how do you apply a mask to find edges and to break up ... Imagine an image of a brain. And you're trying to figure out connectivity and you have noise in it. And you're trying to figure out where the real connections, and how you thin out the other ones? Travis Oliphant: So morphology is a tool people would often use for that purpose. And I thought, "Oh, that ... " I knew about morphology. I knew how it worked. I said, "It'll be great to have a morphology library in Python. Took time to build that though. So in about 2004, I think, a project called ndimage showed up. Smart developers, smart creator created this thing called ndimage on NumArray, right? Travis Oliphant: Well, NumArray is the future. So he wrote on NumArray. That's great, but then, oh, SciPy is on Numeric. And the problem is they use different memory. So it's not like I can take ... if I have my Numeric-based array, I'd have to copy it over to a NumArray array. There really weren't good ways for them to share a memory and share data. Travis Oliphant: They could be built, were starting to be built, but effectively, there was this mental split happening where people were, "Oh, I have to use NumArray. I can't use both." There's sort of this big angst and the community. John Hunter, at the time, had written something called Matplotlib. Travis Oliphant: And Python, as many people don't understand, is a story of many players, could have many individual mavericks basically, or people who've put a lot of effort into growing different parts of the ecosystem. There's West McKinney on pandas. And the key part of that is, it's not alone. Travis Oliphant: I did some stuff to get it started, but the only reason anything successful is because lots of other people jumped in. And a key part of growing the community is actually enabling that contribution, enabling that participation, so that you can get lots of people supporting both users and contributors, your idea, especially early on, so it continues to grow. Travis Oliphant: And John Hunter did that for visualization, so we had some nascent visualization in SciPy. He came around in 2001 and said, "Yeah, that's not good enough." And he built something himself called Matplotlib that was a much better visualization tool. And he had that problem too. He said, "Oh, NumArray is starting to show up. People want support for NumArray. They want support for Numeric. Travis Oliphant: And so he built just a little module called Numerix with an X that was just simply an API module. It was like a layer between them, so he could write to that API and then depending on what you had installed, it would use the array object you want. So he put that shim layer. Travis Oliphant: There's an old saying that there's no problem in computer science that can't be solved with another layer of abstraction, which is kind of true, right? Now, there's consequence that abstraction, and then ultimately you can end up with latency and trouble. But at any rate he did that. That was okay. It was actually a reasonable solution. Travis Oliphant: With ndimage being released and kind of splitting the community in my ... I was uncomfortable. I didn't like it. I was like, "Oh, can't we just have Numeric NumArray and Numeric work together, and have some [inaudible 00:26:27] worked?" So I was thinking about that in 2004. Travis Oliphant: Spring of 2005, summer of 2005, I was teaching a class on building MRIs. So we were going to basically design and maybe build, Most likely just design and simulate an actual small MRI machine. And it got canceled. Didn't have enough people sign up for it, so I ended up without a class to teach. Travis Oliphant: And so I'm sitting there, "Oh, I got some free time. Why don't I ... " Even though my tenure committee had told me, I was spending too much time in open source and I should be writing more papers, I was still really bothered by this thing in the world that had the split happening. Travis Oliphant: And I knew a ton about Numeric. I'd been in conversation with the NumArray people, probably one of four or five people in the world that had enough information and knowledge to do it. And then I had some times. I said, "Well, I think I need to do this." So for about four months, I just wrote NumPy. Travis Oliphant: I basically said, "Well, I know Numeric. I know the features NumArray is adding. And let's see if I can't use the code base of Numeric and the infrastructure." NumArray had taken a different approach. They were just a Python only, then kind of augmented with C extensions. That was their approach. Travis Oliphant: And it had consequences for smaller arrays. Small rays were a lot slower in NumArray because of that. So I went back to Numerix, said, "Well, let's start with a C extension and then add the features of NumArray." That was the idea in the program, and so I just went about and did it. Travis Oliphant: I said, "Hey, I'm going to do this." People went, "Man, okay, good luck." They were incredulous, actually, just because it's hard to do something like that. This is back in the day, that is small enough community. And it is hard. It is. But I was young and ambitious. Travis Oliphant: The hardest part was actually ... I did a lot of things badly. In fact, I look back and I wish somebody senior was there to help me, someone who knew more about type theory. I wish I was there to help my younger me type theory. I was trying to unify a type system. Travis Oliphant: And if, actually, I had a better understanding of type theory, we could've come out with something much better, because that is still a challenge that we have in the Python ecosystem is, how to handle types. And for the different purposes, and different use cases, different reasons there around. Travis Oliphant: There's other things that I would've done differently for sure too. But we did introduce a more extensive concept of type and we actually called it dtype. In fact, today, it's kind of ... I look around and everybody's using dtype, and I went, "I just pulled that name out of my head. That was not a thing ... " Travis Oliphant: I was like, "Oh, should I use this? I don't know, what did we call this thing? Well, we'll call it dtype." And now it's everywhere. And I'm like, "Okay, well, you can even be wrong and show up everywhere." That's probably the one contribution of NumPy that's significant is the type system. Travis Oliphant: Even though I criticize and say, it could be better, and wish it were better, it was something we did, was try to improve the type system in NumPy. So I started that effort. Probably, along maybe about six months later, people said, "Oh, maybe it's going to work." Then I started to get some contributors, some other people jumping in and helping. And they were significant, really helpful. Travis Oliphant: Robert Kern took the Mersenne Twister algorithm, and we use Pyrex at the time. It's now Cython. We started to use Cython extensions. Charles Harris came in early on and helped a whole lot of lifting for some of the libraries that were used, because NumPy was more than just, "Here's an array object." It also had a bunch of libraries. Travis Oliphant: And we had the challenge in NumPy to actually unify Numeric and NumArray. We had to let Numeric users be able to adopt NumPy and not be too difficult. They could easily migrate. And that meant they could recompile their code. There's a C-API that had to be handled and a Python API had to be handled. Travis Oliphant: And then NumArray as well, even though it was a little more nascent, we had to support it as well. And my goal, of course, was get ndimage onto NumPy, and then SciPy onto NumPy. So effectively, our test suite where these extension modules to these other languages. Travis Oliphant: And that's why, for a long time, NumPy didn't have a great test suite, because the test suite were actually those libraries that we had to support. It's better now, but early on was one of the depths of criticism of NumPy. They'd say, "Where's the test suite?" "Well, we were using these libraries as a test suite." Travis Oliphant: So that's the story, kind of long-winded, but trying to give the color, and the motivation, and what was going on in my head. Another person I mentioned is Francesc Alted. Francesc Alted was a Spanish person, who I also had the pleasure to work with later. Travis Oliphant: He did a lot of testing of the new record data types. So we had the ability to have a nested data type in NumPy, which was significant because that actually, I think, led to pandas, basically because the record data type gave you the ability to do data frame, gave you the appearance to be able to do a data frame. Travis Oliphant: So people started to use it for that purpose until they went, "Well, it's a little bit ... " To be clear, NumPy gives you the ability to do arrays of records. You have arrays were your array, every element is this long-nested record. Data frame's kind of the opposite. They do a record of arrays. They have an array as separate items. So instead of having an array of records, it's a record of arrays. Travis Oliphant: And then there's just the memory allocation details. There's some details what they do differently that were sensible. And we had that discussion early on, but we adopted ... We had to support NumPy's main use case and just extend it to these record D types. Travis Oliphant: But Francesc Alted really proved those out. Really put a lot of effort into, "Do they work? Do they not work? Where are the assumptions?" And really help test that out and make sure it worked. That whole process took about ... I thought it would take four months. Travis Oliphant: It took about six months to write NumPy, but it took about another year to work out all the kinks and the bugs, that I could tell someone, "Yeah, use it. It's ready to use." So November is fall of 2006. So I started in January 2005. About fall of 2006, it was usable. Eric Anderson: I mean, this is the common two-options-split problem. And you solved it by adding a third option, which we always joke about. But in fact, it worked this this time. You got everybody to move over to NumPy. Travis Oliphant: It worked this time, but it was a lot of work, and we were intentional about that. That's where we had to make it backwards compatible. We had to think about each of those audiences who were trying to adopt this new platform independently and serve their needs, right? Travis Oliphant: That's the key thing. You can't just go out and put a, "Hey, it solves both the problems." Well, does it? Have you talked to people who were actually trying to use Numeric of what can they do when they come over to the NumPy? Now, not everybody did. Like 99.9% of people did come over to NumPy. Travis Oliphant: There were still a few holdouts, because I'd run into them occasionally. "Oh, I'm still using Numeric." "Okay. Well, that's a good thing about open source, is you can. There's nothing forcing you to just switch." And in fact, in some cases, for legacy hardware, legacy systems, it's okay. You can keep it on an old version. You don't really have to move it forward. Travis Oliphant: The challenge is if you want people to maintain it or help you with it, you're not going to get that unless you move forward, because the community, that's where people are going to learn. So the key thing for me was, I knew I'd succeeded with NumPy when John Hunter made his dependency NumPy. When he ripped out Numerix and depended on NumPy, that was actually the key. Travis Oliphant: I told them him afterwards, but that was when, "Okay, this is going to work." At that point, then the snowball effect happens, because before then, people are like, "Well, is this really going to work," or, "Should I rely on it?" But we had enough carets. We had enough features. Things were faster. It had better features in Numeric. Travis Oliphant: It was faster than NumArray, stuff like that, that could get people excited. But you do have to think about your audience of those other end, and how you're going to get them to use it. And that's that's real work. So we have the same problem today in the world, which TensorFlow and PyTorch have been talking about that over the past couple of years, like, "Well, this is ... " Travis Oliphant: In fact, what is there today makes what I was solving really seemed silly. I mean, it's like it was as a exercise science project, right? Now, we have massive code bases on these two platforms. I don't think a single Tensor object to unify them all is the right answer at this point. Travis Oliphant: I've gone back to, actually, John Hunter's original strategy of a layer, so that downstream users don't have to think about the details. It's much more about API commonality and letting downstream users, people who are consuming the Tensor, not have to know what some user has actually implemented, particularly for library authors. Travis Oliphant: If you're writing a library to do something more, how can you make that library be able to support both TensorFlow and PyTorch and NumPy as well? MXNet ... We've been talking about to other people about, "How do we do this?" The lessons learned from the past are still real. Travis Oliphant: There's a lot of great stories from the past too. I've actually contemplated writing a book to tell some of these stories about where some of these things actually came from, because it was very tight-knit group, and then it bloomed. And I don't know, since 2015, I certainly haven't been able to track everything, but up until about 2012, '13, you can kind of know everybody that was significant doing interesting things in this space. Travis Oliphant: And I had the good pleasure of knowing a lot of them and calling them my friends. So that's probably the summary, long summary, but I don't know if you have any particular follow-ups on some of that story. Eric Anderson: Well, I know we don't want to spend a lot of time on packaging, but you brought up distribution. And I'm just curious, was PyPi, or pip is it's called, was it not there at the time? Travis Oliphant: It was not there. No, no, not at all. In fact, that whole story of distributing packages in Python goes back a long ways. One big problem of NumPy is that distribution, right? At the time we had ... There's different parts of the distribution problem, which is often why it's hard for people to wrap their heads around, because there's different user stories and job stories that are all packaged into the same conversation sometimes. Travis Oliphant: And therefore, you end up people arguing past each other because people are focused on their different use case, not recognizing this other use case that they're ignoring, or assuming doesn't exist, or people don't have a problem with. That's constant in this community. Travis Oliphant: Particularly Python, while its general purpose, the primary creators, the primary contributors to Python movement have primarily been web and system administrator kinds of people, like computer scientists. There haven't been that many core Python developers who are as, let's say, science familiar, and familiar with why a NumPy exists and why a SciPy exists. Travis Oliphant: You sort of see them as, "Okay, great. It's another library," but they don't understand some of the, "Why did you have to write a new module," or, "Why are you calling out to C for all these capabilities?" So they don't appreciate some of that. And so sometimes their choices don't reflect some of those realities. Travis Oliphant: It's just not present. So that was definitely true with the creation of something called DiscUtils. DiscUtils is effectively a build system. At the heart, Python, it used to be, "How do I get something installed?" Well, my first ways to get some installed in Python was a makefile. Travis Oliphant: You type make, and then the Python part is kind of separate. But there was this goal when people were thinking, "Oh, Python can be extended. Let's have Python drive the creation of that extension module." And that was setuptools. And DiscUtils and setuptools can have their own history. It was all evolving at the same time. Travis Oliphant: I'll get the facts wrong if I tried to go into it, so I won't. You could bring other people on to describe some of that, if you'd like. DiscUtils suffered that it could not create a FORTRAN extension. But SciPy was all FORTRAN extensions. So in fact, it was not serving us. It couldn't be used. Travis Oliphant: So in NumPy, we added the ability in NumPy to compile a FORTRAN extension, called NumPy DiscUtils, but that was a pain in the neck. It was so bad, that in fact ... It was Stephen Cook, I think, was the guy who had came in and actually made it work. And I was so grateful, because having to deal with that code was such a mess. Travis Oliphant: It helped me understand the code wasn't really extensible like it wanted to be. It wasn't serving its purpose of being an extensible system. We ended up rewriting it basically, and calling it NumPy DiscUtils. Used the name only. Later, setuptools showed up. So you could just say Python set up DotPy, given the name. That was how you'd installed things. The setup DotPy. Travis Oliphant: And the setup DotPy as a concept is reasonable, kind of the implementation of DiscUtils is not reasonable. And then pip came later. pip didn't show up until, gosh, 2012, 2011. I could get my dates wrong there, but was very late in the day, long after lots of stuff was out there, and installable and installed. Travis Oliphant: And then we ended up having to write our own package manager called Conda later. And that was after talking to DiscUtils folks and talking to Gito, and them saying, "Well, we're not really trying to solve your problem. So probably have to do it yourself." So we said, "Okay. Well, we will." Travis Oliphant: Now the PyPA has evolved since. I would say, right now, just with the release of the pack of the DependencyResolver, they're like 90% of the way. And there were maybe 20% of the way when we started writing Conda. Now, it's maybe 90% of the way. Travis Oliphant: But there's still a fundamental difference, a user story difference between a user and a developer. And pip evolves from a Python developer, the person who's developing the package. And that's a very different user story than if I'm just a user of a package. And those two job stories have different constraints and requirements. Conda isn't a great developer package manager, for example. Never was trying to be, right? Travis Oliphant: But pip is not a great user package manager, and really has never claimed to try to be. But people sometimes conflate those. And sometimes, as a user of Python, you're kind of wearing both hats. So that's why I think some of that is part of the continued confusion and challenge of the packaging story in Python, per se. But that's another story. Travis Oliphant: But that's how, as you've heard me describe with SciPy particularly, SciPy was all about distribution, how people get it installed. Most of the pain was actually build pain of rustling with competitors and the difference between a Windows compiler and how it handles references. Travis Oliphant: You don't have a pointer. Some of your audience might be C programmers, but if you're a C programmer and you have a pointer to a structure, how does the compiler handle that? Well, it's actually not defined. And so the C compiler on Windows will compile that differently than a C compiler on another platform, and might handle it differently than the FORTRAN compiler on Windows. Travis Oliphant: So now you're calling a FORTRAN routine from a C routine with a structure. It means you can't do certain things. If you pass pointers to complex numbers, you have to be careful about how that's linked, because the linker won't do the right thing for you, and you'll end up with the segfault. Travis Oliphant: The right number isn't past the routine that it's expecting, because it's a convention the competitors haven't agreed on, right? And if you're talking about a blended system like Python, where you're later adding modules that are compiled, then the convention the competitors use matters. Travis Oliphant: But if it isn't published or it isn't determined, then your loft ... Those are the things I was learning on the fly, like, "Well, pulling my hair out on why this thing won't work." But that's the kind of stuff we'd wrestle with it and we got very good at it. Travis Oliphant: So there's a lot of factors involved in open-source software, getting contributors, getting distribution, NumPy's exceeded my expectations in terms of what it did to light off ... and really in conjunction with other activities. What NumPy did, I think, it enabled Matplotlib, right? To not worry about this story, and then to grow. Travis Oliphant: It enabled pandas. It encouraged Wes to wright pandas because he was starting ... He tried to use records, or write rec arrays in NumPy, and realized the limitations, and wanted something more. So are other hedge funds doing something similar. But he worked at a hedge fund that he got to release the software as open source, so that drove pandas. Travis Oliphant: And then the SciPy ecosystem, as it grew and grew and grew, it was like, "Oh, we gotta have a scikit ecosystem." The scikit ecosystem was basically like, "We've got to factor this. The science is big. We've got to have lots of modules and we have to have projects develop independently, and not be dependent on the release schedule of one project and governance of one project. So that led the scikit. Travis Oliphant: And scikit-learn was one of the most popular of those scikits, a route for machine learning. And I look at the adoption of Python and see NumPy and SciPy helped scikit-learn, pandas. And then Jupyter finally grew out of the IDE environment. Travis Oliphant: We have a little IDE in SciPy, originally. It was a little idle IDE and it was meant to be an environment, but that ended up being an entirely other problem space that would require its own efforts to develop. And Fernando Perez and Brian Granger pushed Python initially, and then it became Jupyter. Travis Oliphant: And Jupyter, panda, and scikit-learn, huge for influencing Python's adoption and dominance around science. Even with NumPy kind of undergirding it, and helping people unify around a common core. So definitely, got to see a lot of growth have been amazed actually at it. Did not expect this. Travis Oliphant: And then all of a sudden, 2015, when deep learning took off, all the major players ... I'd spent lots of time at Microsoft trying to help them understand the value of the domain expert, lots of time trying to help them understand. And the culture there is very developer-centric, and kind of this divide I've talked about before of the developers in Python versus the domain experts in Python. Travis Oliphant: And they don't always see eye to eye. True at Microsoft too. Now, of course, since 2015 and beyond, it's like all the major companies care about array computing. They had to change the name of tensors, but okay. They care about tensor computing and data frames. It's like, "Okay, now there's a bunch of people here." Travis Oliphant: They've got new good ideas, so it's been amazing to watch the explosion these days and, "Well, how do we ... " I've just been looking for ways to continue to contribute, but it's been fun to see the growth and fun to see what happens when you facilitate and light the ability for people to cooperate together. Eric Anderson: Yeah, Travis, you've done a fantastic job of giving us the lay of the land and kind of the full story. We had somebody on for TensorFlow recently. And I think we should continue down this path with future stuff. Before we wrap up with you, can you tell us a little bit about, once you build SciPy and NumPy, I imagine you're at the heart of all of this activity, and maybe you wonder what you can do next to you. Do you bring governance to the projects? Do you start a company? Travis Oliphant: That's a great question. I think it depends dramatically. SciPy and NumPy had a lot of participants, right? I feel like I definitely did a lot to sacrifice some time and spend time that wasn't necessary allocated, but just go do to create things and to create an energy. But the reason it was able to succeed was because of the willingness of others to jump in. Travis Oliphant: So in terms of creating a company, one of the challenges has been, in the SciPy, NumPy ecosystem is creating a company as well. There's a lot of people involved. There is some concern about, does one company show up and then try to take ownership of this space? Travis Oliphant: So what is that company? And there have been several companies around this space, but how does it interact with the community? So I've thought a lot about company interaction with communities, and how do you support that? In fact, that's the whole mission I'm on now with OpenTeams and Quansight is checking companies and communities and doing it effectively for the benefit of both. Travis Oliphant: So that's one question to ask, and it really depends on the thing you're building, and what is there, and who participated in it? I think, secondly, success happens in a group. It definitely requires individual energy, but that individual energy has to be channeled with cooperation from others, and enabling others to be empowered, to have ownership. Travis Oliphant: NumPy's an interesting story. I was involved heavily for a long time, but looking for people to come in. I didn't have an intention to have other people take over NumPy, but I didn't have money to work on it. Nobody was paying me to work on it. So I needed other people to work on it, and I could spend time when I had. Travis Oliphant: And along the way, you get people and they spend their time, they mix their energy with it, and then they feel an ownership of it too. And that's good, but then it also means you have multiple stakeholders. Multiple people have to ... I think we could have done a better job of governance. I think that's an important thing these days to pay attention to is, how are decisions made? Travis Oliphant: It doesn't work to have ... Once you get beyond five to seven people, just having everyone agree isn't going to work, just because it takes a lot of time. Not because they couldn't, but because it takes a lot of actual getting people together, and coordination, and facilitation to even know they've agreed, right? Even if they were going to, and that's assuming they would, and they'd sometimes don't. Travis Oliphant: So a lot of projects end up stalling because there's no clarity about how a decision's made, even though it'd be better, even for the product to make a decision move forward, then other people could create other projects if they don't like those decisions, but at least there's clarity. Travis Oliphant: So I think that's what I've done differently is, I think organized governance, and at least spell it out, and then iterate a system. They've gotten better. Our systems for governance had gotten better. We had, in Python, land this BDFL notion, benevolent dictator for life. Travis Oliphant: It's kind of like having a king, almost politically. Having a king can be useful in some cases, it eventually, at scale, kind of wanes. It doesn't work very well, right? Unless the king is the name only, basically, and then just is a figurehead, where actually the actual governance is done more bureaucratically, I guess, or more democratically. Travis Oliphant: So I think that's one thing that's evolving in open source is governance. I make a point to differentiate between what I call community-driven open source, or CDOS, and company-backed open source, or CBOS. Both have value and their difference is in governance. Travis Oliphant: The difference of the company-backed open source is when the governance is a company or a person. I also keep ... Single-person governed projects are also company-backed open-source project, because it's the company ... It's DBA that person, right? Whereas, community driven is multiple stakeholders who actually participate in governance. Travis Oliphant: And I think all projects should think about themselves going through this migration. Maybe they start company backed. It's most common, they start company backed. And then how do they become community-driven in order to take advantage of all the features people wanted an open source. Travis Oliphant: But that's what I'd look for if I'm thinking about open source, is governance now. Governance is a really critical point. I wished we'd done a little better. It took a long time for SciPy and NumPy to evolve their governance models. I was a bit of a all-for-one one-for-all mentality, and kind of, it'll work itself out. Travis Oliphant: Whoever has the time comes in and works. And when you don't have any money and you're just all volunteers, that's true. It's hard to establish governance if there's nobody funding it. But once you do have any money at all, I think you've got to think about governance pretty hard. Eric Anderson: Very good. Travis, super great to have you today. We've covered so much ground. I've learned a ton. Thank you so much. Travis Oliphant: I love what you're doing for sure. Appreciate your advocating and promoting the open-source communities. Thanks so much. Eric Anderson: You can find today's show notes and past episodes at contributor.fyi. Until next time, I'm Eric Anderson and this has been Contributor.