Rajat Monga: One thing I used to say: "It's so hard to have so many users." But of course not having users... you don't really want that. Eric Anderson: This is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. I'm joined today by Rajat Monga, who was one of the early co-creators of TensorFlow. Rajat, welcome to the show. Rajat Monga: Thanks, Eric. Pleasure to be here. Eric Anderson: Yeah. I think we're all excited about this one. TensorFlow almost needs no introduction. It's a massive project, maybe even a family of projects by some degree. To start us off, maybe you could tell us just briefly how you describe TensorFlow to people, and then we can get into the story of how it came to be. Rajat Monga: Yeah. The goal for the project was, "How do we democratize machine learning?" And where it is today, it's definitely well on its way towards that goal. In terms of what it does or what it is, it's a platform for machine learning and offers all sorts of tools and capabilities for different kinds of things, from training new models all the way to deploying them, and you can train models on large scale supercomputers and deploy them on really tiny microcontrollers. So there's a huge span of things that you can do with this. Eric Anderson: How did this happen? If I had to guess, I believe the TensorFlow story, I was at Google when TensorFlow first emerged. Of course you were as well. And if I had to guess, it's almost closely tied to the Google Brain team and project. Is that right? Rajat Monga: That's correct. It came out of Google Brain, and in fact was a part of that team until very recently. Where it started from was, Google Brain itself started back in 2011, and we started with basically we wanted to scale machine learning, specifically deep learning up, and to explore that, to make that happen, we built some infrastructure which was called DisBelief. And from 2011, the next few years, we saw amazing success for that. We were able to scale it up. Lots of products at Costco who were using it. We were doing all kinds of new research that was published, and that's been very, very useful in this area. So around 2014 was when we were thinking of, "Okay, this has been great, but we need to fix a lot of the problems and limitations that we have with DisBelief." And so Jeff Dean, around that time, said, "Okay, what if you step back and think about a new thing? And what would that mean? How would we think about it?" And so we started talking about the next redo, so to speak, and that's how TensorFlow was born. Eric Anderson: Give us an idea of the scale of the operation. How big was Google Brain team when it first started? You were there, Jeff was there. Who else? And then at this point of transition, kind of how big was the team? Rajat Monga: Yeah. So when Brain started, Jeff was part-time. That was a 20% project for him. He had a bunch of other things going on. Andrew Ng was there. He was also 20%, one day a week, and he was still a full-time professor at Stanford. And there were three of us, and then three engineers, and we were joined by a couple of interns from Stanford group that Andrew had, and all of us started ... This was when DisBelief started, and then as we moved along, by the time we got to doing TensorFlow, the infrastructure side, the software folks who were building software, including me and a few others, we were maybe less than 10. I would say probably somewhere between five and 10. Overall, the research group was still maybe 30 people or so. Eric Anderson: That's amazing. Going back maybe really quick to the first days of Google Brain, did you know that you were kind of creating history? That you were among the first group that would then ... Andrew's gone separate ways. Jeff's now working on other things. You're working on other things. This was kind of the beginning of a new era. Did you have that sense? Rajat Monga: I think after a couple of years, honestly. At the beginning, we were all excited. We could see the potential, and it seemed like if this worked, there's a lot of impact it could have. We obviously didn't know if it would work or how it would work, and we were figuring all of that out. But a couple of years in, we had a lot more confidence that, yes, this was important. It was making a difference, and we expected it to make a lot more difference. Eric Anderson: Got it. And DisBelief was an internal Google project, right? And TensorFlow, at some point, the decision was made to make this a public, open source, community-led thing. Rajat Monga: That's correct. DisBelief started as an internal project. It was very tied to our internal infrastructure. So even if you wanted to, there was just no way we could have open source stack. Eric Anderson: Right. Rajat Monga: With DisBelief, when we were talking about TensorFlow, one thought was, "Okay, should we just agree on DisBelief, et cetera?" But in the end we said, "Okay, doing a new project is better. There are lots of advantages to that." Once we started down that route, a couple of months in, and this was, again, Jeff's side originally, that, "Why don't we open source it?" His thoughts were coming from. We've, at Google, built a lot of projects since, published papers, including MapReduce, Bigtable, a whole bunch of others. Eventually most of them, they weren't open source projects that came out of those, but they were never [inaudible 00:05:17], even though the rest of the world started using it. Rajat Monga: In this case, we were like, "If we moved out, not just published a paper, but actually published code with that, that would really move things faster for everyone, and would allow people even outside to improve the pace of research, really, and speed up this whole area." Looking back, that I think has definitely worked out well. It has really, I think, sped up the whole space. Deep learning itself has shifted from just some people can do it. It's not like there weren't any tools before, but it was very much limited to researchers. I think now way more people do it because the tools are available. Eric Anderson: I remember when TensorFlow launched being impressed. Open source project launches are often kind of ugly, but TensorFlow seemed to have some real, maybe some marketing effort. It had some real sheen. Maybe if you could tell us about that. Was this kind of a complicated launch? Rajat Monga: Any launch, when you're launching from Google, I guess it can be complicated because it just can reach a lot more people, so you have to think through all the aspects of that. I think where this differentiated from a lot of open source projects that you see out there is, we waited a tiny bit where when we launched, TensorFlow was already usable. We were using it internally for a number of things. So somebody could actually download TensorFlow that day and start using it. Number two, I would say a couple of things around which are different from a lot of open source projects. I wouldn't say all, is because this was created as part of a company, and we're seeing more of those now. There was a lot of effort in terms of the core quality, the documentation and all of that, which you wouldn't see in a random open source project. And so just that quality difference, especially in this area, versus a lot of the others, did make a big difference. And then yes, we did talk about it. We wanted more people to know about it and see where we go. Eric Anderson: And almost from day one, TensorFlow has been on a rocket ship of interest, adoption, growth. Maybe you can tell us about, what was it like having to kind of absorb and respond to all that? Rajat Monga: Yeah. It was exciting, and of course comes with challenges as well. It's interesting, you talk about the crazy growth, or excitement around it, I guess, here on day one when we launched. And I still remember it was early morning, because we wanted to launch 9:00 AM Eastern time, so it was 6:00 AM here when we actually ... The news articles got published, and people started talking about it, and so on. Early in the morning, we saw a fair bit of interest. There was a lot of spikes on the websites and so on, and we took bets. We were all in a conference room, maybe 10-15 of us, and we were taking bets on, "Okay, how many people, how many downloads will we get the first day," or GitHub stars, or whatever metric you pick. And I was amazed at those numbers. They definitely were way higher than what I expected. I think they were higher than almost anybody's prediction, but yeah. I definitely wasn't expecting that kind of crazy clout on day one. Rajat Monga: Of course, that continued over time, and that was exciting. I think that gave both opportunities and challenges. The challenges were more in terms of, "Okay, now we have different kinds of users." It's not like, "Okay, just having more users is good, because we got lots and lots of feedback." Of course, then every user starts asking, "Could I have this feature? Could I have this feature?" Et cetera. And we were just starting to set up the process for, "How do we even take inputs from the users? How do we even take and code back in what they want to contribute and stuff?" And even that took us a while, because until that point, our code was actually internal, and then we made it available on Getit, which is a different open source tool. We eventually moved over to GitHub when we realized we want to really engage the community and make that easier. So there was that change, a whole bunch of processes around just, managing the open source, managing the community itself. Rajat Monga: The other was the kinds of users, I guess, which definitely stretched us, and in some ways continues to stretch TensorFlow. It's good that we went beyond just the research users, so this allowed hobbyists to start with, but over time, folks in trying to run this in production, and doing different kinds of things. What that meant, though, was each types of users had different requirements, and they were asking for different things, and that really made it hard at different times to make those choices. I mean, our team was X people, and how do you do all of these things? You have to pick and choose, and you have to prioritize. Rajat Monga: There were times during that journey where we felt, "Oh, everybody's just complaining. There are so many good things." But yeah. Of course, over time, that did lead to a lot more improvements. Like last year, we launched 2.0. That was because of everything we learned from these different things. So these are exciting things. You learn from so many different users, but it does become harder at times. Eric Anderson: That's a good point. I can imagine with such a successful launch in terms of awareness, that you would have maybe some signal to noise issues with the feedback. You might be more interested in the researchers or sophisticated users, but you're getting a lot of comments from hobbyists and kind of those in almost university courses, and you're wondering, "How do we make everybody happy and not disappoint everyone all at once?" Rajat Monga: That's right. And over time, that is the diversity of users, is what helped create the broad ecosystem around TensorFlow, though. Over time as a team, we also realized we can't do everything. We have to work with the community on various aspects of these. And there are a lot more projects around TensorFlow that aren't just run by Google. They're run by all kinds of different teams who've built on top of TensorFlow, who's done different kinds of things, and they are amazing value. And of course, within TensorFlow itself as well, we, in some cases, end up taking specific segments of users and saw, "Okay, can we do something better for them?" Rajat Monga: For example, a couple of years in, we saw that, yes, there is interest on the mobile side. In fact, within Google, and externally, there were a lot of people who wanted to run apps and models on mobile, and while all of TensorFlow could run there, and you could deploy it, and we had examples of that, it wasn't really optimized for that. We were trying to optimize a whole bunch of other things here. So that's how we ended up creating TensorFlow Lite. That was really focused on that. So we have been learning through the years and iterating on making different things better for the community. Eric Anderson: I feel like for a lot of people, at launch, TensorFlow was neural nets. Because of the mass awareness, it was kind of the way to do it. And then with time, there were other innovations in other communities, Keras and Pytorch. I imagine you felt some need to rethink how you did things at TensorFlow. Any thoughts on how you would kind of learn and adjust as other community and innovations emerged? Rajat Monga: Oh, absolutely. But the ones that you talk about are great examples. So Keras actually had been around, maybe had started just before TensorFlow was created. Eric Anderson: Right. Rajat Monga: And Fortiana, which was an older library. And when TensorFlow came out, and seeing its popularity, the creator of Keras, Francois, actually decided to map Keras to TensorFlow as well, so made it work with that. We knew there was a need in this area. We wanted a higher level API that made sense, and over the next couple of years, we made a couple of different attempts to create those as well. I didn't think we did great with the APIs that we were building, and at some point, given that Francois was at Google and he started collaborating with us to see how best we can do it, it just made sense to combine those efforts and really have one API that just made sense, and so Keras became the default API for TensorFlow as part of 2.0. Rajat Monga: That was one part. On the learnings from Pytorch, and that was interesting, where this whole idea of imperative stuff was something we had explored even before Pytorch had come, but there was so much already in the system that people weren't really interested in trying it out. I know that. Once Pytorch came out, there was more interest in saying, "Okay. You know what? Yeah. We see some advantages if you just had that in sort of a graph style that TensorFlow used to have." And so we ended up spending more time exploring, how could we keep the advantages of what we had while getting some of the advantages of competitive style in there as well, because these are two very different styles, but they both come with different advantages. And I think a lot of learnings through that are what led to that eventual convergence of both of those in a different way, with TensorFlow 2.0. Rajat Monga: So always things to learn from different APIs, different projects as well. Pytorch, for example, continued to focus on one set of users. Research, really. There are people who use it for other things, but I would say research is where its sweet spot is. For TensorFlow, We made a deliberate effort that we wanted to cater to the much broader community. Yes, TensorFlow was still interesting and exciting for research, because you can scale, you can do interesting things and 2.0 made that easier, but it was important for us to be able to go from there, from an early model, all the way to all kinds of deployments, right? And different pieces might solve for those, but this was a need we saw broadly in the community, and of course, within Google as well. Eric Anderson: I'm curious also, I can only imagine what it would have been like. We've talked about how there was a bunch of user adoption, and you had to kind of respond to that. But at the same time, the number of stakeholders for TensorFlow also feels like ... I remember at Google, there was the Google Cloud team, who wanted to do things around AI and ML. There was internal projects within Google that wanted to rely on the Brain team, and TensorFlow. And then you had the whole external community with its varied interests. I'm curious if you can remember any moments in which those things were in tension, or how you had to kind of juggle internal teams, various ones, as well as external. Rajat Monga: Oh, yeah. Many of these were often in tension, and not because something that we built A is bad for B or C. That's usually not the case, but often they're in tension more because we have limited resources and we can only do so much. We can't do every single thing that each of these communities is asking for, and so how do you manage that? And I think there were times that I would say we had more of them happier than in other times when we had less of them happier, because there are trade-offs you make. That's the thing that you get with users, I guess. One thing I used to say, it's so hard to have so many users, because you always have some users that are complaining. But of course not having users, you don't really want that. Rajat Monga: Among these, [inaudible 00:16:17] the boutique types of users, I would say, and in some ways too, one was between the research community and the more production heavy users. There were like different needs that each of them had. And so there were trade-offs that we were making. We saw those trade-offs sort of diverging through the version one over time, as we went through 1.X, different incremental versions, where we added features for each of them, but they almost were separating out, because the researchers wanted a more competitive style, whereas production folks wanted what became TensorFlow Extended, and a whole bunch of other things around running things in production, and so on. Rajat Monga: I think with 2.0, a lot of those converged, that was a massive effort over the last ... Launched last year in 2019, but was a massive effort the year before. In terms of cloud, versus within Google, versus other open source users, et cetera, I would say they were less out of sync, because a lot of that was about, "Okay, how do we make it better for production?" Yes, there are always some features that, within Google, are tied to Google-specific software that is used internally versus externally, people would use a different one, maybe Kubernetes outside of Google versus what Google internally uses called Borg. But very broadly, there weren't as many differences as you might think. So there's a good mix, I guess. That's the fun and challenge of working on a big project and driving that. Eric Anderson: Yeah. At the beginning of the show, you talked to us about a few people at the beginning of the Google Brain team, and I imagine the people working at TensorFlow swelled pretty quickly, meaning employees of Google. I'm trying to wonder if there's anyone who showed up and made an out-sized impact that really kind of shaped the future of TensorFlow, that would be worth kind of mentioning. Any interesting stories there? Rajat Monga: I would say with TensorFlow, it was more of a team project than one individual driving and building everything. And yes, there are individuals, and you can probably look at GitHub to see which ones have lots and lots of contributions, or others are less, to get some sense of the people who contributed more, perhaps. The folks who were early on obviously had a bigger impact on the early designs, and a lot of those design decisions still live at TensorFlow. They're not gone. They would be exciting ones. The first set of designs, and if you look at the paper as well, we have about 20 people or so on the paper, I would say maybe about 10 of those were on the core team, then 12, and then some who had been helping us out because they were early users, they were helping us from their perspective, what made sense, et cetera. But I do think it was a great team effort and continued to be. Of course, yes, there are a number of people who have done lots of good things. But if you look at specific areas more intensively, you might see more specific people, I guess. Eric Anderson: Sure. I'm trying to imagine how I might respond if I was sitting on this very exciting open source project, as you were. There's a pattern in the industry of folks developing open source projects at companies and then leaving to start their own company around the Kafka team over at LinkedIn, or the Kubernetes team at Google. And I can't help but wonder if you ever wondered in those early days, Rajat, if, "Should I be leaving Google and starting a TensorFlow company?" Any thoughts around that? Rajat Monga: Yeah. A lot of people did ask me that. Eric Anderson: I bet. Rajat Monga: And honestly, I think it was a lot of fun doing this at Google. I did eventually leave. I was there for over 10 years, but there are differences. Yes, there are things that you can do outside Google that you would not necessarily do the same way within Google, but there are also advantages to doing something like this. I think the kind of things that we were able to achieve, the breadth of that ecosystem, some of that definitely goes back to because Google itself is stretching that in so many different ways. Rajat Monga: When I was at Google, there was, for example, this project on getting TensorFlow Lite to run on microcontrollers. And the reason it was relevant was, yeah, there is some part of Google that cares about those. Same for mobile. Same for scaling TensorFlow to really, really large scale computers. Because yes, Google has [inaudible 00:20:45], which are effectively large scale supercomputers for this. So there are advantages to being in a place like that, and sure, yes, you can go outside and build a commercial entity, and there's value in that too. There are lots of users out there who could use the help in, making it better, or supporting them in the right kind of phase. But for me personally, I enjoyed doing it while at Google. And even though I left a few months ago, I ended up looking at different areas rather than just working on TensorFlow. Eric Anderson: Yeah. Certainly. You're right. I forgot about TPUs, but there was a lot of special things happening, a lot of resources at work that probably would be hard to find elsewhere. And then Rajat, I wanted to explore a couple of other parts of the project. You mentioned how Google had previously just published research papers, and this was kind of one of the early forays into Google doing an open source project. How did you kind of make sense of governance, and other open source aspects of running a community that were kind of new to Google? Rajat Monga: So traditionally I would say, I wouldn't say they are completely new, but yes, a lot of projects before this were much more ... For example, Android. Even that's open source. Eric Anderson: True. Yeah. Rajat Monga: And a number of others. But they were very strongly managed by Google, as in 90, a huge percentage of the development was done by Google. Yes, other companies were involved, and it was open and all of that, in lots of ways. But not so much where Google was primarily driving that direction. With TensorFlow, We were iterating and figuring this out. Actually, a project that was launched slightly before this was Kubernetes, which actually went one way in terms of governance, where they handed it off to a different governing body. And that worked fine for Kubernetes because that allowed it to scale in different ways, et cetera. Rajat Monga: For TensorFlow, we saw that this is still a very changing field. In fact, it still continues to be, I would say. And even though we had conversations with folks about what would it mean to be its own separate governing body and so on, we felt that if we move too soon towards that, that would basically make it hard to evolve. And as we saw in that iteration from 1.X to 2.0, that was important, and that would have been really hard. So that's sort of what kept us there, but that said, it was important for this project for us to involve the community and be very, very open about the decision-making and stuff. Rajat Monga: And so the last couple of years, we've spent a lot of time building out the support for that. So this was, A, publishing roadmaps of where we are going, talking about all the different things that are happening. Building groups and special interest groups around different areas, and involving them, and having those groups drive different parts of it. And we did that on a number of sub-projects within TensorFlow that are actually completely driven by those special interest groups. Some still have a lot of Google members as well. Others are 90% folks outside Google. And whatever makes sense for each of those projects, right? Rajat Monga: So over time, the push has been towards, it needs to be open. It needs to be transparent about where it's going, because that's what builds the trust with the community, and allows the community to really give back and build on that, but still allowing for the right kind of pace and change where it's needed. And so it's been that, a bunch of trade-offs in that sort of line that we've been walking that fine balance, and maintaining that in some sense. Eric Anderson: Yeah. Makes a lot of sense. And I appreciate hearing that from you, because I don't think there's a lot of acknowledgement. I think there's a lot of proponents about ... Maybe not the right word, but kind of open governance, and I think there's also a lot of value about having a core team with aligned interests that can move quickly and make coherent decisions towards a coherent end. So that's good to hear. Eric Anderson: Tell us where the project's at today, and kind of where it's headed going forward, as we near the end of our conversation. I think there was a time when there was a lot of expectations on TensorFlow. "Are they going to support this and adjust that way, and fulfill this need?" And largely I get the sense that you've kind of arrived and satisfied a lot of those big ideas. You talked about the special interest groups. Any thoughts on how the project evolves from here? Rajat Monga: There are always things happening and things changing over time. But if I look at this area, there are different kinds of things happening. One, the research continues to move forward. So the research is not standing still. And so there are new ideas around deep learning, around reinforcement learning, et cetera, that are being tried out by researchers, some using TensorFlow itself. And then now that research needs to also move and be available to folks who want to use it in different kinds of applications, different kind of production things, et cetera. And then of course complete newbies, more and more people who want to learn machine learning because they're exciting. It's going to be part of everything that you see in the future. Rajat Monga: So each of these areas are continuing to move ahead, and there are changes happening in TensorFlow today to improve each of those. The special interest groups, I think, are interesting, because there's a lot of, if you think about different areas where people want to use machine learning. Let's say is different kinds of algorithms. So it's not just about deep learning anymore. After the first couple of years, in fact, we added support for all kinds of things. And so if you want more probabilistic methods, there's support for that, and there's a group that supports, that really pushes that forward. If you want to do something around more traditional methods, say, random forests, decision trees, et cetera, there's I believe a group for that as well, and other things like that. Then there are more, in some sense you might say, vertical, say, people who are interested in biomedicine or some area like that, maybe genomics or something like that, and want to apply there. There might be groups around that. Rajat Monga: So you've been seeing different groups evolving over time and building up on those, and specializing in those areas, but also at the core coming back, 2.0, I would say, was a lot about changing how the user interacts with TensorFlow, and really all the work that had been happening over the last couple of years, bringing it together and converging that route. I think the next step for TensorFlow, broadly at the core, is revamping the underlying infrastructure, because that can be improved significantly, and I know there are a number of efforts going on around that as well. Rajat Monga: So we see push and improvements across all of those. Will there be a significant change, like a 3.0 coming up? Probably not right away, but eventually I think it should, and there's still a long way to go with machine learning, deep learning, and different things that you want to think of. And I sure hope that the folks at TensorFlow and the project itself continues to evolve as we see new things in there. Eric Anderson: Great. As kind of a final thought, some of our listeners may want to get involved, or I imagine becoming a user of TensorFlow is super easy. There's a lot of kind of documentation around that. Being that it's such a big and mature project, becoming a contributor to TensorFlow is probably a bit more nuanced. What's the best way, do you think, to do that? Rajat Monga: Oh, there are tons of opportunities. It's possible to start with ... One thing might be you start with one of the special interest groups or specific smaller projects that maybe you know about, or you've been working or leveraging, and maybe start there. But even for the core and all kinds of projects, there are often issues that would be tagged as, "Contributions welcome." You can start by just providing or improving documentation itself. You can start with very small changes, and there's a lot of documentation on how to get involved, and just getting in there, but people love to get those changes in, and it's important for the community to really give back and build on top. There's no way this project scales by just the folks at Google working on this. It has to be the entire community that pushes it forward. Eric Anderson: Thank you so much, Rajat, for joining us today. I've had a good time and I look forward to where you go next and where TensorFlow goes. Rajat Monga: Likewise. It was a pleasure. Thanks for having me here. Eric Anderson: Take care. Eric Anderson: You can find today's show notes and past episodes at contributor.fyi. Until next time, I'm Eric Anderson, and this has been Contributor.