Manish Jain:
There is no book out there about, hey, how do you do open source marketing? Right? I feel like that's a title that should exist out there because it's very different from typical marketing.

Eric Anderson:
This is Contributor, a podcast telling the stories behind the best open-source projects and the communities that make them. I'm Eric Anderson.

Eric Anderson:
I'm joined today with Manish, who is the creator of Dgraph. Dgraph is something I've been interested in a while. Manish, maybe before we get into the history, you could tell us what Dgraph is for the uninitiated.

Manish Jain:
Yeah. So Dgraph is meant to be a general purpose database with a graph backend. Very early on we picked up GraphQL as the core language. And so with daft QL becoming so popular, Dgraph has almost created a new category called the GraphQL business.

Eric Anderson:
That's succinctly put. Let's then jump into how this came to be. If I remember right, you were at Google for some time before Dgraph. What made you want to tackle this project?

Manish Jain:
Yeah, so I think I could probably give you a bit of my own story because my story intertwines with Dgraph's. I grew up in India, went to college in Singapore and joined Google right out of college initially as an intern. I joined Mountain View, right after graduating, I joined full-time.

Manish Jain:
And for the initial couple of years, I was in Google Zurich, even though I supposed to join Google US just because of visa issues as usual. But I started working with the web search infrastructure, right from Zurich. And then two years later, I was able to transfer out into Google US with the visa issues resolved. So, the story is, I joined Google in 2007. 2009, moved out to the US back, and for the initial three years at Google, 2007 to 2010, I was working on web search infrastructure and as part of the web search indexing team, and we were dealing with one of the biggest database installations that Google had. And this system actually contained all the web pages Google had ever seen before, and all the [inaudible 00:02:16] signals and so on and so forth. So I got a whole bunch of experience building real time distributed systems from that.

Manish Jain:
And then Google acquired this company called MetaLab around 2010. I started working with those guys to see how we could use Knowledge Graph. They brought Knowledge Graph to Google, how we could use Knowledge Graph in web search. And as part of that, I led a couple of projects and one was to build a knowledge engine, something which could understand English [inaudible 00:02:41] and parse them, and try to find relationships between the words in the query and understand what they meant. So for example, do stuff like movies starring Tom Hanks directed by Steven Spielberg, or they could do children of Barack and Michelle Obama, stuff like that.

Manish Jain:
The second project I did was to build an equivalent of web search indexing and serving system, but for Knowledge. And so that was Google had a bunch of these [Vonn boxes 00:03:06], which had structured data with them. For example, if you search for weather in New York or movies in Mountain View, you get the weather data, the movies data, all of this data sets are being acquired by Google and they were running in different backends. And so the idea was that, hey, we have this Knowledge Graph, which has whole bunch of data, and we have all this proprietary data feeds that Google was purchasing and we can put them into a single graph indexing and serving system. So I don't think we'll end the web search.

Manish Jain:
So I was one of the tech leads for that project and learned a bunch about how you would build a distributed graph serving system, which would perform really well, scale really well, sort of low latency, high throughput system, because a good chunk of the web search queries are going to hit that. So overall I was at Google for six and a half years combined with the internship time, and learned a bunch about building a graph system. And I ended up leaving Google around 2013, moving to Australia, just personal reasons.

Manish Jain:
And around 2015 is when I was looking at the graph space to see ... I was working on this project, sort of like consulting project where I thought the graph system could be really useful. I looked around to see what's out there, and I realized that even there are some solutions there, they are just not as well accepted solutions or they are not just as great as they could be. And felt like I had done this before at Google and surely we could build a better system. Something which could scale really well, which would perform really well. And more so than that, I also felt that graph systems were being used just for as side databases, not as primary databases. Typically, people would use CQL or MongoDB or something for primary database and use graph on the side. And I felt like that's just not a great use of graphs. Like surely you can use it for analysis, but most of them there, they can also be powering your applications. They can be your source of truths. And that was the inspiration to build Dgraph.

Eric Anderson:
There was a lot there to discuss. So, I mean, you've been doing this in kind of a broad sense for a decade or more. It sounds like. This distributed graph serving knowledge graph related work, it sounds like.

Manish Jain:
That's right. I think started in 2010 and it's still going on. And I guess I've learned a few things.

Eric Anderson:
Yeah. Well, and this has been something the world or the internet's wanted for awhile, the meta web and semantic search and knowledge graph. Google has probably done the best job of it. Google search. You've also not only been doing it for 10 years, you've been doing it at kind of the cutting edge to a degree.

Manish Jain:
That's right. And I think sometimes like, I found myself sort of ahead of everybody else, as well. And ahead and alone. And this actually happened at Google, too. I remember like I built this a proof of concept about the knowledge engine and showed it to some of the search leads and said, "Hey, look, if we can find ... if you search for books by French authors, they actually literally can show you all the books. We know exactly what the query is. We know so much about the results and we can slice and dice them in interesting ways." And search lead was like, "Hey, why don't you just do keyword search at Google?"

Manish Jain:
And it shows you some weblinks, right? Because Google as a company has built an amazing fortune serving webpages and early days of knowledge, even the leadership was not convinced that they need to do anything different, that they actually needed to give answers to people and take traffic away from the web pages. Now, Google is all about it, right? Like Google tries to respond to as many queries as they can. And that's the whole knowledge stuff. Knowledge is like table stakes at this point. Right? But back in the days, that's not the case. And so we had to do a bunch of convincing to try to put the point across that it's better user experience to be able to respond to queries than to just send them to weblinks.

Eric Anderson:
Definitely. Also fun to hear about you've gotten around, from Zurich to Australia. Spent some time in India and now the states.

Manish Jain:
Yeah. I've lived in five countries and for what it's worth, I really enjoy San Francisco. So I think I plan to now not move anymore, but yeah. I lived in five countries. As mentioned, grew up in India, went to college in Singapore, then Switzerland, US, Australia, back to US. So I've seen a bunch of the world, seen a bunch of different cultures and also kind of interesting to see how different countries see themselves. So learnt a bunch from that.

Eric Anderson:
And take me back to, you're in Australia and you're deciding to work on this. Were you sure you wanted to build a startup? Were you sure you wanted to build an open source project? Where you just kind of interested in building a database?

Manish Jain:
Yeah. I remember back in the days in 2006 when I was doing my internship at Google and Bigtable was the new thing there. And I was looking at it as like, it would be great to work at Bigtable. To work on Bigtable. And Bigtable was Google, so the distributed Keywell database sort of which proceeds MongoDB and Cassandra and stuff. Yeah.

Manish Jain:
So I was always sort of interested in systems in general, but the way it happened in Australia was initially I was just looking for a job in Australia and I looked around and I just could not find anything which was really interesting to me to the kind of work that I was doing back at Google. And so at that point I was like, you know what? Let's maybe do my own thing. And I did another startup before we got into a local incubator there, but unfortunately did not really go anywhere. And I think ultimately was doing some consulting gigs to pay the bills. Actually the inspiration came from CockroachDB guys. I saw them, they had raised around in 2015 from a benchmark think, and they were building a spanner equivalent in goal language and I quite liked goal language. And that's when I was like, hey, you know what? I'm probably one of the handful of people who have experienced building a graph system at scale. Perhaps I could use my skills to build that.

Eric Anderson:
I love this idea that in some ways there were constraints that put you down this path. If there had been a great job for you, you'd probably be building distributed stuff at Twitter out of India or something. But instead we have the benefit of you having built Dgraph for us. So I'm glad the cards fell where they did.

Manish Jain:
That is right. I think if I had a company that really could have given me something [inaudible 00:09:37] what I was doing back at Google, I would have most likely joined, particularly after the first failed startup. I just wanted to work on really interesting, challenging problems and work in open source. And so anything which would have come would have made it easier for me to not pursue Dgraph, but to just go join them.

Eric Anderson:
Right. Let's start talking about Dgraph. It's got a lot of interesting sub components. I know at sometime along the way you develop Badger, which I think is also interesting of its own. Where's the right place to talk about how Dgraph got started?

Manish Jain:
Yeah, maybe get started with Dgraph sold story and I think as I mentioned before, so I felt like, yeah, we could actually have a better database, but does it make a better company? That was actually not clear because in Australia I think we didn't have a lot of precedence of open source companies doing really well, particularly very highly technical companies in the same sort of database or systems infrastructure space. And it was a unique decision because I remember saying to some friends that I'd made there and telling them that, "Hey, I want to build this Dgraph opensource database." And they were just like, "How will you ever make money?" And I was like, "I'm not sure. I just had to figure it out later." [inaudible 00:10:52] open source has a very good way of making money and it's pretty established in the Valley, but that knowledge was not there back in Australia.

Manish Jain:
So it was a bit of a risk there. But the other thing which happened at the same time was Facebook had come up with GraphQL, it was back in, I think, June of 2015. And I looked at it, I quite liked it. And I felt like compared to other graph systems, I didn't as much enjoy the query language. Query languages out there like Cipher or Gremlin. And I felt like GraphQL is sort of the modern language for today's developers. They don't need the data in sub graphs, in [Jaison 00:11:28] it just so much easier for somebody to consume it. And sort of took the baton on Graft QL and GraphQL has done so well over the last four or five years. We obviously had to fork it slightly to make it work for our graph database. But even the fork, we have got a lot of appreciation for just how easy is it to understand and how simple is it to work with and yet being able to run really complex queries on it.

Eric Anderson:
And how about your first users of Dgraph? I think it's always interesting how people are able to get people to start using projects as they're building them. Do you just kind of put it out there and people come find you? Or did you have some ideas of folks you could take this to and see if they'd want to kick the tires?

Manish Jain:
That was an interesting one, right? Because there is no book out there about, hey, how do you do open source marketing? I feel like that's a title that should exist out there because it's very different from typical marketing. And so we actually kind of had to just learn on our own. The data is extremely technical a project. With databases it's generally understood the database startups have a long incubation period. So I would say like, it really took us like four years to actually build something which is stable and which works really well across the edge cases. We weren't actually the first ones to have done Jepsen testing for a graph database. But I think the early days I remember like we created a community on Discourse and at the same time we created a community on Slack and we would just watch both of them and anybody who comes up, we would talk to them.

Manish Jain:
We would ... any bugs that they would talk about, any features that they might need, we would just go build them. And I think one of the most common feedback that we used to get was, "Wow, you guys are really fast. I talked about it yesterday and today you already have it." So that's how we aimed at building the community. That was the one big aspect. And the second big aspect was to write some interesting blog posts which would get picked up. Like we were on the cutting edge of goal language as well at the same time. And so we were learning things that other goal developers might not know. And so we aimed at writing interesting blog posts which got picked up. And the idea for all of opensource marketing is not to sell to developers, but to give them interesting information, interesting knowledge that might be helpful to them and hope that they find your product exciting and started using it.

Eric Anderson:
Yes. Generally the take with community marketing, open source or otherwise, you give a little value to the community and it builds goodwill and drives attention and people discover you for the points that are valuable to them.

Manish Jain:
Yeah. And you kind of become like a bit of a thought leader as well. Like people that appreciated what we wrote about and they felt like, yeah, Dgraph is like ... You kind of build this engineering respect from the developers and that actually translates also in them believing that yes, you should be able to build a good product because in general, you guys are doing pretty solid engineering.

Eric Anderson:
Maybe something that it doesn't go widely understood or appreciated is that particularly for databases, but any kind of big infrastructure dependency or tool, it's often hard to do a proper assessment on it. And so decisions to adopt something are largely built on trust. I trust that this team is doing it the right way. You won't know until you've reached kind of some volume of usage, whether this is the most efficient or the most resilient piece of infrastructure. And so I think largely people make decisions based on trust of the team and kind of some external signals, what they're going to get at the end is valuable.

Manish Jain:
Absolutely. I think you'll see it in so many places, right? Like for example, even the GraphQL, right? A lot of the GraphQL adoption is being driven by the fact that, "Hey, Facebook is using it in production and it's working for them. So it should work for us." Or, "Hey, Google is using Kubernetes and these are some of the smartest engineers out there. And therefore it should be a good software." But some of the uncertainty in our new technologies, it gets mitigated by the trust that you can have in the company or the people behind that software.

Eric Anderson:
Great. So you're building these ... you have kind of a nascent community at Dgraph. And help me understand, up until this point, you've largely built this on your own. Who are the people that chip in and help along the way?

Manish Jain:
One thing that I think is a common misnomer is that people feel that, like you do open source to get external contributions. Of course you get external contributions always in open source. But if you look at most open source softwares, like I would say 90% comes from a select few group, which is the core group. And then the rest 10% is from the rest of the wider community. And that's true for many open source softwares and that's true for Dgraph as well.

Manish Jain:
So in our case, because we were also funded pretty early on by VCs, a lot of the folks who are the core contributors of Dgraph were employed by Dgraph labs. We do get tons of contributions from the community as well, but they are more in the tail of the contributions. So in terms of the major contributions from myself, from the people at Dgraph, and some folks who are no longer at Dgraph, but particularly at Badger, I think we got some really amazing contributions early on, and they're no longer at Dgraph, but they made some solid contributions there.

Eric Anderson:
So your story is kind of unique in terms of navigating the GraphQL community. As you're building this product, you're also, you had to kind of extend GraphQL to meet your needs initially. And then with time you found ways to kind of more natively support the protocol. I'd be curious to understand how that works. Is there a GraphQL board that you go to, to discuss your ideas? Or do you just kind of shout them out on Twitter and see what ... how do you navigate collaborating with another open community, especially around a kind of protocol or a API like GraphQL?

Manish Jain:
These are like, dropped some balls there. Like we did not ... when we realized that GraphQL seemed to not fit the bill for a graph database, particularly the kind of things that we wanted to do was, with fit functions, style coding, and variables and so on and so forth.

Manish Jain:
And I think we probably tried to initiate initially some conversations with the group, but we were in Australia and it was early days of startup and we were just going full steam, making sure that whatever we use, we are tackling that at the same time, we also looking at other graph systems and looking at what kind of functionality do they provide and what we should be building.

Manish Jain:
We, I think, decided to fork it and did not really engage in too much discussion with the founding committee for GraphQL. And I think we definitely should have done that probably. At the same time, other needs around the fork of GraphQL and the GraphQL spec needs are also quite different.

Manish Jain:
Stuff like variables that we support and aggregations and all that stuff. Not everybody who wants to adopt GraphQL needs that. And therefore, like all of these advanced features, they might never make it to the official spec. But I think what we have done now is to support the official spec as well. So it works very well with the GraphQL system, at the same time, still maintaining our fork and still sort of improving it.

Eric Anderson:
Great. Well maybe we can shift gears and talk about more near term things. You've had an exciting year at Dgraph. I'd love to hear about any milestones you wanted to speak to and where the project is headed from here.

Manish Jain:
Yeah. The Dgraph this year has been extremely exciting. And I think we launched the official GraphQL spec integration in quarter one. Then quarter two, visuals. Private beta for a Slash GraphQL, which is the GraphQL backend of service. And I think in quarter three, which is right now, we are doing a GraphQL conference next week. And in fact, we actually are going to get Scott Kelly, the NASA astronaut who has lived in space longest, come and have a chat with us in the keynote. So I'm really excited about that.

Eric Anderson:
Scott Kelly, that's exciting. This sounds like quite a thing you're planning. I imagine it's a different muscle for the team. You've been building distributed systems, is a little different than organizing events and conferences.

Manish Jain:
Yeah. We are doing a lot of new things this year, I think, because you have such a complex distribution system. I actually personally think it's more complicated than Spanner, right. But I'm more like ... But anyways, we have been very much focused on the backend and this year we started to focus on the front end community as well with the GraphQL and build some tutorials that are react and all that stuff. And now we are building a muscle of our marketing and putting together a really great conference for GraphQL. And we actually called it GraphQL In Space. And the reasoning for that went like, "Hey, if we are attached to a city, we have to have the conference in that city for the rest of the conferences life. But what if we have conference space we can hold anywhere?"

Manish Jain:
And we were like, "Hmm." So that's how GraphQL In Space came about to be. And my marketing team did some amazing job and I don't know how, but they got Scott Kelly to confirm that he is going to be coming to the conference and we'll talk a bunch about space.

Eric Anderson:
Sounds out of this world.

Manish Jain:
Absolutely. At the same time, we're going to be doing a launch for a Slash GraphQL and make sure that ... it's kind of interesting for us because we have been keeping track of GraphQL for a while. And we are the only database out there which natively supports GraphQL. And yet we have maintained a small distance because of our fork around it. And now we are really excited to go back in the GraphQL community and engage with them again.

Eric Anderson:
One of the things I think I'm learning on this discussion that maybe we haven't teased out before is that there's communities of people, and we sometimes associate communities with projects, but they're also just kind of ... you mentioned Go developers earlier and now GraphQL kind of enthusiasts. And I imagine as you try and describe your project, it can help people pattern match and make sense of you if you're able to help them see your project in light of their community. You can create content for Go developers around interesting things you're doing in Go, and then later you can describe how you're doing the grassroots spec and bring in that community of users.

Manish Jain:
Absolutely. Sometimes I have a hard time explaining this to outsiders, right? Because we are like, "Hey, we really like the Go community." And we got a lot of ... actually we get a lot of leads from the Go community, as well, because people like Go, they didn't know our Badger or they know about Restrata over cash system, or just generally because we are populating the goal world. And then they come to know about Dgraph and they're like, "You know what? I could use this project. I could use this in my project."

Manish Jain:
But at the same time, they don't have to be excited because at the end of the day, it's a database and they can talk to it in any language they want to, they don't have to be goal developers to use it. But still I think just because there's a sense of community around the fact that whoever is using goal is part of this community. It's a bit of an intangible feature that plays very well for projected adoption.

Eric Anderson:
I have to ask because sometimes we get some great stories, kind of going back a bit as you ... people showed up to use the project, and as you found new use cases and users, are there any surprises? Every now and then you find somebody you don't expect doing something you wouldn't expect with your project. Or maybe just kind of a favorite use case along the way?

Manish Jain:
Absolutely. You know early on, I remember this is 2017. We were talking to this relatively big company. They wanted to use Dgraph instead of using elastic search. And I was like, "Wow, that's an interesting use case." At one level as an engineer, we designed Dgraph to be quite like a search engine because that was my experience. And in fairness, Dgraph is almost like, I would say, a search engine acting like a database.

Manish Jain:
But we did not realize how obvious it was also to the users, because some of the functionality that we have is what people expect from elastic search. And so they were trying to use Dgraph instead of elastic search. And I was like, "Wow, that's a very interesting use case."

Manish Jain:
And I think over the years, like many people have asked us, "Hey, can I use Dgraph as a time series database? And we're like, "Yeah, we do have pretty good daytime support from very early on. And we do some amazing, interesting daytime indexing." But there are the two use cases that I felt like they were sort of not how we had planned or not how we had designed for, but happened to be because of the design that we put in place.

Eric Anderson:
So you're saying you didn't design for them, but you designed for other things and it just kind of still worked.

Manish Jain:
Yeah, it just fit.

Eric Anderson:
Awesome. Manish, I loved your comment at the beginning that you've ... whether by design or otherwise, ended up living in the future a bit, maybe ahead of the world. And you've been living in this database land and GraphQL for awhile. Give us your perspective on where this ends up? What will GraphQL and databases look like in the coming years?

Manish Jain:
For example, a GraphQL and a graph database. Like we create this unique intersection that is not so obvious to people outside of Dgraph, perhaps, right? Because a lot of people use GraphQL with anything really. Microservices with CQL with MongoDB, perhaps with just APIs, because GraphQL can be a gateway to almost anything. At the same time, we are seeing more and more storage systems that are on GraphQL. And I think what people are not realizing right now is that the kind of problems that they're seeing with GraphQL are the same graph database or graph system problems that people in the graph area have known for a while. For example, the end plus one problem where you make too many iterations, if the results increases. Or some of the caching problems or some of the latency problems.

Manish Jain:
And now we're hearing advice that, "Hey, if you're using GraphQL, don't go more than three levels deep in terms of your queries. Like these are exactly the same problems that graph systems have had and have been trying to solve for, for a while. So like Dgraph already solved the end plus one problem and the query depth problem and so and so forth. But these are becoming more obvious now to the GraphQL community. And some of them actually are actually blaming GraphQL for it when GraphQL is really just a spec, it's not a system. So it's just really about how you build that system to make sure that it works for those and tackles those problems. And so I feel like the intersection of graph database and GraphQL will become a lot more obvious in the future than it is today.

Eric Anderson:
What I think is interesting about your work is that there's been this craze for the last five years around multi-model databases, where you can have a single data storage with lots of different APIs to access it. And what you helped me see is that we can use GraphQL to do graph queries. Other people have shown GraphQL in front of relational databases, and we can use GraphQL in front of time series, as you mentioned. Increasingly, I'm not sure we needed one data representation with lots of APIs. We just need GraphQL and we can solve all our problems with that API.

Manish Jain:
That could be true, right? At least GraphQL makes it easier for anybody to adopt the system without having to worry about how is the data being stored in the back? Almost like the Firebase style where you are getting a very nice API and Google takes care of the storage for you. So GraphQL can be used in that way, but I think some of the problems with GraphQL arise from it's a dual edged sword, because GraphQL is simple and that's why it's great. But GraphQL is simple and that's why it's not so great. So you can't express a lot of things in GraphQL as well because of their simplicity. And therefore you cannot make harder queries that you might want to make from, let's say a graph database or a time series database, or any of those specialized systems.

Manish Jain:
But at the same time, because it's easy, you can easily get started and you can go quite a distance. So I think what I see happening is perhaps people write extensions to GraphQL to do things like let's say aggregations or do things like [inaudible 00:27:55] assignment, calculation and so and so forth. Or perhaps just having better ways to do custom logic at the server side. So GraphQL, you can just create graphical functions which do the complex [inaudible 00:28:10] that you want to do. And the client still calls that function.

Eric Anderson:
I'm excited about the conference and the upcoming launch. Maybe just to wrap things up, you can tell us what you look forward to going forward and ways anyone in the community, our listeners, could get involved if they're interested or what they should keep an eye out for.

Manish Jain:
I really look forward to engage with the GraphQL community. I think it's very ... I think one thing I've noticed different from, I would say, the backend community and the GraphQL community is that GraphQL community is a lot more visible. They write more tutorials. They write more ... create more videos. They're just a lot more engaged and create more content. And so I really look forward to seeing more people use something like Slash GraphQL and writing about it. And also I think, I would say Slash GraphQL is actually almost addictive because once you start to use it, you actually don't want to go back to how you were doing things before. And so I actually really am eager to see ... we're actually also running a hackathon soon. And so I'm eager to see what's the kind of feedback we get from the hackathon and how we could improve on Slash GraphQL.

Eric Anderson:
Fantastic. Manish, any parting words? I thank you so much for coming on the show. It's been great to have you.

Manish Jain:
One last thing I would say is the conference is happening Thursday. So please do join and you'll get to hear some out of the world talk. Thanks for having me.

Eric Anderson:
You can find today's show notes and past episodes at contributor.fyi. Until next time, I'm Eric Anderson and this has been Contributor.