Glauber Costa: The main thing is that we wanted to have the freedom to take the project in directions that we wanted. We want to be a part of this community, and we want to have a voice, and discuss whether or not this particular thing makes sense for the project, have our needs met. Eric Anderson: This is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. I'm joined today by Glauber Costa, who is one of the creators of libSQL. LibSQL is an open source, open contribution fork of SQLite, and we're going to get into what that open contribution element means. But because they're making a fork, Glauber, I understand you're also going to evolve libSQL to do maybe more things than SQLite did for us? Glauber Costa: First of all, thank you so much for having me. We do already, and I love the fork story, because usually in the open source community, there is this mantra that if you don't like something, if you don't like the way the project is headed, just fork it. But then when you fork it, people get angry and say, "Why did you fork it and why didn't you try to jerk the community?" We got a lot of heat on Hacker News when we published the fork, which is expected from Hacker News. But one thing that I always try to make it very clear is that we have a lot of respect for SQLite, for what they're building, for what they have been building and for the way they conduct the project. So the fork is not essentially us saying that they're bad or they're horrible or anything like that. On the contrary, it's actually, I think, a very healthy sign of an open source motion that hey, you have your stuff where you're doing something that we believe could be different and there is no way to get those differences combined in the same project so we'll do it in a different way. There's not a lot of technical things, to be honest. There are some, but that was not the main goal for the fork to do things differently than SQLite on a technical level. The main thing is that we wanted to have the freedom to take the project in directions that we wanted. Not in my way or the highway kind of thing, but just like we want to be a part of this community and we want to have a voice and discuss whether or not this particular thing makes sense for the project, have our needs met. And unbeknownst to many people, SQLite is technically not open source at all. SQLite is a public domain software. It's more flexible even than open source, but it doesn't have a license. So SQLite doesn't have a license. It's essentially a public domain thing. In part, you keep the public domain, they cannot take any third party software contribution. So it cannot just say, "Hey, I copied this code from somewhere else and here's part of SQLite," or, "I'm using this library," or whatnot to evolve it because then the end result would not be public domain. You have to essentially own the copyright and donate to the public domain. And I guess part for this reason and part for other reasons, I'm not in their mind, but to keep the project going in the direction that they want, they very famously state in their website that they don't take external contributions. I thinks SQLite has three contributors. Some people say two, I don't know. I haven't checked, but two or three people contribute the majority of the code, which again is quite impressive given what they do. But they don't take contributions. I've seen the author of SQLite claiming the other day, "Yes, of course we do. There is this one contribution one time that we took and you have to fill this paperwork and there is this kind of stuff that we do. So it's possible that we may take a contribution," but I do like the way they phrase it on the website, it's not open contribution. Even if they would accept a one-off contribution from someone else in a particular situation, we have a commercial product that we wanted to build on top of that. So we wanted to feel like we were not solely at the helm, but at least holding the helm with other entities, have a say in the direction of the project and a fair discussion of whether or not a feature makes sense and being able to add features frankly to our commercial product, but also of general interest. And we added already some stuff to our fork that we don't have it in a commercial project just because we thought would be an interesting thing for SQLite to have Eric Anderson: Is SQLite alone in being not open contribution? Is there any other precedent for major projects like that? Glauber Costa: Not that I'm aware of. Just to be clear, I mean, I come from a very different background than that. I was just having a conversation with a friend of mine today in the morning about how actually spoiled that made me in a sense because I started my career and my first 10 years of software programming was Linux. So I was contributing to the Linux kernel. And I developed a bunch of bad habits out of that as a junior programmer at the time. I don't carry that today with me. But things like, for example, I don't have to test my stuff that well because once it gets merged, if it passes the scrutiny of the reviewers, it's already a high bar. And then if it has some bug in two hours, somebody will find the bug and report the bug and we'll fix it. Obviously, most open source projects don't work that way because you don't have 40 people reviewing your contribution and other tens of thousands of people testing, even accidentally because they're running Linux on their machines all the time. But Linux is obviously the very opposite of that, is a project that is extremely open contribution. It's a project that everything is driven by consensus, that you have a extremely different interests running the project. You have people trying to make this work well on supercomputers, and you have people trying to make the same code, the same software work well on embedded devices, and you have to find a way to coexist. So I don't recall, obviously there might be some, but I don't know of any other successful project that has the reach the SQLite has, which is comparable to Linux in my view, that doesn't accept or welcome external contributions. Eric Anderson: Yeah. Well, and the project predates, it started in 2000, kind of the modern open source movement. There's kind of behaviors, standards, a way of working that's kind of been invented over the last... Glauber Costa: That's where I come from. What is weird to me is the new open source, which is, by the way, one of the things we did not want to do with libSQL, but the modern open source in which there is a single company driving the project, and then this is essentially the open source experiment, the playground for something that is going to become the project, the commercial project. This is the weird thing to me, coming from the 2000s and Linux, open source used to be just this thing where people gather around and work toward a common goal. So I think that there's nothing wrong with that, we are not here to say, "Hey, SQLite is wrong." I think they do have the right to do this. That's the beautiful thing about open source or public domain, or it's open source in spirit, I mean, SQLite is obviously open source in spirit. But there's nothing wrong with that, I just don't know of any other project that does it. And look, for us, it was not the right way to do our stuff. Eric Anderson: Great. So let's get into your story further, Glauber, we heard about the Linux period, but eventually what led you to want to pursue libSQL? Glauber Costa: Well, there's another decade in the middle. So I stayed around doing lots of things with virtualization, containers, C groups, and memory management in the Linux kernel. Including my time as a volunteer, that's around 10 years. I eventually joined Red Hat to work on the Red Hat Enterprise virtualization product. And at that time is essentially where I met the person who is my co-founder today, Pekka. Pekka was the maintainer of one of the memory allocators for Linux at the time, and we've been friends since then. I joined a company then in 2012 that had a interesting idea at the time called unikernel. So a unikernel is essentially a kernel that runs a single process. So it's not as flexible as Linux, it doesn't support a lot of the things the Linux supports, but supports as much as possible. And the idea at the time, I mean containers were not yet a thing, they existed and in fact, I had worked with the infrastructure for containers in the Linux kernel, but they weren't the dominant thing that they are today. And the idea at the time was like, "Oh, as the cloud becomes dominant," and I feel old just by saying that, "As the cloud becomes dominant, it's very inefficient to be running like Linux on top of Linux so let's find a guest operating system that runs a single process, the isolation level that people are looking for anyway." And this was called OSV, was a project that was started by the creator of the KVM hypervisor, Avi Kivity and Dor Laor so they started this company. And I joined them as employee number three, Pekka joined right after. The company didn't quite work... Well, the company worked, but the project didn't quite work. And then we pivoted to ScyllaDB. ScyllaDB, it was a re-implementation of Apache, Cassandra and C++. So again, in the data layer, focusing on high performance, low latency, petabyte scale workloads. I worked with that for again around some eight years. And then now I'm with Turso. Turso is the company that is essentially... Just going back to that, I mean, I really don't like this thing and maybe it's because of my Linux upbringing that there is an open source project completely dominated from a company. We very explicitly tried not to do this with libSQL. I guess some people say it's harder to do branding because now we have two brands that you control. But that's the thing, we don't want to control the SQL branding. We want this to be the space for you to come and build. So Turso is a commercial product. Turso is our company now. Essentially what it does in very simple words is we want to put SQLite on the edge. So we use the fact that SQL is an embedded database that is extremely cheap to run, that has an extremely low payload and footprint and use that to essentially make your database available in 34 regions across the world. So what that means is that you now have low latencies from wherever you are. So when you're doing Edge Compute, you don't have the problem of going back to your central database. And it's not that other databases cannot do replication, of course every database can do replication, but our focus on an embedded database to do that makes the replication affordable to an extent that it can actually now have a database that doesn't break the bank and is replicated everywhere. So that was our main motivation to create the libSQL project, which is like we look at SQLite. SQLite was absolutely perfect for what we wanted to do with the one exception that it's not a replicated database. SQLite is embedded and only runs in your device. We saw lots of other projects, LiteFast, LiteStream. I have also mad amount of respect for Ben Johnson, the person behind those projects. There was also another project called RQLite. I also know the author of Philip. And I saw all of those projects trying to do more or less the same thing of, is there potential for SQLite to be a distributed database, still keeping the good things abouts SQLite? But they were all dancing around the fact that, yeah, I can't change SQLite because SQLite is this unchangeable piece of thing that you may ask them for a feature and they may or may not do, but you don't control the agenda or you can't control the roadmap, you can't influence it in any way. So let's just assume that it cannot be changed and let's build layers around it. So Literfest for example built a virtual file system layer around it that the file system layer is what handles the distribution. Archelite went in into a different direction. So they all have this thing about let's experiment with making data distributed at the SQLite layer but without changings SQLite. We took a different approach. The approach we took, as I said, I think very informed of our time at Linux, which is SQLite is this very crucial piece of software. People have this idea, "Oh, SQLite has to be this way because it runs on the Airbus and it's so incredibly stable and so incredibly perfect," but hey, look, Linux runs everywhere as well and it has a diverse community. For us, those things didn't add up. We could modify SQLite and make sure that it was stable and all of that. At first we thought of rewriting it. So, "Hey, let's just rewrite this thing in Rust," just because we like Rust. But as we started thinking about it, we said, "Hey, you know what? The best thing to do will be to just fork it and diverge over time whenever we need, but try to keep that merging from SQLite as often as possible." Eric Anderson: One of the benefits of a fork might be that you could attract other SQLite fans who also want to take the project absolutely in a variant direction. Glauber Costa: Absolutely. Eric Anderson: The project's done quite well. I only have a couple of metric like the GitHub stars in front of me. You had a big launch a year and a half or so ago and then growth from there. How do you go about launching a fork? You mentioned the Hacker News comments, you just kind of fork and announce or what? Glauber Costa: Yeah, so the project that was, it's on one year and a half. The company that we have is a year and a half, but we had a different project before. The project was also using SQLite and we saw this frustration first and foremost, but we had in a soft pivot as well in the company in which we decided to focus on that layer a lot more. So Turso is the result of that, it's a new commercial project based on that. But we actually created libSQL before we had the idea of going all in with the commercial product of Turso because as I said, we were using SQLite in our previous project and we wanted that to be distributed as well. We had our own attempt as well to do the distribution layer without changing it. And we figured out on a technical level, look, it's just not the right way to do it. And it's a very artificial boundary, especially for an open source project, not being able to change it. But it's funny that when we launched, we didn't want to be the SQLite fork that adds this or the SQLite fork that adds that because we've seen this in the past. And what happens is that those projects all fail. And those projects all fail because again, you have this massive pole of SQLite and then you add your one small tiny feature, you're never going to have enough traction enough anything to justify like, "Hey, this big project with this one feature." And then we made that the very controversial decision to launch this without writing a single line of code. And lots of people at the time read this, "Those guys are clowns. They don't know what they're doing. The talk is cheap. Who's this idiot that is forking SQLite, which is the most perfect project to have ever existed?" I was fascinated that lots of people I did not know I had this kind of karma. Lots of people came to my defense. So, "Glauber is not a nobody, so just come on." But this was by design because by launching what we did was we put a page out, we put a statement of what we wanted to do, some technical areas, just not to keep this very vague, some of the technical areas where we believe SQLite could be different. I'm not going to even say better because this is taking the project in a different direction. So a modern database could have the following things. And this is still on our website by the way. If you go to libsql.org, we haven't touched that, we call it a manifesto. It should work with distributed data. So it should be able to work with data that is synchronized from a different instance of also running SQLite. It should have an asynchronous interface, which we haven't done yet, by the way. It should embrace the web assembly ecosystem better. And SQLite does have some inroads into web assembly but one of the things that we added that by the way is not that interesting for us in the commercial product at least right now, is the ability to write user defined functions in web assembly. So you can go and extend the database by writing, instead of a C extension, you can write a web assembly function that extends the database in whichever way you want. And there were a couple of other things as well like how we believe that a database like SQLite should be relying more on modern NVME devices and less on the page cache and using, whenever available, modern Linux infrastructure like ring. We haven't done lots of those things yet, but we put this roadmap out of saying, "Hey, those are examples of things we would like to do with SQLite if we had the opportunity." And we just put it out there, we announced the fork, we put the name out there, we created the GitHub. The first commit was just adding a license. So obviously you're legally allowed to relicense anything that's public domain can do whatever you want with that, even make it proprietary. So we stashed an MIT license, we put the README out, we put the list of things we wanted to do, and that was the launch. And that alone got the more than a thousand stars. But this is essentially for us a sign that there is agreement in the community at least obviously not unanimous agreement, but agreement that, "Hey, look, I mean there is space for that. This is something good." And three weeks is the amount of time that took us to push the first feature out. We had to live with lots of folks just coming to GitHub and Discord and thrashing us and Hacker News as well, saying, "You guys don't even know how to write code. SQLite is this very polished product and you just put a manifesto out, you haven't even written your code. Show me the code." And look, my way of dealing with that was just like, "All right, the code will come, don't worry," but it's there now. Eric Anderson: That's amazing. That's exciting. Open contribution is one thing, having a repo is another, but presumably now you can have a community around SQLite of contributors. And so how are you thinking about governance going forward? If I have an idea for libSQL, how do I bring that to the group? Glauber Costa: It has happened already. So if you go look at our GitHub, we want right now, again, all the maintainers are from our inner group, but we don't want this to be the case. We want to empower more people. One very interesting work that's happening, by the way, just to give this guy a shout-out, Matt hosts a project called Vlcn that's vlcn.io/ I actually only learned recently that is pronounced 'Vulcan'. I is just pronounced V-L-C-N. And his work is an attempt to bring CRDTs to SQLite. So the idea is that, I happen to agree with him that the biggest challenges with CRDT is the UX to make that happen. And he's trying to expose CRDTs as native SQLite types as part of special SQLite tables. Now all this happens on SQLite as an extension, but then the grammar is horrible because you can't change the grammar, you have to essentially use a lot of the code works as an extension, but declaring those things is quite tedious and horrible. And he's doing work now on libSQL to change the grammar. So he can essentially just go and say, "Create table," whatever you want using CRDTs, obviously merging all the code from the extension as a native part of libSQL as well. This is the kind of example of contributions that, look, it's not a drive by contribution, it's not like a rhythmic fix, it's big work. And we don't know yet where this is going to go. Matt started doing some work of merging these things to libSQL and it's the example of a person that I would love to have is not affiliated with us, doesn't work for us, although Matt, if you're listening, I would love to change that, but it's a perfect kind of individual to be maintained in a project like that. So I mean, you come and over time you trust us. As I said, we could beat your bias, but I see the model that worked for Linux working here. Over time you start trusting new people, those people become your new lieutenants and they have the power to trust other new people as well. Eric Anderson: One question whenever there's a fork is how you handle upstream changes? Is SQLite changing much? Glauber Costa: Impressively it is, yes. SQLite changes a lot, yeah. Eric Anderson: And then do you want to try and incorporate some of those upstream changes or what's the plan there? Glauber Costa: So far we are. And just to be clear, I want to set expectations as well for the folks listening. The delta between SQLite and libSQL at the moment is not that big and allows us to keep back merging. And part of the reason is that, again, also, at the same time we are a startup company, we don't have the luxury of just say, "Hey, throw $20 million into that and hire a lot of engineers just to work on it." And we haven't had the time yet to actually build this large community, which we believe we will. So at the moment, we are in the situation where we actually want to keep back merging as soon as possible. But I don't view this as a requirement. Sila, for example, Sila started... We also had the experience by the way of working with Sila, which was, in that case, a re-implementation, which is actually why we chose a fork this time. In that case, they wanted to change the language, so there was no way to just fork it. But we did start, by the way, we started Sila by looking at the Java code and translating the Java code almost mechanically to C++ 121 without changing much and then going and changing the places where you needed. So it was a similar experience.And look, the first version was very similar. Over time it became a different project. There are other embedded databases as well. I mean, SQLite is not the only one. You have databases today like Duck DB that focuses on analytics. So over time, I don't think this is going to happen this year or next year, but over time I do think we'll end up deviating from SQLite. What we don't want to deviate from is the things like the file format, the API compatibility. You want to hopefully extend the API, but you don't want to remove things from the API. So hopefully what you want to do is add. So the user, whatever work with SQLite keeps working with libSQL, right? You get users, those on ramp into the project, but adding more things whenever it makes sense. There are a couple of things about SQLite that I don't particularly love and we're happy to change that. I think this will be driven by just what we're seeing, the reality on the ground, so to speak. For as long as it's easy to keep back merging, we'll keep back merging. Come a time where it's becoming very tedious and using a lot of our time, that's the point in which you say, "Hey, we're not going to be doing this anymore." But this also is a natural mechanism of back pressure because if it's still doable to back me, that means that your delta is not too big, when it's not doable anymore, that means your project has legs. Eric Anderson: Totally. So part of the changes you wanted to make, Glauber, were to support this idea of using SQLite not only as an embedded database in browsers and other places, but as a distributed database. Tell us about some of those initial changes you made and the impact towards that vision. Glauber Costa: Yeah, so the changes that we made is that SQLite is actually quite extensible as it is. So it allows you to add your own virtual file system layer. So the virtual file system layer, it was the layer that will essentially mediate access to the file. But the way SQLite works, SQLite has a write ahead log file in addition to the main database file. The write ahead log is just the commit log file is the file that contains the stream of changes to SQLite. And SQLite does not actually allow you to tap into those changes. So again, you can virtualize the access to the storage, but you cannot virtualize the stream of changes. So what that means is that you have two options if you want to turn SQLite into a distributed database, which is always be sharing files, which is very inefficient, especially as those files grows, or find a way to tap into that stream of changes by reverse engineering what's happening, essentially what you're seeing, this is from my understanding what that project that I mentioned that puts a virtual file system layer, a distributor file system layer LiteFast does, you want to look at this file from the file system perspective, try to figure out what are the changes and deconstruct this file. What we added to SQLite that was of immediate benefit to us was this visibility to, as soon as the change happens, have a bunch of hooks before the change is committed, after the change is committed so that allows us to have real time access to the stream of changes. And what that allows us to do is that now we can make a distributed database just by shifting around the deltas and also guarantee that everything that we're passing around is transactional. So we can see the stream of changes and say, "Hey, a transaction started here and ended there. So now that the transaction happened, let's pass this on to the other replicas." So you always have a transactional boundary. The service level that we want to provide with our commercial product is that it's, again, full SQL at the edge, that's Turso so for the folks interested.So to do that again, I need to understand did the transaction happen, when the transaction ended, was that committed or not? And be able to stream those changes. So that's what we did. We also built something that I call libSQL Server, which is a server more for libSQL, which is what we use to power, and that repository have a lot of code that's a different repository than libSQL, but it's still part of the libSQL organization, which is essentially the libSQL core. We follow those changes. Now using that as a consumer of this write ahead log should create a data distribution. So actually our database, the distributed database is also open source and also part of libSQL. It just lives in a different repository because it's like something with an HTTP server around and a lot of other components that would just pollute the core. But this is also part of the libSQL project, as we call it, and part of the organization. So in that we actually have a fair amount of code, but it uses this thing from the core, which is allow me to tap into the stream of changes in real time so I can see all the changes that are happening and pass that on to the replicas. Eric Anderson: Glauber, take us up a notch from now the use cases for this distributed database. If I'm an app developer, why might I go pull on libSQL or Turso? Glauber Costa: I will take the license to take even another step back. So if you're an application developer, why would you even worry about data distribution and compute distribution in the first place? So what we're seeing happening in the market today is that there is a drive to move code at the edge. And why is there a drive to move code to the edge? Because you want to provide the best experience possible for your users, and increasingly your users are distributed, your users are all over the planet, they're no longer in a single geographical location. By the way, this is something that already happened for more than a decade with static assets. I want to serve a page for users all over the globe. So I put my static assets on a CDM. The next logical step for that is that now I want to personalize those experiences, I want to run some code. Code used to run on the client. Our industry works in pendulums and things come and go, but used to be the case that application developers would put code on the client, the client is the ultimate edge, but now there's a movement to do server side rendering again, which is how it was done when I started in the nineties. So let's do server side rendering again. But now the problem with server side rendering is that client side rendering is so much faster because it's at the user side. Server side rendering can be potentially far. So how do you conciliate that? It's like, let's just put a lot of servers everywhere. And this is the so-called edge, right? So you're now doing server side rendering, you're executing just JavaScript code or one of the nice things of server-side rendering that it freezes from JavaScript, you can do a web assembly using Rust or using Go or using any other thing if your provider allows you. But you are executing code on a server there is close to you so you have the benefits of server side rendering, but you are as close as possible to the user, although of course not at the browser, but you don't need necessarily to be that close. Now we accepted that and then the problem with that is that I'm executing code, but 80% of the useful code in the world accesses a database. Now you have your code that is running on the server close to you, but your database is far. And what that means is that you bought yourself nothing, right? Because the latency access to the database will dominate your time. So what we want to do is essentially solve that problem by putting the data close to you as well. And as I said, it's not that replication in databases is a novel idea, but replicating standard databases is incredibly expensive just because databases are traditionally expensive to run. So the idea of using SQLite for that is that now, storage is not that expensive, but running the compute on those locations is the expensive part. So what we want to do is essentially plug that gap and use SQLite or libSQL should make your data access fast from whatever you are in the world. Eric Anderson: So anybody building an app and especially those that are using these edge servers or server side rendering would benefit from libSQL and Turso? Glauber Costa: Yeah. Eric Anderson: Are people also using libSQL in the browser, the way we're familiar with SQLite in conjunction with a Turso for example? Glauber Costa: Not that I know, but I don't actually think that SQLite is that used in the browser. There's a lot of people who are experimenting with that, but I think it's still a very mature thing that we do hope to steer in the right direction at some point. We don't have the bandwidth to do it ourselves today. We would love to see folks doing this as part of the libSQL project, but it goes beyond that layer. Usually those things are done through web assembly. Web assembly at the moment doesn't really have a very good system interface. So there is the Wasi project to essentially enable that. But I think there's still a lot of pieces missing to have SQLite in the browser, but it's definitely within the vision that we have, but we're not seeing libSQL leaning that in that direction yet. Eric Anderson: Good. Well, Glauber, as we kind of wrap up here, what can folks do to get involved? They're excited about this and I imagine that that can span anybody from an app developer who's excited about using libSQL to somebody who wants to make a contribution and get involved in the new SQLite community. Glauber Costa: Awesome. So first of all, just a shameless plug here on Twitter @GLCST, so follow me on Twitter. We have a Discord community. I can put the invitation for the Discord community. We also have the libSQL organization in GitHub, just github.com/libsql. In that organization, as I said, you're going to find a lot of repositories, but two dominant repositories. LibSQL is the core embedded database that can be used in any way, shape, or form. Still the same way as you use SQLite as part of your application. The main feature that you're going to see there that is different, hopefully CRTD is coming soon, but right now the ability to execute web assembly user defined functions. And then there is the second repository called SQLD, the SQLite demon that allows you to use that with an HTTP interface, which is the base for Turso. Also, if you're interested in Turso, we have a separate Discord channel. As I said, lots of modern companies today will mingle those things into one. We really wanted to keep those things separate and not have us like the overbearing people making all the decisions in the community. So you have a separate Discord community for Turso, but if you're interested in our offering, we would love to have you there as well. Eric Anderson: Awesome. Glauber, this is exciting on many levels. This is kind of a new kind of topic for us on this podcast. Thanks for coming on. Glauber Costa: My pleasure. Eric Anderson: You can subscribe to the podcast and check out our community Slack and Newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.