Dor Laor: We know how open source can bring lots of traction into a business, and you can form a business around it. So, we did think about monetization even before writing a single line of code. And with an OS, many players will want to use the OS for free and many players will want to buy support just to get the stability and get the confidence that comes with it. Eric Anderson: This is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Eric Anderson: We are live today with Dor Laor, who is the CEO and founder of ScyllaDB and one of the creators of and involved in many open source projects. But today we're going to focus our conversation on ScyllaDB. Dor, thanks for joining us. Dor Laor: Thanks for inviting me. It's a pleasure to be around here. Thank, Eric. Eric Anderson: Yeah, totally. Dor, as I mentioned in the beginning, there's a lot we could discuss. We'll focus our conversation around Scylla. Feel free to jump around as it relates, as it makes sense. But maybe you could first, I think it helps to have you ground our conversation in what ScyllaDB is, and then we can get into the history. Dor Laor: Sure. ScyllaDB is a database. We reimplemented the Apache Cassandra from scratch in C++. So it has wire compatible with Apache Cassandra even beyond just the wire protocol, everything else is compatible. For example, the file format is compatible. A year ago, we added comfortability with DynamoDB. So you can think about Scylla, as you think about DynamoDB and Cassandra. So it's no SQL database, big data, real time, OLTP workloads. And it comes with an open source offering, an enterprise and as a service. Eric Anderson: Fantastic. I was unaware of the Dynamo compatibility, which is exciting. And so makes total sense. How did you tumble into this? Is this something you've aspired to do for some time? Dor Laor: We definitely stumble into it actually back in 2012, when my co-founder and I Avi, we were back then working at Redhat. We did a list of potential projects in order to form a company around and database was one of the projects, but we figured, "Hey, everybody already implemented databases with all of the options, even with a newer SSD hardware." So we didn't pick that and we picked some other projects. It's a nice story on itself but when we had to pivot, we pivoted into the database world because we wanted to demonstrate performance with Cassandra and the performance wasn't there. So this is how we stumbled upon Cassandra. If you like I can elaborate too on that project a little bit more. It's interesting. Eric Anderson: Please do. It would also be interesting to hear... Maybe actually before you get into that, Redhat is the premier or first or the original open source company. And you worked at Redhat and you were still excited about starting your own open source company it sounds like. Dor Laor: That's right. I'm an open source fan, an open source believer. I think I do it for hobby, but also for profit too. It'd be super nice that these passes can be combined. It's funny that I worked at Redhat between 2008 and 2012. And at the time Redhat was obviously a big leader in opensource and they were still kind of pitching open source as if it's something new, although the entire world's already back then moved to open source. Definitely it's now. So it's nice to be able to work and make profits from your hobby. Eric Anderson: Great. And now tell us about this project. Dor Laor: So even a little bit more background, there was a lot of parallel lines between ScyllaDB as a startup and the startup company that brought me to Redhat in our journey. So in 2005, I met Avi my co-founder. We were among the first employees at the Israeli startup that was called Qumranet. With its name imply, initially we did something around networking. Before I built a terabit router in another startup and at Qumranet we had to pivot and we somehow pivoted three times the last pivot was our last chance. And it was pretty successful before we worked on the Xen hypervisor with a pivot idea, which didn't result in a success. So we had to pivot away from it. And because we were familiar with the Xen architecture, Avi came along with a fabulous idea for a hypervisor, and that was the KVM hypervisor. And ever since we moved to the KVM hypervisor path, the company started to get traction and the project really boomed or the Linux community, it was part of the Linux kernel. Dor Laor: And we also work with another existing open source project called QEMU, which was successful as well. Actually, we've made it more successful with KVM. And eventually the company was acquired by Redhat, where we spent four years and I managed the hypervisor development and trade at the KVM and Xen. And Avi and I always wanted to have our own startup, so this is where we formed this company. Now, originally we didn't know much about databases but we had a lot of knowledge about operating systems, kernels and virtualization. So back then we identified that originally with virtualization help people to move physical workload to the virtual ones. But over time, the usage properties we're different, it's not that you were about to run the same physical workload on a virtual world. Once you had virtual machines, you could just run a single workload inside your virtual machine. Dor Laor: So this was kind of a... Now it's obvious, but back then in 2012, it was like we were onto something. So we decided to create a new operating system from scratch, with our own kernel, that will be focusing on virtual workloads and just run a single application inside of it. And by doing it, we were about to offer a big performance boost and also manageability. Because think about how many operating system configuration file you need to turn in order to run your single workload. So everything was in place and great. And we launched a new open source project, but the problem is in parallel to us, Docker just began to boom. And usually the answer that we got is, "Docker is the answer, what's the question?" So there was little attention span to Unikernel. This unikernel still lives today and even people use it. And it's continued to be developed not by us, but by the community, but we had to pivot away from it in 2014. Dor Laor: And when we wanted to show that applications run on our OS called OSv for virtualization, better than Linux, we compared the two and one of the workloads was Cassandra. When at the time we could offer a 70% performance improvement when we run Redis on OSv versus Redis on Linux. But when we did the same with Cassandra, the needle didn't move much. And we discovered that Cassandra itself is really inefficient. So when we realized we need to pivot, we went back to this discovery, did market research, did additional technology research and we decided to turn the company to ditch the previous project and rewrite Cassandra from scratch in C++. And that was in 2014. And this is what we do still today. Eric Anderson: Wow. You were really ahead of your time on the Unikernel OSv. That's fantastic. I think you still hear people talking about that as the future, how amazing that you saw it when you did. Dor Laor: Thanks. Could be that unikernel is a really nice wishlist always. Usually the competition is what's good enough and Linux is so good. It's even more than a good enough as it is. So here and there just recently I saw another unikernel project, but altogether, especially with the rise of containers, Docker now Kubernetes, it's really hard to show massive gains. We wanted to try to partner with cloud vendors and hypervisor vendors back in the days it was difficult because OSv managed to boot a full OS in just under one second. But in 2013, the provisioning time of a new VM instance on AWS was a couple of minutes. So the saving wasn't really propagating to the end user. These days it's different but still I don't see unikernels burst right now, especially with a Docker and Kubernetes. Eric Anderson: Yeah, now I agree. And I've seen lots of open source projects and companies that do great work and then Docker and Kubernetes reset the landscape. And now we all have to consider them the way we have to consider Linux. But exciting that it eventually got you to experimenting with Redis and Cassandra, databases in general in developing Scylla. So you were reimplementing Cassandra from bottoms up using C++ correct? Dor Laor: Correct. Eric Anderson: And maybe just some context, how is Cassandra...? What's the background of Cassandra? Is it also C++? Dor Laor: No. Cassandra is written in Java and that's one of the reasons that we selected Cassandra, because we're kind of competing against someone that fights with his hand tied against his back. It's just a wrong choice of language for such a project. So it's such a complicated high speed IO project. It's just wrong, it's hard. So it's great to compete against such a project. And that's the reason why we offer 5X to 10X performance improvements. Eric Anderson: Yeah. And I wonder if that language choice comes out of it's Hadoop heritage. Cassandra grew up around the time of HBase and other Hadoop related work that was all developed in Java. I don't know. Dor Laor: It's true. We also, back then, we looked at a MapR that implemented portions of Hadoop in C++. So we took that example MapR didn't end up well, but I don't think it's related to the choice of language. And also it's more complicated with Hadoop because developers do write code, MapReduce in Java. So it's fine if your application is in Java, and its running on the same platform, it's okay to have Java, Java. But with the database, everything is over the network. So it doesn't necessarily... The language of choice in the client is independent of the server. And it just makes much more sense to write in a native language. Today there are better at native languages than JVM in terms of performance, like Go and these days Rust, but C++ is still great. Dor Laor: So it works well for our use case. And I must also say that it's not just the language. The language is only a tool, an enabler to get to accomplish what we like to do. And what we like to do, because we have this low level experience with OS and virtualization, we like to gain control. Gaining control is for example, be able to be the one who sends the IO and in a DMA synchronously and control all aspects of execution. Dor Laor: Scylla, for example, we're trying to bypass even the Linux kernel with all of the experience and knowledge that we've got. We're trying to implement things in user space and making sure that the database is the one that controls everything. So when you install Scylla, for example, we automatically have a mini benchmark, a micro benchmark under the hood that tests the disk performance. We save these numbers in our configuration and we make sure that we never exceed the maximum disk performance. Because if we exceed this number, then we just make the discord a fast system to queue everything. And then we lose control. Some of the data needs to have low latency for real-time queries. And some data may be not important in terms of latency. Like when we stream data to new nodes and Scylla is built around this notion. So C++ is just an enabler to have the perfect control over IO, CPU, scheduling, et cetera. Eric Anderson: And I imagine it gives you an opportunity to rethink all kinds of elements in the architecture as you go. Dor Laor: Exactly. Another unique thing that we have, is sharding. So both in adobe and in modern distributed databases, the data set is sharded because it cannot fit a single machine. So in Cassandra and Scylla, the data is sharded into servers. Now we have another level of sharding, which we call shard-per-core. If you have a machine server with 20 cores, then we shard it, divide the data into the number of cores, 20. And each chunk is independent, there is shared nothing between these cores. And there is no locking whatsoever, nothing. So every CPU is not dependent on the other CPUs. On x86, if you lock, then of course you need to wait for the other CPU to release that lock and that's expensive. But even if the lock is not owned by another CPU, you'll have a penalty of 20%. Dor Laor: So Scylla doesn't use users locks at all. And also it's not just the CPU, but IO path are also independent. And also memory is pinned together with the CPU that it's runs on. In a environment where you have multi-socket machine, if one CPU needs to access memory that that resides on the other socket, you pay 100% penalty in accessing it. So Scylla is all designed around, shared nothing and sharding on the server level and also on the CPU core level. Eric Anderson: Awesome. Taking a step back from the technology for a bit. Tell me more about how this grew from a project of just yours and Avi's to where you are today. I imagine other people got involved at some point. Had you already incorporated the company or was that something you did after? And maybe you can get into some thoughts around how you looked at licensing and governance of the project. Dor Laor: Sure, it's definitely was a long project and a long journey for us. Relatively quickly, or in parallel of after we decided we we're going to form a company, we got a seed investment from the founders of the previous companies and several other certain entrepreneurs in Israel. And we started to go with the first project, the OS. And a couple of months later, we got an investment from VC investment firm led by Bessemer. So this is how we formed the company and also did the seeds and area rounds. And that company, we started the open-source way and we just utilize something that we're experienced in. And we know how open source can bring lots of traction into a business and you can form a business around it. So the first project was open source from the very beginning, with a very permissive license because the application had to linked with it. Dor Laor: And we wanted to maximize our chances to partner with many, many players. Also, we did think about monetization even before writing a single line of code and with an OS coming from Redhat, many players will want to use the OS for free. And many players will want to buy support just to get the stability and get the confidence that comes with it. So that's kind of a brief history. Another interesting thing especially around COVID is both at the previous startup and also at Redhat, we worked with a distributed team. Both within the company, within the startup and within Redhat and also when you work with open source by nature, you work with people all around the world. Dor Laor: And it's fascinating and it's fun. And I'm a big proponent of this collaboration, not just in terms of coding, but also in terms of how you manage other projects, not just coding projects. So this is how we formed the company from day one. And we started to hire people that we knew from the open source space. So we have KVM and kernel contributors on board, and we have other contributors that contribute to QMU and along the years, different other projects. So we began to add people around the world, and today we have almost people in 20 countries around the world from Japan to Brazil. Eric Anderson: It's interesting, from your perspective, having already ran or operated other open source communities, do you find that some of those other communities come with you to the new one? And you mentioned hiring KVM and other experts to work with you on Scylla. Does it help to kind of bootstrap a community that way? Or do you end up having to kind of find all new people interested in that particular solution? Dor Laor: It's a good question. So it's helpful, but you can only move a small extent. It's definitely good to come with credibility. So when we launch a new database, people give us the credit, even before we did something and do try to watch. And it's good also to be connected to people at high places, in different companies. It's also by... I encourage everyone on board to contribute to open source because it just help you and help your resume, whether you'd like to be entrepreneur or not so that's great. But in terms of moving people from projects, it's not always simple. For example, those who work on Linux kernel are really married to the Linux kernel project. It's almost impossible to move them out. We did move some of them to a parallel world with OS virtualization and most of them moved with us to the database. But it's not that simple, but it's possible. And the user community usually users have different perspective or different interest. So they'll continue to do what they have been doing before. Eric Anderson: In addition to individuals in your community, what about your first critical production workloads or companies that rely on you? How do you find those types of users or do they come from your kind of community contributors? Dor Laor: So usually contributors to a complicated project as database have different character properties than the end user. They can all be developers, but it's different types of developers. So the contributor community is one type of audience and the user community is a different one. So we had to go to look for users, which we didn't know them before. So it was more kind of looking for users ourselves, and that's kind of a standard activity. We've done it also with the open source movement. And also we barely had salesforce initially, now we do, and the company sells like any other standard enterprise and as a service company. But early in the days, we did it like walking between everyone we could pitch to, whether it's open source or not. And everyone who has been using Cassandra or panes with the existing databases. So it's always really good to piggyback on existing project. It worked for us with the KVM hypervisor, piggybacking on Linux and trying to convert Xen users. And it works for us with Cassandra and these days with dynamoDB too. Eric Anderson: It makes total sense. Good work on getting those initial production users. Maybe you already touched on this. How did you think through the licensing and governance of the open source as it grew? Dor Laor: We had to think it before it started to grow, basically the moment you launched a project, then it needs to ideally wouldn't change the license. I'm really against changing licenses, unless it's absolutely a must. We have two projects at Scylla. One is the database itself, and we chose to license it with AGPL. We saw the AGPL example with MongoDB and it was successful for them. And I think that the AGPL is a license that really encourages contributions. So it forces you to contribute if you change the code, if you distribute the code or not, basically the GPL itself has a big hole in it that AGPL closes because with GPL, and we saw that the Redhat. If you use GPL codes, but you don't distribute binaries. So you only provide something as a service, whether you're a Facebook or an Amazon or something like this, then it's not a must to contribute your changes back to the community. Dor Laor: So AGPL is more right in that sense. And for us, it was important to select this type of license, which will offer us some type of a protection from all of the gorillas out there who may use code without contributing back. There's no problem to use AGPL code for free, no problem at all. Whether you contribute at all or not, it's fine. But if you do need to change the server code, then you need to contribute the changes back. So I think it's A, fair; and also B, it's a good compromise between permissive license and non permissive license. We have another project, which is the core engine of the database. We've made it an independent project. It's called Seastar. This independent project it's not a end user product, it's more of a library that allows you to write a synchronous project like a database. Dor Laor: For example, there's a company called Vectorized and they're trying to rewrite Kafka. Kafka is written in Java 2. They saw the example that we have started and they tried to duplicate it and bring it into the streaming world. And they use our core engine Seastar. And because we knew that people will use the engine Seastar in variety of ways. And it makes sense for them to... They wouldn't want to contribute all of the code back then we gave Seastar an Apache license. So this way they can be independent with the directions that they take. And Seastar is also used by Redhat with the SEF project and by several startups who do NVMe-over-TCP. So it's a good independent project, and we didn't have direct monetizations goals around Seastar. So that's why its Apache. Eric Anderson: Very neat. That makes a lot of sense. And I wasn't aware of where all the places that Seastar had gone. Congratulations. I want to kind of discuss a topic quickly from you. That's a bit more, I guess, theoretical or abstract. I've explored this myself. When I was at Google we developed an open source project, Apache Beam that was just an interface that could be used on our proprietary service cloud data flow, as well as other open source services or engines, I should say, like Flink or Spark. Eric Anderson: And this idea of separating the interface from the engine in open source is becoming much more meaningful as we go into cloud, because in some ways with the cloud, customers only see the interface, and there's an opportunity to swap engines behind the scene. And of course, you've pioneered this with Scylla, you realized that the Cassandra API is something that a lot of customers have already built against, and you give them the opportunity for a new execution engine. The Seastar, I guess, follows the same pattern. Is this the way of the future? Will we see more kind of interface-specific projects and then kind of more execution engine, if you want to use those types of words, projects. And end up in a world where people kind of gravitate to an interface and then have a menu of execution engines behind it. Dor Laor: I think we'll see all bunch of options. Like I can say for ourselves, we're very creative in terms of implementation, but we're not necessarily the experts or the most innovative around APIs. I wish we were, but so far we're not. And this way it speaks a lot of sense to just take an existing, successful, good API, and just replace the implementation under the hood. And we've done it with Cassandra, we're doing it with DynamoDB, even though Dynamo is a closed source project, we're kind of reversing the trend. Usually Amazon is doing this with the open source code and we're kind of reversing the trends. Dor Laor: So it makes tons of sense. And it's also a natural open source movement. It's a movement that we did with KVM versus VMware. And many times open source begins by trying to open up infrastructure that was only closed until now sometimes without being API compatible. And there are cases where API compatibility is important, and there are cases where it's less important. We used to get questions about why does the world need another database, especially several years back, every week or so, there was another database company or another database project. And the world doesn't need another new database. It needs a better implementation of existing good APIs. A SQL by the way, is a great database API. So project just needs to try to be compatible with SQL. Eric Anderson: Yeah, totally. Take us now to the kind of the present day. What are you working on now? What's the future hold for the Scylla project? Dor Laor: Next week in January 12, we have the Scylla summit. Probably our audience will hear it in B Street. And we're launching project Circe, which is a 12 month roadmap that's transformed Scylla... The names are from the Greek mythology. Scylla was a nymph and Circe was a witch that transformed Scylla into a monster with 12 tentacles. So we've made the logo cute and all our mascot, but it was a monster, dreadful monster. And we're trying to transform now Scylla to a monstrous database. The main idea is that we have fantastic performance. We have compatibility with multiple APIs. We then have Kubernete's ability now, but we're not enough satisfied with the elasticity aspects and the mobility aspects of Scylla. Dor Laor: I think that today Scylla is easier to manage than Cassandra, MongoDB and Redis, but we'd like to make it more smoother. So we're replacing big components in Scylla. We're taking the raft consensus protocol and putting it in the base of our system. It will allow users to have data always consistent. Transactional data, as opposed to eventually consistency guarantee that Scylla and Cassandra and DynamoDB has. And it will also offer us to provide better elasticity and maintainability without diving into it too much. And this project Circe will have a webpage and we'll publish every month a new functionality of this project over the course of the next 12 months. Eric Anderson: That's wonderful. I was just counting the tentacles on the logo. I didn't see 12, but I think I got a seven, which implies another five behind. So very good. Dor Laor: We have another logo for project Circe with more tentacles. Eric Anderson: Okay. [crosstalk 00:30:19] Yeah. Very good. Wonderful. And hopefully we've piqued the interest of some folks on the show about Scylla. If there's folks listening and want to get more involved, any suggestions on how they would go about that? Dor Laor: If you're a user, the easiest thing is to try to download, to use kubernetes or to use our managed cloud. If you're in Contributor, go ahead, check the code on GitHub. The code is fabulous. We started with C++ 11, proceeded to C++ 14, 17, and now C++ 20 with the coroutines and really exciting programming techniques. So just as a good, interesting learning experience, then it's really fascinating to check out the code. So whatever you like, there is something that can be interesting for you. And also we have fabulous user base. So users like Discord users, Strava users, Medium users, Starbucks users. So it's a good database of choice. Eric Anderson: Dor thanks so much for your time today. During my time at Google, we use KVM. I've heard about your work for years, and it's a pleasure to meet you and hear the story behind Scylla. I'm excited for you doing. Good luck on the conference or the summit next week. Dor Laor: Thank you. It was a pleasure. Eric Anderson: You can find today's show notes and past episodes at contributor.fyi. Until next time, I'm Eric Anderson and this has been Contributor.