Maxim Fateev : Open-source is not a goal, right? Open-source is means to achieve something. My view is that if you build any infrastructure-level projects, if it's not open-source these days, it doesn't make sense. It just doesn't. Eric Anderson: This is Contributor, a podcast telling the stories behind the best open-source projects and the communities that make them. I'm Eric Anderson. Eric Anderson: I'm joined today by Maxim Fateev, one of the co-creators of both the Cadence Project out of Uber and Temporal Project. Maxim, welcome to the show. Maxim Fateev : Thank you for having me. I'm really excited to talk to you. Eric Anderson: Maybe you could tell us first what Cadence and Temporal are, and then we can go back to how they came to be? Maxim Fateev : So, Cadence and Temporal are both open-source projects, which we have started and still run by me and my co-founder, Samar. We started Cadence around four years ago, October. And in October last year, we quit Uber and started our own company called Temporal Technologies. And Temporal is a fork of Cadence, which our company is moving forward and productizating. Eric Anderson: Got it. So, four years. Very good. Maybe you could take us back four years ago to when Cadence first came to be? What inspired the project? Maxim Fateev : I would say four years wouldn't be enough. The reality is that we probably need to go 18 years ago? Eric Anderson: Yeah, this is a life's work. Maxim Fateev : I joined Amazon in 2002 and I was a part of the so-called SPLAT team, software platform team, and that was well before AWS existed. Our team was responsible for a lot of things in the platform, but a couple of them were: asynchronous messaging, workflows, and frameworks. Maxim Fateev : So, I had a lot of exposure to asynchronous communication. And later I became team lead for the whole asynchronous platform Amazon ran on. And it was well before Kafka even was conceived. This technology is still used as a backend for the simple queue service. Maxim Fateev : And I had a lot of exposure to a lot of asynchronous use cases. Back then, think about it, Amazon was practically the first who started the whole microservice-oriented architecture. And we were hitting a lot of problems, which a lot of companies just started to face right now. Maxim Fateev : And so, practically, a lot of a microservice orchestration and microservice communication was done asynchronously. Initially, we had homegrown engine, which was based on Petri Net, and it ran on top of Oracle. Later, we realized that we needed orchestration because just asynchronous messaging for cues wasn't enough. And it wasn't the best model for complex service orchestrations. Maxim Fateev : So, we started a project, which later became Simple Workflow Service, which is publicly available editware service. Back then, when we started doing Simple Worklfow, our idea was to redefine how distributed applications, reliable applications are written. I think we made a lot of progress, but we certainly didn't nail it because probably you've never heard about Simple Workflow before I just mentioned it, but we kept learning, and we did a few things. Maxim Fateev : My co-founder, Samar, after he left the Amazon, went to Microsoft and built their Durable Task Framework. Later, that framework was adopted as a durable by Azure Functions. If you ever heard about Azure Durable Functions, they kind of use the same approach we started in the Simple Workflow. Maxim Fateev : And so at Uber when we faced a workflow problem, we kind of said, "Okay, we kind of know a lot about workflows. Why we just don't use the same ideas that we actually use at Amazon while building Simple Workflow?" But we also understood that it had a lot of limitations and [inaudible 00:03:49]. We kind of improved on that. Iterated a lot. And now we've got Cadence Workflow, which was actually pretty successful. Inside of Uber hundreds of teams were using it when we left Uber, so it grew a lot organically. It was started as an open-source project right from the beginning. We got pretty decent adoption, and then we decided to form the company. Maxim Fateev : So to go to how we started at Uber. It's an interesting, because at Uber our first project wasn't actually Cadence. It was [SharingMe 00:00:04:19]. SharingMe was a open-source messaging system. We wanted to replace Kafka at Uber because Kafka back then wasn't very stable. It was more than four years ago. We actually wrote, and it's still out there, open-source project. It was successful inside of Uber, so dozens of teams were using it in production. But later, management decided that financing something, which has an open-source technology as Kafka didn't make sense, so the project was shut down, but Cadence grew out of that. Eric Anderson: Got it. Maybe you could tell us briefly about the use case for these asynchronous messages. I imagine these are services or microservices creating tasks for other services or microservices to accomplish. Is that part of it? Maxim Fateev : That is certainly part of it. The thing is that there is no very specific use case, and this is something we are kind of struggling because everybody says you should focus on something. Eric Anderson: Right. Maxim Fateev : The thing is that we are kind of platform to write distributed applications. Eric Anderson: Got it. Maxim Fateev : When you need reliability. Obviously, if you don't care about losing data or you don't care about risk conditions or whatever, you don't need us. But every time you need some reliability, you need to make sure that your business transaction finishes, it's applicable. Maxim Fateev : And there are a lot of different scenarios. It can be as simple as, I don't know, payments. Quickly transferring money from one account to another. And again, it can be done from just making the local calls, but it also, obviously, in these days mostly involves multiple services. So service orchestration is obviously one huge use case. Maxim Fateev : But you absolutely can do everything. You can do desktop applications using this platform. So the way people write in desktop applications, which had practically all our service embedded a single machine, because it's still very applicable. So you can do monoliths. You can do microservices. You can do everything. Maxim Fateev : And from use-case point of view, it's all over the place. It can be a data pipelines. It can be a mail pipelines. It can be business processes. It can be subscriptions. Deployments, for example, HashiCorp uses Cadence to orchestrate their cloud deployments. Practically the new cloud platform is based on top of that. And at Uber were a lot of deployment infrastructure based of that. But there are also business processes. For example, tipping Uber driver. When you press a tip button, it will start the workflow for example. Eric Anderson: Okay. So let's nail the history real quick. You had developed the Simple Workflow Service at Amazon. You came to Uber. They had an existing project that you continue to iterate on, before that was either kind of closed or shut down, or either way you started a new project, Cadence. Eric Anderson: Maybe any comments on the kind of first commit or the first days at Cadence with your experience, you probably knew exactly what you wanted to build. You mentioned you had done this before and you knew of some improvements you already wanted to make. Was the plan to be kind of open from the beginning? How well planned out was Cadence from the outset? Maxim Fateev : So, first thing was that we actually spent almost half a year trying to open-source the previous project because it wasn't written as open-source from the beginning. We found very hard way that it's very hard to open-source later because all sorts of dependencies will creep in. We on purpose practically started Cadence from the beginning as an open-source project, almost all check-ins, besides maybe a few initial lines of code, we started to do them in the open GitHub repo. You can go there and see the history of how it was developed. Maxim Fateev : I think this was the kind of major decision to do it as open-source project from the beginning. It also meant that all internal Uber dependencies, we had to factor the code in a way that we would kind of able to use internal Uber systems, but not have them in the open-source repo. And we didn't want to fork projects. So it was actually written in a way that was extensible to support all those requirements. Maxim Fateev : From the implementation point of view I remember that Samar practically just said, "We need this and we know how to build this workflow system." And initially it was just targeted our internal project, SharingMe. And we said: "Okay, we know how to do that. Let's do it." Maxim Fateev : Initially I actually wanted to redesign it completely, because I had all sort of ideas how to make it better than our original, Simple Workflow ideas. But we understood that we would never do it because it would take long time to redesign and then kind of, we didn't have financing. We never had like a management come in to us and say implement this. It was kind of our own project. Maxim Fateev : So we kind of decided to stick to the same ideas. We didn't do major redesign. Obviously implementation is completely different. API was kind of similar, but still different so it wasn't repetition, but at least high ideas were the same. And then we implemented that and after it's actually was practically ready, then it started to curate with various improvements. And the biggest difference, obviously that first is decay which we implemented was Go. And at Amazon we didn't have... Go didn't exist back then when we started the project 10 years ago. Eric Anderson: And you've got Cadence in the work. You don't really have management buy-in, but you've got this, the early start of this project. How do you move from kind of a skunkworks project to a real project? Maxim Fateev : I think a couple of things. We practically went completely bottoms up. You found first customers. We practically went around to evangelize the technology. It was kind of like open-source, the same thing, but it's inside of a single company. Initially. We went [inaudible 00:09:38] and evangelize that we found use cases where it was very well applicable and application developers liked it. They tried it. And then one thing which we found about later is that we've never seen a single team stop using us. After they start using our technology they really keep using that and keep it in use cases. It's never was that they tried it and just said, "No, we are not going to use it." So it's very, very sticky. Eric Anderson: And are these teams coming from other Uber projects? Was there a bit of competition or are most of them just coming from other random open-source projects and they're excited to work with somebody internal? Maxim Fateev : Internal. It was just a internal team, because we are not competing with any existing projects practically, because think about it. We are trying to solve problem of writing distributed asynchronous applications and reality is that if you give this problem to almost any developer, these days, he will write something custom. Right? It will use Kafka or some other queue, or admitting queue. It will use databases, maybe some timer services, but there is no solution we solve these problems for in general case. Maxim Fateev : Reality is that's why we don't have like direct competition. We are competing with ad-hoc solutions. And Uber obviously had hundreds and hundreds of those ad-hoc solutions. And we practically came to the develop said: "Why are you reinventing the wheel? Why you're doing it the hard way?" Just to use our technology and you can just focus on your business logic and we will take care of the hard distributed system stuff. Maxim Fateev : So it was actually a relatively easy sell, but obviously it takes some time to understand the value of the technology. But that's why we usually start small from some basic things. Like, I don't know, distributed Cron Job, for example. And then after you get familiar with the technology, almost everything starts looking as a workflow to you. Eric Anderson: Right. And I imagine you have to catch teams either at a refactoring stage or at the outset of the project, is maybe hard to find matured projects and having to switch. Maxim Fateev : Actually, it's not 100% true. Obviously it took some time for most mature projects to switch, but reality is that it's not like all or nothing. It's not you need to go and migrate all your application to start using it. As I said, you can just go and switch single Cron Job, like distributed Cron, or we can switch just one part of your system. Maxim Fateev : And then you can go and redesign the whole. Like the pattern we see inside of Uber and then the open-source, that a lot of companies just to try small, they just implement one small use case, which is not mission critical. And then they understand the technology and that some companies actually 100% redesign their architecture around it. Maxim Fateev : So practically you can think about Cadence and Temporal in the way as a service mesh for asynchronous application. It kind of similar to this service bus idea, but it's a completely different technology and completely different implementation. Maxim Fateev : And the thing is that there are companies which practically switched all their services. Partly became an activities and workflows, and they don't need any existing service measures, because those don't work very well with short-lived operations and long retries. Maxim Fateev : In our system it's very easy to say: "Okay, I want to retry the separation for five weeks." Right? They will remain it if something goes down or I want to this take five months, or this workflow can run for year or live forever. And these things are practically, like normal service architecture don't help you with that at all. Eric Anderson: Now take us to Cadence's kind of open-source launch. How do you get from where you are now or where you are then to taking Cadence to public? Maxim Fateev : So it's interesting. We started this open-source project, but obviously first half a year or a year, we've got very few people looking at us. I don't know. It's interesting. Obviously, if somebody's starting a open-source project, don't get discouraged, because I think like almost first two years we've got very few users. Maxim Fateev : What happened is that this time they started to appear and it was interesting. We didn't get a lot of users, but we've got extremely sophisticated users. We've got people like HashiCorp, Banzai Cloud guys. We've got other companies which I unfortunately cannot name publicly yet, because we are working with them to have case studies released. But the thing is that we've got this top tier companies with very strong engineers, which looked at various options and found our technology and decided that it's better to use it instead of building their own. And this companies were able to build their own. Absolutely. Like everybody understands that HashiCorp can build whatever they want, if they put enough effort into that. But they decided that they'd rather focus on their business logic, on their core strengths, and to just use technology we provide. Maxim Fateev : And then it started to grow, grow, grow, grow. And practically last May 2019, we ended up in the top posts in Hacker News. And then since then it was just very wild ride, because we've got a bunch of new people looking into us. We've got a lot of new adoption then obviously venture capitalists noticed that. So they came to us and started to talk about starting our own company, which before that actually wasn't really kind of our goal. Maxim Fateev : So it's certainly... This new [inaudible 00:14:42] was a big thing for us. But before that we've got some usage, but it wasn't like growing exponentially. I think right now is growing pretty fast. Eric Anderson: I missed a part I wanted to ask you about before that. How you met Samar? So you mentioned that he was doing work similar to you when he was at Azure, but you were at AWS and you both ended up independently at Uber. Is that right? Maxim Fateev : It's actually an interesting story. Samar have worked at Microsoft for long time. He worked in mostly developers tools, SQL server and other places. And then at some point he decided to join AWS. He wanted to try cloud and Azure wasn't big thing back then, 10 years ago. He ended up joining Simple Workflow team and I was tech leader of the team. Maxim Fateev : So we worked together on releasing the Simple Workflow. And then later we kind of ended up quitting the Amazon for sort of nontechnical reasons. And he ended up back in Azure. He was tech lead for the Azure Service Bus and then created the Durable Task Framework there. And then by coincidence, absolutely, I actually ended up at Google for three years and then by coincidence, we joined Uber at the same time, practically, like within two weeks and started to work on the same project. Eric Anderson: Interesting. Were you assigned together or did you kind of find each other? Maxim Fateev : We just found each other at Uber. We didn't talk once in the middle. Like since we left [inaudible 00:16:04]. it's kind of interesting, but then yeah, since then we build these two projects together and obviously I'm super lucky to have him as a co-founder. We have very complimentary skills. Eric Anderson: Great. You rode the Hacker News burst of visibility to get a lot of adoption around the Cadence project. And then some VCs helped you consider starting Temporal. Maybe you can talk to us about Temporal. The decision to start a new company, project and the fork and all that. Maxim Fateev : Yeah, so our plan was maybe one day we do something about it as a company, but it was very, very abstract. I've never spent any time researching startups. I barely knew about any VCs. I don't think I've heard about [Andrison Joravits 00:16:47] until I actually started my company. Maxim Fateev : So I just absolutely uneducated about this, but one thing which VC's told us, which I think... No, they didn't tell us, but we kind of understood, but they kind of emphasize that it was practically impossible to really focus on external community staying inside of Uber. And not because Uber is bad or something because Uber is a company. They have their own priorities. And they needed Cadence and there are a lot of mission critical applications inside of Uber running on sort of Cadence. So obviously they couldn't give us unlimited resources. So with our limited resources, all those resources were a 100% consumed by internal adoption. Maxim Fateev : And we wanted to... We believe this technology can actually change not only one company, but practically how software is developed across all companies. And that vision just couldn't be realized to stay in inside of any company. I don't think it's about Uber, I think, any company. Maxim Fateev : And ability to start our own company, which focuses only on this technology and we can care only about external users of the technology. I think that was something which we believed was the right thing to do. And that was the main reason we decided to actually quit our pretty well-paid jobs at Uber and go and start our own company. Because we believe that we wanted external community be successful and we wanted this project to go and realize its full potential. And I think without having large team around that and enough funding, it's not really possible. Eric Anderson: And I suppose the Temporal project is in many ways, just an extension of the Cadence project in terms of vision and aspiration? So I imagine it kind of feels like it's more of the same. You're just kind of keep going. Maxim Fateev : It is the same in the sense that it's just a continuation of that. There are all sorts of legal and practical reasons why we couldn't continue to develop and Cadence itself. And once we made a decision to fork, we looked what actually, what features our user community required and asked for. And the most important feature was GRPC. Because Cadence uses teacher now, which is custom request/reply protocol written by Uber. And drived like historical reasons why it was used. But the thing is that it was practically duplicated at Uber at the time already. And it has all sorts of limitations. Most basic one that doesn't support a security. You cannot even do a SSL over it. Maxim Fateev : So we switched to GRPC, which ended up being actually much bigger undertaking, which we thought and brought a [inaudible 00:19:24]. And then we also, as we ran this production for four years in at Uber without making single backwards incompatible change. We accumulated quite enough features, which I wanted to do. And improvements, which weren't possible or would be very, very hard and expensive to do in a backwards compatible manner. Maxim Fateev : And we practically right now, Temporal is a kind of improved version of Cadence because it does two things. It, okay, most obvious one is GRPC security and others, but also there are like hundreds and hundreds of small improvements, which we made, which were impossible if we stay with Cadence, as it is. Eric Anderson: Maybe just on that topic, I was at Google before coming to SCALE. And there were a history... Google's got commercialized and create a lot of open-source projects. But also just, even from the days of Hadoop, a lot of papers, at least around distributed systems and implementations, but it seems like Uber is in many ways filling the position that Google once had in creating these open-source projects and new ideas around running distributed systems. Cadence is one, but there are several other open-source projects. I just find that interesting. I'm excited by the role that Uber and others are playing in kind of incubating these new ideas. Maxim Fateev : Yeah. I think it's amazing what happened at Uber. And I don't think it was forced on people. It's not that Uber had this policy: Okay, now you have write open-source. I think what happened is that Uber at the time hired a bunch of very smart engineers, because it was well before all the problems which it faced publicly. Maxim Fateev : So it was desired place to work. So it hired a lot of engineers and it had pretty liberal policies around open-source. Practically, all you need to do is go and get legal approval about the project and then you would be able to develop your project in the open. Maxim Fateev : And I think a lot of engineers just use that to implement cool ideas. And one thing is I wouldn't even join Uber if I wasn't told that my project can be open-sourced. And I think a lot of other people wouldn't do that. Maxim Fateev : So Uber having this kind of very open policy about building open-source solutions and for hiring all those smart people and having very hard problems to solve was enough to actually kind of become this cardinal of, I think a lot of new open-source projects. That is kind of my view of that. Eric Anderson: So Maxim, some of our audience are would-be open-source founders and you already had some words for them around adoption, may be slow at the beginning and you just kind of persist. I wonder if you have any more additional thoughts? Also some of our listeners are folks who would like to contribute and use Temporal. So any thoughts for them as well. Maxim Fateev : Open-source is not a goal. Right? Open-source is means to achieve something. Maxim Fateev : One of the reasons I think any company want to do open-source is that you want to have longevity of your project. For example, I remember back at Amazon in 2002 and later, we had a lot of really cool projects at Amazon, because Amazon was hitting large scale probably 10 years before most of the companies did. And we didn't open-source any of that. And what happened is that we build this technology. It's amazing. It's highly applicable to the whole world. A team builds it and then moves on, right? Because nobody wants to kind of keep doing the same thing. And then, five years later, some open-source technology IP is solving the same problem. You look at that open-source technology and laugh because it's joke. It's nothing serious there. It's not real implementation. Maxim Fateev : I'm pretty sure that Google guys looking at the Hadoop had exactly the same impression compared to the Google MapReduce, which is like a hundred techs, more scalable and everything. But then what happens is this time, because there is all this open-source community around companies like in a lot of iterations and users, in five years, this open-source technology becomes 10 X of the internal technology. And then you practically duplicate this internal technology in the favor of the open-source one. Maxim Fateev : I've seen it multiple times at Amazon. I think Google kind of had similar problem with a external a Hadoop stack. And my view is that if you build any infrastructure-level projects, if it's not open-source these days, it doesn't make sense. It just doesn't. Because company will, can invest a lot of resources. At some point it will stop investing or it will not have enough resources to invest. And annual open-source project fuel this time actually beat it. Maxim Fateev : So I think that is the main reason you want to do open-source. And that's why I think, for us keeping the like project and Temporal absolutely open-source we maintain MIT license. And I think there is no way we would be ever successful if this open-source project is not successful and not widely used. Maxim Fateev : So anyone who starts the project, I don't know, again, it's not about open-source. It's about create something which is useful and people want, and they will come. And it can take long time. But if it's really something which people will want, they will use it and will find it. And it can take long time, obviously. Maxim Fateev : For us, again, it was like 10, 15 year journey to get there. But now I think we are getting to the point when more and more people learn about it and want to use it. Maxim Fateev : In terms of contribution we absolutely open. The only problem is that it is pretty complex project. You can think it is like, I don't know something like level of database or may produce stack. So there is lot of complexity there. You can not just come in and start writing. You can do small things, but if you want to do some core changes to the core engine, even following internal developers, which we hire, it takes a few months to get comfortable with the code base and especially having experts like us who built the whole system from scratch nearby. Maxim Fateev : So, yes, we absolutely welcome anyone, but just be ready that it will take a long time to actually become productive. We had a lot of contributions around integrations. For example, we have service and then we have SD case. So we have awesome community members who actually wrote integrations for a Python, for a C-sharp for Ruby. And these are amazing because our it's the case, a pretty heavyweight. So it's a lot of effort to create one and still the external community contributed them, which I think is awesome. And we expect more and more contributions in the future as community grows. Eric Anderson: I imagine your most helpful and engaged contributors and community are those with large scale distributed needs that are taking you on as a significant dependency? Maxim Fateev : Actually, it's not. I would say a lot of individual contributors are not from big, even big companies. They just like the technology and they want to use it. And we are, again, you don't need to be big and large and have large use case to use us. The cool thing about us is that you can start using us for very small use case, for low traffic and it still provides a lot of value because it just simplifies how you write your code big time and design your system. Maxim Fateev : But the cool thing is that don't you need to redesign anything or rewrite anything if traffic grows. If your site now and gets like a one request per minute, you can use us. But if it starts getting like 5,000 requests per second, you still can use us and you just deploy the bigger cluster. So I don't think we are applicable only for large scale. There are a lot of companies using us for relatively low use case. Maxim Fateev : Think about every data pipeline by definition is low scale. How many data pipelines would you have? Like, okay, maybe a few thousand. Maybe 10,000, if you're very large company, but this is for us is nothing, because we can do thousands of workloads per second. Maxim Fateev : If you wrote it to target business level workflows and they happen like hundreds and thousands of times per second, so we build platform for that. But again, it doesn't mean that you cannot use it for very low traffic use cases like for Cron Jobs, distribute Cron Jobs, for example. Eric Anderson: As we wrap up here, what's the future for Temporal in the coming days? Any big milestones we should be watching for? Anything you'd like from the community in terms of help? Maxim Fateev : Our major milestone is that, as I said before, to Cadence project and various Temporal.io project right now, and we haven't had a single production release yet. So we have alpha release out. We are code complete there, but we spend a lot of time and effort to do very deep testing of the product. Because people run a core businesses on us. So practicly a lot of core workflows of a lot of businesses that rely on that. So we can not let them down. So we want to make sure that everything we release and say, it's ready for production. It is ready for production. Maxim Fateev : So right now we are focusing on stabilization and testing. And as soon as we think it's ready, we will announce that. I think it will happen relatively soon. And then yes, as soon as we have production release, we expect to most of the Cadence users to migrate to it and then obviously we welcome the new users. Maxim Fateev : And obviously run our alpha-beets as much as you can. So we want a more free testing if possible. One thing which I didn't mention actually, what actually our system is doing, because we say workflow and I think it's very confusing because workflow is actually means very different things. And there are a lot of Legacy systems we should do in workflows. Maxim Fateev : Think about this way. Think about your code. Every time you write a new application, this requires full tolerance. You need to account for possibility of process failing, just process being killed anytime. Or machine crashing, data center going down. What it means that you always need to make sure that your state is always snapshotted and resisted. Practically most of the applications these day, they get the request, they'll load state from the database. They update the state, save back to the database, then produce reply or push message to another queue. Maxim Fateev : And what we did, we actually changed that. The practice said: "Write your code and we give you abstraction of durable memory." So full memory of your application is for tolerant. Which means that like stacks, variables, local variables, everything is preserved always. Maxim Fateev : What it means that for example, if you make a request and request take five days, you are still blocked in the same line of code. And then it returns five days later because of retries or other reasons, you just unblock and go to the next line of code and all the variables, everything is preserved. And we ensure that this process lives across all infrastructure level fairs, restarts, deployments, outages, and so on. So you practically eliminate huge class of problems developer has to deal with, because memory is not full tolerant and we provide this filter and memory abstraction. So it applies to a very wide class of problems and as you see, we call it workflows because again, it applies to the same domain a lot of workflow engines supply, but it's much wider. Maxim Fateev : You can have objects live in forever, for example, and receiving events and processing them and keep them stayed in the local variables and then taking actions. So that is kind of high level idea of the project. Eric Anderson: Maxim, thanks for joining us today. I've learned a ton and I'm really excited about what you and Samar are doing. Congratulations to both of you for your success with Cadence and your new work on Temporal. We'll have to have you back to give us an update at some point. Maxim Fateev : Absolutely. I really appreciate you having me. Eric Anderson: You can find today's show notes and past episodes at contributor.fyi until next time I'm Eric Anderson and this has been Contributor.