Noel Minchow: Hello and welcome to PodRocket, I'm Noel, and joining me today are Maxim Fateev and Ryland Goldstein. Maxim is CEO and founder of Temporal, and Ryland, I believe you just told me you were doing Full Stack Dev, is that right? Ryland Goldstein: I'm head of product at Temporal. Noel Minchow: Head of product, nice. That's even better. Cool... Even a better perspective. Ryland Goldstein: I agree, I got a quick promotion. Noel Minchow: Yeah, nice... No worries. Yeah, we were talking a lot before the show so just trying to keep all my ducks in a row here... But yeah, I'll just turn it over to you guys real quick. Can you each give us a little bit of your background, where you came from, and how you found yourselves at Temporal? Maxim Fateev: So I've been around for a while, over 20 years, but in 2002, I joined Amazon when it was relatively small company with 800 developers, and then I spent a total of eight and a half years at Amazon, I saw how Amazon web services was conceived and grew. And I was tech lead for the [inaudible] system of Amazon, practically, they have built the Amazon messaging platform. Later, Simple Queue Service took that part of that platform as a backend. And then as a tech lead for the [inaudible] ecosystem. Clearly, my team and me, we saw that building microservices and communication within microservice using queues wasn't really a good approach. So we wanted some orchestration solution, and out of that a simple workflow service was built, conceived an Amazon AWS SWF service, which is still part of Amazon's offering. And I was tech lead for the public release of that service. And later when I ended up at Uber, we built a similar solution on the same ideas but using a completely different software stack. The project was called Cadence and that project was open source from the beginning, and later we started the company Temporal, and forked the project, and now we have the Temporal Project which is a continuation of ideas of Amazon Simple workflow in Cadence. And I was an engineer all my life, I was an individual contributor, never managed a person... And then we started the company... Now I'm CEO. Noel Minchow: Nice, that's quite the jump, we'll unpack that a little bit if we have time here at the end... Before we do, how about you Ryland? What's your background? Ryland Goldstein: Yeah, so my background is very much on the engineering side of things as well, although obviously less storied than Maxim's. I really got into coding, into programming when I was quite young, in my teens. I wanted to basically write cheats for games, and so I learned low level programming as the way I got into the space. And that passion pretty much carried me through university; I went to the University of Washington, I eventually ended up joining a startup from one of the professors that was sponsoring me for research, and so that was actually my foray also into the distributed system space, the company I first worked for was building a competitor to Apache Spark, and so it was scale up map reduce style product, all done with on-prem hardware, there was no notion of the cloud, that was a dirty word at the time. And so then that was really how I got into distributed systems, very quickly I ended up becoming a leadership instead of just an individual contributor, obviously still coding, but doing a lot more of management in the architecture and design. And then I actually started a company with two other guys, that was very much a cloud play, we were trying to build a competitor to AWS Lambda, and so we ended up getting funding, I actually moved to Israel to bootstrap and start that company, and we did a lot of work for three or four years on that product, and brought it to market, and they eventually actually got acquired by Twitter. And so when I left that company, it was right before the pandemic started, and I honestly was just mostly taking a break, but then the only investor for Temporal at the time reached out to me and said, "Hey look, I read some of your writing online, you should be doing product work for this company." And so that's how I got introduced to Max and Samar, I was the fifth hire at the company and I was brought in to do product stuff around developers, but for the first 12 to 18 months I was running the business of Temporal, so responsible for all the non-engineering functions. Noel Minchow: Oh, got you, that can be a technical endeavor as well, I've found recently, there's a lot of nuance there sometimes. So I'm just going to start throwing some questions out there, you guys take them, whoever you think is better to answer, feel free to jump in. For those who aren't familiar with Temporal at all, can we contextualize a little bit, what is Temporal? What problem is it trying to solve? Maxim Fateev: It's hard to describe actually in two words, so let me give you a general idea of the problem. We need reliability, these days we are building these kinds of distributed systems from unreliable components, and we need to make sure that our programs, even our applications don't fail when unreliable components fail. And take the most basic example, let's say we do money transfer, in the old days, you would just say, okay, here's a database transaction across these two... Update... From your records, we get a transaction, everything is updated or not, and we get reliability just from transactions. But these days you withdraw money from one service or one bank, and you deposit into another one, there are no transactions across them, and how do you ensure, for example, this happens in the presence of various failures. And usually developers need to come up with a bunch of workarounds; they start using queues, they start to checkpoint state in the database, they need to break their code into a bunch of callbacks, because every time, processes can crash. And Temporal, practically says; you can just write your code, as we call durable execution, and this code is guaranteed to execute in the presence of failure. So practically you will just write two lines of code; withdraw money and deposit money, and we say this code will continue executing no matter what. So it means that your process can crash, your data center can go down and then come back, and then we will go and resurrect your process in exactly the same state, and continuing executing. And the other interesting part of that, that any API call can take an unpredictable amount of time, so if withdrawal takes three days, because you need to do retries behind the scenes, or it just takes longer because a human is involved in the approval process, you still make a single line API call, practically; withdraw, and it'll block there for three days, and it's not linked to a specific process, it is safe to do, because whatever crashes, it will be resurrected, and then three days later, it returns to go to the next line of code. So it is a very simple concept, but it is a very generic concept, but it allows it... And then you need a database, because all your state, all your variables is always persisted and durable, so you don't need to explicitly go and save state and load state from the database, because you just keep them in your local variables, and it also simplifies your programming experience big time. Ryland Goldstein: What I would add on that, coming from a very different angle, and very much inspired by someone on my team who Dominic, who's very passionate about databases and the fundamental abstraction that Temporal represents, the way that he often describes things, is that when you write a very basic transaction in a database where you're taking money from one table that represents a user's account and you're putting money into a row in a different table that represents a different user's account, when you actually write that transaction in terms of the query in the database, you don't have to specify how those operations are done. You say, "I want this thing done and I want this other thing done, and they need to both be done, or none of them need to be done." But you don't go into any of the handling of what happens if it only gets partially done. And so regardless of how you feel about databases and writing queries, I think that from an ideological perspective, almost everyone agrees that not having to specify the details about handling the very specific failure cases and all of that, that's generally better, it doesn't really provide you value in what you're trying to do when you're using the database, to have to go to that level of depth and understand all the different edge cases and conditions that can happen. And so from a very ideological point of view, again, Temporal is really about giving developers the same experience that you get when you write a query in a database, where you get to really specify the intent of what you want to do, but you don't have to get into the details of how you roll back a transaction when it fails part way through, because that's just an abstraction that database has provided you, and given you the ability to not think about it anymore. Noel Minchow: Enjoying the podcast? Consider hitting that follow button for even more great episodes. Is it fair then, in this analogy to databases, we lean on this, the "database can just handle it" mentality, because people have spent a bunch of time implementing the database to make it so transactions are safe and things can be rolled back, but is that always a clear analogy with services that are necessarily being glued together, because the dev is concerned often with the implementation of the service itself as well, right? So there still needs to be some piece of knowledge that is understood by the dev if they're implementing some downstream service that needs to talk back up to an orchestration layer. Is that a fair postulation, is that fair to say? Maxim Fateev: So I think the idea is that we have this durable execution and in Temporal terminology, we call them workflows, for legacy reasons. But one thing that's very important, it's not like a new language, you just write that in normal language and we support SDKs in every language. For example, if you're a TypeScript developer, or a Node.js developer, you'll just write your code as typescript. You actually will write typescript code like withdraw and deposit. So we have Java, PHP, Python SDK is coming, and Go, I think the most widely used SDK right now is still Go. And if you are inside of that world, of workflows and durable executions, and all services participate in that, their life is much easier because they practically don't need to do a lot of work, which you would do normally with these services. But usually there is an external world which doesn't participate in that, and this way you can integrate with that. We do it for activities. Activities are just pieces of code; task handlers, which can execute any code. And then you will just invoke these APIs which exist and you will integrate, but Temporal will automatically provide things like retries, making sure that things complete and workflows complete, but like any API call it will be re-tried automatically... So it plays nicely... And the other important thing, you don't need to do one huge migration, it's not like you need to go and say, "Oh, I will throw away my current architecture and put Temporal in." And it will help you. It is more about, "Oh, I want that this specific part of my system will be durable and execute these few operations reliably." You can just introduce it in one specific place and the rest of your architecture stays exactly the same, your services become reliable, but you still can have the same services, the same APIs and so on. Noel Minchow: Got you. So if I'm a dev coming in, I've got a function that calls three other services, for example, and using your transaction example, say that I'm interacting with a third party; and one is withdraw money, and then the second call is insert money, and then the third one is update invoice. How does Temporal help me solve the problem of; I've made it through withdraw, and then I go to deposit money, make that second call, and something fails in that, and I need to figure out then what I'm supposed to do in regards to that first call, how does it help me solve that problem? What is different? Ryland Goldstein: Drawing it back to that first analogy I gave, when you think about a database transaction, the super important thing to remember is that it's not a silver bullet and Temporal isn't either, I think they're immensely valuable, but it doesn't solve all the problems. So if you take a database transaction, there are very valid reasons, reasonable reasons, that a database transaction doesn't work, and as the developer, you need to know that's the case and you're going to get something back on your end that says, "Hey, this thing didn't work." Now what you're guaranteed is that you're never going to get something back that says, "What's going on here? We don't know what happened." You're either going to get something back that says it worked or didn't work. And so the real thing about a database transaction is it doesn't just make everything work for you, it gives you a very explicit contract about how things are going to be, and it gives you a very clear way of understanding the world and how you should react to it, and that makes it much easier to develop applications that are reliable and more enjoyable to develop in the first place. I think Temporal is the exact same way, so we aren't some crazy solution for distributed transactions that nobody thought of before, because that's not possible, there's physics limitations there. What we do is give you the most explicit interface and contract for when you're doing things across multiple discrete services, so you always have a 100% understanding of what you're working with and how you should adapt to make sure that you have the outcome that you're looking for. And so directly in the context of the example that you gave, with calling independent services, obviously we can't guarantee that all of those services are going to get called, or none of them are going to get called. But what we can do is make sure that if one of the services gets called, that there's a very strong and concrete record of that happening, and it always will happen if that service gets called, and so that way from the point of view of the developer, when they're actually writing their code, they don't ever have to do the guess and check work of, "Well what if this got called and then the thing crashed before I was able to record it?" "How do I know if this is in a consistent state?" Because what Temporal presents to you is very explicitly, the state of the world, regardless of what has happened to actually reach that point in time. Sorry Max, I don't know if you have more to add there. Maxim Fateev: Just going back to that example, we need to distinguish two types of failures and faults. So there are infrastructure level things; processes crash and databases have outages, network events... Those, Temporal handles seamlessly. You don't write code to account for your process crash, that's why you can write code just three lines of code; withdraw, deposit, and generate invoice. And in Temporal, you will just write explicitly three lines of code and we will guarantee that they execute, but then there are business level failures, for example, if you try to deposit it to the account which doesn't exist, there is nothing Temporal can do about it because it is a business level failure, it's not infrastructure level failure, then it will bubble up back to the application and your code will need to decide to what to do. In this case, probably to run compensation and put money back into the initial account. And that is part of the business logic and this is what people will usually do as sagas, with the saga pattern and Temporal is an awesome way to implement saga. But saga will be very simple, it's just code which will go and run the compensation flows based on the code which executed, and so again differentiating between business level failures, which is your business logic, versus infrastructure level failures, which are something which Temporal takes care of and your code doesn't need to even think about. One way I call it is; fault oblivious... We don't like that name because I believe it has a little bit of a negative connotation, but if you think about it, your code just doesn't need even know that a fault happened. It just doesn't even notice that, it just keeps going as if nothing happened. Noel Minchow: Got it, that delineation helps a lot for me. So in this example, what if there's a network failure on the second call, is that something that is within Temporal's wheelhouse? And if so, where do you guys draw that line when you're deciding what should and should not be handled? Maxim Fateev: So in this case, the reality is that you're calling an external service, and this external service does not participate in Temporal's microcosm, so you will make a call, this call will fail and time out because the of network and we can do two things; we can retry, and Temporal will retry these things by default, or you can just disable retries and run some compensation action. For example, your API is not that important, and there is another API to check the result of the previous state of the world. So you can say don't retry, and Temporal will not retry, and it'll go back to your business level logic, and you'll say, "No, run this different operation to check the state." And then if state says operation didn't succeed, retry. So you can code these things as explicit patterns in Temporal. Ryland Goldstein: Yeah, the one thing, just to make sure it is clarified, Temporal doesn't make a decision about what stuff is within the scope of things that we care about or not, it's up to the user and it's the user's responsibility to understand when they're interacting with things which are outside of Temporal's purview. And so we provide a generic construct for any of these things, which is essentially just a special type of function that we call activities, Max mentioned those before, and they're essentially a container for work that you're doing that is unpredictable, things that Temporal can't reliably do itself. And so if you were doing a network call, it would go in an activity, if you're writing to the file system, it would go in an activity, if you're doing random number generation for example, it would go in an activity. Now the one cool thing is that we have added support with one language; TypeScript, that actually makes it impossible for users, for the most part, to write code that shouldn't be in a workflow, and it will actually stop them from even executing and compiling that code before they can run in production. Noel Minchow: Right, it's interesting... So is it fair to think of Temporal's utility then, as providing this activities abstraction layer on top of code that you are writing and running that helps you manage and recover from unexpected states, or just more complex states than may be intuitively handle-able? Maxim Fateev: I wouldn't say activities, the idea of workflows is a durable execution, which is what we call workflow, and workflow is a function which is guaranteed to execute and state is preserved. But then workflow orchestrates activities and can also react to external events, so you can send event to the workflow, you can also query it and in the future we are implementing an update function. So what it means is that if you have an existing... Let's say, as you said, you want to implement money transfer naively, in the service; somebody makes a call, you call three functions, and then you return the call, and this works until your process crashes in the middle. And calls also can be retried, if they time out, but at some point you need to reach return. So in Temporal, what will happen is that these specific external API calls, you will put in activities, and then workflow code will call those activities, withdrawal activity, deposit activity, generate receipt activity. And then there is a guarantee that this code will execute which calls those activities and it will never die in the middle, because even if the process crashes, it will be resurrected in the same. Noel Minchow: That makes sense to me. More generically then, when you're talking about durable execution, which is a term I've heard you mention a few times, are you talking about that abstraction in and of itself, or is there something else that durable execution is meant to encapsulate? Maxim Fateev: No durable execution is just... If you think of an abstract level, without any specific product like Temporal or whatever, the idea is that you have a function or a piece of code which is not linked to a specific process, which is guaranteed to execute in the presence of failure, that is the whole idea. And there are a few things which come out of that, it will eventually complete, because if it's down for two hours, it cannot not execute, if the whole system is down, but then it'll continue to execute, and all variables should be durable because again, they're not linked to a specific memory, because they continue executing, and then API calls can take any amount of time. So practically you say; withdraw, sleep for 30 days, deposit, then you will wait 30 days and continue, because this process is guaranteed to execute, on abstract. And Temporal practices a specific implementation of that durable execution, and for legacy reasons, we call them workflows, but you practically will do exactly the same, you will run these lines of code, you will say sleep 30 days.... Say we need to do a subscription for the user, you will say for loop, in whatever language you're doing, if it supports for loops, then you will say, for 12 months, I = 0... < 12. Then you will say sleep 30 days, and then you'll say charge user, send email that you got charged, and then check cancellation status and exit that loop if it's canceled. This is like 10 lines of code, and imagine doing that any other way; you need to have durable timer or chron job and you need to have a table to keep state, you need to make sure that you can call those services reliably as you try, and Temporal is just your business logic, you would write it as if your memory would be durable forever, and this is exactly what we are providing. Ryland Goldstein: That's really the inverse of what Max said, you said it right before I think, which is that really durable execution at the end of the day, it's about the developer writing only business logic, and that includes business failure handling, because that's not something that exists separately to business logic. And so really, that's the end state, the user experience, the change that you see, is that the person who's writing the code, they only have to be thinking of what's relevant for their business, and they don't have to worry about all these transient failures and things that actually don't have to do with what they're actually writing. Which is the same value you get from a database transaction, at some level as well, right? Noel Minchow: Yeah and that process you're describing; a for loop with a long sleep in it, I think there's just functional utility to that paradigm as well, beyond just better error state and failure handling and all that stuff, that is a useful thing to have on one's tool belt, to write code. I can stop worrying about the externalities of the environment when I'm writing my business logic, so I think that in and of itself is an easy thing to sell, or would be valuable for a lot of specific use cases for devs. Emily Ketner: It's Emily again, producer for PodRocket and I want to talk to you, yeah you, the person who's listening but won't stop talking about your new favorite front end framework to your friends, even though they don't want to hear about it anymore. Well I do want to hear about it because you are really important to us as a listener. So what do you think of PodRocket? What do you like best? What do you absolutely hate? What's the one thing in the entire world that you want to hear about? Edge computing? Weird little component libraries? How to become a productive developer when your WiFi's out? I don't know, and that's the point, if you get in contact with us, you can rant about how we haven't had your favorite dev advocate on, or tell us we're doing great, whatever, and if you do, we'll give you a $25 gift card, that's pretty sweet, right? So reach out to us, links are in the description. $25 gift card! Noel Minchow: Max, I think you mentioned earlier about how you can only use Temporal on a little subset of your code, or start massaging it in over time to a more complex code base. How does that process typically work if somebody's trying to make their services a little bit more robust, but they really haven't done much work at all? Maxim Fateev: So for example, I don't want to name companies, but one company had a payment system that was based on queues, and a bunch of services pushing messages to other services for queues, and then they contacted us and said, "We have this downstream dependency which can be done for three days, because it's a bank, they have a bank holiday, and they also want to have weekends, so the SLA is three days." And what do we do? Because you cannot really try from a queue for three days, there are no queues which support such a long retry. And then also visibility... There are a lot of things which go with that... And they're like, "Okay if this request fails, just go and start the workflow." Which will practically keep retrying, because in Temporal, there is no limitation of retries, you can retry for a year, you can just set the retry policy... Expand the try policy for a year, and it'll keep retrying for as long as necessary. And that's what they did, they just introduced this silly workflow which was running single activities with a three day or five day retry policy. And that is what they did. So two years later I've learned that they replaced... All of the payment system is just bunch of workflows using Temporal. So that is how it goes, that they try it in one place then, "Oh why do these two services talk for queues? Maybe we just can do orchestration directly." And they keep growing and growing, this time they just ended up throwing out all queues and just ended up with a purely orchestrated solution. That is one way and from another... We never talked about use cases, but if you think about it... I can give you just some use cases. From infrastructure automation, HashiCorp's cloud is built around this solution, because they need to provision new [inaudible] clusters, they need to provision new infrastructure, they need to talk to unreliable cloud APIs, and they even run Terraform as an activity. And also you can do lifecycle management, because you can have this loop waiting for external events and take an action. So you can have this durable execution workflow always running for a resource, for example, for the cluster, they can a request, and making sure, running them one by one, serializing them, because there is full consistency, guarantee there is only one workflow running per business ID. So if there is cluster ID, there is guaranteed to be a single point of control. So writing control plans for cloud services is one big event... Infrastructure automation... Then deployment; CI/CD pipelines; The Netflix team rewrites the internal version of Spinnaker on top of Temporal right now... And there are a bunch of other CI/CD systems doing that. Then business flows, certainly, things like Uber Eats, things like DoorDash, they are very good examples of business flows. Payment system, like people doing peer payments, sagas, realtime payments, or asynchronous payments... A lot of FinTech use cases. And then go up the stack like mortgage processing... If people are involved. So you start from a very infrastructure automation, or low level IOT device management, and then go up to business flows which involve humans. Ryland Goldstein: Really Max, you're talking about, there're two different dimensions of the question you're answering, and I didn't even realize what the other dimension you're answering was until the end. Which is, how do you piecemeal onboard a specific use case, incrementally bringing on more and more of a specific use case, and then how do you piecemeal onboard a company? Which is the second part of what Max is talking about. I think the one to add on that first one, honestly, the most common pattern is that people have an existing system, it's not built with Temporal, it's running in production, they build a parody system with Temporal, they incrementally move more and more traffic over in a mixed mode, until they're comfortable and satisfied with things, and then they just cut over once they realize Temporal is going to make their life 10X better, which is almost [inaudible]. Maxim Fateev: And also, you can start from a small subset of the system, and then they just keep growing that, yes. Noel Minchow: Yeah, got you. I could probably go read the docs but I'm just curious now that we're talking about it; say you have some long running service, something is sleeping for 10 days or even a year, and there's some bug in the code, discovered in the interim, right? Something's sleeping and waiting, and how do you think about that as a developer using Temporal, what considerations do you have to give to those currently waiting processes that haven't resumed yet? Maxim Fateev: So that is a very good question, practically you're talking about versioning, how do I upgrade code for long running processes? Which is a very non-trivial problem. There is no super ideal solution in the world for that, but we have, I think, the next best thing. Practically, we allow you to upgrade your code, change your code while processes are running. So if you have a bug, obviously you cannot change the past, if you pass the past part of the bug, you already executed that... Okay there is actually another feature we can do around... And we'll already probably want to talk about that, but in general, if there is a bug on the path which workflow didn't reach yet, you can just upgrade your code and we will make sure that this new code will be executed. And we have support for that. There is another part around... You want to cover that? Ryland Goldstein: Yeah, so we also support a capability... Because one of the ways you can conceptualize the way that Temporal tracks the stuff that's happened in your workflow, your application, is something that is similar to the history that git provides. So we have this log of all the different events, things that contributed to the state of your workflow being what it is at this point in time. And so what you can actually do, we have this feature called reset, where you can actually choose a specific event in that log, that history of events that's representing your workflow's execution, and you can choose to reset back to a specific point in time that your workflow was executing during. And essentially what will happen is that there will be a new execution of that workflow that's created from that point in time, but then it will continue executing as if all the stuff that had been recorded before had never happened in the past. And so essentially what you can do... Obviously it doesn't undo if you call- Maxim Fateev: But there is one comment, it reapplies events automatically, so it won't lose events which were sent to that workflow even after that point. Ryland Goldstein: Yeah. Noel Minchow: Cool. Maxim Fateev: So I can give you real use case. For example, you are in an airline, and you want to do the system to track airline points, these bonus points. So practically you can just write a workflow which will keep those points inside of its variables and listen for external events like trip completion events, and then increment those points. And then as soon as you reach some certain number of points, you can run an action to call some downstream service, promote that user to the next tier. This is actually a real example for of one of our users, and that what happened is they actually had a bug which miscalculated those points, and they knew exactly which deployment caused that. So when they rolled back the code, but then they also marked the feature saying that previous build was bad. And it actually found all places, each workflow which was touched by the bad build, rolled them back to the point before the build, and reapllied all trip completion events again. So practically we did a backfill with a single command, and across millions of users in parallel. And try doing that on top of a database, if you corrupted that with your bad code. Noel Minchow: That's wild. Yeah, that ability to go and replay events from the past... It feels like a little bit of a Pandora's box, I would think, in implementation. But it does sound super useful for those kinds of cases where it's like, "Oh we've got corrupted our data, but we do have a very detailed event log of what can and cannot be rerun." Is that feature pretty new? Has that been a recent development? Maxim Fateev: It's been around for the last four years, and we also have the log. Temporal, by definition, by its implementation has this log... When you write your code, you don't think about it, it's just part of the system. Noel Minchow: How about new stuff on the horizon? What's coming out soon? What are you guys excited about? Maxim Fateev: Oh, a lot of things. Just to give a high level picture, it's open source. Temporal.io, you can go to our website, its open source, it's under MIT license... Some libraries... Java SDK is Apache license. So anyone can download that, you use it, there are a lot of companies using that in production, we have a long list of companies we can reference. And we have awesome case studies on our website. And at the same time, the way we monetize that, we monetize it as a SaaS offering, because Temporal consists from two parts; the application code, which you run, the SDK libraries... And then there is a backend service which keeps track of the state, that performs queuing and all sorts of durable timers, all these other things. So you can run both components, you can run the application, you can run the backend cluster, but we as a company provide you a cloud solution when we run these things for you, and this makes your life easier... Also, It creates good incentives because we're not creating this... Usually you break this open core model when you have this kind of version of the product, which is for selling, and then version of the product which is not for selling. We provide the same API on the cloud and the open source, and we guarantee backwards compatibility; we guarantee if you can run against our cloud, you can run it against the open source. And in the future we will provide even live migration tools, we'll be able to switch traffic back and forth to the cloud and from the cloud, without downtime, we will have those features in future. So for us, our cloud is something which was more like in private, we had over 50 paying customers, but we had a very long wait list to sign up for that. And this year, we are actually finally getting to the point when we will go for that wait list, and I believe reach as soon we will, anyone who can join and get access to our cloud can do it immediately, with self sign up flows and so on. That is the major deliverable. And on the open source side, we have a lot of features, Ryland is a product manager, so he can talk about the exciting features which are coming. Ryland Goldstein: So I think the one that's the most pressing and top of mind, that we're most excited about is Python support. So that's probably been one of the most requested maybe tied with JavaScript and TypeScript support, which we added... Actually GA, was earlier this year. But Python is constantly asked for from people in the community, all the time, so we've been working on that. We have an excellent engineer who's been driving that, it's now in the final stages of being in beta, and so actually it'll be [inaudible] very soon. And it provides a very simple way of getting into Temporal and understanding Temporal, compared to the other SDKs, just by the nature of how enjoyable and easy to use Python is. Unless you ask Max, he has slightly different view there... But I think in terms of other functionality that we're super excited about; right now, the way that most people get started with Temporal is through Docker compose files, and we do feel like this is a significant barrier of entry, especially for people who aren't bought into the Docker ecosystem, or don't want to have to bring that level of dependency down. And so we're going to be offering an all in, single binary for the product, where you can just download that, and actually that was work done by Datadog initially, and so we're building on what they've gifted to us. And so our hope is that by the end of the year everyone will start using Temporal through this single binary that they install with Brew or AFT, or whatever else they're using to install packages. So that's another big one. Another really cool feature is that right now it's possible to update a workflow that's already running, and send an event, for lack of a better way of putting it, and it's also possible, separately, to ask for data; read only data from a workflow, which we called the queries. But right now there's no way to both send data to a workflow and get something back at the same time, there's no request and response primitive, so we are actually adding that, workflow update is something that we're super invested in, we're working on now, and that will make it so there's just a single primitive for sending things in and getting things out in most cases. And then the last one, which is probably the most fundamental, probably the hardest one to explain in the time we have; but right now if you want to use Temporal for a bunch of use cases, and you want those use cases to interact with each other, and you want the applications that you've built to interact, Temporal does not provide necessarily the best solution for that today. And so we've really wanted to, for a long time, solve this problem, where the moment that you are using Temporal and you want to connect with other people who are using Temporal, that should be the greatest experience in the world, and that's really what we're driving towards. And so we have a new project, it's called Nexus, it's something that Max and I have been working on for two years now, and essentially it provides you a way to actually surface your Temporal application logic and the things that you've built, to other people who are using Temporal, and eventually even people who aren't using Temporal, and so we're even building an abstraction layer that represents the value proposition that Temporal provides, but without requiring you to actually buy into Temporal, and still getting those benefits. And so in that way it'll be possible to interact with Temporal applications outside of Temporal, or interact with applications that aren't built with Temporal, from Temporal but in an intuitive way. And so Nexus is for us, really a next step of the product and the technology, in terms of getting adoption and expanding the use cases that are possible. Maxim Fateev: One other way to look at that; imagine right now we talk about from service meshes, and RPC services, but those don't work very well as soon as you have an operation which can take a long time. So if you make a request and the request take two hours, you cannot use RPC services, and service mesh is not going to help you. And people work around that with web hooks, there are queues, we are pooling, and there are various ways to do that, but there is no standard in the industry. If I go to 10 developers and say, "Can you define an API for a service in which operations can take three hours?" And this is what we are trying to do with this project is practically say; We will actually have a standard way to define long running operations, we actually call them arbitrary length operations; ALOs. And then there will be certain capabilities attached to that, for example, they're cancelable, this is the way you discover their current status, this is the way you can get the result, and you can bind them to different technologies, you can bind them to queues or to web hooks, but then you can make the way you work with them very seamless. And that is what we are trying to do there, is just define this ALO abstraction and then on top of that build the service APIs, and your API can be more like, "Oh yeah, this is my input in output, but it can take three hours." And it will be very good experience, because from Temporal, when you call it, it's just a blocking call, it's just a normal synchronous API call. Noel Minchow: Yeah, so is the goal then, with Nexus, is it fair to think of it as a standard that you guys are trying to establish? So it's easier, in the use cases we described before, where our transaction fails on the second call of three, so it's easier, "Oh, if this fails I can get more information from the upstream service if they are using the Nexus paradigm as well." Ryland Goldstein: I can answer that, but I think I'm going to use an analogy Max is probably not too fond of. So there're two different layers there; there's Nexus, and then this thing that Max mentioned, which is called arbitrary length operation. ALO is the standard, and the way that, at least with the majority people I've talked to, it really helps them conceptualize it, is that the value you get out of, at least in JavaScript, of a promise, the concept of a promise, imagine if that extended to something that could be represented by a server. So the concept of a durable or server side promise, that represents this unit of work, which is eventually going to complete or fail, which a type of completion. So that is the first thing that we're talking about here, is this standard, which has nothing to do with Temporal, which is really just describing what is the pattern in the back and forth communication required, and the API required, to support this concept of a long running operation, an indefinitely long running operation. So that's the first thing. The next thing is this Temporal specific portion we've talked about which is called Nexus, which really allows you to expose an API, any API from Temporal using your existing Temporal primitives like workflows and activities. The thing about it is that it allows you to expose those things as Also, and so the idea being that if anyone else then buys into ALOs, they don't need to care whether they're calling something on Temporal or calling something that was backed by Kafka in a database, all that they need to care about, "Does it support the contract of an ALO?" Because that's really what fundamentally represents the value that you get out of Temporal in the first place, and so why does it matter whether it's Temporal at the end of the day, as long as you get that interaction and that contract, that's really what you care about. Noel Minchow: Got it, that helps a lot, thank you. That explanation is a particularly clean one. Thinking about real world events like server operations as awaitable things in whatever language you're writing in is a pretty... Ryland Goldstein: That's a very good way of synthesizing it back. Noel Minchow: Yeah. Maxim Fateev: And in practice, if you use Temporal to consume that, then you will actually get the real promise back, at least in TypeScript. Noel Minchow: Yeah, it probably feels pretty magical when you get it back and say, "Okay, I don't even have to think about it as the dev, it just is the data that I requested, it's just maybe been three days." Ryland Goldstein: Exactly. Noel Minchow: That's very cool. Well then you guys talked about the future quite a bit, is there anything else on the horizon, on the roadmap that you guys are excited about? Or even just stuff you've recently published? I see some notes on visibility and traceability. Ryland Goldstein: There's one thing I have to talk about; we had our first user conference in August... We started as a company during the pandemic, many of us have met each other not that many times, and so this whole idea of in person has been a bit up in the air, no one really knew where that was going, and we have a really big Devroll team at the company, especially now, and so they're trying to understand how do they align with this new world, and how do they provide value and all that stuff, and we even tried going to some other large conferences around developer ecosystems, and we saw a really not great turnout and we were discouraged there. And so we decided though, at some point there had been enough traction and enough users in the community that we really felt it was time for an in-person event. And so we decided to have our first conference actually, just August 26th. And so that happened in Seattle, we ended up getting, I think it was almost 300 people for the main day... We couldn't be more proud and just so happy and appreciative of that turnout. It was one of the most amazing moments, at least for me, in my time working at any company, just seeing how many people were genuinely interested in what we spend our time building, and what we invest all of our energy in. And so that was a really wonderful experience, it gave us a really amazing understanding of who is out there in the Temporal community. You have entire tables at the conference that are just dedicated to specific companies like Datadog or Stripe or other really literally large companies, and that's just a really magical thing to see. And so I think that was a very exciting thing for us, everyone who was there, they left with an energy that I don't think you can really replace with anything else. So it's definitely worth mentioning, that was a big highlight for us in the last couple of months Maxim Fateev: And as it was very successful, we plan to have a much larger conference next year, it was the first one, so watch us and please join us next year. One other thing, you mentioned visibility, I think one important thing about Portal, we always talked about how you write your code so far, but because we build this at Amazon and then at Uber, we actually cover the whole life cycle; how you make sure that your application never goes down, how you make sure that you can upgrade your service without downtime. How you make sure that it can roll back things. How you make sure that you can version your workflows. How you troubleshoot them in production. So our value is not only in improving the developer experience during development time, but it also gives you a huge advantage during operations. Your life is so much easier because we record almost every event, so troubleshooting things in production is so easy. You can even go... You have for example now, pointer during production, in your workflow, you can just download the history, open that in debugger and replay it as many times as you want in your debugger, just to troubleshoot. So if you got an error once in production, you practically always can produce that. And then we do error handling, for example, you can have a service call another service, call another service... And we can give you exceptions stack trace across four different services in different languages, for example. And I don't think there is any other system... And we integrate the standard stuff like metering, tracing... Open Trace and all these things already are integrated into the system, you get this out of the box. Noel Minchow: Yeah, nice. Very cool. It sounds like it'd be quite the dream to be all set up and able to debug and figure stuff out that way. Maxim Fateev: One thing, I don't know if we've got time, Ryland can mention some companies which publicly are using us. Just to give people some sense of the type of users we have. Ryland Goldstein: So very quickly, Snap, they're using us for a bunch of things, but they're actually really interesting because not only are they using the technology, they're actually using our cloud offering publicly, and so if you've used a Snap Story, or posted a Snap Story at all, at least in the last good while, that's running a workflow every single time you do that, and so that's a really exciting one for us, obviously the scale there, you can read between the lines, it's large. We also have companies like Datadog who are not only super invested in terms of the amount of stuff they're building with it, they're also contributing a ton back, and so that Single Binary experience for example, that was actually something that Datadog initially built out, we've actually learned a lot and been inspired a lot in terms of the product investments they're making, based on just things they're organically doing internally. And so I think at this point there're over 400 people that are writing Temporal code at Datadog, which is just absolutely insane. And then we have companies like Netflix, also, who are heavily using us for various parts of the internals in Netflix for CI/CD stuff, a lot of the stuff around Spinnaker... Companies like Indeed, basically they're ATS for candidates who are coming in, and they need to send those to other backend systems. Qualtrics does their top level workflows product with Temporal. So yeah, there's a ton of really impressive companies who have taken a bet on us and are just wonderful partners, and we're really super appreciative of that. Maxim Fateev: Same. Ryland Goldstein: Oh, one last thing is that we will be at GopherCon in like a month, so if anyone is at GopherCon, please come say hi to us. Noel Minchow: Nice, it's awesome to hear that, we'll be sure to get a link to it in the show notes so listeners can go check it out, and play with the product a little bit. But yeah, thank you guys so much for coming on and chatting with me. It's been a pleasure. Emily Ketner: Hey, this is Emily, one of the producers for PodRocket. I'm so glad you're enjoying this episode. You probably hear this from lots of other podcasts, but we really do appreciate our listeners. Without you there would be no podcasts, and because of that it would really help if you could follow us on Apple Podcasts, so we can continue to bring you, conversations with great devs like Evan Yu and Rich Harris. In return, we'll send you some awesome PodRocket stickers. So check out the show notes on this episode and follow the link to claim your stickers as a small Thanks for following us on Apple Podcast.