Ruben Fiszel:
As you grow, you learn to not reinvent the wheel because those languages are already extremely advanced, extremely complex, and you're just going to do one more poor implementation of something that already exists.

Eric Anderson:
This is Contributor. A podcast telling the stories behind the best open-source projects and the communities that make them. I'm Eric Anderson.
I'm joined today by Ruben Fiszel, who is the creator or author of Windmill. Windmill's an open-source workflow engine. You could also call it a developer environment. Windmill's arrived on the scene, at least for me, out of nowhere and grown to quite a popular thing. 5,000 GitHub stars, a big Discord community. Ruben, congrats on all the progress and welcome to the show.

Ruben Fiszel:
Thank you for inviting me.

Eric Anderson:
So let's quickly describe what this thing is. I called it a workflow engine.

Ruben Fiszel:
So I think you can see it as an onion of layers. And so it's a developer platform that embeds workflow engine. And so, one of the question we have is, why not reuse a workflow engine? Why reinvent it from scratch? And the reason is that sometimes when you integrate things vertically, you can get better performance. The abstraction make better sense, they integrate better with what we have.
And so there are already good solutions for integrating with the existing workflow engine. So I'm thinking about Prefect, Dagster. You can also see Airflow or Temporal as workflow engine and we could have done just a layer on top, but then we would've missed all of the good things that actually make Windmill what it is, which is extremely low performance, extremely flexible, the ability to run code as it is, so arbitrary scripts, and we have some abstraction around what is the main function. And the main function is all you need to basically run one of the step of Windmill, which is very different from some of the other abstraction used by the other workflow engine.
For instance, Dagster, Airflow, Prefect, they're all based in Python. And so it all makes sense if it's a unified code base, which is all Python, but what if you have a mix of batch script, TypeScript, Python that you like to mix together? This is where basically making some different choice are essential. And so we had to reinvent from scratch and in the end it took a while, but the results speak for themselves. So I'm pretty glad we went this route.

Eric Anderson:
Got it. And so I imagine a lot of people adopt you for a workflow problem and then they enjoy the developer platform and even use you in non-workflow scenarios.

Ruben Fiszel:
Yes, I think workflow in general is a domain that is still largely unexplored. So of course senior engineers know about it and they already use great solution about it, but I think a lot of people are unfamiliar with it and they basically reinvent it for their own problem or basically patch it with some pep sub cues and some microservices are in there. And so everyone is reinventing workflows and so are we, and so we're trying to build a standard and so maybe it's one more standard.
But the benefit here is that a lot of existing workflow engine kind of force you to adopt their, I think sometimes convoluted API. We're trying to fit into the existing way people write code and just basically to build a workflow abstraction on top of it. So we really see it as an orchestration of different scripts or different functions, the same one you would write if you were writing your own backend. And it's exactly the same code. There is nothing that is basically specific to us and the code is code that you can port. If you were to move out of Windmill, you can just port it and include it in your microservice.
So what do we provide on top of the icing building from scratch? Well, just doing some piping between the steps is not so easy because you basically have to say, well, I'm going to do a sequence of steps one after the other, and then each step is going to have to go to completion. And then once they go to completion, they have to take the result as input. But then maybe if you do a complex dag, you're not just taking all the [inaudible 00:04:06] outputs as input. So now you need to separate the order of execution from the DAG that represent the input flow. And then now you want to do some more complex things.
Say I want to have my workflow sleep or suspend for a certain amount of time before I reactivate it and I don't want to do that by doing an active weight and you want to do some branching. And basically there's a bit of like, yes, you could do that yourself, but then as soon as the complexity of the workflow grow, then you're kind of glad that someone build an abstraction that you can just reuse. The objective with what we do is to do it in a way that doesn't force you to change the way you would've written the code otherwise. So it feels very natural for a software engineer to basically use Windmill and they don't have to change the way the write code, they can just use our local builder to basically orchestrate everything. And so it feels very natural for someone with a software engineering background.

Eric Anderson:
Got it. And you're right, I think we've seen people in any company of any meaningful size begin to develop their own workflow engine if they don't, aren't given one because of the need to describe. And so there's a long history of folks with homegrown workflow engines. Maybe you could speak a little bit to the developer platform.

Ruben Fiszel:
So first I think there is the idea of this is a platform so that you can expose workflows and scripts to the rest of the organization. So we take care of basically building a front end for each of those flows and even for the individual scripts. So we do that by parsing the main function, the parameters extracting what are the kind of input that it needs, and so building the basic form that would represent the inputs of that script. We do the same thing for flows where we know the shape of that particular script so we know how we need to chain it inside the flow. So we take care of basically having this platform that you can expose to the less technical people of your organization, not as a workflow engine, but as a developer portal or a internal portal. The other thing is that you can centralize your permission.
So groups, general permissioning, read, write, basically people are in different groups, they should have different access and other uniform access. You have in most workflow engine where they just basically expect that as soon as they get a job, they just have to process it. Tied to the question of permission, there is a question of secrets, how do you handle sensitive values? And then what we did on top of all of that is that we built a UI builder that is very similar to something like Retool where you can basically drag and drop things. One difference is that it's not just another retool. We made it with a software engineer or backend software engineer in mind in the sense of you do want to expose quickly what you've done as workflows or as scripts, but without basically having to spend too much time building front end.
But you want kind of like to be able to leverage a full power of your scripts. And so the idea is in most cases in five minutes you can build a very complex dashboards or something that shows a lot of data without it being a front-end engineer. I think that's a bit different from a tool like Retool where basically the idea is even if you're a front-end engineer, you'd like to basically build a very complex dashboard where you can customize everything. And so it looks very similar to what you would've written with code, but without the need for code, we're not trying to do that. We really emphasize productivity. And the goal is that as a backend software engineer, you can really build your dashboard that is very powerful in really, really quickly and make it basically also very easy to maintain.

Eric Anderson:
We published or open sourced a enterprise chat bot recently here at Scale Venture Partners. And originally we were storing things in spreadsheets and in S3 and we had to have permissions, we had our own little scripts for managing permissions to both of those. We would get authentication tokens and stash them places. And then we tried out Windmill and realized that all that permission management to third party sources is kind of managed for us. There's a bit of a life cycle around those secrets. So I think you've addressed the problems that people typically run into, at least from our experience.

Ruben Fiszel:
My experience that people end up reinventing the wheel every time, which I think is almost necessary because it's really hard to build a generic solution, especially if you kind of build it in a very opinionated fashion. So I think what's cool with engineering background is that you can try to build non-op opinionated solutions that basically are just tools that people can use the way they want. And so in many ways the way to do that is to basically text things to their essence and not try to build too much. So we provide a lot of a minimal solution for everything with a very open API and then you can build a lot of it and yourself.
And that's because everything is basically either a resource or a script and a script is code and we are not inventing programming language. They already exist, we just run them. And so those programming languages are extremely flexible and once you are able to in those scripts to fetch the resources that are permissioned, then you basically are free to reinvent your own way of thinking about permissions, like secrets and everything. So we don't want to impose necessarily one way to do things. Everyone has their preferred way to do things. We just want to give you the most basic tools so that you don't end up reinventing them again and again.

Eric Anderson:
Talk to us about how you got to this solution, Ruben. Maybe a little bit of your background, what led you to this place and then we'll get into more of the nuts and bolts of it.

Ruben Fiszel:
So I did my studies in at EPFL in Switzerland where I was working in the Scala lab and a lot of my background is compiler related. I then went to Stanford to do research on compilers. We were doing Scala to hardware synthesis, so how to get from a very slow language to hardware acceleration, which is as fast as you can get. And then I work for Palantir in as a dev tool lead. And there I kind of learned how the industry actually work and how people actually use software both as a tech company and as a producer of software for other companies. And then I worked for a startup where I learned the woes and the troubles of being a startup and then I learned a lot of things in each place. I think as my compiler background really gave me a taste for designing API languages and also a taste for not reinventing languages.
So I think a lot of the first thing you do when you learn about compilers is that you build your own DSL, you build your own compiler because it seems like, yeah, now I have all this power. And I think as you grow you learn to not reinvent the wheel because those languages are already extremely advanced, extremely complex, and you're just going to do one more language that is going to be a poor implementation of something that already exists. So I think a lot of my intuition for basically trying to wrap existing language and stuff like writing your own DSL come from there. And the dev tool at Palantir, I basically, or even just at Palantir, I learned the strengths of building a platform. So Palantir does a product that's called Foundry and a lot of its strengths is that it is really a platform in which people can spend their whole journey as an analyst.
It's more geared toward really large enterprise and big data and your journey as a data, how to do ontology and how to treat data inside [inaudible 00:11:10]. But the cool thing about it is that basically it's very generic. It does a lot of things and then every member of the company can use it in different ways and can get value out of it. It's not just the engineering team that has a tool somewhere that no one understands how it works, no one can interact with and you just assume that it works. Now it's the whole company that leverage it. And then even non-technical people or semi-technical people, when they're able to contribute, they can bring a lot of value to the stack that when we just try to isolate everything miss. And so the idea of building a platform and the strengths of building a platform, I really got from there.
And I also saw the need for a tool like Windmill at Palantir. So I figured, well later on when I was like, is this a need? Is this already done? I had the intuition that if it wasn't solved for Palantir then there wasn't that many tools out there that were doing this. And then in the startup kind of the same I was. So I was trying to solve a lot of problems quickly and then being a software engineer, I wanted to write them with code and I realized that writing them with code took way too much time. So you have to deploy in them on AWS Lambda or CloudFare workers. If you want to do the right thing, and you want to as an engineer and do everything with code, it takes a lot of time. But if you want to do the quick and dirty thing, then we don't have the tools available today because it's going to be a lot of no code, super rigid tools that you're going to outgrow and are not a scalable solutions.
So I wanted to build this developer platform that you could quickly iterate on but that you could use for production later that you would never outgrow. And so you needed a few things. You needed to have the right abstraction, so code, you needed to be extremely performance oriented because as developers, there is nothing more than you hate than making a choice that are you're going to basically outgrow because it's so slow. And also as a developer experience is really, really related to speed and performance. A tool that is fast feels great as a developer. And trying to not be too opinionated to really make a developer platform that is not trying to reinvent or to basically impose what it thought was the best way to do things, but just give you a way to quickly deploy code to run them as workflows to permissions and then to build quick UI on it and then basically the ultimate tool in a way and then let software engineers do what they want out of it.

Eric Anderson:
Awesome. I wanted to talk about two other things. One is you live in Europe, you've been to the US for Stanford and for YC, I assume. There's others from Europe that we've talked to. I'm curious how you see the value of being in the US versus being in Europe and approaches to building an open source community independent of that.

Ruben Fiszel:
I feel like the injuring mindset of being European leans more toward building infra or I'll say deep tech or for some definition of deep tech where basically you really care about the engineering and not enough about the, I'll say business aspects of things or the marketability of things or the way to scale them accordingly. So from that, I really got a taste for the hard engineering and I'm really glad I got that education. But then when I went to the US, I realized how clueless I was about many things about how you actually scale a business, what it is to do sales. And I realized there was a reason America was ahead in a lot of places is you guys are absolutely the best at sales, business, capital allocation. I think there is kind of you dream big and you see potential in things.
I think a lot of Europeans think about what's going to fail. A lot of the Americans think about what's going to what could work. And so they were this idea of, well, it's cool to want to build a very complex stuff, but the US gave me the ambition to actually make it a business and to actually think it could grow a large community out of it. And the rest is basically what you learn at YC, how to build a nice product, how to iterate quickly. And there is a bit of tension between doing infra and open source and the mantra of YC, which is to build things fast. But I think you can combine both in a way, and today's probably the nicest period to build open source because people are getting really used to open source.
There's a lot of tools available, there's a lot of technology around, I mean Windmill is very large, but it lays on the shoulder of giants. So basically a lot of the AC transaction provided by Postgres, for the front end we use Velt, which is a very cool framework that allow you to iterate quickly. We use Rust, we use Tokio. So there's a lot of those things that maybe didn't exist three years ago and allow you to really quickly build this innovating solutions quickly. So it's really a cool time to build stuff, I think.

Eric Anderson:
Certainly what you're doing is working. So I can imagine the combination of your past experiences as well as geography and work related have served you well earlier as we are warming up, Ruben, before we started recording, you mentioned there were some really interesting kind of technical details, things that you had to figure out in order to solve this problem in the right way. Maybe you could go into some of those for us.

Ruben Fiszel:
Yeah, for sure. So I think there's two main ones, which is one which is container, not containers. The CNF , Container and Native Foundations, and a lot of framework today, they rely on containers as the unique of abstraction for executing arbitrary logic. It has one drawback which is building a container takes time. So you actually, when you want to quickly iterate, you are losing a lot and actually building a framework that is fast is impossible when it's like you always have to build a container first. The second thing is like well, if you want to be able to deploy many scripts, many different workflows, then you need to be able to store those heavy containers in storage. So Lambda for instance, what it does is that it package all your dependencies in S3. Then when it want to run your mirco VM or your endpoints is like that, pull them from S3, which work fine if you have a lot of storage, but basically in a case where you want to be able to on a small setup, have thousands of endpoints that would not scale.
So that's what choice we basically did. The second was around essentially building long-running processes or running long-running processes or not. So what we do in Windmill is that we basically only store the cut source. We don't store the zip of the dependencies or anything, and we store a lock file, which that cut source. And so at runtime, when we basically pull the job, we pull the source code and the lock file and then we need to basically technically reinstall the dependencies and then run the cut source. That would be very, very slow if we were only doing that. So now we did a lot of engineering so that basically we cash the dependencies and we create virtual environment on the fly. And so we do a lot of things so that it is instant and in no way we are basically doing an alternative approach to containers where we can mutualize a lot of the already shared dependencies.
And we can do that because our basically domain is not as large as containers. Containers can run literally anything. We are only running scripts in Python Go Bash type scripts. So our domain is more constrained. And so now the only thing we have to really care about is how to handle very well the dependencies in those languages. And so by constraining the problem, we can come up with an alternative solutions to containers. The second thing that I mentioned, which is running long process or not running long, long-running process, which is what basically Temporal or Lambda does, which is when they get a request for the first time, they need to put a micro VM and then they need to put an HGP server, and then they basically send for each request, they send it to the HGP server. Once HGP server is hot, it's going to be extremely performance, which is why Lambda and some other framework making that choice scale very well.
The problem is that they suffer from Cold start, which means that well actually there's a lot of ceremony to do all of this. So the trade-off in most of those cases is like you optimize for scale and you optimize for throughput, not for latency. So a lot of problem in small startup or companies is not something that you want to scale to a million requests a second. They're more of the nature of once every five minutes at most. And so trying to fit, crown everything into one solution, even though it's like it's a very leg elegant one, is not necessarily the right one depending on your problem. So we do that the same way. Explain that we basically run the scripts bare. We don't actually run, we don't spawn a container, we don't spawn an SGP server. We just basically spawn a fork that is going to execute your scripts bare, which allow us to have the lowest latency across all of the other engines.
So if you were to from a cold infra run a workflow from scratch, we're going to be faster than any other. So yeah, it works that one aspect. The second aspect is because we run the scripts bare, you don't have necessarily a lot of the constraints you have with container where you cannot access memory or you cannot access GP acceleration. The way we run scripts, they have access to your wall node, and then if you want to do resource isolation on top of it, you basically use Kubernetes to do that where you say each worker is going to be dedicated that amount of resources. So it's a lot easier to basically build a large scale infra on top of that because we don't ourself are built on top of the container abstraction. Our workers are containers, but they don't run containers, they run scripts bare.

Eric Anderson:
Interesting. You're making more sense. As earlier in the show, you talked about how by combining these into a single platform, you're able to make some design decisions that benefit this kind of narrow set of use cases and what you're doing with containers or non-con containers makes a lot of sense.

Ruben Fiszel:
I think some basically back of the napkins calculation can show that basically a single worker, so a single container on a very tiny C2 instance can run around 14 million scripts in a month. And so 14 million scripts is a lot, and the assumption is basically that every script is going to take a hundred milliseconds. There's a lot of companies that could be the whole infra could be served by a single worker and they don't realize it because basically they've built all of this abstraction for large scale and everything. And so I think there's a lot of counterintuitive math that people don't realize that maybe they don't need this Cassandra cluster, maybe they don't need this Kafka cluster.
Their problem is actually smaller than they think. And this is very in line with a lot of movement today is which Doug db for instance, made by the creative of BigQuery where basically you realize that you don't need BigQuery in most cases. Like 99.9% of the workflows could fit in memory. It's kind of the same for us in most cases. You don't need Sharding, you don't need Spark to do ETL, you just need to have one node that execute your world workflow. So there's a lot of those things that might seem counterintuitive, but that hardware today is extremely fast and one node can do a lot and it's efficient to basically be able to have an infra of 10 to 20 nodes to be able to run really, really large scale compute.

Eric Anderson:
Another aspect of Windmill that I don't think most people from the outside appreciate is when I dig into reviews on Hacker News or in Discord, people talk about Windmill, they talk about Ruben's incredible customer service support that he seems to be everywhere. What's your approach been for supporting Windmill and new users and how do you sleep at night I guess?

Ruben Fiszel:
So basically we wear or we are, but we are mostly done on the building phase for a long time. Building phase mean we code and meaning that I'm behind my laptop most of the time writing code. So Discord is never really far away. I think there is the aspect of we're going very fast, so sometimes there's some bug that must have slipped through. The fact that some people from the community will do bug report is doing us an enormous favor service because digging bugs is a lot harder than solving them. So by having good bug report, we want to reward that kind of behavior by being extremely fast. And also, I mean I would be very frustrated with using a software that is buggy.
So there's this kind of a feeling of pride that you do want to solve that as fast as possible. So it feels very natural. It's the one thing that you want to do, which is someone made the effort of telling you about an issue or giving you feedback, just treat it immediately. It's really easy to do at small scale. People can just send me a message on Discord immediately and then I can just depending on the nature of the feedback or the issue, I can drop whatever I'm doing and fix it and then come back to it.

Eric Anderson:
Makes sense. Tell us about the future of Windmill. So today, or actually maybe to capture the current state Windmill's an open source project, there is kind of a cloud offering. If I want to execute in a SaaS style, where does Windmill go from here?

Ruben Fiszel:
So I think Windmill is a lot more generous than any of their other comparable. So it's not only it's open source, we also have SSO included. Most of the, I'll say enterprise features are actually in the open source edition. So the strategies really basically to get to grow the adoptions, to be deployed everywhere. And most people will probably not need the enterprise edition, but we think at large scale for enterprise that are really going to derive a lot of value out of it, they're going to want to have this connection with us, they're going to want to have support, they're going to want to have this one plugin that makes them a bit faster at really large scale.
So the future is really building the best tool out there so that people love it, have it deployed everywhere, and then eventually we're also going to make a bit of money because if we only capture 1% of the value we create, but create billions of value, then at some point we're going to be able to make it a viable business. So right now we have clients, we have customers, we're really lean team, we are in no financial trouble. And that's really kind of give us the ability to be very ambitious and not trying to just do short term strategies where you need to monetize everything. What we trying to optimize is right now is customer satisfaction, making sense as a wall, as a platform.
I think building a platform is really, really hard and there's a lot of essays around it because it has to be extremely consistent and being consistent is one of the hardest thing because you always want to do that one feature, but then that one feature will go against the rest of the platform. So it's a lot of those choices to make right now and we were really, really lucky to have really good investors that gave us the time to basically sync thoroughly to those problems without having to rush through to anything. And so that we were build able to build really solid foundations. Now that we have those really solid foundations and we can think about the next stage for us, which is now to get it into the hands of not only startups but also larger companies, which I believe could benefit enormously from using Windmill.

Eric Anderson:
One of my favorite parts of doing these interviews is I find that often I've captured people in what amounts to their life's work, and it feels like Ruben, a lot of things have led to this and you're pouring your full kind of passion into it and we're all the beneficiaries. So thank you.

Ruben Fiszel:
No, thanks a lot. Thanks a lot for having me.

Eric Anderson:
You can subscribe to the podcast and check out our community Slack and Newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.