Noel: Hello and welcome to PodRocket. I'm Noel, and joining me today is Zoltan Kochan. Zoltan is the creator of pnpm and working on dependency management at Bit.dev currently. Today we'll be talking about his recent talk at this year's Vite Conference, and that talk was titled, What Makes pnpm Performant. Welcome to the show. How's it going? Zoltan Kochan: It's good. Thank you for having me. Noel: Of course, of course. To get rolling, can you just tell us a little bit about yourself and your background as a developer, and then how you found yourself at Bit? Zoltan Kochan: Okay. I live in Ukraine. I'm 32 years old. I have a wife and a daughter. I started my career as a full stack .NET developer. I started at the little outsourcing companies and I joined JustAnswer, which is a Q&A website. Verified professionals answer questions for subscribed users. I've worked there for a long period, more than nine years, and during my work there I have discovered pnpm around 2016. The reason I really loved this project was that it was really fast. At that time, npm was super slow and it amazed me how much more performant pnpm was. Also, it wasn't [inaudible] yet, so it was a proof of concept at that time and it was not maintained. But I really liked the idea behind it, so I started to maintain it. I really improved my JavaScript skills, because I was more into C# before that, but I really started to love JavaScript, TypeScript front end stuff. At my work I switched to DevOps team to work with JavaScript tooling, but it was pretty hard to combine my work at work and the maintenance of pnpm. It was five years of this. Pretty stressful, hard switching between two jobs. We had some communication with Bit for the last three years and they also sponsored our project pnpm through Open Collective because it was a really important part of their infrastructure. They allowed to choose between Yarn and pnpm. They don't implement their own package management solution, but they use the programmatic API of either pnpm or Yarn. I helped them to integrate pnpm. We had some history together, so they asked me to join and to work on dependency management. So at Bit, I work on the dependency management aspect. They have a lot of nuance in that area. They use package managers on a really low level because they don't even have packages on files in their workspace. Everything is calculated behind the scene by Bit, and on a low level they tell the package manager what to do. Bit is a very, very interesting company. I really suggest everyone to check it out. It has a pretty big learning curve, but if you commit to it, it's really a great way to develop components with Bit. Noel: Nice, nice. I've got some questions about Bit specifically and how devs might get it into their workflow. But before we get there, can we talk a little bit about package managers more abstractly first? what did the space look like, the package manager ecosystem, before you got into pnpm? Zoltan Kochan: At that time, I think only npm existed. There was also Vavr, which was I guess in maintenance mode. It wasn't so actively developed. Npm had lots of issues. It was really, really slow and it had some issues on Windows in... I don't remember which version it was. Three, I think probably. Until version three they used a nested node modules structure, which was bad on Windows because on Windows you have a limit to the file path length. So if you have a lot of nesting, you'll have some weird errors. You won't be able to even remove those directories. So it was really a pain. It was also not deterministic because back then they didn't really have a LOCK file, they had a shrink wrap JSON, but it wasn't on by default and not as good as a real LOCK file. Noel: What was the big motivating factor for you... When you discovered pnpm, what state was it in and what drove you to it? Zoltan Kochan: First of all, it was very fast, and at that time we had a big monorepo at JustAnswer with about maybe 100 projects. Npm CLI was about 20 minutes on this monorepo. When I discovered pnpm, it didn't work at first on our monorepo, so it had issues, but installation was a lot faster. I don't remember the exact numbers, but I believe it might have been like 10 times faster. Because later on, npm also improved speed in version five, I think it was. Like three, five times faster. But at that point in time before Yarn and before pnpm, the difference was huge. That was one, and the second one was the central storage which pnpm used. I've seen big potential in this. I realized that we could use hard links or copy and write files, like copies on a copy and write file systems, to share the space for these modules so we could have the packages in a single place on the disc. This would drastically reduce the amount of disc space that node modules consume. I had real problems with disc space at that time. I think more than 20 gigabytes of disc space was used by node modules on my drive. Noel: Yeah, that's a lot. I feel like reclaiming disc space is always a noble goal, so I think that in and of itself makes sense. You said there was issues that didn't work right away when you first tried to install with pnpm over npm and it was kind of in an alpha state. What was wrong with it and how much work did you need to do to get it to a point where it was production ready? Zoltan Kochan: Yeah, it missed a lot of parts. It was really a fast proof of concept, I would say. Because it's really easy to write a simple package manager, the problem is that there are so many edge cases and little hidden features. It's used by 0.5% of projects, but in any project you use 1,000 dependencies. So you will always have this 1% or 0.5% percent of edge cases and it takes a lot of time to fix to implement everything. But I think the biggest missing part was the peer dependencies, the proper peer dependency resolution. That wasn't implemented. Optional dependencies I think also didn't work. Even life cycle scripts didn't work. It was an early stage. I think after 10 months of development, I have bumped the version 2.1. So, still faster. Noel: Once you did that, once you bumped to V1, how did adoption occur? How did people find pnpm? What has growth been? Zoltan Kochan: It wasn't viral effect, it was always slow adoption. I was a bit jealous for Yarn because Yarn came out maybe five or six months after I have started to develop pnpm. In two, three days they got a huge chunk of npm users immediately. Even today, I think we are far, far away from that level of adoption. But there are positive moments, of course, because you have less pressure and we still had some huge customers, huge new users. Very early on, Microsoft started to use pnpm. They have this project called Rush, which is I guess a task runner, like a monorepo management tool. They support Yarn and pnpm, but they recommend to use pnpm by default, and very, very early on I have worked closely with people from Microsoft to fix issues that they have had on their projects and to help them adopt it. Noel: Nice. Do you think it's the case that a lot of monorepos were early adopters of pnpm? Zoltan Kochan: Yeah. Yeah, definitely. Pnpm was a really good solution, especially for monorepos, because on a small project where you don't have so many dependencies, the speed difference is less noticeable, and the issues with npm are less noticeable with pnpm. Even compared to Yarn, PMPM is faster on big projects. Noel: That makes sense. As npm and Yarn have matured a little bit, they've brought some features that I think pnpm was aspiring towards originally, now, today, what are the big advantages of pnpm over just vanilla npm or Yarn? Zoltan Kochan: Yarn has switched to plug-and-play by default in version two. It's an interesting approach, but it's a bit less compatible with the ecosystem than pnpm current is. Of course, they have options to disable it and they even added an option to support pnpm style node modules. But still some folks were disappointed by this direction of Yarn and they prefer pnpm now, but mostly I think we have almost all the same features. We have some features that Yarn doesn't have, they have some features that we don't have, but I would say we have a 90% overlap, I guess, and they are pretty interchanging. Regarding npm, I think we had a big advantage and still have a big advantage, both we and Yarn, in the monorepo support area because they only started to support monorepos maybe a year ago, and we already support monorepos for maybe four years. We have a big leap there, and probably with this we have better supports. Noel: Yeah. How about workspaces, like the Yarn workspace equivalent? I feel like that was what drove a lot of people to Yarn initially. Is there a similar feature with pnpm? Zoltan Kochan: Yeah, that is what I mean when I talk about monorepos. This is the workspaces feature. We support the same in pnpm and we added this support around the same time. Unfortunately, we did it independently and when we were naming this feature, we actually used the workspace to call this set of packages, unlike Yarn, which calls workspace the every single package in the monorepo. Noel: Yeah, that all make sense, and you guys have a really good feature comparison chart on the pnpm website. If any listeners are curious, you can go and see what looks the best for your given use case. But now I'm curious about a Bit as well. You said that the way that Bit interacts with package managers is very low level. Can you expand upon that a little bit? Zoltan Kochan: If you open up a Bit workspace, you will only see directories and tests and source code files. You won't see packages on files, you'll see less node modules directories, and you'll see no configuration files for ESLint, for Prettier and stuff like that, because Bit hides all that from you. It handles everything. Even though there are no packages on files, there are separate components in this workspace. This is very powerful and it provides a very good developer experience. Because with Bit, when you start using some dependency in a component, you don't have to install it. You don't have to add it to the package.json. For instance, you have a button component... Or you have a card component and you start using the button dependency in it. So you just write the import statement and then Bit status will show there is a missing dependency, if this dependency wasn't yet used in the workspace. And you just try a Bit install card. Next time when you publish your component, in the generated package.json will have these new dependencies in it. But if you already have the button component in this workspace and you will write this import statement, let's say in a list component, then there's nothing you need to do because it's already inside the workspace. Its code analyzing tool will find this import statement and add the dependency to the dynamically generated package.json. Of course, there are edge cases when you want to maybe use a different dependency in some component. In that case you can do that with some commands. But it's a very pleasant, very good developer experience. And refactoring is really simple. You can move around code and you shouldn't care about updating the package.json fast, and you will never forget to remove dependencies that you don't use anymore. Noel: Yeah, that's a nice feature. I feel like we end up having to lean on tools a lot of the time to ensure that our packages are all being used, in the traditional ecosystem. So I feel like that is a pretty nice to have feature. Zoltan Kochan: Yeah. Noel: Yeah. Nice. Why is pnpm a good fit for that kind of developer, that abstraction that you just explained? Zoltan Kochan: I think the main reason they like pnpm and they chose npm is that it's fast and it provides a good programmatic API. We worked together back before I even joined the Bit, to refactor PMPM's code base to allow it to be integrated into the Bit tool chain. Yarn also provides a good API. They also like the strict nature of the pnpm node modules, that only the direct dependencies are linked, but it's not a deal-breaker because they can automatically identify these import statements. Noel: Yeah, that makes sense. When we're talking about the API of a package manager, how are these tools interacting with these APIs? Where are the calls generated and what do they usually consist of? Zoltan Kochan: Basically, it's a function that you can import. We have the pnpm core library and it has a function called Mutate Modules, and to pass it an area of projects. So basically, an area of package locations and package manifests. The package manifest is the packages on file. And you also provide what action to happen, like should pnpm install all the dependencies in all these projects, or should it add some new dependency to a given project, or should it remove some dependency, or should it update all dependencies or a given list of dependencies? That's what I mean by programmatic API. Yarn has a similar programmatic API. Noel: Gotcha. Gotcha. Is that anything that the consumers of these packages, typical developers that are just writing projects, would they ever need to interact with these APIs for any reason? Or is this mainly for toolings? Zoltan Kochan: Yeah, it's mainly for tooling. We have actually another company that uses pnpm similarly, Glitch, Glitch.com. Yeah, they also use pnpm similarly. Recently, StackBlitz, they also started to support pnpm. Inside their WebContainers, they have some really great ways to optimize file system operations. They have created a custom hook into pnpm. So it's a bit different how they used pnpm and Bit, because Bit imports some packages from the pnpm monorepo, but StackBlitz, they just hook into pnpm. So you run pnpm, but they override some parts. They use a custom feature, [inaudible 00:18:30] feature, and the custom linker, a function that writes the files to node modules. This is some proprietary stuff, I think, that they use in their WebContainer, but it's so fast that when you run pnpm install in a StackBlitz WebContainer, it's like less than four seconds to install in a huge project. Locally, you would run it for maybe 20 seconds, so it's a lot more efficient. Noel: We've been talking a lot about install speed here in the context of these tools and platforms. The disc space advantage that pnpm provides, is that leveraged as well? Is that an important feature? Zoltan Kochan: Yeah. For Glitch.com, back when they decided to use pnpm, this was the reason they chose pnpm, because they wanted to sell unlimited disc space to their customers. The reason they could do it was that they basically mounted a shared drive to each container, which basically hosted this central store of packages used by pnpm. This way, basically the node modules didn't consume any additional disc space in these individual containers. For Bit, it's not a big deal. For StackBlitz, I guess as well. But for users it's a big deal, because when you are a developer you will have many projects on your computer, and if you have this two gigabytes node modules in each project, then you'll have a big problem there. But with pnpm it will consume a lot less disc space. Noel: That makes a lot of sense. And again, I feel like saving disc space for disc space's sake is good enough for me. I'm curious though about this Glitch architecture where they have this volume shared by all containers, or a large subset of containers with the packages on it. Does that essentially end up with almost every npm package, or at least every commonly consumed npm package on this container? Is it essentially just a pulled down version, localized version of all of the modules that npm is hosting? Zoltan Kochan: Probably, yes, it contains the most popular ones. I think they even had a script that prefaced the most popular packages to this store. I don't know everything about this. I, on this know some details, because I didn't see their code base. They probably have multiple disc volume, so I don't think they use just one. Noel: Yeah. I'm sure there's all kinds of logic and stuff figuring out how to optimize that. It's just as a notion of, "Oh, these platforms where they have a lot of projects, thousands of projects potentially." I hadn't looked at these platforms as just a different iteration of a massive monorepo, but it kind of feels like that's what they're doing. It's like a big monorepo in the cloud that they can then use a volume to share all of these node modules across them. Zoltan Kochan: Yeah, it must be a huge. Noel: Yeah, yeah. Nice, nice. Well, we've covered a lot. Is there anything in particular from your talk or just in general, that you want to plug and point listeners to? Zoltan Kochan: We can talk about some little less known features of pnpm and other package managers. When people think about what package manager to use, npm had always huge advantage over Yarn and pnpm because npm is shipped by Node.js, and the developers look at it as the default one. The default has an advantage always. A year ago, Miel, who is the lead maintainer of Yarn, he was able actually to contribute package manager manager to Node.js, which is called Corepack. It's shipped from Node.js 16 something, 16.14 maybe, or 10. It's turned off by default. It's experimental. But you can easily turn it on by running Corepack enable. And when you enable it, you immediately get both Yarn and pnpm available in your terminal. I guess it gives us some legitimacy now because Node.js kind of ships it. If you run this command, if you run pnpm, then actually Corepack is executed and it will dynamically install a stable version of pnpm and execute it right away. There is commands to install as a given version of pnpm. And also, in package.json a new field is supported called Package Manager and you can specify the name and exact version of the package manager which should be used with that project. When Corepack is enabled, you can actually run, for instance, Yarn in this project. If this field has a specific version of Yarn declared, then Corepack will automatically install it and execute that given version of Yarn. And the same is true for pnpm. Noel: Nice. Very cool. Zoltan Kochan: Yeah. Noel: I feel like that's a well thought out place to put that level of configuration, like build what nvm became for node versions, but put it into the packaged JSON file. I think that's a cool feature. Zoltan Kochan: You mentioned nvm, so I will talk about feature of pnpm. Actually, before this was contributed to Node.js by Miel, I came up with an alternative solution. I thought, "Why don't we install Node.js by pnpm?" There is actually a nice package created by Vercel called package, pkg, and it can bundle your Node.js [inaudible 00:24:36] into an executable which runs without Node.js. So I was able to bundle pnpm this way, and basically, if we now shape this executable version of pnpm, it has also the advantage of better speed. So actually, it's the pnpm which is bundled into this executable, it starts up faster than the JavaScript version. Noel: How is that? Why is it faster? Zoltan Kochan: They compile JavaScript into some byte coder or something like that, so the JavaScript engine can start faster with this code. But I don't know all the details. When we did that, this allowed us to basically use pnpm without Node.js preinstalled on the system. It allowed us to actually use pnpm to install Node.js on the computer. We have these pnpm [inaudible 00:25:32] commands, which can manage Node.js versions. There is a setting use Node versions, which you can put into npmrc. And when you run pnpm run, it will actually use that version of Node.js to run the scripts. So basically, you can use pnpm instead of nvm or other Node.js version managers. Noel: Gotcha. So if a dev wanted to use this, what does need to be installed on the target system to run this initially? Do you have to install pnpm first, then you can run these configs that have the specified version of nvm? Zoltan Kochan: If you open up our website and go to the installation page, then this is actually the first way that we tell people to install pnpm, the standalone version of pnpm. This is the default one that we recommends to install. It's just a bashScript, which downloads this executable from GitHub, from our GitHub release page, and it will work without [inaudible] data system path to add a pnpm home directory to it. And the Node.js executable will be linked to that allocation as well. But of course, nothing prevents you to just use pnpm for managing your Node.js. Depends, then you can install it with Corepack or with npm, like npm install globally. Noel: Yeah. Again, that's a pretty good spot. Is there any other features you wanted to touch on before we wrap up here? Zoltan Kochan: Okay, one last feature. This is a new feature which I added in August. It's a new resolution mode, and to be honest, I was inspired by Yarn for this because it's looking to implementing a new resolution strategy. There is this problem when someone publishes some malware in your dependencies. There were several such situations in the past year when someone publishes a minor or some code that takes your, I don't know, tokens or secret stuff from the RPC. This happens because npm and all of the Node.js package managers, they always install the latest version of the package. So Yarn is experimenting with an alternative resolution node, versus install not the latest version, but lowest version. This way, a newer version will only be installed if the actual semver range in the package.json will be pumped. Noel: So just to clarify, instead of the default behavior, the package manager finding the highest semver match version, it'll find the lowest semver match version and instead install that? Gotcha. Gotcha. Zoltan Kochan: Yes, yes. Yeah. So it's a bit safer because the dependent package will have to be republished in order for the new version to be installed, so the malicious person will have to break all the chain of dependencies. But yeah, I think it's a problem of course, but this gave me another idea. Because currently, the problem with the fact that we always resolve to the highest possible version is that we cannot use any cache, because we don't know if there is a higher version or not. We always need to fetch the latest version. We can't easily change this because this is the expectation of the user, so I was thinking about maybe changing the expectation. I came up with this new resolution mode, which is called time-based. You can try it out by setting resolution mode equals time-based in your npmrc, and pnpm will use it. When it's on, pnpm will select the lowest versions, but only for the direct dependencies. And then it will check the published times of each of these direct dependencies and pick the latest one. And then for the sub dependencies, it will pick the latest versions from that timeframe. It will pick the latest version, but only the latest published before that time. So we can use cache this way, because if the cache is newer than that time, then we don't need to request the registry. Noel: Right, right. So essentially, I guess effectively for sub-dependencies, you're using the version that your direct dependency was using at its published time, most likely? Zoltan Kochan: Yes. Yes. Noel: Yeah, yeah. Cool. Zoltan Kochan: If people will like it and if we will be able to make it the default one, it opens up lot of impossibilities. We could even implement server site resolution. Because currently, implementing server site resolution doesn't make a lot of difference because the server would anyway need to request a new metadata each time. So it would be still slow, but with an effective cache it could be really fast. Noel: Nice, nice. I'm glad you brought this up. This has me thinking about package resolution in a way that I hadn't ever really considered before. Yeah, very cool. Well, thank you so much for coming on and chatting with me, Zoltan. It's been a pleasure. Zoltan Kochan: Thank you. Thank you for having me.