Kate:
Hello, and welcome to PodRocket. I'm Kate, the Producer of PodRocket. And joining us today is Feross Aboukhadijeh, CEO and Co-Founder of socket.dev, a new security company that can protect your most critical apps from supply chain attacks. Let's get into it.

Feross Aboukhadijeh:
Going great, thanks for having me here.

Kate:
Feross is also the Creator of WebTorrent, Standard JS and Wormhole. We covered Wormhole in an episode of PodRocket back in June. If you have not looked into WebTorrent, Standard JS, Wormhole, or our podcast on Wormhole, you should definitely do so. We will include the links in the show notes. But our episode today, we are going to socket.dev, a new security company that can protect your most critical apps from supply chain attacks. Feross, can you just lead us into what is socket.dev?

Feross Aboukhadijeh:
Yeah. So socket.dev is a tool that developers can use to protect their apps from supply chain attacks. Supply chain attacks are basically they're attacks against open-source code that can result in malware or other unintended code being added to an open-source package. We've unfortunately seen kind of a big trend in the last few months and years of an increase in these supply chain attacks. Oftentimes, you'll see these headline news stories about a package being compromised or hijacked, maybe the maintainer reused their password on another site, so some bad actor gets control of the package. And suddenly, the newest version of the package contains some additional code. That's called a supply chain attack, and that's what Socket was designed to protect against.

Paul:
So was Socket sort of like born out of its own volition? Or did you get into that from another project? Or did it come to you in the middle of the night in a burst of recognition? Yeah, how did you come up with the idea for Socket?

Feross Aboukhadijeh:
Yeah. It's funny because I've been an open-source maintainer for, I don't know, eight, nine years now. And so I've been kind of at the front lines of seeing how open-source is made. And there's always been kind of this element of discomfort I've had with kind of the way that we do things. Every time I run npm install on my computer, and I see the number of dependencies that are coming down and getting installed on my machine, it's always made me feel a little bit uncomfortable because that means that I'm trusting all these different packages, and all these open-source maintainers. And obviously, most maintainers are good, myself being a maintainer, obviously, I know that most people have good intentions.

Feross Aboukhadijeh:
But what the issue is really just the number of packages getting installed in an average app has just gone up so much in the last few years. Even a Hello World app often will have a thousand dependencies. And so when you have that much code, it's obviously really hard to audit it. And the number of packages that are changing is very high, because there's always new versions being released, there's always new stuff coming out. As our ecosystems have gotten more complicated in this way, I have always felt a bit of discomfort around just kind of like, I'm running this install command on my computer, and it has all my data on it, it has my personal files, my tax returns, all my personal information in it. If one of these packages was compromised, it could do whatever I wanted on my machine. And so it's been kind of a fear in the back of my mind for as long as I can remember as maintainer.

Feross Aboukhadijeh:
But then in 2018, it really kind of came to the fore with this attack called event-stream that really, I think, grabbed headlines and kind of woke everyone up to the threat of this problem. And so we can get it a little bit into how that particular attack worked. I mean, it is from 2018. But it is actually one of the most interesting supply chain attacks against NPM. But anyway, that's kind of been always there. And then where the need for Socket really became clear was when we were building Wormhole. And if you go back to the episode that Kate mentioned, where we talk about it, you'll know that Wormhole is basically trying to be the most secure way to send files online using a web browser.

Feross Aboukhadijeh:
So it's a web app designed to enable people to send files, but without the web app seeing the files at all. So the web app has no idea what you're sending. It doesn't want to see your files, we can't see your files, it uses end to end encryption to enable that. There's a whole bunch of things we did to make that web app as secure as we could. I think some it, we get into a little bit on the previous podcast that Kate mentioned. But as we were building it, and we were doing all these things to make it really secure, it became clear to us that if we aren't looking at our dependencies, if we don't know what's in our dependencies, and we're not carefully vetting them, then we're really rolling the dice with our users security.

Feross Aboukhadijeh:
And it just felt like, really, the next thing we should do to try to improve Wormhole security and take it to the next level would be to come up with a plan for how to actually make sure our dependencies are secure. And so that was kind of a thing we started asking around. We said, what are other companies doing that care a lot about security? What are the best practices in this space? And what we kept hearing back from people was, well, you can install one of these vulnerability scanners that will tell you if you have like a known vulnerability in your code, if you have a CVE in your code, so something like a Dependabot or a Snyk. But what we realized was, those don't actually address the fear that we had, which is a supply chain attack, which is very, very different than this known vulnerability problem.

Feross Aboukhadijeh:
A known vulnerability is when a maintainer makes a mistake, accidentally introduces a bug. And what we were really concerned about, and what you're seeing in the news a lot more with these high profile attacks is a supply chain attack, which is very different. It's where malware is intentionally added to a package. And if you want to detect that, you need a completely different approach than just looking up in a database to see if there's a known vulnerability filed for this project. Because that's very reactive and what you need to stop a supply chain attack is actually very proactive.

Feross Aboukhadijeh:
You need to be able to know that this code that was published yesterday that no one's looked at yet, that you're about to install, you need to be able to kind of scan that and know what is it going to do. What are its capabilities? What are its behaviors? What servers is it going to talk to? What files is it going to read? What is it going to do to your computer? And to be able to proactively detect suspicious things in packages. And so that's what we ended up realizing we had to just go and build ourselves because it didn't exist. And that's what Socket is.

Paul:
So it's really less of a static analyzer of a given body of code. And it's more of like understanding how things are linked, built, where they're sourced, and how that can have certain implications on everything downstream that you use with that. It's kind of like an intelligent-

Feross Aboukhadijeh:
Yeah, exactly.

Paul:
Yeah. So one thing that I'm kind of wondering about is you mentioned, okay, we were building this Wormhole, we really wanted our users to be secure, so these issues came to us. So the thought process there is kind of like, okay, we could have an insecure package in our code that could have like some known vulnerability, that could do that. Another fear you talked about is, and I feel this too when you do npm install, and it's like this black hole gets created, that everything could sink into that, that you'll never like traverse. And you're just like, God, what's getting on my computer?

Paul:
These are two different types of attacks. There's one that's attacking you, the local guy, one's attacking the customers. I'm wondering like, do you have any lay of the land about what's going on out there? Because I know there's also Faker.js, right? Where that whole debacle came out, and then the package became unusable and stuff. So, yeah. Is it mostly like home attacks? Are you more concerned about what gets shipped to users? Are you concerned about maintainers removing their code? Or is it really yet to be seen?

Feross Aboukhadijeh:
So we have very concrete things we can look back on in the last year and look at his examples of what are the kinds of attacks that are happening right now. So you mentioned one in January, so we can talk about that one first. So in January, literally, just a couple months ago, this person who is a maintainer, who had I think 100 million downloads of his code, he had particular two projects that were really popular. One was called Colors and the other was called Faker. And he just kind of woke up one day and decided that he wanted to sabotage his own code. So he just decided to add malicious code to his projects that would print out a bunch of gibberish, Unicode characters and kind of ... It even printed out some conspiracy theories and other things like that into the terminal, and also had like wild true loops that would just kind of infinite loop your program. Who knows what his motivations were?

Paul:
I think he...

Feross Aboukhadijeh:
Sorry?

Paul:
He had a vision or epiphany.

Feross Aboukhadijeh:
Yeah. It's really unfortunate. I don't know exactly what happened there. There's a lot of different opinions in it without getting into all the reasons for it. The kind of point is that this is actually an interesting case, because it's actually the maintainer themselves doing it, which it's actually very interesting, because a lot of the proposals that people kind of always talk about when they're talking about security involve things like, what if we just did two factor authentication? Or what if we just did code signing? And those are all good things. We should do those too. But in this case, because it was the maintainer themselves, sabotaging the code, like none of those things would have actually done anything because they would obviously typed in the 2FA code or you're signed it themselves.

Feross Aboukhadijeh:
But yeah, that's one example where this code that had hundreds of millions of downloads that people had relied on for years, just suddenly, one day, a new version came out with completely different behavior. It's suddenly just not doing what it says on the box anymore. It's doing all this extra stuff. And for several hours, anyone who was unlucky enough to install this would have this happen to their computer. Even a ton of big companies that were using these libraries either directly or indirectly. And so like, for example, the Amazon Cloud Development Kit tool, which is like a CLI tool, it depended on one of these, I think Colors, and so it just ... Anyone who installed it that day and tried to use it was just getting all these junk getting printed into their terminal, and everyone thought that Amazon itself was hacked.

Feross Aboukhadijeh:
When really it was just like, no, no, Amazon CLI has a dependency on this project that was using a like a loose version range. And so it just automatically pulled in kind of whatever the latest patch version was of this code. And so that's kind of what happened. So, yeah, that's like one type of attack. And then I just mentioned one other one that happened just a few months before that. Back in October, there was a project called UAParser.js, which is a ... I would say, this is kind of more of a typical attack that you see these days. What happened with that package was this ... So this is a thing that's been around for 10 years. UAParser.js is very popular. I think it has 30 million downloads or was it 7 million downloads a week, so a little over ... Like basically around 30 million downloads a month. And very widely used. It's used by React Native, it's used by basically a whole bunch of stuff that you have either heard of or use.

Feross Aboukhadijeh:
But what happened was, on October 5th, on a Russian hacking forum, somebody made a forum post talking about ... Basically, it said. I'm selling an NPM account. It has more than 7 million installs every week and there's more than a thousand dependencies depending on this. The account doesn't have 2FA turned on, and I have the login email and the password. And I'll sell it to you for $20,000. And this just got posted on October 5th. A couple weeks after that on October 22nd, UAParser.js had three malicious versions get published all at the same time. We don't know for sure that those two events are connected, but it's very suspicious because UAParser.js has exactly 7 million downloads a week. So it's almost certainly that that was kind of the way that it got compromised.

Feross Aboukhadijeh:
And so yeah, so these three versions got published. And for about four hours, anyone who installed those versions would have a cryptocurrency miner get downloaded to their computer and run to mind the Monero cryptocurrency for the attacker, obviously, not for you, not for the victim. For the attacker. And then also on Windows, they had this extra attack that they did where they would steal all the passwords from about 100 different programs on the computer.

Paul:
Classic windows stuff.

Feross Aboukhadijeh:
Yeah. Anyway, so it was like really bad, right? Like just not something you would ever want to run into computer. And you can see how it's just kind of targeting people who ... Like, anyone who just typed in npm install ua-parser-js during that four-hour window would be automatically downloading the latest version, which was the compromised version. And anyone who wasn't using package lock file would also be compromised. And so there were tens of thousands of people who installed it during this period. And then eventually, the community kind of caught wind of it because with an attack like this, it's actually not that sneaky with a cryptocurrency miner, usually you'll see like 100% CPU utilization.

Feross Aboukhadijeh:
Your computer just chugs to a halt, and you're like, what's going on? Why is my battery running out? And then people quickly figured out, okay, it came from this package, and it got removed. But you see how it's very reactive. People kind of like are just like, I guess we just install it and see. And then if something bad happens, we try to get it taken down and just there's something wrong with this approach very, very reactive.

Paul:
Trying to keep up with what's going on. So how does your product address this? What are some of the main ways that you decided to tackle these problems and provide a unique value proposition for developers having open-source?

Feross Aboukhadijeh:
Well, so if you look at these attacks, they all have kind of common themes, common techniques that the attackers use. So one thing that you'll see in about 60% of supply chain attacks against NPM is that once the attacker has control of the package, they will add an install script, which is this feature that's kind of specific to NPM and you don't really see it in too many other package managers. But it basically lets a package say, whenever somebody installs this package, please run this script at install time. And it has some legitimate uses. This feature is often used for compiling native code. So if you're using a JavaScript package that has like a C component that needs to be compiled, or maybe some files that need to be transpiled, or something like that, you can do in an install script step.

Feross Aboukhadijeh:
You'll often also see this install script used to print out almost like a little bit of spam in the terminal that kind of says like, please donate to our project or stuff like that. I mean, those often used install scripts as well. I think those are actually disabled now, though, as far as like, the output isn't shown anymore by the latest versions of NPM. But, anyway. The point is, so these scripts have somewhat legitimate uses, especially for compiling code to make native modules work. But when an attacker takes over a package, and then they're trying to decide how do they want to kind of deploy their payload, it makes sense to put it in an install script, because it will just run automatically right away, and it just makes the attack like more effective, because it just means anyone who installs it is going to immediately be affected, even without having to require or import the module.

Feross Aboukhadijeh:
And so one thing you might imagine is, well, hey, like what if we just looked at modules, and if they suddenly just add an install script, like out of nowhere. Like, for years, they've never used one, and then suddenly, today, they're using one. That seems like pretty suspicious or pretty noteworthy. And so maybe we just detect that. And whenever the capabilities of a package have changed in this way, where the kind of behavior of it or what it's trying to do has changed significantly, we could just warn the user. It's almost like, you can think of like a smartphone app. If you install a smartphone app, and it wants to access your contacts or your camera, suddenly, for the first time, it doesn't just get to do it because you installed the app a couple months ago. It's like, no, it wants to use your camera for the first time. You've never allowed it to before and now has to ask you. And if you say yes, it will get to and if you say no, it won't get to. So the user is in control.

Feross Aboukhadijeh:
So that's kind of what we're kind of trying to do here is if a package ... We want to basically tell the user when they install a package, what does it do today? So what is its behavior? What servers does it talk to? Does it use files on the file system? Does it have an install script? The user can then decide to proceed with the installation or not. If they decide to proceed, then great, they can install it, everything's fine. But then if in a couple months in the future, the package's behavior changes, and now suddenly, it wants to do more things, well, we basically want to warn them about that change, and allow them to have a decision to be able to make a decision if they want to take that change or not. So that's kind the of idea. So we sort of just looked at what are all the things that the different supply chain stacks did and then we built them into a tool that could detect when those changes happen to a package.

Paul:
So it seems like there's obviously some manual intervention in like human processing needed to like look at certain warnings and errors and sort of allow things through, not allow things through. So where do you kind of draw the line about, all right, you're a developer, I'm going to hit npm install. I don't want to spend two hours like doing a bunch of stuff to get my package installed. So how do you guys like think about that? Is it really quick at the beginning? Does it require a lot of overhead? Where do you draw that line?

Feross Aboukhadijeh:
So I think every team is going to have kind of different risk tolerances. So if you think about like the most paranoid teams, people building like end-to-end encrypted messaging apps or banks, or even companies like Google, they have already today, they have really, really thorough processes, where to introduce any open-source dependency into the codebase requires the security team to do a full audit of that open-source project and then to sign off on it. And that's what like all the kind of the most, the big companies and the really security sensitive companies are already doing today.

Feross Aboukhadijeh:
And then you have like, on the flip side, which is like where most companies are, is they just basically do nothing. It's just like, hope for the best, run npm install and kind of like ... It's mostly things are mostly fine most of the time and life is too short. We don't have time to audit all this stuff and it just hasn't been very easy to address. And so people mostly are just kind of burying their head in the sand and like ...

Paul:
It's a lot of resources.

Feross Aboukhadijeh:
Yeah. It's a lot of resources and you need skills too. I mean, to be able to have ... Think about it. Like, if you were tasked with auditing some package like React, you have to ... I mean, there's a lot of code to read, there's a lot of dependencies to look at, right? And it's just like, where do you even start? It just feels overwhelming to people. And so that's where most people are, that's where we were with Wormhole. Even though we cared a lot about security, we were a two-person team. I mean, all we could really do was try to pick good dependencies from the beginning, and just kind of try to pick maintainers who whose reputations we trusted, and just try to do what we could. But really, we were still pretty vulnerable to that maintainer losing their password, and then their packages getting compromised, or the maintainer going rogue or, or whatever. We're still vulnerable to all this stuff. And we were just hoping for the best.

Feross Aboukhadijeh:
So to answer your question like, Socket tries to do something kind of in the middle of those two extremes. It says, look, the developer time is a very precious resource. People only have a very limited amount of time to spend auditing their dependencies. And right now, they're spending basically no time doing this. So what we want to do is say, if we have a really high confidence that something about this package has changed, and it's just so suspicious and it's rare enough that it's worth a look, then what we want to do is basically surface that to the developers attention and have them take a quick look at that.

Feross Aboukhadijeh:
So an example would be something like install scripts, like I mentioned before. If a package suddenly hasn't done this for years, it's never used an install script, it's never needed an install script to function correctly. And then suddenly, a new version comes out and it does, well, we want to make sure that the developer reviewing that pull request, it's made very clear. So what our GitHub app will do is it will come in and leave a comment on that pull request and say, this dependency has added an install script. It's on this line, click here to see what it does. And it'll be like a big kind of warning. And so this doesn't happen very often in legitimate packages.

Feross Aboukhadijeh:
So we feel pretty comfortable just saying, this is the thing, it's rare enough that we're just going to alert it to the developer, and we're going to say, in this case, you really think you should take a look at this. Another example is typos. Actually, the number one supply chain attack right now was typosquatting. So maybe I should mention this. Typosquatting is when a bad guy registers a package that is like one or two letters different from a popular package. And they hope that people will make a typo when they're installing the package. So if you have a package that's like one or two letters off from something that gets 10 million installs every week, you're just going to probably get like a few dozens or hundreds of people to just accidentally typo that and then you'll get kind of installations.

Paul:
I've definitely done it before.

Feross Aboukhadijeh:
Yeah. Sometimes what happens is, it's not even a typo. It's actually like, it's you forget, because sometimes packages are like, they'll have like a JS at the end or they won't have a JS at the end. So it's like, is it npm install foo? Or is it npm install foo.js? Or is it npm install node-foo? So there's all these different prefixes. So sometimes you'll just guess. You're like, I think it's this one, you just guessed, right? If you guessed wrong, and someone's registered that wrong one, and they have nasty code in there, then you're just like trusting ... You're just running their code on your computer.

Feross Aboukhadijeh:
The right way to think about NPM is, NPM is basically a wiki, and anyone can edit any page. So there's no vetting process to put a package on NPM. So if you're just guessing a package name, then you're just going to be like, you're just effectively downloading whatever happens to be there and running it and hoping for the best. So it's really not advisable. And so that's basically that's another thing that Socket can check for, is we can say, look this pull request adds a new dependency. And what we'll do is check that that dependency doesn't look like it's a typo. The way we do that is we it's a very simple system. We basically say, is this package a couple letters off from another package? And is that other package a thousand times more popular than the one you tried to install? So that's how we do it.

Feross Aboukhadijeh:
So if you're trying to install something and has 100 downloads a week, but there's a couple letter difference that you can make, and then it'll have ... That other package will have like a million installs a week, then we'll draw that to your attention and say, just want to make sure here like, is this really, really what you intended to install? And of course, maybe it is and so you can ignore it. But the developer has that information, that extra information that they hopefully find useful, and then they can decide what to do. Because maybe they didn't intend to install this thing. Maybe it was intentional. But almost certainly, they're at least happy that this was drawn their attention so that ... Especially if there's a code reviewer reviewing it. The code reviewer might not notice the typo, and then there'll be really sad that they didn't catch it. So we want to basically draw that to the team's attention so they can act on it.

Paul:
This almost feels like my English teacher would like go to, like you said the Wikipedia page and be like, well, there's a new source from this crazy guy. So I don't know if you can trust this anymore with this crazy information because anybody can post there. Anybody. Yeah. So if there's a ton of packages, and you have to go to NPM and check out each month, see if there's different. Is this going to make my GitHub actions just take forever? Or does it run in like a reasonable amount of time? Because one thing about the amount ... You're right. Like a Hello World could even have a thousand dependencies. Do you guys do any like special sauce around caching stuff or?

Feross Aboukhadijeh:
Yeah. We've designed Socket so that we have this kind of data processing pipeline that we've built like in our cloud. And we have a full copy of NPM and we also have this process that follows NPM. So it's sort of like, anytime a new package is published, we download it. And so we're just chugging through, like every ... Basically, we have a system that can analyze any NPM package, and then produce like this report, which contains a list of issues that we've found in the package. And once we've processed a package version, we'd never need to process it again because it doesn't change, it's an immutable package, because NPM doesn't let you modify a package after it's published.

Feross Aboukhadijeh:
So what we basically just do is like, we will process a package and we have this report, and then really, the GitHub action or the app will just need to basically hit our API and ask for the report back for the package. So it's actually a pretty fast process, especially if we've already processed it. Yeah. All it does is you basically just ... It sends your package JSON to our server. So we don't even need your code, we don't want to access your code. We what it will do is it just works by sending your packaged JSON, which contains the list of your dependencies to the server. And then we use that to look up all the issues that may exist with those package versions.

Paul:
So you guys kind of basically have your own indexed private version of NPM.

Feross Aboukhadijeh:
Yeah, pretty much.

Paul:
That's pretty cool.

Feross Aboukhadijeh:
Yeah. It's huge. So it's 15 terabytes of disk space to store every package and every version of every package that's ever been published. It's not like a thing you could just do on your laptop. But it's also not too crazy. 15 terabytes is not that big-

Paul:
Yeah. I can go buy that at Walmart and always have NPM when the world ends. It would work.

Feross Aboukhadijeh:
Yeah. Well, now you know. Yeah. If you want to have NPM, with you at all times, you just need 15 terabytes. And the other cool thing about following NPM, like in real time and downloading every package as it's published, is that if a package gets unpublished, which only happens in the case of malware being published, because, like we mentioned, NPM doesn't really want packages being changed or disappearing after they've been published. So really, the only situation in which you'll see a package be removed from NPM is if it's outright malware. And so what we do is actually, we can see when NPM has deleted a package, because it disappears from the feed.

Feross Aboukhadijeh:
They have this thing called a CouchDB replication feed, which basically tells you all the changes that are being made and in real time. So when we see that they've deleted a package, we can go and look at that package because we have the code, we downloaded it before they deleted it. And so then we can see what's being deleted, which is actually really fascinating. Because it's all malware, basically. So you can just go through and see that was a crazy one. Or that one was like talking to this weird server, this weird IP address. Or this one was like stealing all of your environment variables, your tokens and sending it off to this server. You can just see all of it. It's just there. It's really cool. In fact,-

Paul:
Have you guys ... Sorry, go ahead.

Feross Aboukhadijeh:
I was going to say, we actually have a page on the website where you can actually browse around the malware that we've collected. If you go to the footer, there's a little link called removed packages. And if you click it, you get to go to basically a listing of all the NPM removed packages that we've collected. There's about 700 every month that they remove for malware. So you can click through to see like what malware was published to NPM today, and you can just see ... Or what was removed today, rather, not what was published today. It's what was removed. You can see what kind of attacks people are doing. It's really eye-opening. If you have the inclination to check it out, I'd recommend taking a look at it. It's pretty cool.

Paul:
That's like exactly what I was going to ask. Do you in any way like make that data available or like publish your own taxi feed or something to let the world know of the incoming attacks.

Feross Aboukhadijeh:
Yeah. This feed is already stuff that's been kind of removed by NPM. So it's more interesting for kind of just research purposes to like understand what are the attacks people are doing, and to make sure that like our analysis would have caught whatever those things were doing, whatever those packages were doing. But the actual, the kind of benefit of Socket is this proactive protection that we give, which currently involves a human looking at our bot. Our bot leaves a comment on a pull request, and then a human asked to look at that and say, okay, we're not going to merge this.

Feross Aboukhadijeh:
So we'd love to get to, and this is what we're working on, is like to get to a place where we proactively have very high certainty that something is bad, and then we just like fail the CI builds. Like, we just fail the pull request and put a red X there and say like, this is bad. Do not update. And that, I think, is going to involve probably some more advances beyond like what we've currently done. What we can do is we can tell you an install script is added, but we'd still need a human to look at that to say, is this actually bad or not? So it's like-

Paul:
It's mapping what it's accessing and changing.

Feross Aboukhadijeh:
Exactly. But it still requires a little bit of that human judgment at the end. And so we're trying to basically minimize the amount of things that we bother people with. Like, we don't want to alert people if it's like just pointless noise. No one wants that. So we're trying to keep it really low, like really, really high signal. And hoping that therefore, if it's high enough signal, when one of these alerts comes in, a human won't mind taking a couple minutes to investigate it, because it really could save your bacon one day.

Paul:
I mean, you're inherently solving a human problem, right? So it's like the semantics about what does this action mean to me as a team, as a developer? That's a difficult fuzzy line to sort of, like, draw your AI at. We know that's bad, that's always going to be bad. Yeah, big generalization to make. I'm sure you can certainly make it with the standards and the protocols and the way like things move and what they're accessing. Somebody is trying to dump all your env and send it to a weird server across the world somewhere is probably bad.

Feross Aboukhadijeh:
Right. In fact, that's the thing too. Like, where we can detect when environment variables are being accessed for the first time. We can detect that in when install script is being used for the first time, and we can detect when the network is being used for the first time. So if you see those three at once, and they were never used before, we could probably just block that until a human can look at it and just be like, no one's going to install this for a few hours. We're just going to just block this and then we'll get a human to look at that. And then maybe we as Socket can decide that, okay, we just confirmed that this is malware, and then we can block it for everybody who has our GitHub app installed. Until we can get it ...

Feross Aboukhadijeh:
And that's the other thing too, once we find something ... We've already found malware a few times, by the way, whenever we've done that, we've just reported it directly to NPM. Because we want to not only protect people who are using Socket, we want to just get it removed from NPM so no one else will install it either. So we also just report that as soon as we find it.

Paul:
Thank you, guys. That's great. I was just going to ask if you see this attack space broadening in the coming years. That was probably my last question.

Feross Aboukhadijeh:
I hope not. I mean, I'm sad that this is happening. And I hope I didn't scare people too much, but I hope not. I mean, NPM introduced two-factor authentication as a requirement now for all maintainers, I think, as of a couple days ago. So I hope that that eliminates some of these hijacked packages that have happened because maintainers were reusing passwords. But we still have the case where a maintainer's computer gets compromised, and they're 2FA, maybe it's on the same computer as them. We still have the case where the maintainer goes rogue, we still have like more ...

Feross Aboukhadijeh:
There's also attacks that happen when someone asks to become a maintainer, because a lot of times, that's the way open-source works. Like, people will take over projects from someone who's burned out or someone who is no longer interested in working on it anymore. So you're there's all these handovers or these new maintainers being added. So that's still a risk too even with 2FA. I don't know. I think there's definitely good things happening. But I also think that this is such a juicy target. There's just so much impact you can have if you take over one of these packages that I think it's probably still going to be a problem in the future.

Feross Aboukhadijeh:
And there's also other ecosystems besides JavaScript like other languages where they have even fewer resources to monitor the registry for malware. I mean, NPM is owned by GitHub, which is owned by Microsoft, which is a very profitable company. And they have a lot of resources to put into securing NPM. And we still see these problems happening on NPM. I mean, I think other languages where there registries are actually nonprofit, those have even fewer funds and resources to make sure that they're scanning practically for malware and stuff like that. So I just think this problem is probably not going anywhere anytime soon, unfortunately.

Kate:
Why do you think this is becoming a problem now?

Feross Aboukhadijeh:
The way that we write software has changed. So it used to be the case that projects didn't have thousands of dependencies. It was very normal, I think, to have maybe tens of dependencies, or maybe just a few, there's like five dependencies. Today, we have at least a thousand in most JavaScript apps. And so that's one big change. It just means that we're trusting more maintainers, we're trusting more people, and there's just more to audit, and there's more changes happening kind of constantly. Like maybe zooming out a level like why did that happen? Like, why are we writing software in this weird way now that involves tons of dependencies? That's a good question.

Feross Aboukhadijeh:
I think some people like to criticize JavaScript developers and say, well, JavaScript developers just forgotten how to code and they need to install a dependency for everything, because they don't know how to write a 10-line function anymore, they have to install that from NPM. And I actually don't buy that. I don't think that that's fair to say about JavaScript developers. I think what the kind of real reason that we install 10-line packages for is that NPM actually made it really easy to install to deal with dependencies in a way that no package manager had done before.

Feross Aboukhadijeh:
So if you look at something like the Python package manager, in Python, you can only have one version of a particular package. So if you install foo, version one, then every other dependency in your project must also depend on foo version one. If it wants to use foo, it needs to be version one. Basically, every dependency in the project has to agree, we're all going to use foo version one. Which means if foo version two comes out, and I want to update my usage of foo version two, but all the other packages out there are still using foo version one, then we've now created a situation where the user cannot install my package and the other packages, because they're using different versions of foo. That's called dependency hell. It's basically a really bad place to be.

Feross Aboukhadijeh:
It's like the package manager just throws up its hands and says, we do not know how to install these dependencies for you, because there's an incompatibility, and it just force ... It just breaks for the user. And so because of that, these Python maintainers, were very hesitant to add new dependencies because they don't want to create the situation for their users. So if they wanted to depend on 10-line module or 10-line package, they would rather just copy-paste that into their package than to introduce a dependency.

Feross Aboukhadijeh:
But with NPM, NPM solves dependency, hell for everybody by just saying, if two packages need different versions of foo, one wants version one, one wants version two, fine. Just install both. Give the first package version one and give the other package version two, and like just have them both exist together. And because of that, there was very little like downside to having more dependencies. There's just no cost to you to just add a dependency, at least it felt that way to people. So that's why people started thinking it was more acceptable to have a 50-line package. So I think that's kind of why it started, and then obviously, there's a lot of benefits to that.

Feross Aboukhadijeh:
I mean, people like to make a make jokes about left-pad. I don't know if you guys remember left-pad. Left-pad was this package that just padded a string with like spaces to the left. So like, if you wanted to make sure that the string was always 10 characters wide, but you give it like a three character string, it would just add seven spaces to the left, so that it would be 10 wide. So that's all it did. And people we're mocking this package because it's like, why do you need this? Just write the function. It's just so simple, just write it. This all became an issue because the maintainer of that package decided to delete it one day, and the whole internet broke. This is before NPM actually prevented people from removing and unpublishing their code.

Feross Aboukhadijeh:
So this person just decided like to unpublish all their code. And so by unpublishing this, literally everything broke. No one could build anything because this package was just missing. And so like every single project pretty much broke, all the CI systems broke that day. No one got any work done because of this. And so everyone was like, well, why did we all depend on left-pad? Have we forgotten how to program? And there was this big soul-searching that happened in the community. And then people were saying that, well, no one should have depended on this. But I actually don't think that's fair. Because if you actually look at ...

Feross Aboukhadijeh:
A lot of people were posting comments saying, here, I implemented left-pad for you. Here you go. And they were writing like one line of code and showing how easy it is to implement. But almost every single person who wrote comments like that got it wrong. Like their implementation had some bug in some edge case that the real left-pad didn't have. So I would say, while there's some good points and some valid criticisms with this ... There's valid criticisms about how we do all these small modules.

Feross Aboukhadijeh:
I would say that a lot of these people also aren't appreciating how hard it is even to get some 10 lines of code are actually so complicated just to do that 10 lines correctly, that there's actually quite a lot of benefit to depending upon that as a dependency. Because you get bug fixes, you get the improvements, you get the performance improvements over time, you just get all these benefits. So yeah, I don't know. I just think it's not as clear cut as some people wanted to make it seem. Anyway, back to your question, Kate, I think that's probably the reason ... That's like the main reason why. Is basically, the way we code has changed. There's a lot more dependencies, and so there's a lot more risk.

Kate:
Super interesting. Yeah. We can almost have a whole nother episode on just like the how we got here. Well, for us, thank you. It's been great to have you on. I saw socket.dev is hiring. What are the plans for that and growing the team, maybe talk a little about that, and what you'd like to point our listeners to, and then we'll close out.

Feross Aboukhadijeh:
Yeah. We're hiring. We're looking for frontend developers, backend developers, security engineers, designers. We basically need to do a whole bunch of hiring. The team right now is five. So we've been able to build quite an awesome full-featured product with just our team of five, all engineers. The whole team is actually open-source maintainers right now. So we all really love this problem and understand it and really want to help the community. So we're super driven. We have funding. So that's another good thing. We haven't announced the details of it yet, but that will be coming in a couple weeks. So that's also really exciting.

Feross Aboukhadijeh:
Our future plans are basically to make Socket the best supply chain security tool and make it just super easy for developers to use. We just have a whole bunch of different integrations we need to build out, CLI tools. We only do GitHub right now. We need to support GitLab and Bitbucket and all the other ones. We want to improve our analyses to catch even more types of bugs. We want to do that real-time, kind of proactive malware blocking that I mentioned. So we need people who know about security, static analysis. There's a whole bunch of really cool problems to solve and we are planning to just build all that stuff out in the coming months and years.

Kate:
That's super exciting. Yeah. We'll be excited to have you back on to talk about Socket in the future. Thank you so much for joining us, Feross, and we'll see you around.

Feross Aboukhadijeh:
All right. Thanks, Kate. Thanks, Paul.

Kate:
Thanks for listening to PodRocket. You can find us @PodRocketpod on Twitter. And don't forget to subscribe, rate and review on Apple Podcasts. Thanks.