Avi Press: We're talking about some of the most technologically sophisticated people in the world, open source developers. They're like building the foundations of technology in every other sector that everyone else is working on. The state of the art to learn about your user base is to send a survey. It's fundamentally crazy to me that that's what our solution is. Eric Anderson: This is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Today we talk about Scarf. We have with us, one of the creators and the founder and CEO of Scarf, Avi Press. Avi, thanks for joining. Avi Press: Great to be here. Thanks for having me. Eric Anderson: For the less acquainted, Scarf is a tool that open source developers, maintainers can use to understand the adoption of their projects. Is that correct? Avi Press: Yeah. Yeah. I think of it as a platform for the distribution of open source, the analytics for open source and eventually the commercialization, but we're trying to build all the tools that bring the modern distribution channels for open source up to date for what is needed for all the business that people are trying to do around this software. And we're trying to lead the charge in that vein. Eric Anderson: And some elements of Scarf are open source as well and others less so. Is that right? Avi Press: Right. Earlier iterations of what Scarf's products could have been were open source. We've since open sourced some SDKs and pieces of what... We're trying to open source everything that we can. Eric Anderson: Yeah. Yeah. Avi Press: But a lot of the, kind of, one of the main pieces of Scarf is not yet open source, but will be soon once a lot of the dust is settled around the many iterations that come from talking to customers as often as we can and evolving and shaping things over time. But ultimately Scarf needs to just be open source front to back. So as soon as we can. Yeah. Eric Anderson: Well, and your kind of at the nexus here of a bit of a business revolution where folks are relying on open source is a way to attract users to a project or product or an effort and understanding that adoption's pretty critical these days. Avi Press: And very difficult. The ways that we have been delivering open source to each other for the entire history of open source has just not really been very amenable to this. And there's been a lot of strengths of that approach. And the fact that anyone can just spin up their own hosting registry for any kind of packers they want and anyone can participate and the community is what makes it so great, but getting a handle on actually tracking this kind of stuff has been very difficult, both technically as well as just socially getting end users to be more comfortable with that kind of thing has also been a challenge in front of us. And so we're really both solving a technical problem as well as a social problem, I would say. Eric Anderson: Yeah. And maybe that's what's so interesting about Scarf is that I think there's plenty of people who are like, oh, like we should offer analytics, open source. They understand kind of the business or technical need and maybe even have some technical solutions. But I think as you point out, you've gone the third step and you understand the culture around open source and what would get people to be excited about an analytics solution beyond that it works. Avi Press: Right. I think doing analytics as a whole is easy. Every other domain of software, this is a solved problem, but it's not solved here. There's a few different reasons for that. I mean, one is just the fact that it's kind of built into the foundation of open source, that you didn't need anyone's permission to go use a piece of software. And so, part of that is you can do it totally privately and without anyone seeing what is going on, but that's one of the things that over time is more and more of a lie that we tell ourselves than the actual reality of the situation. Because ultimately you have to talk to a server on the internet to go get this stuff, and that server sees an IP address. And that's just the way the internet works. And while sure there's things that you can do to mitigate those things, the reality is these platforms are collecting data like this. Avi Press: And so my way of thinking about this is that the question is not should this kind of data about open source usage be collected. The question is who should have it. And I would argue that maintainers are perhaps the most crucial group of people that need to have access to this as soon as possible. So that's really what we're trying to do and navigate the best ways to actually go about that in ways that disrupt people's expectations the least, and get people on board as effectively as possible. That has posed all sorts of different challenges that we are working through one at a time. Eric Anderson: So Avi, tell us the Scarf story. How did you end up here? What led you to want to work on this and to some of your conclusions? Avi Press: Yeah, so I guess the precursor to it before I even had an inkling of Scarf, I was a software engineer at Pandora and building a lot of tools for artists to better understand how their music was performing with their audience. I was thinking a lot about like this data problem, but what happened to me a few times is I hit a bug in some open source library and opened up a ticket and ultimately nothing would happen. And I'd spend a couple weeks on a work around when if I just was partnering with the maintainers a little bit closely, it could have just been solved in like an hour, but this happens, this is not an uncommon occurrence, but that's the way that it was. But ultimately I didn't think too much more about that problem beyond that. Avi Press: But fast forward a couple years, and I am an engineer at a very, very early stage startup, like a five person team. And I'm building a lot of dev tools for myself and my team starts relying on them. I open source them and some of those tools take off to some degree to the point where people are coming to me on GitHub saying, Hey, there's this bug or that bug and it's affecting us, please fix it. And I look, and I see they're at Adobe or some other very big company that are asking me to maintain tools for them for free. And so that really got me thinking maybe I have a commercial opportunity here in some way, if there's companies relying on my software, pretty straightforward. But I was distributing my tools on package managers like package managers and container registries as well. And I just had no idea if that was actually a good decision and I had no way to know until I tried it. Avi Press: And even when I tried it, it was really, really hard to like set up an online store. How do I sell a license or how do I broker a support contract? And I even had the data to make an informed decision about whether I should do that. And once I did, I didn't have any of the tools that I needed to actually go from I am a person maintaining commercial used developer tools to, I am a software vendor that works with customers of big businesses. That gap is really big in practice. And yeah, and that's what got me thinking about this problem. And thinking really deeply about how distribution plays a very key role in this whole universe and how distribution had been a very, not often talked about piece of the story that these package managers just could be helping so much downstream commerce around the software and how difficult it was to do things that were technically not very challenging. Avi Press: And so that was really what got me thinking about Scarf. And then it was a thing that kind of turned into a side project and then it turned into, okay, I'm going to actually work part-time so I can work on this more. And then slowly more and more of my time shifted towards Scarf. And what ended up really taking off was an early JavaScript library that we had written for a few different React maintainers that we were talking to, to try to get at this data, this company data question, and we put it out there and the download started coming in and it was clear that we were onto something. And so that's what really got us going on Scarf in the first place. Eric Anderson: So maybe just so that I can understand how this works a little bit. So open source code lives somewhere, maybe it's on GitHub most commonly and anybody can go and download the open source, build it from source. But a few people do. The maintainers will maintain binaries or more production ready packages of the code. That requires maintaining some kind of build pipeline. And they push those to these distribution outlets, the package managers or container repositories, and that's where most people consume open source. And that's where we could have some analytics with the help of Scarf as you build this out. Avi Press: Yeah, we are focused on how the software gets from your laptop, the maintainer to the laptop of an engineer at a big company that starts to use it and put it in their project. And you're absolutely right. While in practice, these things are just on GitHub and you can just go get the source, most modern day projects are wrangling thousands of dependencies, and you're not going to just go to GitHub and clone the repository and follow their build instructions. You're going to go get a binary, you're going to get a container. You're going to get a bundled artifact of some kind. And that's where a lot of the data, that's sort of where this data's already being collected by these registries and not being made available beyond just like how many downloads have you ever had ever. Eric Anderson: Yeah. Avi Press: Or how many downloads did you have this week, which is great. And some languages even go further than that, which is great, but they're still missing a lot of very, very key things like how many unique users did we have this month? Eric Anderson: Yeah. So as a person who consumes a lot of metrics on open source projects and I've looked at container downloads and yeah, I see a big number growing, ever bigger, but you're right. I don't know. Is that somebody who's doing a bunch of CI/CD polls, and they're just really aggressive with their CI/CD downloads or are these new user adoptions? And as far as I can tell, there's not really a way to understand that from the aggregate number. Avi Press: Nope. Not at all. Eric Anderson: And this is what Scarf would give you. Avi Press: That's absolutely right. That's one of the things that a lot of Scarf users have been most surprised by. They'll see these really big download numbers and they'll be very surprised by the number of distinct sources that that traffic comes from, which sometimes is a disappointing realization often, but other times is very useful to know, because you were not about to make an uninformed decision as a result of that. Yeah. We'll see a lot of, yeah, a lot of CI pipelines, a lot of just really aggressive updating scripts or Damons or these sorts of things. And it really inflates those download numbers by a huge margin. And this is one of those things that it's different depending on what ecosystem we're talking about. But by and large, a lot of those downloads are not as real as you or not as meaningful as you may have thought they were. Eric Anderson: Another topic that I think you talk about a bit Avi, that I think is interesting is not only the analytics, but just the awareness, communication, even the help that can happen between a maintainer and consumers. And in preserving some privacy and some openness we've lost all this ability to coordinate. And I just, I don't know. It struck me that I hadn't really internalized that that was the case, that there are folks who like, would love some help from containers and maintainers who would love to give some help and even exchange some value beyond high fives that can't do it today. Avi Press: Yeah. This is so pervasive. There's so... And I see it all the time of engineers online that'll be like, complaining about like, Ugh, this library is just not well maintained. Like I need this fix and they can't even tell me when it's going to happen. A lot of them probably could if you were paying them, a lot of them would love to, if you were paying them, some of them wouldn't surely, but it's just one of those things where there is a market here that is just really, really underdeveloped. And it's underdeveloped because inertia really, like they're just the movement in this direction has been really slow. And it was kind of baked in from the beginning that open source would be very separated from the big enterprises of the world, but it didn't really pan out that way. It paned out almost the opposite where now open source is used everywhere, including the big companies. Avi Press: And that puts us in a increasingly precarious position where we talk about, like the open source supply chain is one of those things that people are talking about a lot right now, and kind of the security of the supply chain. And ultimately a lot of resources need to go into secure this and the maintainers are kind of the key people that need to be involved. And yeah, so I think overall as our society's reliance on this stuff increases, the amount of opportunity here for maintainers to actually build sustainable businesses or financially sustainable projects increases as well. We just have to actually be very intentional about how we want to go about that. And I think to start, having some visibility into the usage of the software is step zero to doing that. Like we can't reasonably make business decisions around something that we can't measure or understand or analyze. Eric Anderson: Yeah. Maybe a comment on your big business thing. You're right, that there was a time when folks thought it was really risky to consume open source software, and now they think it's risky to adopt anything that's not open source. Like there's this obsolescence problem that they're worried about. So Avi, we've kind of danced around how Scarf works, that it relies a bit on these distribution channels, but maybe you could tell us a little bit more. How exactly do you get the analytics and, Avi Press: Yeah. So Scarf has a few different pieces of the platform that all fit together and collect data in different ways. The main way that I would say is the most novel, the most useful is called Scarf Gateway and Scarf Gateway is a registry proxy and redirect layer. So it sits in front of a package registry, a container register, really anything, any server on the internet that hosts anything. So the idea is that today, if you are a maintainer and you push say Docker containers to Docker Hub, you put your Docker pull command in your README and people go straight to Docker Hub to grab it. They don't go straight to you. And so if down the road Docker says, we're actually going to start rate limiting free anonymous end users, your kind of just left with that. There's nothing you can do because that's where your software is. Avi Press: So you can either keep using it or you can move it somewhere else. And this is indeed what happened. With Scarf Gateway instead of pointing your user straight to Docker Hub, you'd continue to push your containers there, but you would actually have your users download the containers via Scarf. And then all Scarf does, is it just redirects the download traffic over to Docker Hub or wherever you've configured it, but they're going to you first. You can connect your own domain with Scarf. And so your DNS is kind of the very first step, meaning that you have effectively taken control of the distribution channel. It is yours. It is not someone else's. And so down the road, you're like, eh, Scarf, not really working for me. You can point that URL somewhere else. But the idea here is that by being in the middle of the download path to a given piece of software where we just pass the traffic through, but then we can record all of the metadata about the download. Avi Press: So, we can know what version of the software was being downloaded. What was the runtime or what was the client that was trying to do this? Where in the world was it? What company did they work out? What cloud environment was this happening in? Have we seen this person before? All those sorts of things and we can start to actually get some really granular data about the download. This combined with some of the other pieces of Scarf, for instance, we have documentation insights, which is essentially just pixel tracking for open source docs. One of the things that happens with open source is that the code itself might be hosted in a number of different places. Your README might be on GitHub, and Docker Hub, and NPM, and your website, and in emails, and people's editors and you just can't really track that easily today. Avi Press: But by embedding just a clear tracking pixel suddenly if that gets loaded, it hits Scarf servers and then we can show the maintainers that information. And so we can actually start to have some visibility into the whole journey of a user with a piece of open source software. They came in and they looked at the Docker Hub README then they pulled down this version of the software, then they tried this version and we start to, and then they looked at this part of the docs or that part. And you can start to correlate all these things together and you start to get a pretty clear picture of what your users are doing with the software, which is something that you really just don't have today, anywhere else, kind of from discovery to education, to development, to deployment, like what does that look like? Avi Press: The really, really crazy thing to me that I still just have a hard time wrapping my head around is that we're talking about some of the most technologically sophisticated people in the world, open source developers. We're like building the foundations of technology in every other sector that everyone else is working on. The state of the art to learn about your user base is to send a survey. It's fundamentally crazy to me that that's what our solution is. And these are the same people that are building all the other analytics in every other domain, but we just don't have it ourselves. And so we want to catch open and source up to this century when it comes to just basic observability. It's one of those things that can help the maintainers do their jobs more effectively, and that ultimately leads to better software for everybody. Eric Anderson: And then Avi, we talked a minute ago about the gap between maintainers and consumers of open source and how there's just so much opportunity and value to be kind of offered there. There are a few folks doing that. I think these efforts are kind of early as opposed to widespread, but Tidelift is one that comes to mind. I think Gitcoin, if that's the right name is another, perhaps you know of others. Clearly this is a place where people feel like there's some opportunity. Any thoughts on how Scarf brings more to the story given these alternatives? Avi Press: Definitely. Definitely. Yeah. I think those two are definitely some of the main ones right now. A few others to mention, there's also OpenTeams, Aviel, a few other companies that are working in this space and there is a graveyard of companies that have tried and failed to do something in this space to help this problem. It has definitely been tried before. How we are different and unique in this space is with the focus on how the software is actually being distributed. A focus on analytics, which I think really comes first. I do not think that open source sustainability problems are going to be solved with, we just didn't have the right Rails application to connect the two sides. Like that's not the problem here. The problem is one of incentives, one of social inertia and a lot of other things, but I think those are the main ones. Avi Press: And so what we are trying to do is build an open source world where when code is distributed from one place to the other, there's some observability to it. There's an understanding that software developers need to know how their software is being used in order to effectively maintain it. And that being something that we embrace rather than prevent at all costs. I've actually, I find it fairly hypocritical sometimes when we talk about, oh, we need to empower maintainers to be sustainable and build better software, but only in these very specific ways that I have kind of condoned without really like letting them do things that are very acceptable in a lot of other ways. And let them find ways to do it safely. Avi Press: When it comes to how we differ from some of the Tidelifts of the world, I think Tidelift is a really interesting model where they're trying to do this subscription approach where and your kind of paying a lump amount and that is distributed to the maintainers that you rely on. And we're actually kind of coming at it from the other angle of empowering individual parties to connect by helping one maintainer identify his or her commercial users, and then go connect with them. That helps us be useful right out of the gate, rather than when you have a critical mass on both sides, which I think is a fundamentally more powerful approach to this. Not in any way to discount what Tidelift is doing. Avi Press: I think they're doing really great work that's very, very important, but yeah, we're just coming at this from a different angle that starts with the actual foundations of distribution and delivery. And as a result, we have some very unique data that is really hard to find period and making it really easy to get it, really easy to get started doing it, and frictionless to do so. And so over time, we will have more and more components that some of these other offerings have, like broker and support contracts and selling licenses and these kind of things. You can do all of that once you have kind of established some tools with the distribution channels that let you do these kinds of things. So we're building those pieces one at a time that actually enable these kinds of business transactions. Eric Anderson: Yeah. Yeah. And doing so would address kind of the marketplace problem of you can acquire one side of the marketplace, the maintainers by giving them something that they don't have today without having to kind of get the consumers involved. Avi Press: Exactly. Eric Anderson: And then once you've got a bunch of containers, you can be like, look, and now we can help bring their consumers on board and broker a relationship here that benefits everyone. Avi Press: Yeah. That approach does not come without drawbacks. Like Scarf kind of has to find product market fit multiple times in order to really, truly be successful in the grand scheme of division. But it does mean that we can further our mission right away. I think by even delivering the amount of insights we already have in the short history of the company, we've already provided a lot of really actionable leads that have driven business for our customers. And I think we're starting to really, really normalize this as a practice that actually this is okay for maintainers to do. It can be done in a privacy conscious manner. It can be done responsibly and it's a win for both sides when it is done. And so over time as we do exactly, you said, as we do get a good, massive maintainers, we can start to really build the tools that connect them more effectively to the commercial end users on the other side and build out tooling for them as well. Avi Press: But I'm really proud of where we've already gotten on pushing analytics forward and showing people the impact of their work. I just as much really love when I see an indie maintainer that says, oh my God, I had no idea that NASA was using my library, but they are. I never knew that this person was using it or whatever that looks like. And I think sometimes it will really wet people's appetite for commercialization to show, look, you actually, your work has a lot of impact and you really could be supporting yourself with that value that you're delivering. And so I'm really excited to hopefully inspire and enable a whole new generation of open source entrepreneurship that may not have been possible otherwise. Eric Anderson: Totally. Not only making people who are currently offering open source a path towards supporting themselves, but I think you open the floodgates. There's a bunch of people standing on the sidelines that like, man, I'd love to work on open source, but I just can't afford to. Avi Press: Right. Eric Anderson: And if you give them a path, you can just like flood the market with all kinds of new, awesome projects. Avi Press: It's one of those things that we don't talk about very much, but we often, the overall diversity of open source has needed improvement for its entire history. And what we just don't reckon with enough is that having the spare time to work for free is a very privileged position to be in. And we're all very lucky to be participating and benefiting from open source. But if we made it more amenable to actually being paid for your work, we would definitely encourage people from much more diverse socioeconomic backgrounds to participate. That's something that would be very, very good for the space. Eric Anderson: We've talked about maintainers kind of in generalities, you've mentioned indie devs a couple times, but I imagine there's kind of two archetypes. There's the indie devs and then folks who from the outset were like, I want to build a company and I'm going to hire a bunch of people, maybe even raise some venture capital, but I'm going to first build an opensource project and coming up from that direction. I feel like Scarf has found some headway in both camps. Are those the right kind of camps to describe? Is there more granularity than that? And do you see these as kind of two different markets for you? Avi Press: Yeah. They do have different needs, I would say. And yeah, so we have either indie developers, or just kind of more broad open source organizations, more generally that don't really necessarily have a commercial strategy at least now. And then we have the commercial open source startup, I would say, is the other archetype that we deal with. Yeah. Those companies definitely have very different needs. I think that one of the really key things that we deliver to those companies is when businesses show up in your Scarf data, who do you need to get in touch with there to go sell the enterprise offering or your cloud service or whatever it is. And we have that information as well. And that's one of the things that we sell access to as a part of our paid plans, but it's really, really important that we be useful to all open source projects, not just those businesses. Avi Press: And so for everyone else, all of our tools are completely free to use and will remain that way. And I think there's a lot more of a focus on just helping people understand the impact of their work versus like, who do you need to sell to at some business that just got started with your software? It's definitely, they are different and they have different constraints in terms of what they can and cannot do, how governance works in the projects, how they can and cannot adopt new tools. And so we are very much navigating that right now, but we're making good traction in both, but I would say, especially so with the commercial open source startups, I think is really turning into one of our primary verticals. Eric Anderson: Interesting. So if I'm just using Scarf to understand my user base, but I'm not commercializing my project and I'm using your basic tools, everything's free and I can imagine being so for the future. And so you're commercializing at Scarf, the folks who are commercializing their projects up until now. Avi Press: Yeah. Our goal is to help these people build businesses. And so the model that we're taking is really aligning with those businesses. And so we'll make money when they make money and not otherwise, which I think is a really important thing because we've seen a lot of misalignment of incentives in the open source space. A lot of times it is with the package registries and the services that actually stand up all its infrastructure, which is expensive to run. And so it's not surprising that these kinds of things have happened. Like NPMs acquisition, Docker Hubs moves that they have made. And these kinds of things really highlight that misalignment. And so by really, really intentionally aligning our incentives with the one of our customers, I think the result is a much better product that really has their values in mind. Eric Anderson: Avi, as we wrap up here, anything we should be looking forward to you for the future or kind of plans for the year that we can look forward to? And also share with us how people can kind of learn more about Scarf and connect. Avi Press: Yeah. So I'll start with that. You can find Scarf on all the normal social channels. We're Scarf_oss on Twitter, we're scarf-sh on GitHub and our website is scarf.sh. So definitely follow along, because there's a lot of exciting stuff coming up. We are rolling out Scarf Gateway for various other languages. So we just launch Python support. JVM packages are coming very soon and a few other tools for maintainers broadly and for stakeholders of open source to see stats for the entire community, not just Scarf users will be coming soon. Yeah. I mean, I think what really, you can just expect that we're going to continue to push the envelope on what can and cannot be tracked within open source and what can and cannot be tracked safely and responsibly. And yeah, we'll be making a lot of headway on this with some really exciting tools. Eric Anderson: That's awesome. Avi, thanks for all you're doing. I think it's exciting for the community at large. We've done 40, 50 shows and everybody I've talked to on here would benefit from the work you're up to. Avi Press: Yeah. Thanks so much. Eric Anderson: You can find today's show notes and past episodes at contributor.fyi. Until next time, I'm Eric Anderson and this has been Contributor.