Kate:
Welcome to PodRocket. I'm Kate, the producer of PodRocket. And with me is Derrick Stolee. Hi Derrick, how are you doing?

Derrick Stolee:
Hi. Doing great.

Kate:
Derrick is the principal Software Engineer at GitHub. Thanks for joining us today.

Derrick Stolee:
Very excited to be here.

Kate:
And also with me is Brendan, our director of product engineering at LogRocket. Hi, Brendan, how's it going?

Brendan:
Good. It's great to be back on the pod.

Kate:
Thanks for joining us again. Derrick, just to get started, maybe just tell us a little about yourself and what you're working on and we can go from there.

Derrick Stolee:
Sure. As a software engineer at GitHub on the Git Fundamentals team, we really focus on the open source Git project and it's related tools. And so we spend a lot of time contributing to the core Git project in the open source world. And that has a lot of effects on the clients that you have on your developer machines. And a lot of it has been focused on performance specifically with big monorepos in mind. Our team has historically supported the Microsoft Windows and Microsoft Office monorepos.

Derrick Stolee:
And we've had a lot of success in improving their performance, and sometimes in surprising ways that actually affect us normal users of much smaller repositories. So it's been really interesting to see how those extremes of scale can inform how to make things better, even just for regular Git users.

Brendan:
Awesome. Well, we'll definitely dive into Git and monorepos and doing stuff at scale a little bit more. But maybe before we dive in, I noticed on your LinkedIn that you started in academia, you did a PhD in math, I think you even taught for a while. How did you end up transitioning into engineering and how did you end up working on Git?

Derrick Stolee:
Well, if think about it is that I actually went to college with the intention of going into software development and with the job that I have now being the end result and goal. But during undergrad, I fell in love with math and specifically graph theory. And so I went, decided I wanted to go to grad school and become an academic. And I started off pure math, but then I decided I liked doing computer science too much, so I did a math and CS combined PhD.

Derrick Stolee:
And I really focused on computational methods for these pure math problems. And so the whole time I was programming along as well. And I really, really enjoyed being a grad student because I just got to be heads down on these problems. Think about really interesting creative solutions to these really tough challenges. And then when I became a faculty member, that became a lot less of my focus because I was busy teaching, I was busy advising graduate students.

Derrick Stolee:
My math graduate students couldn't program the same way I could, so if I wanted to do that kind of computational research, I had to do it on my own. And for the first year or two, I could manage that on nights and weekends, because that's when no one else was needing my time. But then my wife and I had a child and suddenly I didn't have nights and weekends anymore and my job just became a lot less fun. And I said, "You know what? Maybe it's time for a change."

Derrick Stolee:
I told my wife and she said, "You know what? I'm interested in changing which university I'm at." And so she went and got a new job and that motivated me to say, "Okay, well, now you need to find a job at... I don't need to find a job at the same academic institution. I can find a new professional career." And that's when we discovered when we moved here to Raleigh, North Carolina, that Microsoft had an office here and I applied and got the job there.

Derrick Stolee:
And that was where, what is now called Azure DevOps was located. And I got initially started working on the Azure repos backend doing with the custom, get implementation they have. But within a year and a half, I transitioned to do stuff on the Git client because I was really interested in contributing to open source. A lot of my academic background is about free information and sharing, so that was a very natural fit for me.

Derrick Stolee:
But also just I realized that we had gotten the Windows monorepo on to Azure repos, so the back end scale was working, but there's a lot of client side scale that needed to be done. So that was the most exciting thing for me to go jump into. And I've been doing Git client stuff ever since, even though essentially my team got reverse acquired by GitHub. After Microsoft acquired GitHub, they acquired us engineers who knew how to do Git things. And so I've been at GitHub for about a year and a half.

Brendan:
Gotcha. When you talk about the scale of like the Windows monorepo, what kind of scale are we talking? How big is that code?

Derrick Stolee:
It's big in lots of different ways. I think if you were to have a checkout of just the tip commit uncompressed, I think it's 100 gigabytes. Maybe somebody says 300 gigabytes, something around that range. It's three million files. The history is adding over a million commits per year. So just every single aspect of growth is enormous. And so it's just really interesting to able to keep up, have a workable developer environment in that scenario is really tricky and that needs a lot of stuff going on.

Brendan:
How does that information flow to you as the team responsible for making a working developer environment? How do you find out what's working or not working for the developers who are trying to wrangle that giant repo?

Derrick Stolee:
In the very early days when we were starting this transition, there was a lot of involvement between the Azure repos group, the Git team and the Windows engineering system to make sure that this all worked together and that everything was going to flow. But then after that initial jump, it was like, "Oh, the technology's in place, we don't need to have as much direct communication. We just need to improve the things that are necessary to improve." And one thing we did is we shipped a tool just within Microsoft infrastructure to collect telemetry data for the Git use by these developers in Microsoft.

Derrick Stolee:
And we were able to get really detailed performance information. The macro scale as, well, how fast is a Git fetch commander, how fast is Git status. But even at a minor scale, we put in some tracing into Git that you can enable via an environment variable on your machine that shows like, well, the Git status actually was spending this amount of time in this part of the code and another part of time, and another part of code. And so you could actually say, "Why is this command slow? Which part of Git is actually causing that?"

Derrick Stolee:
And we're able to see that with thousands of engineers working on these big repos to say like, "What are the pain points? Let's go find out." One of the examples was they were saying, Git push was really slow, and we thought, "Okay, well let's clearly a network thing. Let's make sure that everyone's got up-to-date objects and that way they're sending as little data as possible to the server." We did that and it sort of helped, but not really. And so we looked deeper into it, into these different regions and we found that actually it was not even network at all. It was searching the client side, deciding which objects he needs to send at the first place.

Derrick Stolee:
And it was actually doing a full enumeration of every object, every file, which we had all those three million files, even if you just updated one of them. And so it's like, "Well, I'm going to send you this one object, but I examined three million beforehand." So we were able to do a custom algorithm, we called it sparse push that essentially did a diff to say, "What are the actual new objects without having to explore the full space?" And that really sped it up and we got that upstream. So it's now the default way that when everybody pushes it uses this new algorithm.

Brendan:
You're a principal engineer at GitHub, and I think especially since Will Larson published his book last year on staff engineering and senior roles on the IC track, there's been a lot of interest in the community around what those roles look like. So it'd be really interesting to hear some of the details of your role day-to-day. What are you doing? Who are you collaborating with? Are you more hands on keyboard? Are you working with other engineers to make them more effective? What does that principal engineer title mean in your experience of it?

Derrick Stolee:
That's a really good question. And yeah, I love that staff engineer book, partly because it really goes into a very clear delineation of the different roles at a very high level. And I would put myself in a really technical expert kind of role in the sense that I know Git really well inside and out and I've got a lot of experience of solving these at scale problems. And I've got a reputation for being successful in that space. So I'm primarily focused within the Git area, so I'm not super helpful in terms of I'm touching a bunch of teams. I touch the three get teams at GitHub.

Derrick Stolee:
But I do have influence for instance, with customers a lot more, where I'm interacting with customers who are having Git pain points to be able to figure out, okay, what's the right technical explanation for what's going on and how would we use that to change our roadmap? And so I would say probably 50/50 in terms of 50% actually hands on keyboard, writing deep technical features and performance improvements, and then 50% interacting with customers, writing blog posts, giving presentations, showing up on podcasts. And simultaneously working with the rest of the team to make sure we're aligned in a good direction and mentoring junior engineers within the Git space.

Brendan:
So with that 50% of the time that you are hands on keyboard, solving problems, how do you decide which problems you want to tackle or which issues are the most valuable for you to put your time towards?

Derrick Stolee:
There's a lot of different ways that have I've approached it in the last few years. When I first started going, there was like there's things on fire. We see that users are having trouble with these commands, we need to solve them. And so I was very targeted to, we have the data that this is a problem, how do we fix it? Okay, let's put that in and fix it. As those fires became less clear, it transitioned to, okay, what's the systemic thing below that's causing a problem?

Derrick Stolee:
And so things like the commit-graph, which speeds up commit history or the multi-pack-index, which allows you to do faster object lookups. Those are systemic things that affected everything a little bit. And so that was a transition for a while. And then we got again into some specific where there's specific customer asks that we then try to figure out and juggle in. We're at the point where we've got way too many things we want to do for the capacity we have to do them.

Derrick Stolee:
And even if we did, we don't want to overwhelm the open source project by just submitting 20 things and having them say, "Well, now you got to deal with us pushing all this code to you." So it's really important to keep both sides of capacity in mind." We recently did a thing in my team, I think it was in December where we did like a mini summit and everybody created some design documents of things that they wanted to do. Some were, "Here's this performance tweak I want to do," or, "Here's this usability change," or, "I just want to go and study how people use this one feature to see if we should extend it."

Derrick Stolee:
And we got together and we picked our top four or five to set our roadmap. And I'm really excited about what's going to be coming out of that. I've got a project right now I can't about yet, but I'm literally like next week probably going to hit the button to send it to the mailing list, so it'd be public. But I try to get approval from my product manager to say, "Can I make sure we commit to it by putting it on the podcast?

Kate:
People have done that on. Yeah, that's how you get it going.

Derrick Stolee:
Well, look forward to seeing that. If you're listening, go look at the mailing list and look for messages from me and maybe it's out there, but yeah. So it's something where I've been spending a lot of time prototyping and double checking to say, "Is this something that's actually worth doing?" And something was like, "I think it's worth doing wing. I think the evidence is there. We just need to make sure that the rest of us are committed to delivering it." So that's going to be exciting.

Derrick Stolee:
We just finished, part of the reason this was really important to set our new roadmap is we had just finished a year long effort to deliver this thing called the sparse index. And that was very clearly a direct ask from something in the Microsoft Office monorepo, where they're using sparse checkout to reduce the size of the working directory. So instead of the two million files they would normally have, the developers can have 100,000 or 200,000 that they actually care about. But we found out that the index stores a reference to every file at head no matter what.

Derrick Stolee:
And then just sets a to say, "This isn't actually on the working directory." And so that index size was actually a bottleneck for a lot of Git commands. And so we found a way to change the index format to make it much smaller, so it's only roughly the number of files you actually have in your working directory. But it required changing in a way that we had to through and change a bunch of algorithms inside of Git to understand this new data structure. And so there's a lot of safety valves that we had to slowly peel away, "So, okay, here's how Git status works with it. Here's how Git commit works with it. Here's," and so on and so forth.

Derrick Stolee:
The one we're currently working on upstream is Git stash, which is really tricky to get working with it, but it gets really good performer it's once you have that in there. And so as this work is wrapping up, it's like, "What's we going to do next?" That's been a really big thought process on our team.

Brendan:
Is there any sort of advice or sort of general counsel you would give to engineers who are earlier in their careers who are maybe interested in pursuing a more senior IC track, maybe being a principal engineer at some point?

Derrick Stolee:
The best career advice I ever got was from a principal engineer, and he said, "Make sure you're solving your boss's problems." That however you're doing things, you are not causing the problems, but also if there's a way you can volunteer to make sure that a problem that your manager is having goes away, then you become the person who solves problems. And that raises you up. There's going to be at some point where you can realize that you are noticing that there's work that could be done that could solve problems that the team is having.

Derrick Stolee:
And if you can start voicing those and getting them on the backlog and doing the work, then you can start driving your career in terms of what are you doing as opposed to getting more candid to you that you then complete. It's important to be able to do that, that's a hallmark of a quality engineer is that you can do whatever is handed to you. But also to get to the staff and principal level, you need to be the one who's driving what work needs to be done.

Brendan:
Yeah, I really like how you framed that shift from being reactive to the team's work to being proactive and really being a voice for what the team is doing and what the vision of the product is, especially when you're on a smaller team or be at a smaller company. Those senior engineers become a really big voice for what the actual direction of the work is going to be.

Derrick Stolee:
Right. I've heard people say that the myth of the 10X engineer is not that there's somebody who's 10 times more productive than everyone else, but there are engineers who help everyone else become productive enough that their impact is 10X. But the only way to do that is by multiplying others.

Brendan:
Yeah. Maybe a good opportunity for us to pivot a little bit and talk about Git. As I was doing some reading and looking through some of your blog posts and conference talks preparing for this conversation, I was struck by how little I actually know about Git and what it's doing under the hood. And I've used it every day of my career as a developer. I think that's probably true for a lot of us that we're only using a couple percent of what Git is capable of and does on a day-to-day basis.

Brendan:
So for those of us who aren't maybe as plugged in to Git and the problems you're solving as you are, what are some of the interesting unsolved problems or interesting things that are in progress within the Git community generally, that we should be excited about?

Derrick Stolee:
There's a lot of things going on and it's that there's a lot of different dimensions of scale involved with Git, and there's lots of different approaches to solve every one of them. The biggest thing is that Git is a distributed version control system. We like to say that, well, once you clone, you have everything locally, so you can do whatever. You don't even need to talk to the remote ever again and you can do everything. Which works really, really well in the open source world, especially if you're talking about things where they contribute via mailing lists, where you're not even sending commits around, you're sending patches around, and you apply them.

Derrick Stolee:
But when you're talking about large repos and you're saying, "Well, do you really need absolutely every version of every file in the history?" That's a good question, and different things go wrong at that scale. So the first thing we start to do is to bend those rules about what Git needs to say, "It's got a complete repository." A tool called Partial clone, lets you say, "Well, give me everything in the history, but don't give me any of the blobs," which is the file contents, which is generally the majority of the data.

Derrick Stolee:
So you still have the commits, which is the history and all the messages there, but you also have the trees, which is the representation of the directories. And so you can always do something like a file history because the trees actually tell you whether or not the files have changed. And if you need to do something like a more involved diff, if then it'll download the objects necessary or when you actually do a checkout and you say, "Oh, I don't have all the files I need to place these on discs. Let me go download them dynamically from the server."

Derrick Stolee:
And that's a really great way to speed up that initial download, but then gets you what you need. There's a little bit of a trade off, if you want to get blame and you don't have all the versions of that file in your history, it's going to have to go download those. It's going to take a little while that first time. But then you've got them and you're ready to go. But then you can do things like, well, if I have that, if I add in the sparse checkout feature I mentioned earlier, which reduces the working directory, so you actually don't care about every file at head, only the ones you really building.

Derrick Stolee:
And you pair that with Partial clone, then suddenly your checkouts download even fewer files. So you can really, really make sure that you're focused there. The thing that I think is really interesting for like the next set of things we want to be doing in the next few years of Git is really helping people build into that by default of saying, "Hey, I have a big repository. How do I make sure that I do Partial clone really simply? How do I initialize into a sparse checkout without needing to blow up my working directory?"

Derrick Stolee:
And then a lot of that is going to be coming into the idea that we need to look outside of source control for that. A lot of reasons why the Office monorepo works with sparse checkout is because they have a really rigid build system that has componentized the different pieces. And they have a really concrete idea of what it means to be a project and how they depend on each other. So a user who's building Word knows exactly which directories they need and they can tell Git, "I care about these."

Derrick Stolee:
So can we build more of those connections between build systems and Git, so that way it's easier for anyone out there to just say, "Well, since I'm using this tool, I want to enable sparse checkout to build this part of my build cone and make that really painless." We're not there yet, but I think that that's a big thing we want to do in the future.

Brendan:
Is that something that you see yourselves as people working on Git driving, or is that something that different projects or tools we have to opt into and start changing the way that they interact with Git [crosstalk 00:20:05]

Derrick Stolee:
There's definitely a combination there because we need to do more to understand what the build systems need, and then possibly change Git to meet those needs. And we have a few ideas we're playing with, but we don't want to commit to anything without knowing what those build systems need. And then even if we been build it, it doesn't make sense if none of the build systems ever want to use it. So it's a, once the feature is in Git, then that opens up the possibilities for others. And so we're hoping that since sparse checkout's been built in, and it's really fast now, and it's really stable. That hopefully other people say, "There's benefit in connecting there. Let's try it."

Derrick Stolee:
And then when they hit pain points, we can have that conversation about, "Well, how can we make Git better for what you're trying to do." Simultaneously, on my team, there's some people who are really focused on doing that investigation proactively and saying, "For our biggest customers, what build systems are they using?" And these are probably open source build systems. How could we help those build systems improve so they can use these advanced Git features? And yeah, that accrues back to our customer's benefit, but it's also going to accrue to anyone else using that build system, anyone else using Git, even a competitor version of a Git service.

Derrick Stolee:
So that's really the biggest thing we want to do is... And once we have maybe one or two build systems that have an example of how they integrate here, I think it would be much easier to make the case that other build systems should integrate in a similar way and we'll get this cascading effect. And so that's something I really hope happens.

Brendan:
You've touched on monorepos quite a few times, and obviously Microsoft has major monorepos. But it's definitely even for smaller teams, very in fashion right now as a code organization and version control pattern. We use a monorepo here at LogRocket and we certainly don't have three million files in our core repo. But I'm curious if you have an idea of why that pattern has become so much more prevalent in the last few years and what do you think is driving that shift?

Derrick Stolee:
Yeah. There's some interesting things that I would think about driving that shift, is one that a few years ago, Git didn't have the scale to be able to handle large repos. And so there was a lot of people saying, "I need to split into multiple repos because otherwise Git can't handle it." And so that was that kind of intention. And I think that we've moved past that for the most part, except in these very extreme examples, where it just needs a little bit of help, but still once the data's on your machine, Git can handle it.

Derrick Stolee:
The other thing is that people have learned that, well, it's really easy to spin up a repo, but then it's really hard to keep track of all of these different repos. You just have to then create a repo that tells you where to go to make different changes, and so it just becomes a lot of stress on the mind. The way I like to think about it is, Git is sort of a database. Don't think about it as SQL because that would cause a problem. But it's where your code lives, it's the base for your code.

Derrick Stolee:
And it only gets bigger, and so if it gets so big that you say, "Well, I can't make sense of this anymore." What would you do with the database? You'd shard it. So, okay, let me shard it into many things, but in order to actually make sense of that sharded database, you need another database that tells you which shard your stuff is in. And that then needs... It becomes complicated, but it's something where at least a computer is the one navigating that space.

Derrick Stolee:
With Git, it's every one of your engineers is navigating that sharding, and they have to think about it every time, "Okay, let me go to this common place to find out how to go." And they get used to the certain common patterns. But as soon as they need to jump outside of their common patterns like, "Okay, now where do I go?" It's really messy, there's no real standard way of doing that. There's an idea of using submodules for this, and you've got your super project with all the different pointers, which is very similar concept.

Derrick Stolee:
But people who have used submodules at scale can tell you how difficult it is to make cross-cutting changes with submodules. And so if the ergonomics of that worked out better, then maybe that would be a good approach. We found that it's much better to keep the monorepo together, and that way you can, especially when you have these big repos where you can use something like sparse checkout to say, "I only care about this code. I'm going to do all of my developer testing locally with this, and I'll trust the CI machine," which is a big beefy machine that's only spending time doing builds and tests, "to make sure that it integrates well with everything else. And that way I don't have to spend all the time doing that."

Derrick Stolee:
And I think that's becoming much more of a selling point for these monorepos that the CI is stitching things together in a much more simple way when you're all contributing to the same code base.

Brendan:
Yeah, I think that's been a big part of what having a monorepo has done for us at LogRocket, where being able to just use one set of build tooling, know that's not out of sync between different services within the same project. Not having projects try to clone each other and run tests just really streamlines the developer workflow. I guess one other thing that I'm really interested in is the interaction with Git as an open source project in an open source community.

Brendan:
Obviously, it was originally developed as part of the Linux Kernel, I think and it's a technology that is both really widely depended on across the engineering world and something with really strong open source roots. How do you feel like that open source aspect of the work influences what you and your team are working on? And what is your relationship with the Git community at large like?

Derrick Stolee:
It's really been interesting to work in the open source world, especially at the level of Git, because it is... There's some ergonomics of it because it's using the mailing list. Git is not built on GitHub, it uses the mailing list. And we work really hard to make sure that we meet the open source community where they are and follow their standards. And there's really high standards for the Git community, partly because of the email workflow allows people to get really, really fined reviews on every commit that we try to send.

Derrick Stolee:
Your commit message needs to be impeccable. Every single change needs to be as small as possible to make sense in the email editor and then and respond to it, say, "Yes, I understand this change makes sense." And all these changes you submitted together tell a good story. The thing that's also really different about if we were just working with our team on a project, there's already the buy-in that this is something worth doing. And so you don't even need to make that case. You just say, "Well, I'm trying to complete this work. This is how I'm implementing it."

Derrick Stolee:
And you can get to the integrities of why did you implement it this way or not? But there's never a question of why are you doing this? And with open source, you need to lead with that. Like, "I'm doing this because this is how it's going to affect people and improve their lives." And you need to be able to make that selling point, otherwise, it's not going to be something worth taking on that risk of the change. Every single opportunity that you have of writing code is also an opportunity for risk and causing bugs. So why take something that's working and break it, if it's not contruding value?

Brendan:
Is there an example that comes to mind of a time where it was really challenging to get that buy-in or to get aligned with other maintainers of Git on why something was worth doing or what it was adding to the platform?

Derrick Stolee:
We've had some things like that. I can imagine that right now the most important one is we have a feature called the FSMonitor, File System Monitor. A few years ago, some Microsoft People built in a hook that allows you to do things where it would talk to like Facebook's Watchman to say, "What files have changed?" And that way Git could focus just on those files that have changed as opposed to inspecting the entire file system for all the files that might be updated. So it speeds up things like Git add and Git status quite a bit, but it still has that hook invocation and dependency on another third party tool.

Derrick Stolee:
And while it was mostly working, it's a little bit complicated to get set up. And there's also some things where if you hit commit and you create a new commit file in your docket directory, Watchman would see that and report it as a possible change, because it doesn't know it's in a Git repository. So we built a version of that that is built into the Git code base that does that file system watching. And then has its own inner process communication layer for talking to the the Watcher. And it's a lot faster because it doesn't have that hook invocation.

Derrick Stolee:
You can avoid all these changes to the docket directory and do a bunch other fancy things that are Git-specific. But as we're presenting it, we say, "Well, file systems watching stuff is really platform-specific." And a lot of our users are on Windows, so we focus on Windows first. Here's also a macOS version. We didn't build the Linux version yet, and trying to tell a bunch of Linux nerds that this feature isn't going to work for them is not a very easy sell. And it's also something that it's really complicated. That's a lot of code of this really custom stuff for file system watchers, a lot of having a long-lived process is new for Git, having in parallel requests like this.

Derrick Stolee:
So a lot of places where this feature could go wrong and a lot of places for people to say, "Well, this is how I would've done it in things." And so that's caused the feature to take a long time under review upstream. We've had it in Git for Windows for a year or year or so. So if you want to use it in Git for Windows, you can have it. And if you take the Git for Windows code and install it on Mac, you would also have it for Mac. And we've been sharing it with our Office monorepo users who are on both Windows and Mac.

Derrick Stolee:
So at least we've been able to satisfy the customer need there while we work through this upstream thing. But eventually we intend this to land an upstream Git, and then a Linux port will be possible. Just we want to wait for some Linux stuff actually to stabilize because the file system watching stuff has some issues with scale until we get this newer thing that... It's difference between inotify and fanotify or something. Is something that's really new in the Linux system, which is why we didn't include it initially. But that does ruffle some feathers.

Kate:
I'm curious, you've mentioned, you have to be... There's a communication of like, "This is why I'm doing this." And I think with the community, just because you expect it to just work and work well. How are you communicating this stuff? How are you making these announcements?

Derrick Stolee:
Just in terms of the sending a patch series to the mailing list, if you have multiple commits, you should also include a cover letter, which is usually referred as patch zero. And that cover letter is a great way to say, "This is the story I'm trying to tell. This is why I've started this work." But also along the way in each of the commits, the commit messages should really include the meaning for each change like, why are we doing this? Even if it's, here, I'm doing a really little refactor so I can do something soon in the next change. Okay, here's the real customer value of it.

Derrick Stolee:
And being able to tell that story so you can inspect it, not just now when you're reviewing it, but when you go back and do a history check later. It's like, "Why did somebody write this code this way?" You can say, "Oh, they were motivating this feature and here's the tests for it. And here's why they did it." Or "Hey, you know what? This wasn't even that well motivated, maybe this isn't the right way to do it. Let's change the motivation and do something different."

Kate:
Yeah, that's crazy. That's a lot to think about.

Derrick Stolee:
Yeah. The Git mailing list has a really high standard for commit message quality and it's really... I've seen people write essentially two pages of commit message for a one line change. And that's not very uncommon. I think it happens all the time for like, "This is why this is necessary." And a lot of times I benefit from that by saying, "Well, I'm looking at this code and I want to understand why it's written this way." I look at those really, really well written commit messages and it's great.

Derrick Stolee:
That works until you get so far back in the history where in the really early days of Git, they were just really rapidly writing a bunch of stuff. And I got back to a commit in 2006, that first put in this variable, I was like, "Why does this variable exist?" And there's like, "Rewrite the logic for this thing." That's the whole message, and so that's not helpful.

Kate:
I'm only human, send.

Brendan:
Yeah. I guess that alludes to something interesting, which is the level of responsibility you have to the however many thousands or tens of thousands or hundreds of thousands of developers are using Git on a daily basis. How does the scale of the tool and the just sheer number of people who are depending on the work you're doing to be high quality and to not break any of their workflows change the way you work or your team works?

Derrick Stolee:
It's something that we approach with respect and we take our time. We are never deadline-driven on my team because that's just not something that is one, a good thing to rush through anything. Also because of the review on the mailing list is not completely under our control. That's just going to take its time and we can do what we can to present things and move quickly. But it's not under control even if it gets merged or not. And so we can take our time to really double check things.

Derrick Stolee:
And a lot of times it's really simple, you write the careful tests and you know what's going to work. We also have a bunch of stuff like static analysis or your memory leak tests and performance tests that we do as the Git community, especially around release time, to really double check that the things that introduced this release aren't going to cause any big regressions. But the other thing is just, we need to really be careful especially since it's written in C, to just really make small changes at a time that you can really carefully look at and say, "Yes, this absolutely doesn't cause any things like a memory leak or segfault."

Derrick Stolee:
And they still slip in and we just try to do as much as we can to do that rigorous manual testing. Our team has also done things like bug bashes and internal dogfooding to try to get a little extra coverage of these things, to make sure that they're getting exercised sufficiently before they go out to users. And there's a lot of things about Git that are super customizable. There's so many different config settings and environment variables and things that people can use to customize how their Git system works. And everybody has a different setup.

Derrick Stolee:
And we just find those people who have this weird corner case of when I set these two things on and disable this other thing, suddenly I'm broken in the new version. And we work really hard to quickly solve the issues for those people and make this system more robust to those kind of changes in the future.

Brendan:
Okay. You're also probably not the only ones out there doing this kind of work. Obviously, GitHub is a leader in the Git as a product space and has been around for a long time. But there's GitLab, there's a bunch of other tools that are providing a hosted version control solution based around Git. Do they have similar fundamentals, teams? And what is your relationship like with other people at different companies working on Git?

Derrick Stolee:
Primarily, we work with people on the mailing list and we meet them there, and we have a few Git developer community events a year to get together and get some face time and really get to know people on a personal level, in addition to on the technical level. And that's also some places where the Git community can set some technical roadmaps or, I remember when I was thinking about proposing background maintenance as a thing. People would say, "Well, the mailing list won't like that," but I brought up during the community summit and people are like, "Yeah, go ahead, try it. Not a problem."

Derrick Stolee:
I said, "Okay, well, since if you don't think it's going to be a problem, I'll go ahead and try it out." And with some feedback, we got it in. We do have probably closer connections with the people we see more frequently on the mailing list. There's a lot of people who'd pop in and give a patch and then they pop out and they come back maybe a month or two later. But there's also the people there day in, day out because it's clearly their job to be helping on the Git project.

Derrick Stolee:
And so we're very familiar with the people who work for GitLab, but also this big group over at Google who do a bunch of things with Git. Elijah Newman is at plant here and he's been doing a lot of sparse checkout stuff with me, which has been really, really helpful. And so we know to look out for those people and say, "Oh, if they're working in this space, it's probably interesting. Let's try to give it a little extra attention to make sure." And also to make sure it's not colliding with anything that we've got in the works, because, if we're working in similar areas on similar problems, there might be that thing going on.

Derrick Stolee:
But then we also want to make sure that we're keeping things open for new contributors. So for instance I've co-mentored a Google summer of code student. We're working on proposing a project idea for doing that again this summer. Similar things with Outreachy. My teammate, Johanna Chinderland, in addition to being the Git for Windows maintainer, also created a tool called GitGitGadget, which helps people create GitHub poll requests that then you run a chat op to submit it as a mailing list to patch series.

Derrick Stolee:
So it's really easy for people to get started submitting to the mailing list now. So I think that's been really helpful thing to grow the community and have these independent developers starting to contribute.

Brendan:
I've got one more question for you, which is, now zooming out and making some really irresponsible predictions. How do you think Git is going to be different five years from now, 10 years from now from the tool we're using today?

Derrick Stolee:
Git takes backwards compatibility really seriously. So I don't imagine that anything you're doing today will be different in the future in terms of the things you do today will still be possible. What I see will be different is that there will be even more are new modes of doing things where you can say things like, "I want to do a Git clone, but I have a really big repo. So please turn on all the fancy bells and whistles. I don't need to know what they are, just turn them on and I will get started the way I need to get started."

Derrick Stolee:
And really make that really simple for non-experts to get started in big repos and make it such that engineering systems can customize what they need for each of their repos. As opposed to, "Well, Git says you need everything because this is how it works for small open source projects." But that's what we need for these really larger internal things. So I think that that's the skew I'm seeing. And there might be some interesting developments in terms of making things easier for people to use.

Derrick Stolee:
But again, because of backwards compatibility, the tricky part is, if we make a new set of commands that are simpler to use, the old ones still exist. And those are the ones you'll find on Stack Overflow, so people won't stop using them. So that's the tricky thing we have to balance. So that's my maybe conservative estimate of what's going to happen in the future.

Brendan:
Yeah. Awesome. Thank you, Derrick.

Kate:
Awesome. Is there anything that you would like to point our listeners to, where they can find you, anything like that?

Derrick Stolee:
Sure. You can find me on Twitter. I'm @stolee, S-T-O-L-E-E. And you can also keep an eye out for things that we announce in the GitHub blog. I frequently write there. My team is working on a few things currently talking about different Git features, different ways of using Git. And there's lots of interesting things that are outside of that, but keep an eye out for that.

Kate:
Awesome. Yeah, we'll include links in the show notes as well. Derrick, thank you so much for joining us and we will see you around.

Derrick Stolee:
All right. Thanks.

Brian:
Thanks for listening to PodRocket. Find us @PodRocketpod on Twitter or you could always email me even though that's not a popular option. It's brian@logrocket.