NOEL: Hello and welcome to Episode 59 of the Tech Done Right Podcast, Table XI's podcast about building better software, careers, companies and communities. I'm Noel Rappin. Our guest today is James Coglan. James has written an extraordinary programming book called Building Git. In it, he describes the inner workings of the Git source control tool by reimplementing a substantial part of it in Ruby, including commits, diffs, branching and networking, pretty much all the parts of Git that you would use every day. Along the way, he shows not just how Git works but also details of some of the algorithms it uses. There's a lot about building complex systems in general and it has some great examples of test driven development. James and I also talk about implementing in a high level language like Ruby versus a lower level language like C. It's a unique book and I've really been looking forward to talking to James about it for some time, so I hope you enjoy the conversation. Before we start the show, one brief message. Table XI offers training for developer and product teams. If you want me to come to your place of business and run an interactive hands-on workshop, I would very much like to do that. We can help your developer team learn topics like testing or Rails and JavaScript or managing legacy code or we can help your entire product team improve their Agile process. Also, if you're in the Chicago area, be on the lookout for a new public workshop, including our 'How to buy custom software' workshop which is currently scheduled for the first week of May and hopefully, more to come. For more information, email us at Workshops@TableXI.com or find us on the web at TableXI.com/Workshops and now, here's the show. James, would you like to introduce yourself? JAMES: Hi, Noel. I'm James. I'm a developer based in London. Mostly, I do stuff with Ruby and JavaScript on the web, doing much of open source things and recently I've been working on a project to rebuild Git. NOEL: Right and that's what we're going to talk about here. James has just released a book. It's about two weeks as we record this. It is called Building Git and in it, you reimplement Git pretty much in its entirety, right? JAMES: I wouldn't say in its entirety but most of the stuff that you tend to use every day, like all the sort of core building blocks. NOEL: Yeah, there's a lot there. It covers commits. It covers diffs. It covers branching, almost all your day-to-day things. It reimplemented in Ruby. I guess the first question is what made you want to do this and what do you hope people will get out of watching your step-by-step reimplementation of Git? JAMES: My own reason for wanting to do it sort of changed as the project went on and the more that I learned about it. I initially had the idea because I've seen in a couple of situations and workplaces that I've been in -- people sort of doing very basic versions of like let's just build a version control of system quickly to understand how it works. I've seen a lot of people struggle with using Git and develope presentations and I'm like, "I'm going to teach people about Git internals for a bit so that they understand it better," and that gave me the sort of initial idea to go, "Maybe I could put a workshop together where I'll just build the real world basics and walk people through it," and I sort of got carried away, I suppose. A lot of things feeding into this in terms of the way that I think programming is taught -- a lot of that is my own personal biases -- I learn best by example, rather than by reading a lot of theory first. I'm fine with theory and abstraction once I've seen some concrete examples but I like to see things contextualized. So I like the idea of writing a book that... you know, how a lot of tech books it's like, "I want to learn a specific topic. I want to learn Rails. I'm going to get a book about Rails," and the technology books tend to be very specifically focused, which is not a bad thing, per se, but I want to try something that is more like, "Why don't we just learn all sorts of different computer stuff by doing one big project?" because that contextualizes a lot of things that shows you all these disparate ideas working together and mashing into a whole in a way that I think studying individual topics one at the time tends to overlook, if that makes sense. NOEL: One of the things that was really striking to me about the book, I think it sidesteps this issue... as a technical writer, there's this issue of examples. For a lot of things like teaching testing or teaching object orientation, any example that is complicated enough to show the value of them is too complicated to sort of present in a textbook. One of the things about this example is because most of the people reading it, if not all the people reading it will have familiarity with Git, you don't have spend a lot of time explaining the example in the same way and that gives you a tremendous amount of freedom to dig into complicated internals with abandon. JAMES: Yeah and I think that's a really tricky thing with what I feel in writing a blog post or giving a talk, like figuring out where your audience is in terms of what do they already know and what can you assume on that part and what do you have to explain, so yes, starting with that, this is not a book to teach you how to use Git. I'm assuming you're already familiar with it to some extent. That gives you a big sort of head start on being able to dive in some quite complex topics. NOEL: I often tell people that understanding Git's internals is really important to understanding how Git works and even given that, I was not prepared for the level of understanding of Git Internals that awaited me in this book. Was there a particular part of Git's implementation that you were particularly excited to explain because you think it's really clever? JAMES: That's particularly clever? There was a lot of surprises -- NOEL: Well, that was going to be the next question. JAMES: I went into this, knowing some of the basic concepts like what the data model is and what branches are and to some extent, what merging means but a lot of the implementation details were sort of alien to me like I didn't know a lot about how it does graph search or how it does compression or how the diff algorithms work or any of that stuff. I don't think those are details that you need to know in order to use Git effectively, like the amount of knowledge you need is sort of high level and more conceptual than that but that's part of why the scope of the project ended up expanding is just because I kept learning more and more interesting things and finding new areas of computation that it touched on that I thought would be interesting to fold into the project. NOEL: This winds up covering a lot of ground. Okay, full disclosure, I have not actually read the entire book yet, although I have read a pretty good chunk of it, around half so far and already, we've covered a lot of like binary storage methods, the diff algorithm... You can imagine somebody shying away from showing these details because they can be kind of hard to wade through but what comes across, I think, in this book is the amount of work that goes into creating a system with the size and functionality of Git and just a ton of techniques for building that kind of large system, I think. Was that an intentional goal to talk about building software in general or did that emerge over the course of the book? JAMES: There was some of that. Part of what I want to show here was like an element of the process. I'm particularly interested in systems that are self-hosting like programming languages that are written in themselves, that sort of thing and version control systems are another common example of that. Usually, the first aim, if you're writing a version control system is to write enough code that it can manage its own history, so you don't have to be bootstrap it off of something else. I've been toying with the idea of process for a while because I think there's a lot material out there that will show you the end result of something and go look at this design panel or look at this technique for doing something and it presents, like, the end goal and why that has nice properties but it doesn't show much of the process of getting there. I think people get bogged down quite a lot in their work with doing things like how do we plan a project out so that we are incrementally delivering value. We're not working for six months and then doing a huge big deployment, like how can we shift the software incrementally so that it makes sense to the user, so that it makes sense to the team building it, so that it fits all of these sort of operational constraints and also like big refactoring projects, if you want to radically change your approach to doing something in your codebase. I've seen a lot of people get bogged down with like, "This is a great big grand project," and it never gets finished so you have to sort of learn to prioritize like which things do you really need to refactor, which things are causing you pain right now. I kind of like the idea of as you go through the book at each stage, there is something, like you're gaining functionality all the time and it sort of mirrors the idea of the Agile process where it's not, well, you have to read the entire book and then you'll have a finished working piece of software. It's actually like if you read one chapter of it, you'll have something useful at the end of that. NOEL: I really appreciate the way that the design of the code emerges. You don't start with 20 bajillion classes to handle this simple case of being able to give the status of one file. You start with something that almost looks just like a shell script and over the course of it, you refactor, you build up. What was the process of that in the writing of it? How many kind of false starts did you make? Did you build the entire codebase first? I'm fascinated by that because I'm not entirely sure how to approach that and I would be very curious as to how you did? JAMES: See, I had the same worries going into it. Knowing that if I changed my mind about how something should have been done, it would be quite painful to go back and change it, partly because editing the history of the project would be tricky. It's not until quite late on that I developed commands like cherry pick and so on, that would allow you to actually go and reshape the history meaningfully. For a long time, it wasn't able to do that but also, if I decided something 20 commits ago, was not quite the right thing, just going back and changing where that was presented in the book and changing things like commit IDs that might be mentioned in the book so making you sure that the text reflects what the repository actually contains, I really wanted to minimize how much I would need to do that. I think at the start, I remember that I wrote sort of two or three small prototypes for, essentially the first chapter to just like what's the smallest amount of code that I can do to get a working commit command and what should that include. Should that include storing trees? Should that include the index and the add command? And in what order can I do those things? It's a little bit of upfront prototyping. Then as I went on I tried to stay fairly faithful to the... what you're seeing is pretty much what my development process actually was, that does vary from chapter to chapter. There's some material on refactoring and on adding tasks and changing the design. I didn't want readers to sit through a chapter where we build something and then when you enter the next chapter, you go, "Actually, we need to completely redo what we just did because it doesn't quite fit this actual requirement," so as I went on, I did try to do a bit more forward planning and prototyping to see how the design will check out. It is a bit of a lie in terms of if this was something I was really doing at my day job, that would be more rework and you would just be deploying it and changing stuff. But to sort of make the story more manageable for the reader, I did try and do a little bit of forward planning so there wasn't quite so much churn as you get into the later material. NOEL: I think the part of it that is watching the design improve is really compelling and eventually, it turns into some really good examples of using test driven development, which like I said that's a hard thing to do in books and try to find an example that is complicated enough and when you start doing it, which I think it starts with the status command, it's really clear to me why testing is helpful in that case. I really appreciated that piece of it. Would you say that your approach to building a complicated system changed as a result of seeing or trying to build this particular system? JAMES: I suppose I have always had a variety of approaches to how I build systems and it rests very much on how well I think I understand what I'm doing. It's curious that you mentioned TDD. I've written one book previously which is about testing but specifically doesn't use TDD for its narrative because I found it completely impossible to try and show a test to the reader before they've seen the code, especially because they are all these example problems, so they're not things the reader already understands. It's really hard, to go "Obviously, these tests should exist" when they haven't even seen the problem yet or the implementation. I found it narratively much easier to go like "the code will be like this and then let's see how we'll test it". This was a bit easier because as you said earlier, if the reader is already familiar with Git, some of those tests will be, I guess self-evident. They will reflect things the user has already seen and the usage of Git, so it's not so hard to explain why they are there. NOEL: You don't start with testing but you explicitly bring up TDD at the point where you're starting to address edge cases. It actually isn't in the status part but it's where you're starting to address edge cases, you think the implementation of the edge case might break the implementation of the normal path? JAMES: Oh, yeah. It's at the point where I'm adding behavior that I wasn't going to be using constantly. The first few bits, you just work on the code to create commits like, store files and store trees and all the different stuff you can do with that and the add command and the index and because those are commands that you're using all the time to actually commit to the repository, you'll notice very quickly if they're broken, so the value of having a test suite at that point wasn't an immediate need. Whereas once you start getting into parts where there are more edge cases and things you're not going to run into all the time, that's the stuff where you're like, "I'm not going to be automatically testing this all the time as I use it, so I should write down some tests to make sure that stuff does work." It's not so much to do with being scared that that would break the implementation. It's more of like they're tests and use cases that I wasn't going to be hitting in just day-to-day usage. NOEL: I completely get the point but it's very hard to explain TDD on an unfamiliar problem by starting with the tests. I've actually also really struggled with that in the context of writing a book about JavaScript that I also eventually wound up self-publishing a few years ago, where it's being very hard to get the TDD process explained while you're also trying to explain something else, like you're trying to teach, at the time like JavaScript classes and jQuery and TDD at the same time. That wound up being very challenging but in this case, it comes in really naturally. Was there stuff about Ruby that you learned in finding ways to implement Git and Ruby, how much of the implementation on the Ruby side did you wind up having to learn to build up? There were a couple things in the early chapters that I did not know or at least did not know well enough like pack and unpack, which I heard of but never really used. How much of that kind of learning did you do as you were building this? JAMES: A fair bit. I guess I had used pack and unpack before, sort of lightly. I wasn't really familiar with all of their options. One constraint that I have with this is that I didn't want to use any third-party gems because I'm trying to target an audience that is not necessarily already familiar with Ruby, doesn't have it installed, so I just wanted to avoid as much installation pain as possible. NOEL: I should note that if you're not a Ruby developer, the book includes a substantial discussion of how Ruby works as an appendix to let you be able to follow the code. JAMES: Right. So part of what fed into my choice of language is that I don't want a huge amount of on-boarding pain just to get the tooling installed. The fact that Ruby comes installed on MacOS and is a pretty easy install on Linux distributions, it made that an appealing choice. I didn't want the reader to have to sit through, trying to figure out how to use Bundler and install things just because I see people run into problems with that if they're not already familiar with it. I had this constraint of not using any third-party code and that meant that I focused more on the standard library than I think I usually do. Historically, I've usually used third-party tools for parsing command line arguments but I tried using OptionParser for this project and found out that I quite like it. It turns out it's not as complicated to use as I've been led to believe. There's a few syntactic things that I picked up partly because a lot of my Ruby work is doing open source projects that support a fairly wide range of Ruby versions. There's a few syntactic additions that I haven't adopted just because I was constrained in that way. But the stuff like... What's the thing called? It's like the '&.' operator, the safe navigation operator? NOEL: It's called the safe navigation operator or the lonely person operator because Matz think it looks like a person lonely staring off into the distance. JAMES: Yeah. I haven't used that much before and when I saw that that was added, I was somewhat skeptical about it but I ended up finding a lot of places where it really works in this project. I supposed I'm skeptical about it because I didn't like the idea that you should like paper over somewhere, that you have a nil. If you're expecting a nil, you should handle it. NOEL: Right. The '&.' operator in Ruby is the sticky nil -- safe nil operator where if the left hand side of it is nil, then it does not raise an exception when trying to call the method on the right side of the dot. It just continues to pass nil through. It simplifies code somewhat at the cost of potentially hiding bad behavior if nils that are going to pass through the system. JAMES: Yeah. My worry is that you don't want a nil to propagate a long way from where it originates because it can end up causing a problem in some completely remote part of your system and you have no idea where that nil came from and what the root cause is. I think in cases where you are expecting that something will produce nil and you are using the operator quite deliberately, it does end up reading quite nicely for a bunch because I found quite a lot of use cases for it that I wouldn't previously have considered this time. NOEL: One of the things that was interesting to me, I've been programming in Ruby for a really long time but the overwhelming majority of that has been in Rails, which means for the most part, I haven't dealt with things like string pack and string unpack or binary data and for the most part, I haven't dealt with the file system which is of course, a critical part of this book. Was that pretty much your use case too? This book does some pretty clever things with lock files and whatever to prevent overwriting files. Was that stuff that you discovered in the course of the book or was that stuff that you'd worked with before? JAMES: I definitely learned a lot about the POSIX file API by doing this. An observation that I had as well is that, at least in my professional work I mostly do web stuff, I write a bunch of other tooling that deals with files but it's possible to write webapps where you never touch the file system. Anything that is running on more than one box will typically be using a database for its cache and it will probably not be using the disk for its sessions, so webapps typically don't directly talk to the file system and I just realize that's a huge category of programming, where you set up files -- everything is made of files, you deal with them all the time but you actually don't deal with them very much at all in running web software, which meant that I hadn't really learned how to use file API properly. I didn't know about these different options that you have for how a file gets opened and how you use that to prevent race conditions and stuff. I like learning some of that low-level file system API stuff. That was new to me on this project. NOEL: I really liked seeing the little lock file abstraction you build to use '.lock' files to know whether somebody else is using the file to prevent all writes and ensure consistency like that. That's a small enough abstraction that you could pull it out and use it in another project. JAMES: That's something that I just copied wholesale from Git. I looked at what it was doing on the file system and I saw these lock files being created and that's how I knew that you should do that. NOEL: Right. If you've used Git, you had the experience where a lock file hasn't been cleaned up and Git complains. I sort of had an inkling of what that meant but it was interesting to see the thought process of the kinds of things that can go wrong in a Git-like system. Watching you walk through that exercise of we need to make sure that somebody else hasn't written to that file since we started reading it, which is again is somewhat outside my Rails experience but reading you thinking through those kinds of problems, I found really interesting. Was there like an error case that you wound up having to come back to catch because you didn't think of it? JAMES: That happen so much that I can't remember any specific instances of it. That's probably a fair amount of any rework that I ended up doing, either in prototyping or going back and revising what I put in the repository was either I hadn't understood some behavior completely enough, which meant that I'd mis-implemented it or there was a substantial design revision involved that I didn't think would add anything to the narrative, so I go back and redesign how I did something the first time around. NOEL: It's an interesting cautionary note how very, very small features can trigger, especially in this kind of incremental build system that the part of the book that I think when you're dealing with statuses and the ability to denote whether status is a delete or a modify, it triggers a substantial design revision because you just hadn't been storing that information and that process in the book of seeing the very large changes that might potentially be implied by a small feature change, I thought that was really interesting. Is there another example that you were particularly surprised or delighted by? JAMES: Probably more surprised by.. So a common complaint that people have about Git is that the user interface is not... NOEL: Good. JAMES: ...helpful. JAMES: Yeah, good, whatever. I guess that's the origin of people saying, "You need to understand the internals because the user interface is too hard." I somewhat agree to that. I don't think it's possible to use version control effectively without understanding, like definitely how it does merging, because that's sort of the fundamental thing that version control exists to provide is concurrent editing and merging. It sort of frustrates me the amount that people say, "You need to know everything about how it works internally in order to use it." I think my favorite example of where that surfaced was in the checkout command. There's all sorts of ways in which checkout is really complicated because it does a dozen, different, slightly overlapping things that also overlap with the reset command but the one that I found was when you try to check out a new commit, Git will try to preserve any uncommitted changes that you have as long as the files that you've changed don't differ between the commit you are on now and the commit that you're trying to check out. If you just want to switch to a different branch that has some changes on it and you've changed some files that don't intersect with those, then that's fine but if you have changed files that the checkout wants to alter, then Git will detect that and say, "I'm not checking this out because it will overwrite work that you've done." In order to report that, there's two things you have to do. You have to detect that those conditions exist and record them. This comes up in reporting merge conflicts as well, so you have to detect that those conflicts exist and record that they've happened and you also have to report them to the user. You have to print them in the user interface, so I got this implementation of machinery for that working and it produced all the right states in terms of, "I've gone through all these examples with Git and look at what the final state was on disk and I replicated all of that," and I thought that doing the UI would just be like a nice little addition on top of that. It just have to print out what it found. I end up completely redesigning the machinery because there are some cases where, depending on what order you do that detection, it will affect what the UI reports, it will affect exactly how the UI will describe a certain conflict. And so I found that trying to get the UI to match what Git said, that led me to sort of redesign the machinery that it was built on top of and that was a clear case where the complexity of the user interface... if the user interface is a sort of simple additional layer on top of some machinery that you already have, then you know it's not adding any complexity but if it makes you redesign the underlying stuff, you know that UI is introducing complexity that wasn't necessarily there before. I thought what's a really good example of that where UI for this is actually, in some cases is more complicated than the stuff that sits on top of and so, it's no wonder that people find it hard to use because it's introducing complexity that isn't essential to the problem. NOEL: I have often found in other contexts that where user interfaces and logging in things like that or error reporting tends to increase its complexity is because it often dramatically increases the amount of state you need to carry around. I see this in a lot of payment processing kind of stuff because in payment processing, I've found that it's very, very important to carry and preserve your entire state so that you can reproduce it, even if the tax laws have changed or something like that but that introduces a fair amount of complexity because you have side effects that you didn't have before because you're holding on to all the state. I've seen places in the book where that happens too. If you now have this working Git implementation and you made it so that it was essentially completely interoperable with Git as far as that you use the same file format, it produces the same output, but now that you have it, are you interested in trying to change some of the either the internals of the user interface to explore what other patterns might look like or other user interface structures might look like? What kinds of changes would you make if you had the opportunity? JAMES: I would definitely change some parts of the UI so I mentioned the check out and reset commands. I think those are just overloaded with... what they essentially do is quite simple. You have your files and your web directory and you have the files in the index and you have the files in the HEAD commit. Most of what check out and reset do is making those be equal to each other, so you check out a commit, it makes the work space and the index equal to the contents of that commit. If you do a reset, it makes the index equal to the latest commit. If you do a hard reset, it does that plus, making the work space equal to everything else. All they are doing is the sort of combinations of making files in different locations be equal to each other. NOEL: I feel like I want to take the last 30 seconds printed out and staple it to a wall. That's one of the clearest descriptions of what Git actually does. JAMES: It's so hidden behind these two commands that do two dozen different things and they don't reflect either what's going on behind the scenes but I also don't think they mirror the way that someone would put their intent. They don't reflect the use cases for those commands that they just have these arbitrary names and combinations of options and I think people just find them really hard to remember. I find them hard to remember and keep my head straight about what each combination of things does. NOEL: They kind of got build up over time by people who weren't paying a whole lot of attention to the overall. JAMES: Yeah, so I think those need breaking up into several dedicated things that would be a lot clearer and then, you also have a ton of ambiguities that any command that can take both commit names and file names frequently doesn't distinguish between the two syntactically, so when you run Git checkout, you can follow that by the name for a file or the name of a commit either, its ID or a branch name followed by a parent operator or that revision syntax that git supports. There's no indication, like you're not telling Git whether you meant that that was a filename or whether it was the name of a branch or the name of something else and Git has to sort of guess, so there's a lot of cases where if you have multiple things that have the same name, that might not do what you expected and that would be really easy to solve by just putting a named command line like this is a file, this is a commit ID and it became unambiguous and I think it would become easier to use. NOEL: Having done your implementation, what are the biggest gaps remaining between it and standard Get in terms of not just functionality but also in terms of performance or things like that? How big of a gap is there, do you think? JAMES: In functionality, my aim was to, although it did end up being a quite large project and quite a long book, I did want to keep it to sort of essential core. I want to get to a place where the project could push itself to GitHub. That was my bar for it being finished and I didn't want to do a load of chapters that are just like, "Let's just slog through all these options that are just various, uninteresting combinations of things you already understand." What I want in each topic is to actually introduce a new concept. There's a lot of functionality, there's loads of commands that it doesn't have. There are some things -- I think I mentioned this in the wrap up to the book -- you could fairly easily build because they're just combinations of other commands. You can implement the pull command is a combination of fetch and merge. You can simulate most of what rebase does with cherry-pick. The clone command is you can do that by running init and then, adding a remote and then fetching from the remote and then setting up your master branch to mirror that. There's a few commands I haven't done just because you can get the same effect by running a bunch of other things, then there's commands that aren't just combinations of existing other commands but you could fairly easily build them with the machinery that's in the codebase, so the blame command in Git prints out a file and annotates each line with the name of the last commit that changed it and you can fairly easily build that by using rev list to find all the commits that changed the file and then using diff to figure out which lines are attributable to each commit. I've actually prototyped a version of that and it sort of a couple of screens full of code. There's a fair amount of... there's a huge amount of functionality that I haven't done that is plenty of scope for you to have a go at and now, you've got all the building blocks. I think there's just a fair amount of stuff that I decided was out of scope. I haven't done submodules. I haven't done various things to do with patches and filter branch. NOEL: Submodules seems like a nightmare. JAMES: There a things that people complain about constantly and I don't see a lot of people using. I think a lot of things I decided not to do, they're just things I don't see people frequently using or I think they wouldn't really introduce any fundamentally new concepts to the reader. NOEL: Do you have any interest of putting this up as an ongoing project for people to contribute implementations of those things? I could see that there would be value in a Ruby language reference interpretation just because it might be more comprehensible than the C reference implementation but I can also see that would be a huge pain in the neck. Is that something you're interested in doing? JAMES: I'm not especially interested in managing the codebase as an ongoing concern. I don't intend for anybody to use it for production work. It's an educational example. What I have seen a lot more interest in from readers is people who do their own implementations in other languages, which is really satisfying to me because I really don't want to restrict this to a Ruby audience. I'm seeing people follow it and do their own complete new implementations. I think it's really interesting. NOEL: What languages are people trying it in? JAMES: I know people are working on NodeJS, Elixir, Rust, Clojure. Those are the main ones I've heard of so far and I'm currently trying to learn Rust, so I may end up doing my own attempt of that. NOEL: Yeah. That seems like it might be a good one to spring on this. JAMES: Because if I try to learn a new language or a platform, I often reach for a problem I already understand that's just non-trivial enough to be challenging because I don't have the added workload of trying to understand a new concept. I'm just going how would I write this program I already know, in something else. There's a huge amount of scope for this and this is partly inspired by Gary Bernhardt's "From Scratch" videos. I really like the idea of demystifying the thing that you use every day, especially in my background coming from web development. There's this really crummy culture, like this hierarchy of whether you're a 'real programmer' or not, that like if you're a web developer, then you're not hardcore enough to understand stuff written in C or low-level things and there's this whole culture that tells you that you're just not smart enough to understand all these tools you use, that they are sort of magic and you shouldn't worry about it. I like the idea of sweeping that aside and going, "No, you're plenty smart enough to understand any of this stuff, given enough time. None of it is magic." It takes time to learn the stuff but it's not fundamentally beyond your abilities. NOEL: I think what happens is the C implementation of this stuff -- and I learned low-level languages as a student but have never used them professionally. What happens is that there's a ton of incidental complexity that comes from a C implementation of something like this, in memory management and in details that are sort of incidental to actually understanding what it's doing. I think that pulling these kinds of implementations into a tool that is higher level and a little bit easier to understand, taking whatever performance hit you might wind up getting, I think that's a huge advantage. I think that's a great thing. I ran from low-level languages because I just didn't like the high ceremony, lot of incidental complexity stuff. I just couldn't get away from it fast enough. JAMES: Yeah. NOEL: I appreciate being able to see these things in languages that are more approachable, I think to more people. JAMES: I definitely have that experience as I'm trying to read the Git source, especially because part of the received wisdom about people who write C and certain people in that community is like they get positioned as the sort of genius wizard people who can do things you can't possibly understand and are utterly brilliant. I think there's a lot of code in Git that's hard to read because it's sort of the polar opposite of what I have been trained to do in object oriented design where you have these functions that are hundreds of lines long and doing quite a lot of, well it's C, so it's essentially just shunting numbers around and there's no clear labeling of conceptually what is going on to those various bits, the later bits of the book that deal with pack compression. I think something that is like a few functions in the C codebase ended up being seven or 10 classes in my version because you realize this one little loop in the middle of this huge function actually models of a specific concept that is worth naming and pulling out so there's a fair amount of just reading all this data being shunted around and trying to tease distinct concepts out of it. NOEL: I'm a huge fan of trying to deal with code at the conceptual level to make it easier to understand. I was laughing before because I have had the experience of sitting next to a fairly accomplished developer in the C and Ruby world who was trying to explain to me, and a somewhat bemused very, very senior developer, how he was the only good C programmer in the world. JAMES: Yeah. There's so much bad mythology surrounding C that -- NOEL: That's a groan of recognition, yeah. JAMES: Like there's this huge aura about it that you have to be a genius to be able to do it and like, "You don't have to be," but also, the people that claim to be are just as capable of making really bad mistakes as anybody else. NOEL: Right and I think that one of the reasons there's a mystique around having to be a genius in C is because it's so unnecessarily complicated. Unnecessary is not quite the right word but in language like C, there's so much stuff that you need to keep in your head that other languages might, even another system level language like Rust, it makes it somewhat easier to manage in the code and because of that, I used to have this conversation with people who are C fans and they would talk about like they want to have sharp knives and I was like, "I don't mind sharp knives but I prefer them to have handles." Is kind of the way I feel about it. JAMES: Yeah, so I first learned C, maybe three years ago, just shortly before starting this project and my thing that I wrote to try and cement my knowledge is I maintained the web socket driver library for Ruby and I thought I tried porting that to see because it's a thing I understand fairly well and it's binary format parsing sort of Ruby right in C's warehouse. I think the experience that I had doing that was, like, if you're working alone and you have completely uninterrupted time to focus on what you're doing, it's perfectly possible to apply enough discipline to write C reasonably okay and not write any hugely obvious bugs. The thing I realized after letting it sit for a bit is that I would never want to maintain it. There's a certain amount of like you only manage to apply that discipline because you have the whole thing in your head. At the time, it's very hard to compartmentalize stuff and it's very easy to introduce mistakes later when you come back to the codebase. As soon as you add more people or interrupted time or anything that sort of splits the knowledge of the codebase up across people or time, at that point, I start feeling completely confident in my ability to use C safely. NOEL: Yeah, I think that is very similar to my experience. Where can people buy this book and where can people find you online if they want to talk to you about it or talk to you about Git or about C or anything else? JAMES: You can get the book at http://shop.jcoglan.com. I also have my previous book, JavaScript Testing Recipes available. If you want to catch me online, I am @mountain_ghosts on Twitter. It's probably the best place to get ahold of me. NOEL: And if you do find that Twitter handle, the name attached to it is not going to be James Coglan because you never use your own name. JAMES: No. I'm not using my actual name on Twitter for a while. NOEL: Yeah. It was some time before I think I knew your name but I do now. Anyway, thanks for being on the show. The book is extraordinary and I'm really glad that you had the wherewithal to do all of it because I'm not sure I would have and people should go out and take a look at it because it's really interesting. Thanks for being on the show. JAMES: Thank you so much. This is lovely. NOEL: Tech Done Right is on the web at TechDoneRight.io, on Twitter, @Tech_Done_Right and available wherever you get podcasts. The show is a production of Table XI, which is on the web at TableXI.com and on Twitter at @TableXI. This show is hosted by me, Noel Rappin. I'm at @NoelRap on Twitter and it is edited by Mandy Moore, who's at @TheRubyRep on Twitter. If you like the show, please tell a friend, a colleague, a pet, an enemy, your social network, your boss, my boss, me, all of those people telling any of those people would be extremely helpful and a review on Apple Podcasts helps people find the show. Table XI is a UX design and software development company in Chicago, with a 15-year history of building websites, mobile applications and custom digital experiences for everyone from startups to story brands. Find us at TableXI.com where you can learn more about working with us or working for us and we'll be back in a couple of weeks with the next episode of Tech Done Right.