Frank McSherry: When you were in the academic space, the communication you try to put together is very much about how clever your work is and how you've seen a thing that other people can't see. As far as I can tell, no one in the real world actually wants that. They want to get stuff done, and you waxing philosophical about the nature of computation is absolutely a waste of their time. Eric Anderson: This is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Eric Anderson: Well, we're live today with Frank McSherry, who is a data processing expert and someone I've followed for several years. I'm excited to have some time together, Frank. Welcome to the show. Frank McSherry: Oh, thank you. That's very nice of you to say. Eric Anderson: Frank, I understand that you were either the creator, or at least critically involved in the creation of timely and differential dataflow, and then more recently Materialize. Anything else I should be adding to the resume and agenda for us to discuss today? Frank McSherry: I think in terms of open source software, or just generally available software, that those are the main ones. The other thing that gets tacked on to me is differential privacy, but that's sort of tens of years back or so. But yeah, those three are great to talk about, yeah. Eric Anderson: Well maybe you could start us off. We'll get into you and yourself, I imagine, as part of the story behind how these projects were created. But maybe first you could just give us a primer on what differential dataflow and the others are, because I think they are related to a degree. Frank McSherry: Yeah, sure. Of course. So there are three projects that sort of form a little bit of a software stack, if you like. At the lowest level there's this thing called timely dataflow, and this drives very strongly from work that we did at Microsoft Research many years ago, and I'll try to hit on that, in this project called Naiad. And that's the mechanics of dataflow from our point of view. How do you move data around? How do you schedule things? not super opinionated about what should you be doing when you move the data around or around tasks. But that's one of the layers of the stack. Frank McSherry: You go up a little bit and you get to differential dataflow, which starts to have a bit more opinions, that maybe what you should be doing is maintaining incrementally computations that people have expressed. So you sort of trick people into expressing computations as if on static data. This is a cute trick that the link people with C#... That's the first place that I saw it. But this is like this fluent style of programming where you trick people who are familiar with, let's say object oriented programming, they would say like, mydatacollection.join.reduce.map, and you trick them essentially into expressing declarative programs. And once you've elicited that from them, description of what they want to happen, but not necessarily how it has to happen, you can put together a system like differential data flow that says, great. I'll make that happen for you, but also behind the scenes, I'll continually update the results of your computation as the input data change. So it's a bit more opinionated about, what am I going to do with my data flow resources with that compute framework? Frank McSherry: And then Materialize sort of lives on top of all of this and says, that's all well and good, differential dataflow is still a Rust library, so you're welcome to go and use this if you like Rust. But the large majority of the people out there very much more comfortable with something like SQL or even graphical tools that compile down to SQL queries. So sort of opening up the accessibility aperture of the project to give people access to cool sort of incrementally maintained computations using languages and tools that are much more familiar to them, things like SQL or other sorts of query builders. Frank McSherry: Those are the three, in terms of vocabulary and things that are exciting to talk about, those are the three sorts of things. And I guess they are layering in terms of complexity and value added, I suppose. Eric Anderson: You've nicely put them in kind of this low level to higher level abstraction stack for us. And now let's go back. And what led you ... I guess, did they come in this order? Was it timely and then differential and then of course, more recently, Materialize. Frank McSherry: So Materialize is definitely the most recent one. The timeline differential stuff is a little bit of an interesting story. It definitely started much longer ago. So this is back in, I would say maybe 2010-ish, maybe 2011. I forget the exact time, but there was a bit of history, I suppose. Frank McSherry: There was a really cool thing that was going on at Microsoft Research's Silicon Valley lab. This is a place where things like DryadLINQ got created, and this sort of is what Spark looked like four years before Spark got published. It was really cool. Just write some high-level code and it'll spin up a few hundred machines for you. And that project sort of got a bunch of energy there about people working with big data and in particular trying to elicit some structure about these programs from people, so you don't force them to wire up things manually. Frank McSherry: And this led to many things, but for me at least, it led to this project called Naiad that was going to try to take those works and add a few new fun bells and whistles. We were particularly looking at iteration loops and stuff like that as a cool thing that you might want to be able to do. And that sort of led to streaming pretty quickly because when you go around loops you don't want to redo everything from scratch. You want to sort of incrementally update stuff. Frank McSherry: And what we had at first was this pile of code that did everything that we thought it needed to and was pretty inscrutable. And most of the folks who looked at it thought it was pretty inscrutable. And we were lucky to have several really good people working on it. One of them, my take at least, is that one of them, Derek Murray, sort of sliced through this tangle pretty elegantly and separated what we had into two parts. One was sort of the mechanics underneath everything. This is timely dataflow. And another part was the ... I was thinking it was the application logic, but the bits that were doing joins and reduces and stuff like that up above that. Frank McSherry: And that separation worked really well from a separation of concerns, so that folks could work on the lower level system thing and just figure out how do we move bytes around quickly? How do we schedule things well? And other folks could ... You'd put on a different hat basically, and work on the level above that, which was, hmm. Given that I have these nice mechanics below me, how do I think about putting together some higher level programming on top of that? Frank McSherry: And that was sort of the moment where, from my take at least, where these things separated into two parts that made a lot of sense to keep separate because you can work really hard on one of them for a month and then take a bit of a break and work on another thing. They didn't have to be deeply coupled. It was a really nice separation. I'm glad that Derek had that perceptive separation of the two. Frank McSherry: And from that point on, the Naiad thing at Microsoft, we eventually open-sourced that. And then due to some exciting stories, which we can get into, the lab sort of went away and things got a bit of a reboot. Eric Anderson: And now Frank, you were employed with Microsoft at the time, working in the research lab, this was your job, is that right? Frank McSherry: Yeah. We were all professional industrial researchers, which was a super flush sort of set up to have. You're getting a solid, reliable salary to sort of work on the things you think are best. You don't have to fight with getting customers or anything like that. So it was definitely a great opportunity to really take some time and think. When people ask, to the extent that this worked out relatively well, why was it? I think the time at Microsoft and sort of having that amount of oxygen to sort of sit around and think and make sure you got things right, that was a real, real important part of it. Eric Anderson: Maybe just on that, we hear stories about Xerox Labs and Google's had all kinds of little innovation efforts and you wonder if these things work out in the long run. I don't know the story behind Microsoft's work, but was this kind of an apex of, there was a lot of Microsoft research going on and it sounds like some of that wound down? Frank McSherry: Microsoft Research has traditionally been an excellent place for people to go and do top tier research. And possibly, actually maybe the most interesting thing when the lab got wound down in 2014 was, up until that point, the Microsoft Research community was pretty much invulnerable. It was a great place to go for total job security. And it changed a little bit after the perception of like, oh geez. To the extent that you want a job there for the rest of your lives, you should make sure you're going to have to work for it. Which is good. I mean, I think different people have different takes on whether it's a good thing or a bad thing. Frank McSherry: I personally have been doing some, what I think is, really great stuff since being kicked out of the nest. At some level, actually having to go and interact with other people outside of your filter bubble, challenge yourself to do some other things, I think this is great. I feel a little bad proposing that other people should think it's great too, because it was a bit more disorienting for some other folks. But I've had a great time since then. Eric Anderson: Good. So the research work at Microsoft wound down and you kind of ventured off, but continued those same efforts. Frank McSherry: Yeah. So this is where the story gets a little less traditional, maybe. I don't know. The lab vanished and several people were trying to figure out what to do. A bunch of folks went to Google because Google is just right down the street and that was super easy. I hadn't taken a vacation in a while and I was like, oh, I should just go take vacation for a little bit. But we all had some time where we're still technically employed and we're sort of thinking about, oh, what are we going to do next? And hanging around in this case, San Francisco. Frank McSherry: And I was thinking how it'd be neat to learn some new skills. I was pretty fluent with C# at the time, but I wasn't sure what I was going to do with that. And proposed to some colleagues like, oh yeah, well, this is great. There are all these new programming languages out there. I'm going to pick up Go, and I'm going to rewrite timely data flow in Go. And my work colleagues, if you're familiar with the folks at MSR Silicon Valley, you know they're not uniformly supportive. So some of them were like, oh, that's a terrible idea. That's not going to work out very well at all. So I switched to Rust and that worked out really well. Not because of any deep insight on my part, just good luck mostly. Eric Anderson: I looked at the contributions to timely dataflow, and it's mostly you. Was this kind of a solo project? Frank McSherry: Yeah, that's a good point. So this is, in terms of what type of open-source project is it, the main thing that was going on here was, I thought it would be fun to take a second swing at things to make things better that I thought we could have done differently after the first time. But the goal wasn't necessarily to build a massive opensource product to get lots of contributors, but just sort of do this in the open. I'm sort of used to giving presentations, writing papers, and trying to communicate what's going on. And so it was very much about more of a sharing thing than a soliciting contributions. It would be totally fine to take other people's contributions, but it's a bit of a weird enough project that you don't just sort of walk in and say like, oh, here I'll fix this exotic piece of logic that's sort of very carefully arranged. Frank McSherry: It was very much more about sharing and showing people what's going on, keeping myself honest. That was another ... I really enjoyed that. Sort of you put yourself out there a little bit. And I had some confidence because I had done some of this before, but you put yourself out there and if you have some massive defect in some particular use case or something like that, people will call you on it. You're like, yeah, you're right. I'm doing a bad job. Let me go and fix that. And I had the time to do that. I didn't have anyone I was trying to impress. So for me at least, it felt really good. It was a really good sense of motivation. It works for me. I can imagine it doesn't work for everyone. Eric Anderson: No, totally. And that's how we get timely. Frank McSherry: Timely and differential sort of started to build up at this case. There was a little bit of a reboot of timely, which is the foundation for differential, so I started to work on that as well. Frank McSherry: But around the same time basically, I realized I'm in San Francisco, not with any particular employment, thinking about taking some vacation. So at this point basically, I gave up my apartment in San Francisco, hopped on an airplane, ended up spending the next five years or so outside the United States in various levels of civilization. Frank McSherry: First place I went was Morocco and got 20 hours of consecutive sleep there. That was super nice. And just did a bunch of work on Moroccan rooftops, typing stuff in and building data flow systems there. Yeah. I mean, you know, it sounds pretty ... I needed to go somewhere. I still had some professional obligations to academic conferences that were in Europe and I needed to be near Europe, but not in Europe because the Schengen visa wouldn't let me stick around for as long as I needed to be there. So, Morocco was close. Did some surfing, wrote some dataflows. I mean, it sounds like a pretty sweet life. Eric Anderson: Yeah, the weather's nice, so rooftops, right? Frank McSherry: Yeah. Absolutely. No, I recommend it to everyone. It's a pretty friendly place and it doesn't break the bank. It was absolutely a rent reduction coming out of San Francisco just to a different part of the world, seeing different people. Super pleasant. Frank McSherry: And then after that, a bit of time in Berlin, few other places around Europe, just sort of working on rebuilding some of these fundamentals and learning about Rust. And there's a lot of, I don't know, a lot of learning. It felt very creative and fun. I liked that a lot. Frank McSherry: So the timing on this was, let's say a few months in Morocco, a few months in Berlin, a few months in the UK. Just sort of few months here and there in part, because staying in any one place requires some paperwork if you stay there long enough and that seemed exhausting. Frank McSherry: What I did do, which was super nice, was drop-in twice in Switzerland at the university in Zurich there and sort of worked with them and worked with several of their students on timely data flow, differential data flow related projects. Frank McSherry: They had an application that they were trying to build around data center modeling. So taking exhaust essentially from data centers and trying to understand what's going on in the data center. So like a monitoring plane for data centers that they have tried to build with Spark and it didn't work out particularly well. And they tried to build it Naiad, with our old C# code base and that worked except running Naiad on Linux with C#, that didn't work out great. And I was advertising that I had infrastructure that could do the right thing and so I tried to actually test that out and see if that was legit or not. Frank McSherry: So I'm around in Switzerland a little bit, got some grad students they're engaged working with the code base, adding on some features and generally again, testing it, seeing if it's actually fit for purpose, and learning again, quite a lot, absolutely. The whole process is very iterative. It's not, you know what you're going to do ahead of time. You know what you're going to do for the next month or so, and at the end of that someone points out that you're doing a really bad job at something, so let's figure out how to make that better. Again, this is what works great for me, is repeatedly seeing mistakes that you've made and having to think about like, oh yeah, I do know how to fix that. Let's get to work on that. Frank McSherry: But yeah, in and out of sort of Switzerland and bits of Europe for a little while. Just to be honest, being very self-indulgent and stuff like that. This was the vacations that I hadn't taken throughout 12 years at Microsoft. Eric Anderson: Fantastic. And that brings us to kind of the culmination of both timely and differential. They're more or less baked at that point. Frank McSherry: Yeah. They've been pretty stable. So Materialize, for example, has been going on for, I want to say like a year and a half plus now. And timely and differential had been pretty well baked up to that point. I mean, we've been fixing them here and there and I'm sure there will need to be more work in the future, but from about 2019 going forward, a lot of the focus shifted at that point. Frank McSherry: Arjun Narayan, the co-founder at Materialize, convinced me, sold me on the idea, which is totally reasonable, that it's a lot of fun to do things on your own, for sure, but if you actually want to see if this project has legs, if it's going to get anywhere, there's going to be lots of stuff to do that you're not going to want to do. They're going to need to write adapters for various bits of infrastructure. People are going to need to write documentation. There's going to need to be lots of this sort of thing. And the right mechanism for that is basically to create a company. Put together some entity that can pay people, provide stability for them and put together something that's more useful than just a bunch of tech demos, which in some sense is sort of what differential and timely were. They worked really well if you carefully guided them. But you want to actually make it be something that people can use without training, with the usability of something like a database. Frank McSherry: And that made sense. And sort of starting 2019, putting that together and getting a team together and moved to New York in 2019. Eric Anderson: And did you have motivating use cases along the way? Was there kind of a problem to be solved at Microsoft that continued to kind of inspire your design and what you wanted to accomplish with timely? And/or, were people picking up timely and differential and doing things with it that then kind of guided what you hoped to accomplish with it? Frank McSherry: Yeah. Good question. I think in the early days at Microsoft, for me at least, it was very inwardly focused. There was some things that we were pretty sure we could do that we didn't think other people were going to be particularly good at doing. And we wanted to do those things. No one was actually asking us for solutions to these particular problems, if I'm remembering correctly. But in the academic space, there's a lot of sort of back and forth where people ... The game is that you're there to identify some core insights that other people haven't seen, and if you realize like, oh, I bet you could actually do this, you try to think of a way to put that together in a way to present it outwards to other folks and show off, look, we can actually do a thing now. Here's the way to think about it. Here's the secrets. Frank McSherry: So for a while it was, for me at least, sort of that, here are some hard things that we can do. Other people seem to struggle. Let's try to put this all together in a nice package that makes it clear how computer scientists as a class should go ahead and do this, as opposed to specific users who have specific problems. Frank McSherry: For a while there was a big ... In 2014, as the lab shut down, there was a bit of a change in tone, which I enjoy, but maybe not everyone does, where we no longer had a PR department that we were worried about offending and started writing some content. I don't know if you're familiar with this stuff, but there's this cost paper about how essentially laptops can compete with big data processing technology. The laptop isn't going to win on sort of bulk ETL or something like that, but if you need to write a graph algorithm or something like this, then being smart is potentially orders of magnitude better than being wealthy, let's say, or whatever you call it when you just turn on a thousand AWS instances. Frank McSherry: So there's a little bit of something to prove, I think, from that point going forward, that by writing an expressive program, one that reveals its structure to the underlying runtime, you can do a lot better than just brute forcing things. And that was sort of a bee in my bonnet for a while, was can we figure out how to make this framework, these programming frameworks, capture what it is about your program in each of various cases that should actually make it easy to run across multiple machines? Can we remove blockers to concurrency that are currently causing people to run one thing then another after it, then another after it, and maybe do all those at the same time? Frank McSherry: So this is a bit of like philosophical positioning, a bit of religion on my part, that we should be able to do this. Personally at least, I'm not going to rest until I figure out a right way to either communicate that we can do it or figure out why it's hard, or ... Very self-indulgent stuff to be totally honest. This was about me learning and trying to make a record of it as I went. Frank McSherry: But definitely, it was not external people saying, it would really help if we could have SQL right now, or anything like that, though with Materialize this has, of course, changed dramatically. Big change for me personally. But very much about what are the things that are hard for normal people and how do you wrap things up with a bow so that they don't have to with all of your cleverness? Because although that's fun to talk about, no one really cares, right? They just want to get stuff done and you waxing philosophical about the nature of computation is absolutely a waste of their time. Eric Anderson: And to what extent has your work on timely differential kind of fit those needs of folks that you run into now with Materialize? Has it kind of been like, look, my research happens to work in the real world? Or have you found need to kind of tweak and adjust things from there? Frank McSherry: We definitely need to tweak and adjust things. I mean, some things have been really great, like personal favorites list. And you're getting a massively biased view, by the way, of this. But personal favorites have been, so correctness was really built in early days to differential and timely. And that's meant that we very rarely so far at Materialize have had questions about like, well, is the underlying execution engine really working correctly? We're seeing glitchy results. Whose fault is it? And it's been really nice that the early commitment to correctness, at the possible expense of some complexity, like you need to know a little bit more, has made our lives a lot easier from a debugging point of view, from a performance point of view. The system does what we expect because we had that take early on. Frank McSherry: There are probably a few questionable things. The system is definitely meant to run and go as fast as it can without too many interruptions. And that's a bit awkward when someone shows up and says, hi, I'm code that might throw exceptions. Like you have in SQL, if you can divide by zero, what do you do? And timely differential will say, well, if you divide by zero that's your fault. You shouldn't have done it. And of course the reality is that's not an acceptable answer. You can't just take down the database thing because someone put in some glitchy data. Frank McSherry: So this sort of thing we've had to work around a little bit. That's fine. But it didn't come for free. At that point we had to, this is all [inaudible 00:19:34] Materialize, but think about how do we want to deal with the fact that people might write code that on a normal system would cause this execution to stop. We have to guard against that somehow. Did some hardening, basically, around that, which is healthy. Like you see, what are the actual bumbles that a computation undergoes when you're not running it on ... If it's not a race car that you're sort of pointing straight down the track, trying to get to as fast as it can go, but instead driving around the town, bumping into shopping carts, stuff like that. Eric Anderson: Now, Materialize, again, peeking at the contributor list, looks like there's a lot more faces contributing. And it sounds like it's a more practical project, which it feels a little different than you skipping around Europe, talking to people at universities. What's that transition been like? Do people share your vision, or is it kind of hard to kind of hand off parts of the project to others? Frank McSherry: That's a great question. Yeah. And I think I can answer that in parts, but there's sort of this broader question of ... How easy is it to hand off stuff is a little tricky. I would say that a lot of the folks ... So a lot of the contributors I think are from Materialize. It's not a [groundswell 00:20:42] of individual contributors. We're delighted for that. But again, it's one of these ... It's a sufficiently technical thing that's a bit tricky to just drop in and help out. Frank McSherry: But what's been really nice, I really enjoyed about the company is that there's some things I'm good at and I'm pretty happy to go and do. And there's other things I'm just terrible at and have no passion for. And it's been great to work with people who are really good at these things that I'm not good at and that I don't enjoy doing. Frank McSherry: So we get a lot of contributions. For example, SQL parsing and the intricacies of the 2000 page SQL spec ... I don't know, they wash over me. I'm not passionate about this. But we have people who are like, no, no, this is important. Here's exactly what Postgres says about how it's going to work. And so we're going to do exactly that. It's going to make our life slightly difficult, but don't worry, I'll handle it. And that's great. I absolutely love that. Frank McSherry: So reaching out, like I said, it's been a ... I don't know. I feel like you don't necessarily have to have a company started up to do this, but definitely the ability to interact with people who have different skill sets, are good at what they do, and really amplify the work that you've done, has been a delight. I mean, I'm super positive about that. And to the extent that you can bottle that up and sell that experience, I think that's great. Eric Anderson: Yeah. As you're interviewing people, when you find someone you share your passion with, you're like, no, sorry. And then when you find the boring people you're higher because - Frank McSherry: Well, yeah. Eric Anderson: No, I'm just kidding. Frank McSherry: I failed to answer one part of the question, which I should have. Which I think is, are there people who share the passion? I think most of the folks who are interested in Materialize, the folks that we've brought on board, share the passion. There's a little of a selection bias going on here, but a lot of the folks are sort of in with the philosophy of like, hmm, it would make sense to build a data processor that can respond to changes in its inputs rather than rewriting things from scratch. So I think folks come on board thinking, this actually makes sense. I think this is a sane product. We should build this thing. And a good hunk of the underlying timeline differential dataflow philosophy is tied up in that. And so we're not in conflict about the right way to build a database or anything like that. And a lot of these people could totally [inaudible 00:22:42] with a bit of ... I don't know. Getting familiarity with timely differential dataflow probably for them doesn't require a big philosophical change. They're sort of on board with how that works. There's some crazy code. Rust is wonderful, but you can definitely write some complicated looking stuff that it takes a little while to unpack. Eric Anderson: In some ways, people are seeing your ideas for the first time. What am I trying to say? These ideas were mostly around researchers and now you're trying to shed a light for the common man, for the rest of the world, to view data processing in a different way. How has that been? And do you find yourself having to tell the story a little bit differently or communicate the value? Frank McSherry: For sure. I mean, back in the academic space ... I do a lot less academic stuff now than I used to, but when you're in the academic space, the communication you try to put together is very much about how clever your work is and how you've seen a thing that other people can't see. And as far as I can tell, no one in the real world actually wants that. It'd be great if the thing that you did actually was invisible. If Materialize looked exactly like Postgres, it just went faster, that'd be great. And you didn't have to read 50 pages of blog posts about how clever anyone was. Frank McSherry: So there's bit of this trying to figure out what's the right way to take the stuff that is most useful to these people and make that just seamless, and maybe walk away from some of the more complicated stuff that you could have a long discussion with someone about how they could change their thinking to better fit with your setup, and instead either meet them halfway or meet them 80% of the way. And it's definitely a different way of thinking. It's pretty rewarding. I don't know. Frank McSherry: When I left Microsoft I did a whole bunch of blog write-y stuff. And one of the things I really liked about that was that the type of communication that you take, the type of approach you take to communicating, I suppose, was a lot less adversarial. It was less about writing defensive texts that assumed that the person you were talking with was going to challenge you, and made it just a bit more friendly and engaging and fun. Sort of bringing other people into interesting ideas. Frank McSherry: And you can make things a bit simpler in that case, but the tone also just changes. It's more like, wow, there's a cool thing that you can do neat stuff with now. That's great. I'm not going to tell you that you need to fire half your staff or something like that. You don't need to install our competition's stuff. No, it's just, there's some new cool stuff that you can do and I'd like to show you the new neat stuff, and we can all get excited about it. And then you can decide for yourself what the next step should be in terms of, do you want to use this stuff or not? But let's at least understand that there's some great new opportunities for us. And we're in a privileged position, of course, of being responsible for that. If we didn't have anything new and interesting that would be not such a good position to take in terms of presentation. Eric Anderson: Yeah. Let's talk just a minute about kind of the open source aspect. It sounds like at the beginning with timely differential, open-source was kind of just your way of sharing, it was academic open-source. It was mostly your work. Share it with the world so that you can collaborate and kind of talk to others who were interested in the same topics. And now with Materialize, the open source is kind of more commercial in nature. This is your way of getting early adopters and there's licensing differences that go with that. Frank McSherry: Absolutely. So just to be clear, Materialize is source available rather than strictly open source. It's got a BSL license, which converts to open-source after a few years. Our goal very much is to try to respect a bunch of the nice features of open-source, though we absolutely want to stay away from calling it that because some people are very sensitive. Frank McSherry: But for example, a thing that I feel pretty strongly about is that our employees are our people first and employees second. And so the work that they do, the company should be accessible to them in the future. If at some point, let's say we do really well, and they're able to just wander off and get new jobs or something like that, or not work at all, they should absolutely be able to have access to all this stuff that they did. Show people that, walk them through their hard problems, their easy stuff, the stuff they're most proud of. I think that's, for me at least, that's a super important part of the human experience in all of this. Frank McSherry: There's other dimensions too. For sure, it's a different way to try to engage potential customers who are potentially worried that you're doing a weird, unique thing. What happens if you go away or something like that. So it's a bit of insurance. At least one person who is looking at timely and differential and Materialize had the very specific anxiety that there isn't a backup plan. If something breaks, like if I get hit by a bus or something like that, it's not like there is a different version of Materialize that's only half as fast. It's just such a qualitatively different experience than if you tried to port your material as analysis to Spark. It's not like instead of taking milliseconds, it takes tens of milliseconds. It takes minutes and more. So there's that sort of anxiety we're trying to relieve a little bit by putting it out there and saying like, worst case scenario, you just keep using this stuff right now and it literally becomes opensource after a few years. Eric Anderson: And you've got a big life insurance policy or something along those lines. Just kidding. Frank McSherry: Yeah. Well, that's another ... I'm working really hard at the company right now actually to de-risk that aspect of it. Not that I'm worried about getting hit by buses. Maybe I should be. Don't tell the competition. No, but it's very helpful just from a professional sense to have lots of people who can do the things that you do so that you can go on vacation, for example, or do things like that. Eric Anderson: There's a big community around data processing, what was Hadoop and now Spark, that are being introduced. Practitioners who maybe aren't academics are being introduced to Materialize for the first time, and some of those people may be listening to this show. How can they get involved in the project? How can they learn more? What would you tell new people? Frank McSherry: Yeah, well, there's a repo up on GitHub. That's a great place to drop in. There were some issues tagged as sort of good first issues in there, though to be totally candid, if you'd like to participate, it's probably good to socialize that first, to sort of check in and say ... I think this is true of all open source projects to be clear. Rather than showing up with a 500 line commit or something like that, saying please merge, check in and say like, I'm thinking of taking a stab at this particular issue. Does this seem appropriate? Are you planning on pivoting away from that? Frank McSherry: But yeah, for example, there's definitely a bunch of stuff that we could use help with just related to different ways of getting data in and out of Materialize for example. So there's different formats. Some people are pretty passionate about like, let's say Apache Arrow, for example. It's pretty exciting. There's a lot of energy behind that. It's not a thing that we have native support for at the moment. So if someone wanted to show up and say like, oh, I've got it. Don't worry. Totally reasonable to talk through, how do we engage with people who want to do that sort of stuff? Frank McSherry: There's a Slack. There's a Materialize community Slack that you can drop in on. And I think that's a great place, personally. If I'm going to recommend a person to show up and try to get a sense for what's different, what are people using things for, what are the directions that some of these things might go? You can get there, if you go to materialize.io, there's a little clicky link up at the top that takes you to the community Slack. Frank McSherry: But to be honest, I think ... I mean, this sounds great if people want to drop in and try to contribute to core Materialize. A thing that I think seems really interesting to me, and that I would probably encourage even more, is building stuff with Materialize. So the experience of using Materialize, I think is pretty new. And I think that first of all, it would be great for us, but I think also really stimulating for a lot of different people with different backgrounds. Just thinking about what would you do with something like this? So even just putting together a new application that uses the Materialize as the backend ... Okay, it's not core Materialize that you're working with, but it's sort of getting a sense for, is this appropriate for a certain class of application? You could try to do some neat model serving with machine learning, or if you could do some new interactive session stuff that you can write in SQL instead of custom microservices. This strikes me as a cool thing if you're trying to think about, I want to go and build a thing or work on a thing, we're more than happy to sort of link up with these folks and give advice on how to use Materialize and show off a few cool patterns. Eric Anderson: Along those lines, are there classes of things ... Presumably there's common areas where people are putting Materialize to work that might strike a chord with listeners. What might I be working on that I should say, I should see if Materialize can help me with that? Frank McSherry: The rule of thumb that I use, which is a bit flip, but you might as well try out, is if you're doing any sort of data processing, in particular SQL looking things or sort of Spark looking things. If you ask yourself, why am I waiting for my data? What would I do if I got the results immediately? Would I build a different application? How could I engage better with the people using whatever I'm building? This is sort of the first question. If you think like, oh yeah, wait, hold on. If those results were always current and always fresh, if I wasn't rerunning a thing, I could give people this different experience here. And there's some cool examples of that. Some of them are ... It depends, I guess, what you get excited about. Frank McSherry: Like some people get excited about ad tech, for example. I don't personally get really excited about ad tech, but some people really do and like, wow, cool. Obviously there's, again, some benefits for folks there. Frank McSherry: There's some really neat things from my point of view. This is maybe a bit dorky, but traffic analysis, collecting data from where people are in cities and moving around. Sort of a bit of a fan of maps and stuff like that. And just thinking through, how can you collect and collate a bunch of information that previously ... I think at the moment you get New York turnstile information bucketed by every four hours or something like that. And if you had that at a much lower latency you could actually see where people were flowing around in the city. Not obviously tracing anybody, really just counting the number of people walking through turnstiles in New York City and sort of checking out heat maps of where do people seem to be going into the system and coming out of the system at any point. Just, I don't know. Frank McSherry: I'm not sure what you would do with that necessarily, but I think that's fine. I think a lot of projects that I've put together also, like, I'm not sure what this is for. Like Sudoku and differential data flow. No one's ever going to need to use this for professional Sudoku solving, but it's sort of fun and you're like, oh. Cool. I learned a new thing, developed a new skill, had to face some problems, but got around them. Eric Anderson: This is something I wrestled with. As you know, I worked on some similar products at Google, and telling people why they should care about real-time was always a little tricky because I think we're used to just living in a world of historical ... Like there's the historical system and then there's the kind of operating system. And I think what's compelling here is that you're right. You can rethink use cases and approaches because your operating system's just up to date and you can consume it without further analysis. Frank McSherry: One take on this, for what it's worth is, I mean, it was a hard conversation to have five or 10 years ago. I think where someone said, fetch stuff works for me now, why should I lose sleep over stream processing, low-latency, everything like that? And I think a thing that has changed, or we're certainly working on changing, is reducing the amount of sleep you need to lose. So if you can just show up and type some SQL queries and get exactly what you expect out of the system, or in different settings, type LINQ queries or whatever. Frank McSherry: But if we're reducing the pain for, what do you need to do to actually get low latency interactive data analysis, and sort of pivoting it around to saying like, why are you waiting? Why not use low latency real-time stuff? I think of all the potential you have. Isn't it weird that you have to wait 20 minutes to see the action that the user just took? Shouldn't you be able to measure that out immediately? So hopefully with things becoming easier, I guess. It's not necessarily that real-time, maybe it has gotten more important, but there's less of a reason to not try it out and think of the new things that you can do. Eric Anderson: As we wrap up here today, Frank, I'm curious where you go from here. You've kind of been noodling on this idea in some form or another, sounds like for a decade or something. More of the same and kind of sink your teeth into Materialize for the future? Frank McSherry: For the foreseeable future there's Materialize. And Materialize is definitely interesting. To be totally clear, the day-to-day there is not me sitting at a computer and typing furiously or anything like that. It's a lot more of learning lots of new things. A lot of them are about being part of a company, being part of a business where one of the things I need to do is go and figure out how to communicate what's cool about Materialize and stuff. And that's, as you can tell, this is an iterative process, learning about that too. But, how to project manage various parts of what's going on inside Materialize that is close to timely differential data flow is totally new to me. No management training at all in the past, and so trying to both pick that up and get some input on that. Frank McSherry: So there's a lot of stuff. I don't know. I'm learning a bunch of stuff right now. And almost certainly since it's still early days for Materialize, for the foreseeable future, doing whatever needs to happen to get Materialize up and moving as best we can. Frank McSherry: It's a lot of fun. The different hats fit differently and you feel a little bad just doing the same thing over and over again. Sort of go into different research area just to start over. But I'm hoping some Materialize stuff then maybe a little bit of holiday. Eric Anderson: Yeah. In some ways this is the academic stream. You've spent all this time developing new ideas. And for many people that means they publish and move on, and now you've got to kind of see it through in the final life cycle of make it a thing that everyone uses and benefits from. And that's, I think, super exciting. Frank McSherry: I think it's really cool. I've been certainly very, very lucky in a sense to have a thing that's worked out relatively well. It certainly was born out of collaboration with a lot of other people, so I don't get to take full credit for it, but it's really nice to be in a space where it turns out to be useful. And as a consequence, you can spend a fair bit more time working with it and leaning on it, as opposed to sort of scrambling around trying to find the next thing. So very lucky in that regard. Eric Anderson: Well, we'll all be watching very closely and we'll make sure to kick the tires. Thanks, Frank, for joining us. Frank McSherry: Thank you very much for having me. I appreciate the time Eric Anderson: You can find today's show notes and past episodes at contributor.fyi. Until next time, I'm Eric Anderson and this has been Contributor.