Paul: Hi there, welcome to PodRocket. I’m your host, Paul, and with me today is Matt Robenolt. Matt works on the infrastructure team at PlanetScale and you may have recently seen Matt’s blog post titled Faster MySQL with HTTP/3. Matt shared the blog post on Twitter and wrote “I simply asked the question: could we do better? And turns out, yes we can!”. In our episode today we are asking Matt, Can HTTP be faster than the MySQL protocol? Matt welcome to the podcast! Matt Robenolt: Yeah. Thanks for having me. Paul: So, just to kick things off, what is PlanetScale, for folks listening who don't know? In 30 seconds, the shortest version. Matt Robenolt: Yeah. The shortest version. PlanetScale is a hosted, managed database. You are an engineer, you need to put your data somewhere. You don't want to run a database. So, you just go to us, sign up, and we give you the database for you to use. Paul: Tell us how the idea for this blog post came to be? Matt Robenolt: HTTP as an API to a database has been a little bit interesting to me as an infrastructure engineer. And this is definitely from the PlanetScale side of things, not so much as a user. Scaling MySQL connections has been a challenge for us. MySQL connections are very brutal. For example, if your application's opening up MySQL connections to us, these connections, typically you as a user have to manage connection pools and things like that, to where I need 50 connections going. Or you need a PgBouncer, like you mentioned earlier. And that kind of problem just exists because these connections are stateful, they're long held, and they're expensive on the server. So, as an infrastructure provider or as a database provider, the challenges with that is, we have to be able to support infinitely-held long connections. We don't control clients. And in reality, clients never disconnect, which is just, it is what it is. A lot of drivers just open up a connection, and it is open until their process dies, which might be days later. As a provider, who needs to scale, and go through deploying infrastructure, having server outages in HA, and stuff like that, long-held connections are very difficult for us, just fundamentally. We do need to sever connections. So, that started the idea of, what if we could do something else? And HTTP kind of popped up as, what if this was actually kind of a first class citizen? And a lot of things built on top of HTTP kind of solve those problems because HTTP isn't stateful, it's stateless. And a lot of technology is built on top of HTTP these days, that is very fast, and efficient. The common one would be gRPC. So, gRPC is just used everywhere, and that's just built on top of HTTP2. But we were just like, well why? What if that just became our actual interface or a potential interface for talking with PlanetScale? So, it kind of came out of a need for us, and just an experiment of what if. But once we started teasing the idea a little bit more, other ideas came out of it, in that we need to be able to support other features, or things that we wanted to build on that were PlanetScale specific that did not fit the MySQL protocol anyways. So, one of those, actually two of the product features, one of them being PlanetScale Connect, which is our ELT, which is pretty similar to ETL. But I think people are a little bit more familiar with ETL, that works over gRPC, which is HTTP. And that's just kind of an augmentation. It doesn't exist into MySQL protocol. It's PlanetScale specific, and it works with gRPC. So fundamentally, we needed to be able to support gRPC to be able to add on new features. Which just snowballed into, well, what if we didn't just support gRPC? But the exciting part about it, and what led into my experimentations, was a lot of people have a gut reaction with these kind of changes. You think talking to MySQL over their very refined MySQL protocol, that has been battle tested for a long time, is as good as it's going to get. And in reality, that's not necessarily true. I think of HTTP as similar, and popular, as like JSON these days. If you get down to it, JSON, as a serialization format, is very bad. It's very slow on paper. There's a lot of different binary, and codings that are a lot better than JSON. But because JSON, and HTTP are so heavily utilized, they have gotten so good. Parsers for HTTP, parsers for JSON are just really, really good. Paul: Reminds you why the V8 engines are so good, the V8 engine has been optimized to smithereens. Matt Robenolt: Yeah, exactly. Just because it's so heavily scrutinized. And that kind of got me thinking. I was like, well, especially these days when a few extra bits or a few extra bytes, and stuff isn't actually worthwhile. Slinging an extra, even a kilobyte back, and forth, it's not necessarily something that's going to be tangibly different. So, I kind of set out to see if we can be on par with MySQL by using an HTTP transport. And my goal wasn't necessarily to say that we can do better, but to say that if we challenge our assumptions that MySQL is great, MySQL has this extremely fine-tuned protocol, if we can wrap it in HTTP, we could just not be worse. And that was kind of the word I sent out, to see if we could do that. And I kind of was targeting the modern HTTPs because those come with a lot of benefits that the HTTP 1 doesn't have. Specifically, the overhead of headers kind of goes away in a lot of HTTP 2, and HTTP 3. Which is a bulk of what people think of as the overhead of HTTP is that there's a big blob of headers that go along with your requests, and responses, which is a lot less in HTTP 2, and 3. But yeah, then it turns out, and it was like, oh, this actually is pretty comparable. And in a lot of cases it has benefits that were kind of unexpected. Paul: Do you find that between HTTP 1, 2, and three, the way that the connection speed between the client, and the database, can change which version of the HTTP works better, like the latency between them two? Matt Robenolt: So, I think something I glossed over, but I touched on the blog post, is a big part of the connection. What people would perceive as latency is the TLS connection. And that kind of is what I think is relevant, too, with the actual MySQL protocol. So, everything we do as a service provider, we are exposed over the internet. So, we, obviously, are having, we require TLS for encryption of the traffic. And older TLSs in establishing a connection, so let's say pre TLS 1.3, require multiple back, and forths to agree, and say, okay, here are my ciphers, here, you know, this stuff, and establish a secure connection. The modern stuff can do that in a lot shorter round trips. So, I guess the summary here, this is a similar problem as CDNs face. So, a lot of CDNs, the best perceived performance is terminating TLS closer to you because the latency is the round trips of establishing a TLS connection. Paul: And that's like a Cloudflare proxy, right? Matt Robenolt: CloudFlare? Yeah. I mean all the CDM providers. Yeah. Cloudflare, Fastly, they all kind of do the same thing, and that they can terminate TLS closer to you, which is most of the perceived latency of establishing a connection. So, part of the thing that I was exploring here is that MySQL clients, so if you're in Python, and you're just using MySQL or in Ruby using MySQL, TLS support in these is very poor. Because the primary target of this is I have MySQL running on my network somewhere, and the actual network is unencrypted because it's a private network for the most part. So TLS is not very robust in clients. If anything, in a lot of cases, non-existent, because they just don't need to support it. So, as a side effect of that, the much more modern TLS support is kind of lacking, whereas, so you're not going to get TLS 1.3 in a vast majority of your clients. So, we can't just be like just use TLS 1.3, we have to, that's what people are using. But with HTTP comes a lot of modern stuff that you kind of get for free. And usually when things are using HTTP 2, HTTP 3, they kind of come with a more modern stack across the board. And don't have a lot of that legacy cruft. HTTP 2 for example, requires TLS 1.3, as just fundamentally how it works. So you just inherently get faster stuff just by using a more modern stack, if that makes sense. Paul: Yeah, it does. And just to highlight the difference that you're talking about here. A lot of people might discredit HTTP because of... And it used to send headers, and it used to send extra plain text information, but your argument is like, let's look one level deeper. It's when computer A wants to talk to computer B, they need to do this handshake, they need to send the SYN, the ACK, and do this dance, and that's what's taking a long time. Matt Robenolt: That's a part of it. Yeah. And I think that's what the biggest thing in my blog post reveals. And I think I didn't do the best job explaining why that's important. Definitely, the biggest thing that outshines is the initial connection time because that's a... But for the most part, you can argue that that's a... You pay it one time. You pay it at initial connection, and then, once you're established, between TLS 1.2, and 1.3 doesn't really matter so much. Which is why I was like, can we, even outside of that, can we maintain performance with the MySQL protocol? So, even though the benchmarks weren't showing that we're necessarily faster in just doing a select one, you're not going to optimize the select one, right? You're not going to get double the performance of a select one. There's only so many bytes that are going over the wire. But I wanted to heavily focus on something that's kind of a best case scenario. Those are super fast queries. It's no data, so it should maximize the amount of bloat that we're doing by protocol overhead. So, if there is measurable bloat, it would show in a select one versus something that was like a thousand rows. Or 10,000 rows is not really going to... Our bloat is much, much less. Paul: One thing that's interesting, that immediately I'm thinking about, is you're putting the onus of the developer or the implementer to sort of manage the connection this way. That's a fundamental difference. You're saying, yeah, all the benefit of the TCP connection once you make it... Yeah, the TLS, excuse me... Once you make it, it's fine. You don't have to worry about it. Well, then you have to manage your connection correctly. You have to pass it around correctly. You have to open, and close it, and be mindful. Matt Robenolt: Yeah. So, part of what I was hoping to accomplish with that was, by using HTTP, isn't worse, which to me was a big win. And if we can say that this is comparable, this now becomes something that we can explore, if we can actually write drivers based on HTTP, because there are other tangible benefits. It's arguable if that benefit is good for you, to say, do you care about cold startup times or if not, but if we can say that doing this is not worse, and has these tangible benefits, even if those tangible benefits aren't applicable to you, that's kind of a bit easier sell. If it was were worse across the board here, you're going to do your select ones, and it's worse, but we have a faster cold startup time, it's a little bit harder sell. So, the goal was seeing if we can be comparable to MySQL, and have some benefits in edge cases, which we do. And both the cold start time and large payloads was kind of the other extreme that I wanted to test. They're tangibly better. Environments like serverless, we get other tangible benefits because we're potentially traversing larger geographic distances. Whereas traditionally you were next door to your database, when you deploy serverless, you potentially go from Europe to the US or whatever you're trying to do. And those are the problems that HTTP 3, and QUIC specifically were attempting to solve, is optimizing for those cases. Which those kind of things are appealing to us. Paul: And QUIC is just another communication protocol for folks unfamiliar. Matt Robenolt: Yeah. So QUIC is... The best summary is it's kind of like a modern TCP. It's built on top of UDP, which gives people some assumptions of what that means. But, mostly, if you think of UDP as a primitive, QUIC is kind of a competitor to TCP. In theory, it should exist in the kernel, and just be something that is provided by the system. But because it's new, it's not, and kind of needs to be, brought along as a library. But maybe in the future though. Paul: Do you see the work that you, and the team, have been doing as potential grounds for how we, in general, whether it be in this context or not, communicate with databases? Matt Robenolt: Yeah, so that was also part of some of the goals here. Once we were seeing this isn't bad, we kind of wanted to see what could we do to push the envelope here. So obviously, some stuff like drivers based on HTTP would kind of be the first step, and that's kind of what got into the serverless implementation. Another use case for us, which is very niche to us, is a feature that we have, is you can fire up on MySQL console within the product. That is just going from the browser, talking to your database over our serverless driver, and using the HTTP API. So, because we're Fetch compatible, we could just run within the browser, and you just make queries to us, just without going through an intermediate hop or anything like that. You can talk directly to the database. But what we were trying to do is, with HTTP 2, and HTTP 3, we're kind of fighting an uphill battle with it because HTTP 3 is especially niche, which is what kind of made this a little bit more interesting to test. If we can provide tangible evidence that HTTP 3 has benefits over HTTP 2, even in our use cases. If we can help drive adoption, and be able to say, hey, this is actually better. You can't use it today because no one implements this, but if you did, here are tangible benefits that you can get from it. And that's what made this a little bit more appealing to us. Things like service providers like Vercel, and whatnot, when you're relying on their infrastructure, and their run times to do the HTTP requests, you're just using their APIs, you're kind of at their mercy. You can't just tell Versel, do HTTP 3. You can't bring in your HTTP 3 library, and use it. So, part of this is a compelling thing to be like, hey, can you work with us, or can we motivate you to try to give you a compelling reason to implement that? Because in a lot of cases it's not really worthwhile for people. It's not that tangibly beneficial in a lot of cases to get these kind of marginal gains. But in these cases, databases are used to being fast. And cold startup times in serverless environments do have a really big impact. So, if you can cut down latency, especially from say EU to the US, and remove a hundred milliseconds from that startup time, that's very tangible. But there's only so much we can do in this world because we don't control all of that run time, which is why this was a big kind of proof of concept of what if. And that's kind of where we're at now. We have some numbers, we have data, and we're like, oh, that's pretty cool. Paul: I know in CloudFlare work, I can't even use HTTP 3, to my knowledge. Matt Robenolt: You probably can on their load balancer. So, you can probably communicate with it over HTTP/3, but nothing that they communicate with, out of it, is going to do HTTP/3. Paul: Gotcha. Okay. Yeah. And I'm sure there's a lot of examples of these limitations. Matt Robenolt: If you were just to do something in Python, it's not trivial to get HTTP 3 because there's just not a lot of demand. There's not a lot of things that would benefit from using it, which is what part of... It's a chicken, and egg thing. The things don't exist because there's not demand for it. So, if we can help drive demand, and be like, hey, this is relevant outside of a web browser, we can maybe get libraries, and be able to use it. It's definitely quite a bit more forward-thinking than a lot of things. Paul: I mean the biggest gain that it seems like there has been found is from HTTP 1 to 2 though. Matt Robenolt: Yeah, absolutely. Paul: Because that's where you get significant lean, more lean of a protocol, and a lot of things support 2, what, more than one at least. So, you can find that a little bit more common. So, do you feel like people are going to move off of, this is... We're talking about using. If I'm just going back to the URL example, MySQL://, that's the protocol we're talking about. Do you think that in the future people will be just using HTTP to set up normal frameworks? Is that's the hope? Is that the hope that you would have? Matt Robenolt: Personally, I would hope for that because that's what I like. There's a lot of benefits from us as a service provider. So there's a lot of selfish reasons why I think it would be nice. Being a stateless protocol allows us to, just like you do your normal connection to Google or whatever you're doing, you are not interrupted by Google turning off a server, and turning on a new one or doing auto scale kind of stuff. It's just not impacted by HTTP because we can distribute those requests around. So, as a selfish reason, it is very beneficial for us to be able to do it. I think in a lot of niche use cases of serverless, and things like that, we have to as a necessity, but we can improve it by doing HTTP 3. So I think, to me, my goal is this stuff should be transparent to you as a user. You shouldn't have to know am I using HTTP or MySQL? We should just give you something that works. And if we were to publish, for example, a PlanetScale driver, it would just use what we want to use. And you wouldn't necessarily know unless you looked into it, why it was using HTTP. Paul: Do you see that being, in any way, a hybrid world in the future? Matt Robenolt: Yeah, mySQL's not going anywhere. At the end of the day, this just is so ingrained in culture, and stuff that it's not fundamentally going anywhere. We're never going to convince all of our customers to say, use our specific driver. There might be cases where we can say, hey, you're having poor performance. If you switch to this, it's better for you. But in reality, we're just not going to, we have to support it. It's never going away. I won't want to say never, but probably not in my career. Paul: You mentioned in the blog post you were surprised by some results. Can you share more about what you were expecting? Matt Robenolt: Yeah, I mean I was expecting... I knew a benefit of the TLS handshake, that's an obvious one, but I didn't really anticipate the performance parody, I guess. The fact that we were able to not have a tangible difference in a select one was something that really did stand out. I thought that example, specifically, would show our weakness. I thought very specific because that is the one, like I mentioned earlier, it's going to have the most bloat-to-data relation ratio. It's the most overhead of headers to your select one that's returning back a one. So, that was the obvious one that I thought would be worse, and the fact that it wasn't. Paul: And just to clarify, for myself as well, the reason why that select one is the biggest, we expect to be the biggest, bloat is because the potential benefit of it being stateless is the minimal, and the overhead to actually do it, it's there. You still have to pay it. Matt Robenolt: If you're going to pick an example, if we were to put this in a ratio of data bytes versus protocol bytes. So, if we're thinking the rest of HTTP, the headers, and all of that stuff, are occupying X amount of bytes, your data itself is y amount of bytes. Right? If we think of the MySQL protocol, even if you're not familiar with it, it's pretty lean. There's not overhead of the protocol because you already have, they're just kind of, well-defined packets. Whereas a larger response, if we're returning back megabytes of data, the ratio of headers, and stuff, kind of goes away. Adding in 10 bytes or a hundred bytes of headers is not something that's going to make a difference when it's a 50 megabyte response. The ratio's so small. So, on a select one, when you think, effectively, I'm returning back one row, one column, single digit headers are going to be a larger ratio to data. So, that was kind of that extreme angle of wanting to make sure that that wasn't too bad. And I think, if we were to run this test 20, 30 years ago, that would've been substantial because our networks aren't that fast, and computers are a lot slower. But at this point, it's basically still going to fit in one network packet. It's still small enough, especially with HTTP 2, and 3 because you have header compression, and all that, that there's really not that much bloat. The other side that I was shocked about the results on, these are hard to quantify because the results being the same are what's shocking. It's not so much that they're better. The connection within... So, I did two testing scenarios. One was from my laptop, which is intended to be the worst case scenario. I'm like on wifi from my personal computer to something that is geographically in the next state. It is meant to be kind of a janky worst case where I knew HTTP would excel. Or, specifically, HTTP 3 in that case. But testing the basically local hosts of, I have my database in the same data center as my running application, I expected some of these things to start coming up because it's already so fast. And it still didn't have any tangible overhead, which it sounds not very exciting, but it is very exciting when the result is kind of intuition is to be worse, if that makes sense. And the fact that we weren't worse is what makes all of this viable, in that we're not worse for the best case scenarios, and were better in the edge cases. I anticipated the edge cases to be good. Didn't anticipate the fast cases to be, I expected them to be slightly worse, but I didn't expect them to be comparable. Paul: Did you feel shocked when that happened? Were you just like- Matt Robenolt: I was kind of just internally DMing some of our VP of engineering and the CEO of being like, huh, this is actually pretty good. And that's when we were kind of like, we should talk about this. This was very much me just experimenting, and being like, I'm curious what this will do. But it became pretty clear that this was neat, and a lot better than we thought. Paul: Matt, thank you for walking through your, what I mean, we can call them the escapades into the research of HTTP, and interacting with MySQL. If you specifically want to look at some of the things that Matt was talking about, because he's saying, oh, I ran this test, and a hundred milliseconds in those tests, these results that Matt, and his group, went over are available online. You can look at them in a blog post that we can link under the podcast. If you wanted to suggest that people look into this field, in general, for themselves, are there any other projects out there that you found interesting when you were getting into this, or other GitHub projects? Matt Robenolt: This is relatively unprecedented, which is what's a little bit more unique for us talking about this, and that there's other ways people solve the serverless issue. We're not the first database that you can query from that requires HTTP. Usually, things like Firebase, and stuff like that, kind of have their own APIs. They kind of do. Firebase is not SQL. They don't have that kind of stuff. Other people have implemented it using something with web sockets, and things like that, which are a... It doesn't give you all the benefits that we have. I'm not going to get into why, but it's kind of just fundamentally a different solution. But our take has been... It's pretty unique. So, I'd say no, there's not really. I think that the relevant stuff is learning HTTP 3, and learning about QUIC. I think those are both just interesting technology. It's extremely widespread these days for web browsers, but not so much implement outside of that. Server side stuff just doesn't really exist, and it's all very immature, which is something that I kind of brought up in my blog post, too, in that some of the edge cases, the performance of HTTP 3 were subpar relative to HTTP 2. And I think a lot of that is still, without me investing the time into researching that, I think a lot of it is performance tuning of libraries, and implementation. TCP is extremely refined. It's been around forever. It's highly, highly optimized. QUIC, HTTP 3, they're all very new. So, I'm assuming that there's a good amount of room for improving their implementations. Paul: But at a face value, they're already, arguably, improvements over their predecessors. But there's a lot of room to improve. Matt Robenolt: Yeah, and I think that's part of what's appealing about this, too, is that the amount of effort that I've put into this is relatively low to get these kind of results. And I think that nothing has been put into actual performance metrics, and analyzing memory allocations, and things like that, and really trying to fine tune this. This was kind of just a quick proof-of-concept, and see, and it was very good results without really doing anything. Paul: Those are the best results. I like results where I don't do much, and I get really good results. Matt Robenolt: Which is why that's appealing. 'Cause imagine if we invest a few years into this, or something, that you can actually probably get a lot bigger gains. Paul: Well, Matt, thank you for your time, and for talking about the research. I mean, like you noted, this is research that you, and your group are doing, and hopefully you're going to help change the database world as we move on in the next five or 10 years. Matt Robenolt: Hopefully. We'll see. We're trying. Paul: I'm rooting for you.