Anna (00:06): Welcome to zero knowledge. A podcast, where we explore the latest in blockchain technology and the decentralized web. The show is hosted by me, Anna. Fredrik (00:18): And me Fredrik Fredrik (00:26): Today we sit down with Howard Wu to talk about zkSNARKs, libsnark, and how you distribute proof generation using DIZK. Anna (00:43): Hi, Frederick. Fredrik (00:44): Hi, Anna (00:45): How's it going? Fredrik (00:46): Pretty good. Anna (00:49): So today we're going to be talking about zkSNARKs. This is the second episode in the zero knowledge series that we started, I guess like two months ago. Fredrik (01:00): Yeah. I don't know. It's it's been a while. It always feels hard to measure time in blockchain space. But yeah, I mean, we did the intro to zero knowledge proofs episode kind of covering the basics, what a zero knowledge proof is, kind of how to think about it. But it wasn't very practical. It wasn't in an episode, like talked about any particular implementation or anything. And so we want to dig in now with Howard Wu on like some real practical stuff. Anna (01:32): So yeah, we're really lucky to be sitting here with Howard Wu. Howard was one of the kind of original guys at blockchain at Berkeley. He's the libsnark co-author. And he's now acting as a managing director of Dekrypt Capital. Howard. Thank you so much for joining us to talk about zkSNARKs. Howard (01:50): Yeah, it's my pleasure to be here. Thanks for having me Anna (01:53): To kick it off. I think it would be really cool for us to hear a little bit more about you. Tell us about your story. Howard (01:58): Yeah, I think that'd be great. So my background is in Computer Science and Applied Math from Berkeley. I first got into the blockchain space in 2011 through Bitcoin mining at the time. And around 2013, 2014, when all of the different altcoins started popping up I got really fascinated by the technology itself and understanding, you know, the trust models, like what is decentralization mean? What kind of features does this enable? What's the use cases? And that's when I really started to go down the rabbit hole on research and I worked under two professors. So one is a professor Alessandro Chiesa, who is a cofounder for Zcash and also professor Dawn Song, who is a director for IC3 and now I'm a founder for Oasis labs. And my work has primarily been around things like consensus mechanisms and they're generally in cryptography. Howard (02:52): And one particular area of interest for me has been zero knowledge proofs. This is a area where you can do many fascinating things and that's where I've helped with different initiatives, such as libsnark, which is a C++ library for zkSNARKs. As well as other libraries like uh, libFFT, and libFF, these are kind of those arithmetic and mathematical basis for libsnark. Recently we released a library actually called the DIZK, which we can also cover very soon. But during my time at Berkeley, I also helped with getting blockchain at Berkeley started and that's been a fun, fun endeavor for me after, after I finished my undergrad I went to to work at Google and it was fascinating to kind of learn how to think about distributed systems from their perspective, but it was very clear. Howard (03:44): They didn't have that same notion of decentralization. And so I came back to Berkeley to do my masters in cryptography and ever since it's been a, it's been a very, a fruitful endeavor. And you know, I'm now also with my partners is starting Dekrypt Capital. And for us, this is really meant to be one of those building blocks for getting this technology out to mainstream adoption. We really look at it from a pain point perspective. And so that's where things like privacy and scalability and interoperability are most poignant to us. And we really try to work with founders touh, educate them, but also to support them on their journeys as well. Anna (04:21): Well, I don't think we could have a better guest to go into zk-SNARKs then. And I'm so glad that you're here with us to explore this, this part of the space. Howard (04:31): I appreciate it. Anna (04:32): Maybe we could start with the basics, what is a zk-SNARK? Howard (04:37): So I think to really get into what a zk-SNARK is, it's probably good to start by saying like, what is a zero knowledge proof, and just to kind of remind the audience members you know, zero knowledge proof is a, what I would call a privacy preserving cryptographic proof of computational integrity. That's a lot of jargons to basically say I can't tell you the secret, but I can prove to you that I know the secret. So for example, perhaps I'm looking at like a Sudoku puzzle and it's one where, you know, the function itself is going to be the puzzle, but the claimed output would be all rows and columns. So in some ordering of one through nine, and you have two parties in this scenario, you have approver and a verifier. Both of them note this particular function F and they also know this expected output in a through nine. Howard (05:32): But anyone can basically come and say, Hey, I know the actual solution, and that is your private input X. And so I'm someone who claims to know this will be the prover and say, all right, let me generate some type of proof. And they fill in, you know, their answers into some type of a circuit in this case. And it will output a proof and the verifier then looks at the proof and runs it on that public function F and also the expected output and will at some point very clearly believe that this person really knows the answer. One of the great great features about this is that you can then convince someone else that you know, how to solve a Sudoku puzzle without actually showing them the answer key. And that's kind of the beauty around it. Anna (06:16): I love that. You've just explained that we actually used that example in the earlier episode, but it took us so much longer to say what you just said, very concisely. So thanks for that. Howard (06:27): No problem. So in this context it's, this is actually a notion that was first introduced in 1985 by Goldwasser, Micali, and Rackoff. Um this was one of those first papers that really started to introduce this notion of a zero knowledge proof. And the early constructions were actually quite costly. They were one not succinct, meaning the proofs themselves were just massive, like massive, especially for that day and age when computers were also quite quite primitive, so to speak. It just wasn't something that was very computationally feasible, but additionally, for a prover to convince a verifier it was an interactive protocol, meaning they would talk back and forth, back and forth with each other. Uh, playing this kind of a challenge, a verification game to actually convince each other or convince the verifier that the prover really knew the answer. Fredrik (07:23): And these are some of the tradeoffs that were in the early construction's very clearly a bottleneck for adoption. And what was really nice was a, you know, in the past decade, we've seen some huge advances by kind of taking a different route and a different approach to looking at zero knowledge proofs. And that's where the zk-SNARKs really were born. These are proofs themselves that are extremely succinct. They're very short insides. We're talking about like, you know, a hundred or 200 kilobytes sized proofs now. And they were also noninteractive, I mean, the prover could come up with a proof and then give it to the verifier and really any verifier could come and check this back for themselves, for, at any point in time from then on. It's a very nice notion and you know, additionally, these are proofs themselves that are publicly verifiable and have all these nice properties that can be associated with them, especially in the case of blockchains and SNARKs, therefore are you know, it's a, it's a jargon, but zk-SNARK stands for zero knowledge succinct noninteractive argument of knowledge. Fredrik (08:27): What that means is I'm basically you know, I can, I can convince you of some fact without having to tell you the underlying fact itself, and I can do this very, very concisely without having to interact with you continuously. I just kind of generate some type of a proof based on this problem and give it to you. And I think that this is a very beautiful notion and also a very powerful one. Fredrik (08:53): When you're talking about zk-SNARKs and zero knowledge proofs. I mean, they're not the same, right? Like zk-SNARKs is a technique or a zero knowledge proofs system. A it's not a particular proof, it's not a proof of any particular program or aspect. It's a way to generate a proof. Right. So how does zk-SNARKslike set itself apart from these other solutions? Basically, if you take the Sudoku problem, you could, could you express that as a zk-SNARK? Howard (09:26): Yeah, absolutely. So a zk-SNARK is one type of zero knowledge proof, and um, zk-SNARK basically is a collection of these properties. So, you know, zero knowledge is certainly one of them. And you have these other properties as well, that are intrinsic to the particular protocol that's being outlined here. This protocol itself is formalized as part of this kind of prover-verifier system that we discussed. And in general, like zk-SNARKs certainly are just one branch of zero knowledge proofs systems. There are also other forms like a, as you've probably seen recently. There's been a lot of interest around SATRKs, uh SATRKs themselves. So, are another form of a proof system where you can also have zero knowledge as a property. This is a very fascinating endeavor right now because of these concerns and especially in the blockchain space around a trusted setup which is intrinsic to zk-SNARKs, and those are areas that certainly I'm happy to also talk about. Fredrik (10:26): Maybe we can go into a little bit of this trusted setup, because I think it's something when you, you know, quickly just kind of Googled zk-SNARKs or Zcash, or what have you, it's it's one of those things that's often brought up as a negative, or like as a problem with this space, what is the trusted set up and why is it necessary? Howard (10:49): Yeah, so in a zk-SNARK construction, you have three components that you're reasoning about. First is a setup. Second is a prover, and third is a verifier. A setup basically is a process to establish the game that you're playing here. So it is what generates a set of publicly viewable keys for example, a proving key and a verification key. And what that does is basically set the stage for all of the provers and verifiers for the remainder of time to come on this particular problem. That is a huge notion that in the academic world is not so much of a concern cause in, in the world of theory, you just say, Oh, I will have, you know, someone run a secure setup and all the provers and verifies can continue to interact. No problem. You know, what we see now in practice in the real world is there's very clearly a notion that, Hey, like who is going to run the setup, then, you know, someone has to run it. Howard (11:47): And that's where there has been recently some, some concerns around basically what a, what is a setup and who runs the setup and how do we ensure that this is a secure setup? And I would say that there are multiple ways to address this problem. You know, the first is we've seen how Zcash has done this originally. There was a handful of people I believe, six that had come together and said, you know, let's, let's make this setup happen. And they were geographically distributed. They all went through their due diligence on buying computers in what I would call a random fashion and finding a random place to kind of operate and spend a few days to generate these types of of keys that they would then use to combine and actually generate the proving and verification keys. Howard (12:35): I think that that's that's certainly one way to do it. And the fact that it was just six individuals involved at the time has raised some concerns from folks in the community. And that's where Zcash has said, you know, now we are transitioning to a new type of zk-SNARK. And in the process, we are also going to do a multi-party computation on our trusted setup. So what this, what this means is well, a multi-party computation is when you have like a group of individuals that are each running, some piece of a computation, what Zcash has now done is say, okay, we have this powers of tau. It's a, it's a, it's what they're calling it. And it's a component that's inside of this setup. We're going to have that be a process where anybody can run this. And if you run it so long, as you're honest to yourself, you will believe that the rest of the process is also secure. The reason for that is in a multiparty computation. So long as one person in this case is honest. Then this, then the system itself will generate an output. That is also honest. Anna (13:38): So actually, so out of the six, so you did six again, you'd only need one, one of the six to actually be honest in the, in the original trusted setup of Zcash, the six, the six people. I mean, there's some really great stories about what the lengths that people went to to make sure that these, that this computer was unique, that there was no way for them to be hacked or found or followed or anything in that original trusted setup. If did they all have to be honest, or was that also one of those examples where only one needed to be honest for it to 100% be considered legit? Howard (14:17): So in that context, you know, as long as one person assured themselves that they were not going to keep any remnants of this computational process, then the system itself would be, would be sound. And I think that that is a very powerful notion. And certainly there was some, some footage, if I remember, right. Of people like burning, you know, machines and th that's crazy. I would say with those... Fredrik (14:43): Pretty funny. Howard (14:44): ... With this current with this current around where people are, anyone can participate by the way. There's also been some really, really awesome photos I've seen, like there was one guy who had, you know, had their computer lined with a tinfoil and like was running this whole thing in like a basement. And it's just really, really like... Fredrik (15:01): Saw someone take, take their computer on a plane and buy a bunch of SIM cards to upload that data. Anna (15:07): Drive into Canada. Fredrik (15:08): I actually have, I have a friend who participated in the Powers of Tau thing. It was it was pretty funny. I mean, so whatever negative things you can say about this, it is kind of cool to have this like trusted set up thing. Like if it's six people, I can understand the arguments of like, yeah, you could like bribe these six people and whatever. Now I think, I think they have over a hundred participants in Powers of Tau. And like, you don't know beforehand who these participants are, so you can't necessarily bribe them, et cetera, et cetera. Like you can be pretty damn sure that one out of a hundred is going to do this thing correctly. And the, the nice thing about it though, is it's like a community building efforts you kind of get together and like you run this process and kind of build up this little community around it, isn't it? Anna (15:58): It does sound awesome. But I guess the downside is just imagine the organization that this takes every time you want to do one of these things. I know, and I wonder is that sort of the reason that they're thinking of moving away from it, like what, like, if it is so cool and fun and, and now very trusted, why, why not do it? Fredrik (16:17): I would say as cool as it is. And as community building as it is, it is still also probabilistic, right? Like from their point of view. Cause you know, these are like cryptographers who are looking at it and you know, the cryptographers view of the world is very different from just a conventional notion of the world. And it's like, everyone is guilty until proven innocent. And that's where that's where you know, for a cryptographer like having a hundred or 200 or even a thousand participants in one round means that you still have, you know, a, you still have a probable you still have a probability that there is some collusion happening and it's much better if you can just remove that problem altogether. So this is more on like the perspectives, so to speak, then the actual kind of practice, Anna (17:06): This is kind of cool to hear a little bit more about like that mindset and like what what's driving this push towards less of the ceremony and and trust is set up and more of these, like what you were about to say that though, or maybe you did, it was what is replacing the trusted setup then? Like what is actually coming in place? Howard (17:24): Yeah. So there's been a lot of effort on the research front to develop things like STARKs, umm STARKs are basically succinct, transparent arguments of knowledge. These are that, and zk-STARKs themselves enable the exact same game to be played as before with a prover and a verifier. But this time, the setup itself is very simple. You basically only have a dependency around some random coins that are and the definition around some hash function that you choose. And everybody understands this notion and can use it without having to actually rely on a generation process for these, these structured and very large keys that are, that are going to be passed around. It's a, it's a much more, it's not only a more computationally and cryptographically lighter assumption that you're making, but it's also a a more trustless way to do this, especially in the case of blockchains. Fredrik (18:23): But the trade off right now, right, is that STARKs are way harder to generate isn't that right? Howard (18:29): The, the technology certainly is very young. It's really it, it really sits in the realm of academia today. And I would say that this is a very similar case with any previous advancements that were made. If you look at how zk-SNARKs came about in the early constructions they were also like costly you know, you have proofs that are large, and if they can take longer than maybe what would be ideal. And over the years, people refined this notion and come up with newer protocols that have either different cryptographic assumptions or lighter cryptographic assumptions. And you basically massage the techniques down to something that is a much more efficient and user friendly. And that's, that's kind of the state of the art itself. And here we see a very similar progression happening, albeit I would say much faster as well. Anna (19:19): Let's go back to the zk-SNARKs and talk a little bit more about them and how they, they look. Are there any other, are there like some key properties of zk-SNARKs specifically that are different from all of these other systems that we've been talking about? Howard (19:34): Zk-Snarks themselves enabled all sorts of interesting features. And the reason for that is because of these properties, like I had mentioned, you know, succinctness and on attractivity and zero knowledge. And what I'm really trying to get at here is that as zk-SNARK is a very practical protocol. It's one where because of the fact that proofs are very short and verification, time is extremely fast. You actually can come up with constructions that are usable in the real world. Remember that most systems in the real world, if you even have like a hundred milliseconds or 200 milliseconds of latency it can be potentially, you know unacceptable for a service to operate, you know, with any higher thresholds. And that's where in this case, ZK Snarks themselves are extremely fast, extremely efficient. And therefore, you know, for the verification component, it's actually very much usable and, and and able to be integrated into existing systems. Anna (20:34): But with that said, what are the really big challenges of zk-SNARKs Howard (20:38): If you look at the current state of the art libsnark is the de facto library for generating SNARKs and libsnark itself is designed to be a monolithic process that runs on a single machine we've seen in practice from our benchmarks, that in essence, you can generate proofs for problems that are on the size of a 10 to 20 million gates. So what is a gate here? Basically all problems in this world are mapped and can be mapped as circuits. You can think of circuits as you have, like a two input wires and some output wire. So like, if you have like a an addition gate, you have like A and B come in, you evaluate A plus B, and you get some value C out. Or a multiplication, A times B equals some value C. And it turns out that a a construction in libsnark can really only support up to 10 to 20 million gates. Howard (21:35): If you look at the existing applications that use libsnark,um things like the zero cash protocol, which has manifested as Zcash we're talking things currently that are on the order of a million gates. Granted the newer construction that's happening later this year, we'll bring this down by, you know, an order of magnitude. But it's very clear that as we start to branch out into larger things, things that are more than just payments, we are going to start to hit that upper bound on what libsnark could possibly support. And that's intrinsically the challenge that we're facing here. Anna (22:08): I have a question here, because you just said, you said a million gates and it's bringing it down. Maybe I misunderstood that don't you want more gates? Fredrik (22:16): No. To make it more efficient, they've written their algorithm to use less gates so that it is more efficient. Anna (22:24): Yes. I see. Yeah. Gotcha. Howard (22:24): It's, it's better if if a user doesn't have to wait like a minute, you know, to generate a private transaction and merely just seconds, like, and so, you know, the team there has worked very hard to come up with a newer construction that can in fact be a much more efficient. And we've seen that it's something on the order of like 90% to less memory usage and 80% faster. And that's a very powerful, and it helps with driving a usability and adoption and, you know, things like that. Fredrik (22:55): Yeah. I mean, so to put some perspective on the bounds of things we're talking about, like using a million gates, like in Zcash, you need several G of RAM. It takes minutes to generate a transaction. So it's slow. It requires a big computer is like, not something that you could do in your phone. So if they can rewrite their construction to be much more efficient, you could run it on a phone, or they could enable shielded transactions to be like this, the default choice where that would be pretty hard otherwise. But is this also like you're, you're talking about these constraints and why you want to be able to push up to more gates. Is that just because of like a, the hardware bounds, like the, because a normal Fredrik (23:46): Computer doesn't have more than 8 G of RAM and therefore you kind of want to be able to distribute this or that that's what, why you were saying, like, it, it maxes out at a couple of million gates or is there like some software aspect that makes it max out of millions of gates. Howard (24:01): Great, great question. So if we look at the existing systems that are using libsnark, what happens is as we start to grow and solve bigger problems, it's very clear that we run out of memory. Like we will exceed memory bounds. And additionally, it's so time-intensive to generate the proof in that case. Because, you know, you're swapping out to disk and also just the problem itself is so large that it becomes a very inefficient process. It's not parallelized, and it also doesn't scale. So that's where if you take the notion of like smart contracts today and this is like generalizing just on payments. Like if you add in payment logic on how and when and why you would make a payment it's very clear that smart contracts could take advantage of this type of system as well, especially in the case of scaling and also for privacy. But we just don't have the ability to do that today with our, a single machine implementation. Anna (24:59): It's not quite fast enough yet. It's, it's, it itself is not running fast enough for it to be a scaling solution or to be helpful in a scaling solution. Fredrik (25:07): If you look on Ethereum today, you know, you can run smart contracts, but with severe limitations, for example, it, it can run for 3.14 million microseconds. And you also have a memory capacity of about 39 megabytes. These are real limitations to enabling much more interesting smart contract logic. And ideally you know, one way to resolve these problems is to say, Hey, what if I made a circuit construction out of my smart contract? Like I, I, as a developer, you know, you map the smart contract code down to some circuit form, and I compute this off chain on my own computer. I get some result out and I generate a proof. And I submit these to the Ethereum. They only have to pay the cost of verifying approve rather than actually running that logic. And this can potentially be a way, not only to scale the EVM, but also to improve the privacy bounds of the EVM. We're doing offchain compute. So you don't leak any knowledge in this particular case. Fredrik (26:05): And that would be really cool, Anna (26:07): But only if the, if the zero knowledge proofs themselves or the zk-SNARKs themselves, are efficient, Fredrik (26:13): Efficient, and like in a because in this case you would need a trusted set up for every contract and that's not really feasible. So there indeed you'd need something like STARKs. Maybe I wanted to dig into these circuits a little bit. So you, you mentioned they're essentially like gates, but what, are they like logic gates? Is there "AND" and "NAND" and "NOT" and that's it? Or are there the mentioned, there was like a plus gates, what's the most complex gate and like, what's the variety of gates that exist. Howard (26:47): Yeah. So there's a few different forms of representation. The most common ones are arithmetic circuits and boolean circuits. You can think of, for example, an arithmetic circuits, you have two operations, you have addition and multiplication. Addition and multiplication will operate inside of some finite field ideally a prime finite field. And this allows you to basically map any type of computational logic down into a set of additions and multiplications. The gates themselves take as input some number of input wires, and then they'll spit up a one output wire, and you can basically use this to compose and represent all sorts of typical computer language primitives, like your "for" loops and your, your "while" loops and your "if" statements and your "else" statements. There's actually been some papers in the past that map the entire assembly language instructions down to a set of these types of circuits. Howard (27:46): And this is also past work done by Alessandro. And you can, you can find these things online and what's nice about it is there's been a lot of optimizations that people have come up with, for example, like how do I, you know, optimize for like a "for" loop case, or how do I construct ops and optimizations for certain specific type of, is there a knowledge, proof system? And what what's nice about that then is that you can use these and amalgamate them into something that is very practical so that you can try to scale these smart contracts, for example, into a much larger problem. Fredrik (28:18): It sounds very much to me like CPU design, I mean, it's logic gates. And I mean, in a CPU, we have millions of transistors. I mean, it sounds in essence, like anything that you could map onto a set of transistors is something that you can map into a zero knowledge proof, which I guess, I mean, it would be true because it's generally generic, but we, I mean, there exists a whole branch of computer science around FPGAs who, that they have their own languages to develop and to design these FPGAs is, is this something that could be applicable to like speed up generation of a zero knowledge proof? Fredrik (28:56): Yeah, I would say the general field of speeding up zk-SNARKs is quite young. There hasn't been much explanations as far as I know that I've gone into things like FPGAs or some type of like custom, like ASICs or certain types of GPU implementations. That's where, like, for us, what we did was we you know, I spent the past two years looking at distributed zk-SNARKs and how do we basically parallelize this computation rather than looking at some of these other verticals? I think it was very clear that with the rise of like cloud computing and the, the commoditization of these clusters it becomes cheaper and cheaper to just spin up more machines rather than go down the route of getting dedicated hardware. And so this naturally arises, then, you know, how can I just add on like another machine and another machine, another machine to, to distribute and generate a zk-SNARK proof then. Anna (29:47): So we've talked a little bit about libsnark, you've mentioned it a couple of times, but I was thinking it might be really helpful to define exactly what libsnark is. This is something you're the coauthor of. And so I'd love to hear from you how you define it. Fredrik (30:01): I would say that libsnark is a library, a C plus plus library that implements zk-SNARKs schemes. So these are a cryptographic method for proving and verifying in zero knowledge the integrity of computations inside of libsnark, we have a host of reference implementations of zkSNARKs protocols, and this is a very standard and at least currently most used library by those in the academic field and also in industry developments for implementing zkSNARKs protocols for their application needs. Libsnark itself has been around for a few years, came out of a body of academic research and academic work. To this day, we still say that this is a code that is not intended for production use but has gone through many audits and we've done a lot of work to update the library as bugs that have popped up in different audits. Howard (30:57): Um, libsnark itself has been adopted by you know, several different projects in the blockchain space. I think most prominently it would be Zcash. And we've seen that there's also been a lot of contributions and interest from the traditional tech companies here that Google's and here Facebook's and here Apple's. And I would say that this is a library that is meant to provide this set of reference implementations so that people can test and benchmark and understand and study the trade offs that they make with different types of constructions and schemes. Anna (31:32): With, without something like this, what would a developer have to do? Would they have like, is this, this is basically making it a lot easier to start incorporating it, I guess. Howard (31:41): Yeah. I mean, the whole point about adding all of these implementations and making it open source is to basically democratize zk-SNARKs is to make it easy for any developer who gets a general notion on what this is, but maybe doesn't have the time or the capacity to go and implement the schemes themselves. There is always, you know, well motivated reasons why someone would, would develop a protocol here and add it in. And we see this from time and time, again, more and more contributions to the library. Fredrik (32:09): For understanding you correctly. Like one part of it is I can take one of these reference implementations. I can like if there's a something that I want to be able to generate proofs for, and it's already implemented, I can just take that off the shelf and use it. Yeah. What if I wanted, like, what if I knew I had a new problem that I wanted to generate proofs, or what would I have to do? Howard (32:31): There's a few ways you can do this. First I have on my GitHub repo on github.com/howard, a library called the libsnark-tutorial. And what libsnark-tutorial gives you is just a very simple boilerplate set up where you have like the existing snark library imported. And you can, from there immediately start to write code where you take advantage of things like the gadgets and the underlying primitives, and you can call into certain zk-SNARK protocols and just run with it from there. If you want something that is more application oriented we've seen also things like ZoKrates from Jacob Eberhardt and friends, and they basically have mapped to the libsnark library into their rust construction. And that also generates out boilerplate solidity code for basically verifying verifying proofs on Ethereum. Howard (33:31): Another route that has been taken has been to add different types of bindings and wrappers around libsnark. So, for example, there's jsnark from Ahmed Kosba, who basically used a Java implementations to map into libsnark, and it provides a more developer-friendly way to write circuits and generate proofs then there's also been Izaak Meckler's implementation on Snarky, which is an OCaml library that also maps into libsnark and OCaml provides a, a more implicit constraint generation process that makes it easy and intuitive to write functions that get mapped down into a circuit world. And I think that that's, these are all ways to basically drive usability. And so for developers who come in with different backgrounds, they have different reference implementations and also different libraries to work with to fit their needs. Anna (34:15): Do, would you say that like libsnark provided some sort of standardization, like since it's a really highly used library? Is that something that's kind of a standard, or would you say standards don't exist in this space yet? Howard (34:28): So libsnark for the longest time was the primary and only library to use for zk-SNARK protocols. However, in the past two years, we've seen the rise of more and more proof systems come out and that in and of itself meant new libraries that were introduced. I think one of the most exciting that's coming out today has been from Zcash Bellman. This is from Sean Bowe from Zcash, Bellman is a Rust implementation of zkSNARK protocol, namely it's Groth16. So this is a protocol that was developed by Jens Groth from UCL and in 2016. And this is a, a particular zk-SNARK that has been written in Rust and has been developed and optimized for the Rust environment. And that's also what Zcash is using currently to generate their MPC round for the setup. Anna (35:23): So I guess the answer is there is no standardization anymore. There was for a while because it had sort of, there was one and now there's many, I mean, maybe let's talk a bit about that. Like what about standardization in this space? Is there something like, is that, is that happening? Is that something Fredrik (35:41): Like, there should be a standard way to write these circuits because then you could kind of transform them, but more like transports them between different library implementations or something. Howard (35:51): Yeah, I completely agree. This was something that I had brought up last year in 2017 um at Devcon when some Zcash folks, like I think I discussed with Andrew Miller and also Jacob was there and I, and a few other folks that are just a part of that community. We're discussing the fact that there's these libraries popping up. And from my point of view, it seems like the standardization was a boarder at some point. You know, if you, if you think about the different libraries that are out there today, they can't talk to each other. If I generate a circuit in libsnark, it doesn't necessarily run and operate and it's not compatible with bellmen. Howard (36:28): There are other implementations that are coming out today as well, and none of them can implement an interface with each other. So if I generate a proof, you know, in one library, how do I verify it in another library? And there just isn't a standard form. And so that's where earlier this year in May of 2018 there was a ZK standards workshop that was set up out in Cambridge, Massachusetts. And this was basically meant to kind of bring the different folks who work on uh, zk-SNARKs together to not only look at it from a theory standpoint, but also from an implementation standpoint and an application standpoint. And these were three tracks that independently operated and their goals were to basically achieve the ideals on what does standardization look like? How do you make sure, you know, terminology is is uniform across the board? How do you make sure these libraries can interface with each other and how to make sure applications understand what to call and when to call it for what reasons? And so there's there's all sorts of work that's happening on this front now. But as you can imagine, there's, there's not a lot of people that are in this space. And so you can only move so quickly because of that. Anna (37:36): Also, I guess, because it's changing so much, like it, you have to be careful not making things too standard because that could potentially stop other implementations from arising. So I guess right now it's this funny space where it would be really helpful to have some standardization, but it seems like you still want to keep it a little bit open. Howard (37:55): Absolutely. Fredrik (37:57): So jumping from libsnark, which is something you have worked on, let's jump into something that we've been teasing this whole episode, which is DIZK, your new work, and distributing these knowledge, proof systems. So like you said, there's a limit in what you can do on one machine. You want to be able to break it up on multiple machines. What is DIZK and like, how does that work? Howard (38:22): Yeah. So DIZK is a distributed zero knowledge proofs system that's implemented in Java and uses a cluster compute framework called Apache spark. What does it does is take an existing protocol, in this case Groth16 and as a monolithic process, it breaks it apart and enables algorithms that can work in a distributed setting. So what that allows you to say is, you know, instead of being bound to just one laptop here if I can spin up a cluster of ec2 instances, or just a cluster of machines somewhere, maybe I have a data center, or maybe I have like a few, a few computers at home. Howard (39:01): If I link them together, I can run something that is much more scalable and also much more parallelizable. These are two intrinsic properties to this architecture that I think are very fascinating. So what this means is as I increased the amount of memory and compute resources, I give the system we will scale nearly linear the, in the throughput and the output of our SNARKs. That means you can grow the problem size to something much larger just by, you know, doubling, if you want to double the circuit size, just double the amount of resources you give it. Additionally it's much more paralyzable so you have a performance, speed ups and increases in speed for generating a proof. And that's because as you throw more machines and you have more cores, you can actually process the zk-SNARK itself, um much faster than you could with a single machine. So it turns out that as you you know if you fix a circuit size and you just double the amount of compute memory resources your time will also have in a very linear fashion, then Fredrik (40:04): That sounds like a pretty groundbreaking piece of work, actually, because like, once you, like, we have these constructs and we all kind of know that they're, they're hard, they're, they're limiting like in what you can actually do. Like Zcash is not that complicated of a, of a proof to generate. Like, you're proving that you've sent some, like an integer from person a to person B conceptually, like in the real world. It's not that complicated. If we wanted to do something like prove a smart contract, it would be so much larger and, you know, a hundred times larger, a thousand times larger, it would just would not be feasible at all on a computer. But basically with this structure, you could do that. You could actually make something that is a hundred times larger. Anna (40:56): Would this actually mean that you would do zero knowledge proof like on chain, the whole thing. Fredrik (41:00): So if you're talking about on chain, the problem is if, like, how do you distribute things on chain? Like you still need this compute cluster. This is an interesting question though, because like, have you guys thought about like baking this into a decen..., like Ethereum one could argue is a decentralized compute cluster. It just computes useless things. So have you thought about like putting a blockchain into this somehow to, to coordinate these machines? Howard (41:30): Yeah, I'm actually, you know, my current work right now is around, how do you use zk-SNARKs in a decentralized setting. And this is a very open question right now because we there's very little bodies of work that demonstrate how to do this properly. And I think that that's certainly something that merits time and energy. And that's a, that's an effort that I'm currently working on then Fredrik (41:53): Just on the note of like this, this kind of many fold increase in performance, have you thought of, or seen any applications already where this could be exploited where like this, this thing was completely not possible before, but now you can actually run it. Howard (42:11): Yeah. So we actually implemented two applications for our paper. And I guess to preface it, I would say that if you look at libsnark today, we can compute about 10 to 20 million gates in libsnark. With DIZK we can now compute billions of logical gates and that's a hundred X factor increase in the size of a circuit that we can reason about. So naturally speaking it's, well... And so, and so in terms of this computation, it naturally arises that we can just solve much larger problems all of a sudden. And so we went through kind of the different applications and use cases that would be interesting and also practical for the timeline of a research paper. And we basically came up with the two areas that we thought were cool to look at. One was basically implementing a previous paper called the photo proof. Howard (43:03): And what this allows you to do is basically provably show the transformations on an image, kind of like if I'm Photoshopping an image, that's really a series of matrix operations where I'm translating this and transforming and rotating and scaling. And those operations can easily map into a circuit world and into a SNARK in this case. And so that's one, one use case, but the other one is also just in the case of machine learning, how do you have verifiable and integratable machine learning computation? So if you think about, you know, how the industry works today, much of this is a very dynamic problem. If you want to be an ML researcher on healthcare data you know, Stanford hospital may not be so inclined to share with you their patient records because of HIPAA and all these different compliant laws, right. Howard (43:51): But for a researcher in the medical field, they may want to know like what percentage of patients who have a heart attack die within two years. And that is a very serious problem and a very serious endeavor. And there's very little ways to do this effectively and efficiently today. Instead, if you could have some type of privacy preserving model for computing this information and training it in a machine learning model, it very clearly arises the use case to actually do this. And you know, one of the problems with this in the past was that there's so much data. You can't even compute it on, in live snuck, right? But now you're saying I'm storing this data in these massive databases. And now I have a compute cluster framework that I can use to run a zk-SNARK on why don't I just run this problem in this database and compute the prove out from it. And so these are two areas that we looked at that we thought were quite interesting. And that's that those are the two things that we actually evaluated and wrote about in the paper. Fredrik (44:50): So essentially, like instead of the hospital giving all their data to the researcher, the researcher can like query like how many patients have, or like you would have to, it would have to be a boolean query, I suppose. So it's like do patients who have a heart attack die within two years and they could pose that question to the hospital. They could compute the zero knowledge proof and give the answer back without ever sharing their data. Anna (45:18): And yet it be known to be accurate. Howard (45:20): Yeah, yeah, exactly. Like... Fredrik (45:24): It was very cool. Howard (45:24): ...You Would input the data with like their ID and like a, you know, their age and when they had a heart attack and maybe when they passed or if they're still living and you just run this type of a circuit construction and just say, here's the circuit guys, can you run it on your data? And the hospital will be like, okay, let's run it. And at the end, all you get is an answer. Yes or no. And you don't leak anything about your patient's data, but you can believe they really computed it. And so that's a very fascinating approach and a very valuable, Fredrik (45:51): So going back a little bit to DIZK you mentioned a it's built on Apache spark, I'm interested like on a conceptual level, is it basically that you're taking this huge circuit and breaking it up into smaller pieces of circuit and each of those being computed on one machine? Or is it something much more complicated? Howard (46:11): That's a really great question. So naturally I think what, what comes to mind is to say, let's take this operations and just, you know, move it onto a database and like move it onto like a cluster compute framework and just, you know, run it and see what happens. We actually did that and it was very, very slow. The reason for this is the way that we represent these circuits is in the form of matrices as polynomials. So your, your matrix has a bunch of vectors in them. And each vector represents a polynomial because we're going to billion sized circuits. We're talking about billion degree polynomials. These are massive things where even just doing like a multiplication of these two polynomials will result in like first off, you're storing these polynomials in terabyte sized arrays, but then you're trying to multiply these things. Howard (46:59): And there's no efficient ways as far as you know, just naively translating things over to do this. And so that's where in our paper, we implemented some distributed arithmetic scheme. So for example FFT, so fast forward transforms our common operation in these. We implemented a distributed version of that also multi scalar multiplication that that's basically like you multiply scalar in a base and you have a vector of these scalers and a vector of these bases, and you're just trying to element-wise, multiply them. We also implement a distributed scheme for that, and also for Lagrangian evaluations, we implemented a distributed scheme for that. So we, we really started from the building blocks on what a SNARK needs for computing it's proof and then worked our way up the ladder to the point where we actually tie those pieces together into a a protocol for computing, the zk-SNARK itself. And that's where we then have these tricks that we use. And these optimizations that we use in the distributed scheme to write a distributed algorithm that's actually efficient. And the paper basically goes through how those things work and the library implements that that example, Fredrik (48:09): Wow, sounds like a very large body over. Howard (48:14): It ended up being a very educational task, but also I, you know, banging my head on the computer many, many times. It, it turns out in compute frameworks and cluster compute frameworks. Basically if you don't spread your data, even the, in, in the, in the cluster, then you get what are called stragglers. And those are guys who would just lag behind everybody else. And the way that a framework works is that everyone has to kind of be in synchronicity with each other. So when everybody finishes this particular task, then you can move on to the next step. And because some guys are just taking so much longer than others, then everyone else is waiting on them. And so you'll be waiting like hours or days, depending on how inefficient that protocol becomes. And that's where we have to go through and say on the theory level, where can we massage this problem into something that's much more uniform and can be spread out across these machines in an even fashion. Fredrik (49:04): So trying to round out the episode here where we've hit time, for sure. It's so much in sync content in here, but maybe we could just talk a little bit about, like, what's bringing it back to sort of blockchain. What blockchains and protocols do you see using zero knowledge proofs? Anna (49:24): Zksnanrs specifically. Howard (49:25): So there, there's a few different projects that are really looking at zero knowledge proofs and specifically on things like zk-SNARKs and zk-STARKs. I would say that from a very, a enterprise standpoint, and from a company standpoint, there have been a lot of academics that have gone through and to help to the formation of many of these endeavors. Some out, for example, in Israel there's like QED-it, QED, dash it, and this is a company that is specifically looking at zk-SNARKs and their applicability in the enterprise landscape, as far as I understand. And there's also been a rise of a host of blockchain protocols that take advantage of zk-SNARKs and what they do. For example a Coda protocol from O(1) Labs is one, this is with you know, Izaak Meckler and, and folks there who basically have developed an approach to scale blockchain storage using zk-SNARKs. Howard (50:20): So they have this notion of a succinct blockchain where you have recursive proofs and you know, that's a whole nother rabbit hole to go down. It's a very interesting one. Yeah. Yeah. additionally you have you have projects like Origo, O R I G O. And what they're doing is basically providing a form of private transactions on their own blockchain. They they're integrating and actually developing things like a zk-SNARK circuit compiler. So you can take things like,ulike rust or whatnot that compile down to LLVM and basically,umodel that as a circuit representation optimize on that,uvery interesting techniques that are being investigated and developed currently. Uand also, you know, on the front, like STARKs, there's also like Alessandra's company StarkWare, um,that's a, it's a, it's a new company that is looking at implementing,uyou know, a libsnark library and also providing this as a potential service for the blockchain community. UI think that,uright now there's just a, an amalgam of different initiatives that are happening formal and informal in the ecosystem. And it's very clear that that's why, you know, there are these conversations on standardization on how do we scale these types of computations to these large sizes and where do we go next? Like, do we look at hardware or things like that? And I think it's an open question for everybody. And that's, that's why everyone's so excited about it. Anna (51:43): And maybe that leads us to the last question, which is about just the future. Like, I mean, you mentioned a couple of really interesting projects, but what's the future for you and this topic, like, what are you looking at going forward? Howard (51:55): My early work was very much being involved with libsnark and getting this into a very usable form for the community and the development community at large. You know, I've spent the past two years looking at how do we distribute a zk-SNARK again, how do we make that next iteration on it? I found that to be extremely fascinating and also quite insightful. This has really motivated me to look at it in a very decentralized landscape and say, where is zk-SNARKsuseful and the decentralized from. And I think that this gives rise to all sorts of new notions, like that notion of like a private smart contract that we briefly touched on earlier, and also looking at the use cases on like maybe generalizing Zcash into something that's more modular and more broad than just, you know, we make a payment. And these are areas that I think would be very fruitful for us for the blockchain space. And, you know, that's, that's basically where my head is at currently. Anna (52:46): Where can someone find out a little bit about DIZK? No. Hey, how do I say it? Fredrik (52:53): Yeah, yeah, Anna (52:55): Help me, help me. A, help me with the pronunciation and, B, help me find it. Howard (53:00): So the way we came up with the name DIZK was actually a play on words in the zero knowledge proof world. There's this notion of a non-interactive zero knowledge proof. NIZK, it can be pronounced NIZ(i)K or NIZK, uh different academics, pronounce it different ways, and we basically modified it to call it a distributed zero knowledge proof a DIZK or a DISK, um you know, from my conventions because of working with Alessandro and academics in that area, we, we all pronounced it NIZK. And so then I called this DIZK, but it's equally valid to say NIZK and DIZK, then Anna (53:36): It sounds pretty cool. That's all I got to say. Where, where, where would you find more info on this DIZK? Howard (53:44): Yeah, so you can learn more and also play with this library and read about the paper at a few different things. So first is DIZK.org, DIZK.org. This will then direct you to the GitHub library that we have released it's open source and it with an MIT license. Additionally, if you search for eprint DIZK you will find the paper there at the actual link is uh, https://eprint.iacr.org/2018/691. And this will then link you to the paper that we published which is the, the, the paper on DIZK. I think we'll also put this in the show notes. Fredrik (54:22): We'll put all the links we can find in the show notes. Anna (54:25): Perfect. Cool, Howard, thank you so much for taking us on this journey with you. This is amazing. Howard (54:32): My pleasure. And again, thank you for having me. This was fun. Anna (54:34): It's totally fun. And I really, I would love to have you come back in in, I don't know, maybe three months, six months, some sometime soon ish, to continue as we do this zero knowledge series. Seriously, I think we, I mean, I've learned so much in this episode. I'm super, super excited to share it with everybody. Howard (54:53): Yeah. I think that would be really fun. Like, I'm happy to do this again, and we can then do a follow on and see like in the interim, all the people that come on and like, you know, what they bring up and like, what are open questions and like, you know, this face moves. So so I mean, despite as slow as it is, it actually moves quite quickly in many regards. Like, just because, you know, once you get this out into industry and have them run with it, then all of these different ideas come out and like, I couldn't have come up with any of these, you know, like two, three years ago. So it's, it's really nice. Yeah. I think a fall one would be really fun. Fredrik (55:22): I think it's a super interesting topic. It's certainly one of my, like my favorite things in the whole blockchain industry and like in, the work and stuff I deal with normally like this is, this is a really cool stuff. Anna (55:34): So I would say thank you and to our listeners. Thanks for listening. Fredrik (55:39): Thanks for listening.