Anna Rose (00:00):
In last week's episode, Kobi and I talked about zk pod.ai, the voice ai trained on transcripts from our show, and using both my and Kobi's voice to share insights and answer questions about zk topics. If you haven't yet checked it out, please do head over to zkpod.ai. Now, this is a bonus episode following up on that topic. So we want to introduce a collaboration that we're doing with Daniel Kang from UIUC. He's been on the show before, and he's someone who's working in the intersection of ZK and ai. With our project, we aim to use zk-SNARKs to prove that some piece of audio is from us, that is from the original podcast audio. And to distinguish this from the zkpod.ai generated audio, this is similar to his previous work on image providence now applied to audio, and it's something that we think could be relevant for content creators as well as just being a cool demonstration of how ZK can be used in combination with or to differentiate us from ai. Now here is my interview at Daniel Kung all about our attested audio experiment. So I'm really happy to have Daniel Kung, professor of computer science at UIUC back on the show. Welcome back, Daniel.

Daniel Kang (01:08):
Thank you. Thank you for having me. It's a pleasure to be here.

Anna Rose (01:10):
This comes right after this interview I just did with Kobi all about zkpod.ai. We've been doing a project together around the zkpod.ai project, and I wanted to bring you on to be able to talk a bit more about that.

Daniel Kang (01:21):
Yeah, I'm super excited to be on to chat about the project we've been working on, which you can almost think of as the opposite or the antagonist to zkpod.ai, so it'd be an interesting mix.

Anna Rose (01:30):
Yeah. When you were last on this show, we talked a lot about the work you had done around image providence. What we want to talk about today is actually the work we're doing together around zkpod.ai, which is about audio providence. Share a little bit about what you've been working on on that front.

Daniel Kang (01:46):
Yeah, I've been working to, so this actually dives directly into the wider range of use cases that I just mentioned and I've been working closely with you, Anna, to produce a demonstration which will be hopefully live when this podcast airs. The demonstration we're doing is that in the last podcast we mentioned that if we have an attested source of images in particular camera, you can do things like prove private edits. Now, similarly for audio, if you have an attested source of audio, for example a microphone, you can do something similar. And so this way you can prove that you produce a specific clip of audio using a real microphone. And furthermore, are there ways to link that specific clip of audio to a specific microphone that you own so others can't spoof your hardware.

Anna Rose (02:30):
But, those microphones don't exist yet. The attested microphones.

Daniel Kang (02:34):
Yeah. Unfortunately, these microphones don't exist, and I think that's partially because of consumer demand.

Anna Rose (02:38):
Yeah.

Daniel Kang (02:38):
And so one of the things that we're trying to do with our project is to show that there are really cool applications. And what we've done is that we've released an example of, of simulating the attested microphone and combining this with a zero knowledge proof as a zk-SNARK that three sources of audio you (Anna), me and Kobi were combined, honestly.

Anna Rose (03:00):
And the way that we did that was to, instead of having the attested microphone, we'll have signed that initial audio. It's sort of emulating it. But yeah, this is definitely meant as a demonstration to show kind of like the flow through.

Daniel Kang (03:13):
That's absolutely correct. And for the time being we're using our Ethereum private keys to emulate the attested sensor.

Anna Rose (03:20):
Yeah, totally. Is there some sort of alteration on these clips? Is there anything that like, or could there be sort of alterations, edits or, you know, audio compression, something like that done and still maintain that ZKP through it?

Daniel Kang (03:34):
Yeh, so there are many circumstances where you don't want to release the entire original audio, for example, let's say that I accidentally said some private information, say my phone number and I want to cut it out of the clip. You can do these kinds of edits using these zk-SNARKs and still prove that they came from the original attested microphone. And so, for example, I'm not sure how much I should reveal about your podcast, Anna, but sometimes there are parts that are clipped out due to you know,

Anna Rose (04:02):
Flubs! Yes, definitely. Even with those edits, I guess we could still, using this system, potentially prove that the original audio files came from us.

Daniel Kang (04:13):
That's correct. That's the amazing thing about these zero knowledge proofs and one of the reasons I'm so excited about this technology.

Anna Rose (04:18):
Yeah. I think by the time we air this, we will have this work and we'll be adding this in the show notes if people want to find out more. But maybe we can also talk a little bit about the challenges and the limitations of this. As we said, this is an experiment and a demonstration. But yeah, what's still challenging with some system like this?

Daniel Kang (04:34):
Yeah. So the, the first and obvious challenge that we mentioned is that we, we currently don't have attested microphones, so this is not something that we can deploy in the wild today, although I hope that this will increase in this prevalence over the next few years.

Anna Rose (04:49):
Cool.

Daniel Kang (04:49):
Another common concern that's brought up when with attested sensors broadly speaking, is the ability to basically play a recording whether this be say taking a picture of a picture or in the case of a microphone, playing an audio clip

Anna Rose (05:04):
And then recording it within attested microphone.

Daniel Kang (05:07):
Yeah, that's right. So these are very valid concerns, and there's a number of ways that we can go about addressing these. It's actually easier to explain for images. One thing you can do for cameras is to also include a depth sensor, which actually many phones do today and include the depth information with the pixels. And so this way you can tell that you didn't take an image of something that was flat.

Anna Rose (05:29):
Hmm.

Daniel Kang (05:30):
Now for audio, there's there's no such a specific equivalent but you could imagine combining the attested microphone with something like an attested video camera and then proving using say ZK ML, that the lip movement matches up with the audio that was produced and also combined this with a dev center to prevent recording this

Anna Rose (05:49):
Works as long as there's no deep fake of the video as well, I guess.

Daniel Kang (05:55):
Yeah, that's right. So if you just, if you just have the camera, this only works if you don't have a deep fake of the video.

Anna Rose (06:00):
Okay.

Daniel Kang (06:01):
But if you have the depth sensor as well.

Anna Rose (06:04):
Yeh

Daniel Kang (06:04):
This prevents things like a deep fake of a video which is played on a 2D screen.

Anna Rose (06:08):
Are there any other limitations? Like why can't we today already use this with like a full episode, say?

Daniel Kang (06:13):
Yeah. The other big limitation is the computational complexity in producing these proofs. Unfortunately, they're still quite expensive but due to the amazing work of, well, a lot of people in this space you know, ranging from academia to industry to independent researchers, the cost of proving is going down dramatically. And I expect that to be, to change in the next 6-12 months as well. So hopefully a lot of these challenges will be overcome due to advances in hardware and also the zk-SNARK proving systems.

Anna Rose (06:41):
Totally. Yeah. Just to mention, for this particular experiment, we used a clip that I think is 30 seconds long. And how long did that take?

Daniel Kang (06:49):
I don't have the exact number off the top of my head. I think it takes around 20 minutes to prove for 30 seconds. So this is about a 40x blow up in, in time. But as I mentioned that there's lots of work both ranging all the way from the proving systems to the hardware that'll bring this down. And so even if you just apply work that's being developed today, that cost will go down by maybe about 5-10x.

Anna Rose (07:14):
Nice. Thanks Daniel. Thanks for sharing this with us and for coming back on the show.

Daniel Kang (07:18):
Yeah. Thank you for having me.

Anna Rose (07:21):
Now that wraps up our bonus episode all about the attested audio experiment. If you want to keep tabs on the project, be sure to follow zkpod.ai on Twitter, and we should be posting any updates over there. So yeah, thanks for listening.