Paul:
All right. Hello, and welcome to PodRocket. I'm your host today, Paul. And with me we have guest Alex Bourget. Welcome, Alex.

Alex Bourget:
Welcome. Thanks. Thanks for having me.

Paul:
Yeah, we're super excited to have you on because you and your team over at StreamingFast are working on some really cool technology in the blockchain space. And this is a great opportunity to not only introduce some problems in the blockchain space currently to these viewers that we have, they're interested in the area, but delve into this next generation of solutions that are coming and being pioneered by your team. So yeah, welcome. And let's start with a little bit about you, Alex. So I know from some GRTiQ Podcast, which for anyone listening, is podcast put forth by The Graph. We had guest, Nader Dabit, on a few episodes ago to talk.

Alex Bourget:
Right.

Paul:
Yeah, so in one of your podcasts, you mentioned you're a classical musician, right?

Alex Bourget:
That's what I studied, right? But I never became an official classical pianist. That's hard. And I wanted to have a family and pay for bills.

Paul:
That's important. Yeah. It caught my eye because I'm into piano too. It's really fun.

Alex Bourget:
Right, right, right.

Paul:
[crosstalk 00:01:31] at the same time, so you can get away from the world.

Alex Bourget:
But yeah, official training, right? Never studied computer science.

Paul:
So how'd you get into it then? Can you give us like-

Alex Bourget:
Even into computer science?

Paul:
Yeah.

Alex Bourget:
I programmed since I was young. My father had these laptop computers, portable computers that big and it was around, so I started programming when I was young, reading books that had basic ... You know those basic go to line with line numbers, basic programs on drawings of dragons and you'd code a game? So I started quite early, I don't know 10 years old then. And then I programmed in mIRC scripting, wrote bots and a lot of crazy things. Actually had a first job, was using mIRC as a back end to scrape the web and then it led on to PHP Python and I just had a software engineering career from that-

Paul:
From the still motion there, right? So how do you find yourself becoming the CTO of StreamingFast from writing PHP scripts?

Alex Bourget:
PHP 3. Well, I was heavily into the open source world. I had little Linux installed on two floppy disks and then did some crash, but I built those computers, so I was really interested in open source. So I've got a first job was a consultant, open source software consultant for five years. I learned all about Linux and all the ecosystem and being consultant who were throwing all as experts in all sorts of situations. I learned a lot there. Then wanted to go into the startup world. Was attracted by those developing products as comparison to being a consultant, the dynamics.

Alex Bourget:
I thought the best would be at products company for different reason. We can go into or not. So I wanted to join, joined a startup and then eventually led to the creation of my own startup, a Bitcoin payment processor. That's in 2013. And then continued on tracking ... I read a white paper at that point, the Bitcoin white paper in 2013. I think, "Wow, this is crazy. This could work and this is in my realm. This is programming. Money." So I was fascinated. And then I continued on my career. We closed that first business because there was not a lot of Bitcoin transaction volume by then for payment processor and then we continued on.

Alex Bourget:
I became data scientists in a way maybe with my current partner, Marc-Antoine there at StreamingFast. We led the data science team. His company was bought by Intel, the Intel. And I was with him leading the data science team over there. So large scale systems, a lot of fun stuff. And then we thought, "Let's go back into the blockchain world because there's a lot of data problems to solve," and we want it to build some data products and that's what we're doing today. And that's what's the genesis of what became StreamingFast, I guess.

Paul:
Heck, yeah. Those dots are very clear and they connect, so that's cool to hear that story. So for any of our audience who missed our podcasts with Nader, The Graph is this technology that's been growing over the past year and really started a few years ago to tackle the problem of big data in blockchain and needing to index and address that data. So StreamingFast is working with The Graph. Not only are you guys tackling your own flavor of that problem, but you got a grant from The Graph Foundation-

Alex Bourget:
Right, right, right.

Paul:
So do you think you can start to in introduce to the audience a little bit about, again, what is the problem The Graph is solving and how your company is stepping into the playing room?

Alex Bourget:
Right. So, let me give you a little glimpse of what is ... Maybe your audience doesn't know exactly what the database of blockchain is because it's in that category, right? Blockchains, they're databases and they're particular and they have particular needs, but that's why like The Graph exists to index the data that's in a blockchain, but why does it need to be indexed in the first place? So maybe it's useful that I give an overview there.

Paul:
Yeah, please. That'd be great.

Alex Bourget:
A blockchain is a database if we can pair it to MySQL, but it's master-to-master replicated, right? And you have one node, you have another node, they need to sync, but no one is the ultimate reference. There's a consensus algorithm that makes sure that the data evolve and everyone agrees, but it's a master-to-master replication and there's no like ... If you think of Postgres or MySQL, there's no replication mechanism, right? MySQL has a bin log and then it can send all the changes to the database to some slave systems and then you can scale out your read queries.

Alex Bourget:
Well, that doesn't exist. Well, that didn't exist and that's a little bit what we're trying to do here in the blockchain space because everyone is so after the fact that they want to scale the blockchain, meaning they want to increase the write throughput, but the read throughput is left a little bit on its own. And usually blockchains, they want to be so fast at writing that they keep very minimal indexes, very minimal ways to read the data. You don't have secondary indexes and columns and whatnot in a blockchain. So you always query through, so most chains today, a JSON-RPC endpoint. And it's very cumbersome, it's very limited.

Alex Bourget:
And you query point per point, right? You query for a block. You query for a transaction. And each time it's a round trip. Or you query for a data point, you query for inside a contract, "Give me the balance of that particular user." There's no aggregation. You don't list the users. Actually on Ethereum, you cannot list the users. The data does not represent a list of users in the memory because it's made for quickly applying new transactions on top. That's the only goal. And also there's another particular thing with blockchains is that they're an eventual consistent database. What does that mean? It means that you can have, let's say, a block which is a summary of inserts and update statements, okay?

Alex Bourget:
You have MySQL, you have a series of updates and inserts and they're applied, but they might be unapplied. They might be not true anymore in half a second, but it's still useful to know right now because maybe there's a transaction for 5 million bucks and they convey the intent of someone, your competitor there, that they wanting to trade 5 million bucks or something like that. So you wanted to know sooner than later because there's always latency involved. That's always a tradeoff between consistency, right? If you have something that's eventually consistent, you will be sure about it later.

Alex Bourget:
And now you can be aware of it sooner but not so sure. So it's useful to know sometimes when things are sure. You don't want to send your Lamborghini, the five millions or not, surely committed that insert statement for. The payment is not committed, therefore, we call that final or confirmed in the blockchain space. So we wanted to provide systems that give you that finesse and lets you stream, the fact that it's real time, but allow you to navigate backwards and undo things that are not true anymore. And I guess, is that useful? Is that-

Paul:
That's useful. If I could take a moment just to [crosstalk 00:09:09]. So in essence, we're saying that blockchain nodes, they have a database. That's what a blockchain is. And we can think of the replication between nodes because the whole security point of the blockchain, each node has an exact carbon copy of the data on there. That's what makes the data by block hash verifiable because any person can go back and verify the integrity of the chain. The problem is this is scaled out for security, but it's really difficult to employ well-known scaling mechanisms to read that data. And otherwise, we're pin holed through the classical remote procedure call RPC interface, which as you noted, is very limited. You can't send, give me the logs for this list of transactions. You can only get one.

Paul:
It's such a problem that there are some RPC providers are even selling a service to hydrate the data themselves as you get it and it's like ... I guess that's a half-baked solution, but it really doesn't tackle the core problem which is there needs to be some framework by which we can scale these reads while maintaining data integrity between our notes, right?

Alex Bourget:
Let me tell you another funny, horrible story about what people call when they're getting because it's an eventually consistent database, well, sometimes you will send a query, a JSON-RPC query and say, "Give me this transaction," or, "Give me this balance for that user," and it's going to respond and that's fine. A moment after you send maybe the request for another address, okay? You're looking for two addresses here, but now you're hitting a different node, right? Because there's no persistent connection there, you're hitting a different RPC node. Maybe probably the guy has load balanced his nodes, but they're not in sync at the same time.

Alex Bourget:
And truly, within the JSON-RPC protocol, there's never a way to make sure you're querying at a given exact place. That's very rare in protocols. So you're connecting to one, getting a response and then you're connecting to another one. It might not even be the same block number. Maybe this value has changed 75 times between the two values, but if you want to have what were the values of those two accounts at the same time, you're a little bit out of luck which is absurd, right? When you want to query a database like MySQL, often you'll say, "I want all those balances, but give them consistently transactional wise."

Alex Bourget:
Now, we're so far from transactionality because it might even happen that the blockchain over there has the same block number, right? They're both at the same, we'll call it height. They have received the same number of blocks, but it happens that this node has Block Version A and this other node has Block Version B. They're not the same block. Eventually, we'll purge one of them, we'll converge and we'll use either Block A or B, but you can't know that. So that's really tricky. You're querying real-time systems and you're not sure, so all of the burden of synchronizing and knowing where you truly are is all put on the consumer like the guy sending the JSON-RPC.

Alex Bourget:
And the end goal is like that would be the browser, but it's crazy because the moment you augment the number of queries you're after, the greater the complexity of reconciling and making sure all of that is legit. So the other option is just add latency. And then if you query things that are 10 to 30 seconds late, then you can have better assurance, but even then from time to time, right? So it's crazy to query but at the same time, it's the only thing that makes sense. These little blockchain database, they have no time for updating indexes. They have no time to update things to make it simpler for you to query because it's not their role. Their role is to process mutations, transaction changes of value exchange of money or whatever as fast as possible.

Alex Bourget:
And if they are managing, reading indexes, they're slowing down. No one wants that. No one wants to slow the blockchain down. It's already slow, right?

Paul:
It's a separate problem from the [crosstalk 00:13:25].

Alex Bourget:
It's a separate problem. So that's what we've been tackling.

Paul:
So The Graph kind of takes the approach of saying, "Well, all right, we'll let the blockchains be as fast as possible and we're going to create a network on top of that of people who want to take that extra time. They can go do this indexing and they can do it with application driven from what we call subgraphs. And then they're going to take that data and share it by consumers who want to query that.

Alex Bourget:
Yeah. And one important thing is that the community, like the blockchain itself has a lot of data structures that are freeform. Like Ethereum for example, it has a key value. It's a RocksDB behind. There's a small key value store behind for each contracts. You have that namespace. The keys are totally opaque, right? They're hashes of hashes of hashes. They're very difficult. They're not easily read, and the values, they're always UINT256, right? 32 bytes worth of data. That's the structure within the blockchain.

Alex Bourget:
Why was I saying that is that it has not a lot of meaning in and of itself. Someone like the guy who's writing whatever, an exchange on the blockchain like Uniswap, something like that, they know how to interpret that data for Uniswap on Ethereum, so they can provide within a subgraph on The Graph, the intelligence that they know because they created the contract. They know the schema of the data and the weight and they can unpack it and serve it into a more intelligent and refined way for their users. So The Graph allows everyone writing to that common database, let's say Ethereum or any blockchain really, which I like to call the social database where everyone can write.

Alex Bourget:
It's crazy, but you never let everyone write to your database. But in the blockchain world, that's what it is, right? You write and go ahead, but it'll be useful you explain the schema to people, give them also views and slices of data and unpack.

Paul:
It's essentially little treasure maps about, "Hey, for this smart contract, what am I going to look for? What little code calls should I track for you?"

Alex Bourget:
Exactly, exactly. Well said, treasure maps. So that's what we have in The Graph.

Paul:
So The Graph is really cool, but it still doesn't satisfy everything. And some of the problems that might come up from the beginning, have it being Postgres based, are, what do we do when we get to the land of aggregations of saying, "I want statistical data"? What do we do when we start wanting to scale and make fault-tolerant indexing services? And this starts to get into higher throughput and tackling bigger datasets with StreamingFast. I know this is just the tip of the iceberg, but if you-

Alex Bourget:
Right, right, right. There's a lot of sitting there. Well, The Graph did one thing very great, is that they made an end-to-end solution, right? There's a blockchain extraction layer, then you provide some web assembly or assembly script code that takes that blockchain data, interprets it, transforms it into a beautiful set of columns and values, floating point and all that. And then it writes it to Postgres and allows you, not only that, to query it using a graph QL interface. That's a nice simple end-to-end solution and it got there. Now, obviously, it gives some ... Let's say it informs the way the schema is written. It's written in a very particular way, right? Because it allows you to do time travel queries, "What was this data the previous block and all that?"

Alex Bourget:
So it imposes structure and also it doesn't do all the things at query time, because the database, Postgres is laid out in a particular way and you don't have a lot of flexibility because you're saving really rows and entities and then you're taking the language that is exposed to you which doesn't include massive aggregations and things like that really. So that's cool. And let's say, maybe I can tell the story of how we got near to The Graph. We looked at that model and we were building some streaming solutions for many years.

Paul:
Just to clarify, streaming raw block data from the node?

Alex Bourget:
Well, yes, yes. Our goal has always been to take that blockchain data, extract it into something more streaming than pulling. I hate pulling. Every one of us, we hate pulling. So we want to transform those blockchain nodes into a streaming solution and we built things. Then what do you do with data? You index it? What's the purpose of a database if not to read it, right? Well, we made sense out of it. So we created systems to search index and all that. So similar to The Graph in a way, right? And at some point, we looked at The Graph, that end-to-end solution and we applied our technology, our streaming technology to one of these subgraphs and we made it 800 times faster.

Paul:
Oh, my-

Alex Bourget:
Yeah, it was crazy and then I think got on the radar. We knew The Graph people a little bit, but I think that was a little ... Yeah, so that was really nice and it really proved some facts about the power of the streaming solution and then we joined The Graph which is a funny story too because it's ... I have never heard about that, like a company, not acquired by, but joined two or adjunct, I don't know how to say that. We're now a core dev team of The Graph aside the other core dev teams and we've received a large grant that allows us to continue doing that for full time for the next years. So we've been aligned, not acquired. I don't know. So Web 3 three acquisition is someone would say, right? It's was fascinating going through that.

Paul:
Decentralized version of acquisition.

Alex Bourget:
Yeah, exactly.

Paul:
Funny. So what I'm curious is can you share with us what that subgraph was and what type of data was it serving? Was it something giant like the Sushi one or-

Alex Bourget:
The funny thing is that, yeah, why we attack that, because we saw an opportunity. The subgraph that was slow was the PancakeSwap one, was running on the Binance Smart Chain. The Binance Smart Chain happened. So the speed of a blockchain is, we'll call it block time, right? You have a thing that can settle transactions at each 12 seconds like Ethereum does, something like that. You have some chains like Solana, 0.4 seconds. So let's say you're in MySQL, you can commit things each 0.4 second. That's how the consensus work before everyone has agreed upon. 0.4 second hotspots for Solana, 12 seconds for Ethereum. Bitcoin, it's 10 minutes, has different properties there, but BSC is three seconds.

Alex Bourget:
And the Binance Smart Chain, the one where PancakeSwap is, which was really going up like a trending a lot of volume, a lot of trading happening on that, a decentralized exchange, well, there was just ... And also the chain was squashed to push out more and more transaction each block. So the throughput was, I don't know, like 20 times more than what you see on Ethereum and much faster also. And they started having hundreds of thousands of trading pairs and that particular setup on a traditional and the normal the base Graph node, the technology by The Graph there, it was grinding to a halt. It would take four seconds to process a three-second block.

Alex Bourget:
So there was just never going to sync and we were looking at it, it was thinking before two months and no end in sight. So that's the one we took. We transpiled with our technology and infuse it by streaming and we always, always also strive to make things parallelizable. That's something really important I learned from data science. If you don't make your things parallelizable, if you don't try to leverage the horizontal scaling capabilities of your computing stuff like cloud instance or whatever, well, it's always going to be longer and longer. So we made that thing parallelizable, so we were able to shrink it down to six hours, but with no end in sight. So maybe they're the ratio. I say 800 times really is when you divide by eternity, you get some big numbers, but ...

Alex Bourget:
So yeah, by making it parallelized, so that's always been our design principle. Put data from the blockchain into files, maybe that's ... I'm introducing a little bit of a Firehose here, right? So put extract data from the blockchains, put them into files because files can scale horizontally, that's what's used in the industry, right? You launch a bunch of machine Amazon S3 or Google Storage or things like that, that is known to scale. So you can run a lot of machines in parallel and they all crunch their own segments and provide the results. That was our approach, right? And that made sense, provided that we extracted the data and turn it into file. So that was one of the big difference, I'd say.

Paul:
So to explain the benefit to the operator here that you guys brought with speeding up the subgraph, for any of the viewers that are not so familiar with The Graph, the task of the indexer for one of these subgraphs is to look at any number of entities being emitted by that blockchain, write them down, index and get ready to serve it. So for something like PancakeSwap, the more pairs there are, the more amount of things you need to pay attention to as an indexer. And then we've highlighted the short block production time meant that more of those pairs just got admitted faster and faster than you got everyone on Ethereum. So from a typical person indexing this data, sitting and running The Graph technology and waiting for BSC blocks to come in and writing down their indexes, how does that change? How does that little ceremony of events change when they use your technology?

Alex Bourget:
So the original Graph implementation was turning around each block and doing some JSON-RPC requests. What does it have as an option, it's just that's what they do, right? So they would go and query like crazy and just saturate one node and then two nodes and five nodes to outtake, track some data that is in a wrong format, right? It's in that RocksDB because you need to extract it, and by querying these transaction by ID or block per block, while you need to do a lot of roundtrips with all the funkiness that I explained earlier on, so you need to account for that and you need to send requests. That's annoying.

Alex Bourget:
Whereas the technology we brought in was a flipside. It's pushing data to you rather than you pulling with requests. It's pushing more data, all of the data that each block produces, but in the end, when you get it in memory, it's much, much faster to filter and do whatever you want and it's ready to be processed. And you'll get pushed the next block after, so there's no network roundtrip really. I like the image of tent poles. Tent poles, you put them, you stitch them together and they click and maybe you have that two inches of things entering one another, so you have the large tent pole. And you push on one side and, "Whoop," and it pushes directly on the other side, right?

Alex Bourget:
Our streaming techniques, we like to see it as it pushes from the blockchain directly into your app, even though there's a few legs of network hops, right? It's pushed from end to end.

Paul:
[inaudible 00:25:01] the file? The whole concept of files, that's the most basic element in a Unix system. That's like, if you boil it down to a file, the options become limitless. You can even throw a simple nginx thing in front of that.

Alex Bourget:
Yeah, caching peanuts.

Paul:
Right, it's just the possibilities become endless of how you might want to scale or low bounce it out.

Alex Bourget:
Maybe it's not so clear because I'm talking about streaming and we're talking about files. So that's the thing, the blending of these two things. Because we have a source there, that is deterministic data. So blockchain, you run it again from the beginning of history. It will produce always the same data. In a sense, it's a logs-based architecture, right? Similar to Kafka, but it has the property that you can always reprocess it from the beginning and it produces the exact same deterministic output. So that, if you can put your handle on that data extracted once you can put into files, but also you can stream it real time.

Alex Bourget:
And that's what the Firehose does. It extracts it real time, puts it on disk, and the next time you need it anywhere in history, you start from the files, you load them, then you can stream them at high speeds because you're really sourcing from flat files easy. And then you can, "Shook," switch to live mode because blocks have that general clock block numbers and hashes. You can easily switch from files into the live stream. And so from a consumer's perspective, it's just one thing, one powerful streaming engine backed by file that you can cache on which you can do parallelize operations on.

Paul:
And if any of the listeners are interested to read more about the Firehose, Alex and their team actually put out a really great RFC with pictures in it, right? And we all have pictures. So we can put that in the link of the podcast if you're interested to read that for yourself and check it out. So yeah, to summarize, it's like the Firehose is providing a framework to stream block data that is a push interface instead of pulling, which also commits it to disk. And it allows you to go between the live mode of being stream blocks and reading flattened data from that blockchain at a very high speed because it's flattened. So on the topic of flatten, if for anybody that's not into data, can you explain a little bit about what that means? What is a flat file in the context of this domain?

Alex Bourget:
A file is a file. You'll know the LS on your system, you'll know what a file is. But what is particular about that data in those files, I think it's very important, or particular to our systems, is that we extract from the blockchain nodes more data than is actually available through all of these JSON-RPC combined, right? So that's our approach. We saw that, the JSON-RPC endpoints do not provide all the data that is needed, so you can get a transaction. But in the transaction, you don't have a clear view of those nested calls like function calls that call one another and all that. And that's funky in a blockchain because you're calling someone else's code and then you're calling back.

Alex Bourget:
You can have a call tree of different contracts written by different people, think of stored procedures by different authors and then yours calls theirs and whatever and you have a lot of useful contexts. But normally, you get the JSON-RPC get us only events that are flattened and don't give you that context, but in many situations, it's really useful. So we thought we're going to go to the source, we're going to modify the blockchain nodes, we're going to pry open the MySQL source code, right? How often do we do that in life, open the databases source code? So we did that for all the chains, so we can literally put printf statements in there, okay?

Alex Bourget:
Let's imagine printf, right? To output the data as it's being produced, as it's being mutating, as it's mutating its storage, think if you were in the middle of that insert statement in MySQL, well, you could read the data that was there before and/or an update statement. You could output what was there before and what's the new value and output all these deltas. And that's also been really useful for us. We always wanted to have the deltas out, so we can know what was before, what is after for any change that occurred to the database. You don't have that in JSON-RPC things. You don't have that. You don't have that in context.

Alex Bourget:
So we went there and extract so much more rich data contextual and contextual data and that's what we put in those files. So it's so much richer and then you can start expanding on that and interpreting the data, because often, it's just series of events. They all look like series of 32 bytes. If you look at that, you vomit, right? Coming from a MySQL land where everything is typed and all that. So with that and some intelligence of the contrast, you can now, "Tsk, tsk, tsk," extract more meaning and that's what we put in files and that's the same thing we put in the live stream.

Alex Bourget:
Just to be precise there, all protobuf models of the blockchains data, the most complete definition of the data in those blockchains, I haven't seen anything more exhaustive than the protobufs that we output for each of the protocols we support.

Paul:
And the way that you do this is by taking the node as you said, take a big crowbar, you wrench guy open, you say, "We're going to put in little spice in here to grab information and just barf it out."

Alex Bourget:
Exactly. And fast as possible. The design principle, least impact on is right cycle, right? Because you want to have the max performance, output it as fast as possible. If it's hex, let it be hex. If it's protobuf, let it be protobuf. The fastest to get it out because then you get into that read land. You have tremendous power outside of the blockchain node. To organize the data, rethink it, reshuffle it and decode it from hex to whatever or protobuf or whatever, avro if you want and then push it out for read consumption and then you can imagine the limitless, no read system. You send that out to a Postgres that has an index or key value store to whatever other systems and now you can shape it to make it ready for query, but in all sorts of dimensions that you can never ask the blockchain node to do. There's so many ways to pivot data.

Paul:
And just to show, I guess, in the case of example how powerful this concept is, this can make data that would otherwise be ludicrous to say, "Oh, we're going to keep track of everything such as Solana."

Alex Bourget:
Yeah.

Paul:
Now we can think about it being accessible to people everywhere to taking that data. And just the difference in scale, you said BSC does a block every three seconds. Ethereum which we consider decently fast by modern standards is one every 12 seconds. Solana is one every 400 milliseconds.

Alex Bourget:
Right.

Paul:
So that whole new playing field and there's a lot of internal transaction noise. If anybody here is into Solana, there's a lot of stuff going on inside. So taking that data and the fact that the Firehose can do it really goes to show how effective the solution is.

Alex Bourget:
Right. We built our technology on fast chain. We started on the EOSIO technology, right? And that was already a thing that could do 4,000 transactions per second with a block time of 500 milliseconds, so quite fast and a rich, rich introspectable data. And that's what we built and scaled. So we will arrive even to Solana has a lot of has a lot of throughput too, but Ethereum was really a small thing aside, on slow chains like that. So it's ready. It was designed for large scale systems. When you design large scales, think Intel. We're developing systems that were shipping on 200 million machines per year.

Paul:
Two, is that correct?

Alex Bourget:
[inaudible 00:32:49] software and we would do some analysis on any behavioral, "When did people open the app and whatnot?" So a lot of events stream to behavioral analysis. In that case, your throughput can grow double. Your machines, the throughput always grows and so that's what, I would say that Web 2 systems are designed to do. In a blockchain, your throughput is limited by the blockchain, right? It's deterministically from that point on. It scales with, so it's actually not so much data by today's standards in big data world, but we had a system with particularities like the thing that, for example, navigating reorganizations like eventual consistency, flushing and killing of some blocks that are not true anymore.

Alex Bourget:
And that's another I think crux of the features in Firehose that it helps the end user navigate reorders in a cursored way. We provide a cursor, so you can disconnect, come back. We'll always continue linearly linearize the wonkiness of the eventual consistency and give that guarantee downstream which is totally non-existing with other solutions. And that simplifies the work downstream, right? For anyone consuming blockchain data. Oh, our customers usually said, "Oh, I was able to cut down 90% of my code reading the blockchain because of that," and that's what I want to hear.

Paul:
Yeah. Firefighting, you don't have to do downstream because it's done by the Firehose. Oh, wow.

Alex Bourget:
Exactly.

Paul:
What do you know, it puts out the fires. Did the name come from the fire was a kinesis Firehose kind of mindset?

Alex Bourget:
Well, it's a Firehose visually, just the pipe and then it's all unfiltered, right? The original close to the node, it has the pipe with all the things in blocks, all the data. I'm thinking the Twitter Firehose. At some point, they have that product. It would just send all the streams of all the tweets, so you could pay for that. I don't remember. So yeah, the image is fitting, I think.

Paul:
So we discussed the Firehose and StreamingFast in the context of The Graph and how it's going to empower new types of data indexing for it to be done on a larger scale to have it more fault-tolerant, scalable horizontally. What other like applications or areas and domains do you think that this sort of paradigm shift of how you index and taking data could maybe foster other application growth? Have you put any mind to that or have-

Alex Bourget:
Oh, man. So for the last year, we've been thinking of how to parallelize subgraphs because we had made that proof of concept that we can turn it into something 800 times faster. So that's been in the back of our minds all the time. We were brought in to bring performance to The Graph. And the way we had done the first one, that PancakeSwap, the project was called Sparkle, we took exactly the code written in assembly script from that group over there, PancakeSwap. We transpiled it and we fitted some layers of stages of execution to make it parallelizable, but really wasn't designed for such a parallelism. It's designed for linear processing.

Alex Bourget:
And a subgraph today, you need to start from the beginning of where you want to track transactions and then it's going to go linearly and you have to process 1 billion, 3-5 billion transactions a day before you get to real time and what's up right now in the chain. That's cumbersome. So we always have that in mind. So right now, this is pretty edgy, I hadn't announced ... It's not a fully formed ideas and design. We have prototypes. I think it can have a huge impact in The Graph ecosystem, but also elsewhere, is we're taking that streaming aspect and giving it a possibility to have small ...

Alex Bourget:
I don't know if you guys know Fluvio. They have smart models, just the streaming engine, right? It's a streaming engine takes some things you write some bit of code there and bytes in, bytes out. You can transform, it can do stuff, filter, match your storage and your events. So the technology that we're doing, I'll call substrings as a code name, but takes the Firehose, allows you to provide those small smart modules similar to what we have here, Wasm modules. And then we can start composing and streams fashion. So we're far further about ... The Postgres aspect of subgraphs where Postgres is a given, right? It's part of the deal. Here, we're allowing people to go and compose streams and feed ...

Alex Bourget:
And also because we have a clock, because a blockchain has a clock, we can actually synchronize multiple streams together to fuse data and create incredibly composable streams of data that are really public, that leverage the intelligence from the Uniswap team to make sense out of the blockchain data and give it a stream of prices and then another stream of trades and another stream of, I don't know what, right? But they're all small nuggets of intelligence from Uniswap. And then what? Like SushiSwap can do the same thing, Uniswap. They can all provide those small ... And someone can go then and compose, let's say, an average price across all these indexes, right?

Alex Bourget:
Indexes mean the decentralized exchange here, so the price from that guy, that guy, and you can start seeing something really powerful that can end up in a database, but probably you're not forced into putting it in Postgres, you can put it anywhere. And it still has all those properties of reorganization dealing with eventually consistency because all of that is eventually consistent as your source is eventually consistent. And it has all these same crazy properties of parallelism that you can have a small, a mapper. It takes a block, outputs a series of pairs or trades. That's totally parallelizable, so we can leverage the fact that his files base.

Alex Bourget:
Like we do in the industry, take files, spit out files, right? Transform, spit out and so yeah. I'm really excited about this. It's on the, the edge of the design space, but we have prototypes and I think it's going to have a huge impact.

Paul:
Imagine being able to say like, "Hey, I want to, I want to stream this complicated data and there's somebody out there that has a substream that you can make some arrangement with to pull that data from, that would be wild because-

Alex Bourget:
Exactly. And you know what? Let me pitch the further story. What I'm imagining there is that you have a pool of resources and here we're getting close to BigQuery. For those who know, Big Query is a large scale distributed system. You send a query and it's going to do your things in parallel on thousands of servers and, "Chk, chk, chk," cobble them up and bring up your response, right? So large scale SQL system. Well, I'm imagining that we could do that with substream. Well, you'll send specifications, the stream you want and you send it up to that ecosystem of indexers in The Graph and they'll process the chunk today, how they leverage the file that have, they already processed it, because they're common things, were used by a lot of people.

Alex Bourget:
And then it can distribute the works, do it in parallel when it makes sense and then accelerate things and reuse massive resources to even create a query time. You'd write a little bit of code and you'd inject it. It would be a query time processing or you'd be querying the pool of streams real time. I don't know, I think it's pretty exciting. I think it can be a side of a BigQuery engine like a pool of all blockchain data that is queryable through streams, and you can just send it send in your Wasm-defined query if you want and leverage the community, leverage the intelligence that everyone has put together.

Alex Bourget:
That's the crazy bit because you don't want to do that alone, analyzing Sushi, Pancake and all these protocols that have crazy ways of storing their data in their contract. No, let them do that and just take the sweet refined streams up there.

Paul:
It's really leveraging the decentralized economy that ...

Alex Bourget:
Exactly.

Paul:
... you can build right now. So we're running up on time here. I know I could keep talking. I want to ask one selfish question that's been brewing back in my head though. So I know that, and correct me if I'm wrong here, but the Firehose is based off of the [inaudible 00:41:24] and the OpenEthereum client. So do you see StreamingFast, because of the way you're approaching flat storage, eventually making technology like Aragon, not as relevant anymore? Because the whole thing of what you guys, well, we don't have to traverse this whole partition tree and on RLP things and like look up, but Aragon says, "It's flat already. We're done." So how do you see you guys playing with Aragon, that tech?

Alex Bourget:
So for us, you said we've started ... So the truth is the Firehose is an agnostic system from any blockchain in particular. For Ethereum, we dug it into depth, the Go Ethereum code base and we put print statements. And then for OpenEthereum, which is another implementation of the same protocol, Ethereum, we went in there, it's in rusts and we put some print statements. In our goal, we do the same with Aragon, we haven't done it yet where it's based on GIFs, so the patch is very similar, but our goal is to extract the data from those protocols in an agnostic way.

Alex Bourget:
We don't care which of those servers you're using, because the protobuf that gets out of a block is deterministic. We're talking about a decentralized system that every node needs to agree. So the data that gets out is, by definition deterministic. So once we get it out of a node, we don't care how the node stores it. We have it all here. We can lay out in a million ways. So the way Aragon stores it helps Aragon accelerate its own internal use because it's not going through 64 queries on the key value store but rather just one for some reason, but akin to decisions they made over there and GIF originally, whatever, I don't really care. Because if Aragon is fast, I can get the data out fast and then we're out into the blue ocean of possibilities, right? Managing, using that data.

Alex Bourget:
And if I want to put it flat somewhere, I'll put it flat. If I want to put this convoluted, I'll put it convoluted if it's useful for my use case.

Paul:
Really. So got you. So it's really a complement of saying, "We're here to adjust to all the different clients depending on ..."

Alex Bourget:
Well, we don't care. If there's deterministic, we'll spew it out. That pattern works on all chains we've tried. It works in a systematic way. You have a thing producing data linearly that can be eventually consistent. That's the only thing that Firehose needs and it's agnostic of the payload that is EO specific or Solana specific, it doesn't care.

Paul:
That's awesome. It's going to allow the scaling out of the domains to be-

Alex Bourget:
[crosstalk 00:43:51].

Paul:
Yeah. Okay, Alex, yeah, I could keep talking, but I know we're up on time here. So thank you so much for coming on. This has been a great session. I am happy to help spread the word of the Firehose and The Graph at any given moment I can.

Alex Bourget:
Yeah.

Paul:
So thank you, PodRocket, for letting us delve into these topics. And, oh, is there any documentation or areas you would like to put in or mention for the audience on your behalf or for their own knowledge seeking?

Alex Bourget:
If you want to reach out to us, go to our website streamingfast.io. It has almost nothing, but you can join our Discord community. You can go to The Graph Discord if you want to speak to us in particular. Our Discord is a place where you'll reach us and we can chat about things. We're working on having some large scale dots for all the chains and how to deploy all these things because it's just recently packaged for The Graph. So it's not all there already, although it's usable and people deployed it and you can consume those streams today. So go to our Discord. That's the best thing you can do if you want to keep in touch. On Twitter, StreamingFast. streamingfast.io. Hey, thanks a lot. Thanks a lot for having me.

Paul:
Thank you.

Speaker 3:
Thanks for listening to PodRocket. You can find us @PodRocketpod on Twitter and don't forget to subscribe, rate and review on Apple Podcasts. Thanks.