Joe: 00:00:02 Welcome to Relay Chain, a podcast produced by Parity Technologies, where we discuss all things Substrate, Polkadot, and Web3. Nicole: 00:00:22 Hello, and welcome to an episode of Relay Chain. We're your hosts. I'm Nicole. Joe: 00:00:27 And I'm Joe. And we're joined by Emiel Sebastiaan from Polkascan. He is interested in philosophy of technology, computer science, economics, sociology, reading, writing, puzzles, music, scuba diving, flying, and the nature of things. Emiel, how do those things all come together for you and get you into blockchain? Emiel S.: 00:00:46 Hi, nice to be here. I'm Emiel, I'm working on the Polkascan project. Yeah, these are the things in my Twitter bio, and I guess that's just a collection of things I've been busy with throughout my life. Nicole: 00:01:02 Can you introduce yourself to our listeners in your own words? Emiel S.: 00:01:06 I'm Emiel Sebastiaan. Sebastiaan is actually my second name, but since I have a Dutch last name, Sebastiaan actually works when I'm traveling and meet people from abroad. I'm Dutch, I'm from Rotterdam, and I've been working for quite a while now on the Polkascan project, with which we aim to build a multi-chain block explorer for the Polkadot ecosystem. Joe: 00:01:28 Okay, and why block explorer specifically? Emiel S.: 00:01:32 That has a history. Back in 2016, I guess, or it was 2015, I was introduced to Ethereum and I got the whole blockchain thing. I went down the rabbit hole. I kind of skipped the Bitcoin generation of technology. And it always annoyed me that there were these single companies and websites that you need to go to, to actually find out what's happening to your own stuff on the blockchain. There were a handful of block explorers back in those days. It started, really, with a personal journey to find out what it takes to actually build such a system. Joe: 00:02:21 You said you got into blockchain in 2014-'15 with Ethereum. Did you immediately see the potential in that, or in blockchain? We were talking before the show a little bit, and you were going to go to grad school and then quit when blockchain happened. What was it that made you realize immediately that this was a powerful technology? Emiel S.: 00:02:46 I think I understood it. I have a background in computer science, a bachelor degree, and then I did a master degree in economics and then I got into a PhD position in science and technology studies, which basically went into topics of how do the social and the technical work together. I quit that PhD, unfortunately, maybe not, but because of blockchain, because I was trying to address some of the topics there. And the whole blockchain thing nullified that research. In the end, I guess it's a good thing because it got me back to engineering and building stuff. But blockchain is this technology that is as much social as technological, so I think that sets it apart from many other technologies. Nicole: 00:03:38 That's great. Why block explorers? It's a very particular thing to be interested in. Can you tell us in your indexing the history of the entire chain, the history of Web3, can you tell us a bit about what motivates you in particular with block explorers and what your personal philosophies are behind that? Emiel S.: 00:04:00 It has to do with being able to reproduce what others produce. It's about management of information, of what is happening on a blockchain. I've always been an entrepreneur in addition to the academic activities I had, and I built supply chain systems, which was really about information management across the boundaries of individual organizations. How people across organizations work together, what the information management of a product is as it goes through supply chains. This is just my new pet peeve of getting to know what all these assets are that are moving along such a blockchain. Nicole: 00:04:38 Do you have particular use cases in mind? What use cases and which teams, which companies, would you think would be really exciting if and when they use Polkascan? Emiel S.: 00:04:50 I think a block explorer is a general purpose application that is built on top of blockchains. So, I don't think it necessarily has a use case except for making the stuff that happens on a blockchain accessible and understandable in the most general way. I think Polkascan is really an ecosystem project and it should be applicable and usable by any entity working with Substrate or Polkadot runtime environments. So, that could be Substrate implementers, the people building on Substrate. It could be the companies that use the blockchain for their own purposes. It could be the community at large. It could be an individual user. I think there are lots of potential use cases, but it's not up to us. It's about providing tooling that could be valuable. Joe: 00:05:49 Yeah, I think we can back up from that question a little bit, and just talk about who are the users for a block explorer in general. Is it for companies? Is it an infrastructure layer that the end user doesn't really ever see? Or is it really for individuals to go in, and not just engineers? How do you tailor the user experience and what is the ideal user experience for block explorers, especially one that has, like you just said, it could be an individual or a company using this as infrastructure. How do you create the APIs and the user experience that you want, and how do you even define that? Emiel S.: 00:06:26 User experience, perhaps that is a stretch at this point because we are really working on basic infrastructure stuff, just to get meaningful data out of Substrate. Substrate, in contrast to previous generations of blockchain technologies, is way more abstract and generalized than, for example, Ethereum or Bitcoin technology. Bitcoin and Ethereum actually give back some meaningful data if you talk to the RPC endpoints of the nodes. With Substrate, well, you get some obscure block header data and some binary data about extrinsics and it doesn't really mean anything. It takes quite a bit of work to actually get to meaningful data. I think we've, by now, done most of the research to actually get to meaningful data in a very generic way so that it works for any blockchain that is built with Substrate. And then you get into the areas you were talking about, like how do you work on user experience. I think we have a number of phases to go through. Emiel S.: 00:07:43 First of all, we want to make with our milestones, and I will go into that a bit later, we want to make deep data available through our block explorer APIs, and we want to build a general purpose block explorer user interface on top of that so we can at least showcase what is possible with the data we provide through our APIs, which is, of course, much richer than the data you would get from a normal blockchain node. Joe: 00:08:12 Yeah, I think we want to talk about some of the decoding extrinsics and stuff a little bit later when we get into the architecture, but you talked about getting meaningful data. What is meaningful data? Especially in a multi chain environment? Because most block explorers, like if you use Etherscan, you can see balance changes and maybe some smart contract stuff, but like you said, Substrate is very generalized, and with Polkadot we're talking multiple chains. So, beyond just getting transactions or tracking UTXOs, how do you scale this idea of meaningful data when you have multi chains and what do you envision being the data that you can get out of this? Emiel S.: 00:08:54 With Substrate, Parity Technologies kind of did a really cool thing, and that's creating updatable and upgradable runtimes. Substrate is basically a general purpose blockchain development framework with a clean slate for a runtime. Any Substrate implementer, any project building a Substrate based blockchain, can implement their own runtime. The stuff that is built in that runtime actually differentiates one chain implementation from another chain. So, you can imagine. There is the Polkadot relay chain, which has a particular runtime that does all the relay and interchain communication functions, and there is the Edgeware smart contract chain, which has a particular runtime built particular to provide general purpose smart contract features. There is the Robonomics testnet at this point. These would go into the particulars of ... what is it what they do. Joe: 00:10:04 Actuating robots, controlling robots, which could integrate with a supply chain system. Instead of just tracking the data, you can actually trigger— Emiel S.: 00:10:12 And you can have chains that go into identity, specialized bank coin runtimes. Getting back to that runtime, the runtime sets apart one Substrate implementation from the other. But there is lots of stuff that all these Substrate implementations have in common. What we try to do, when we are building a generalized block explorer for Substrate, is that we get part of our Substrate stack that is common for all these Substrate implementations and we can get a working block explorer for these components. The cool thing about the runtime is that not only there is a runtime that differs per chain, but there is also a decoding context for these runtimes. There is a special RPC endpoint within Substrate that provides you a complete decoding context, which is called metadata, that actually describes what all the particular functions in the runtime are. For the Polkadot relay chain, it would provide you all the different types of transactions or events that are built into the runtime which are different for any of the other runtimes and blockchains. Joe: 00:11:29 This is what makes it possible to have a more generalized block explorer, because you're not just looking at a block explorer for a parachain, but you need to track parachain to parachain transactions or relay chain transactions or bridge chain transactions, and it's this metadata, if I'm hearing you correctly, that's what allows you to use the same tech stack on the Polkascan side in order to serve this data? Emiel S.: 00:11:50 Yeah, exactly. We use that, for example, to decode what the extrinsics mean for the different chains, and we use that to decode what the events that are built and triggered in the runtime and mean for the different chains. One example is that the Polkadot relay chain has a runtime module which is called the indices accounts and the balances module. These have very particular events and co-functions that allow you to make a transfer of a balance from one account to another account. Events are triggered when these co-functions are executed. That allows us, with that metadata, to actually get a list from the Substrate node to list all transactions that happened in the Polkadot runtime. The thing is, that runtime module doesn't have to exist in another blockchain. Nicole: 00:13:01 For a new parachain to be deployed, and say they implement additional modules and some additional functionalities that are in addition to the things you get out of the box, what's the process for working with Polkascan and integrating with Polkascan? Emiel S.: 00:13:19 I think our stack is generalized in such a way that there should be sufficient support out of the box for any new runtime module that is built by any team. There are some technical remarks we can make about that. There are some constraints. Basically what the runtime allows for is to specify data types. For example, a balance data type or an accounts data type or a proposal data type, and the runtime actually offers a structure to decompose these data types to more primitive data types. For example, a balance is an abstract data type but it actually maps to an unsigned 128 integer. If you know how that maps to a more primitive data type, our block explorer only needs to support those primitive data types and the composition of these abstract data types, to the primitive data types to allow for support of new data types. So, if a new runtime module is built by a new team and it is deployed, and it includes the decomposition how these new abstract data types decompose into the primitive data types, then we should be able to allow support off the shelf. Nicole: 00:14:58 That's great. It sounds like Polkascan in the future will be able to very easily support a fast-growing multi-chain network. Emiel S.: 00:15:06 Exactly. We are standing on the shoulders of giants, of course. Let's not forget that. This is all possible because Parity Technologies built that into ... Yeah, it's true. Joe: 00:15:19 I think this is where the metadata comes in, because you could implement a runtime that uses a U32 as your balance and so you need to tell, whenever it's asking for this data, like a block explorer, hey, my balances are 32 bits of information. That's also what lets you use inherent data that's not necessarily signed or fit into a transaction mold. You can say hey, we have this section of the block that's reserved for data; and this is how you decode it. That's, I think, where the metadata part comes in. How much of that is automated or how much of it do you have to implement yourself? Get the metadata and, say, implement your own decoder? Or can you basically just, you have a script that can take this metadata from a runtime and automatically put it into pretty formatted, human readable block explorer? Emiel S.: 00:16:15 That goes a bit into the architecture of our stack, of course. I think we should differentiate between the two products we are building. We are building Polkascan PRE and Polkascan MC. It's an abbreviation that stands for something, of course. Polkascan PRE is our Polkascan block explorer for Polkadot runtime environments, while Substrate is a development framework to build and deploy Polkadot runtime environments, and Polkascan PRE is a block explorer for a single individual blockchain built in Substrate, for example. Emiel S.: 00:16:52 Polkascan MC on the other hand is our multi-chain explorer, and that's basically a multi chain gateway to many instances of Polkascan PRE. So, Polkascan PRE [inaudible 00:17:07] on the network for the Polkadot relay chain, for the Robonomics network, for the Joystream network. It's one that there will be hundreds, hopefully, in the upcoming years. Emiel S.: 00:17:17 These will all have a Polkascan PRE deployment instance; and these will provide endpoints to the blockchain data. Polkascan MC would be a single instance, actually connecting to all these individual instances of Polkascan PRE, or a multi-chain interface. What we want to deploy on polkascan.io, our gateway website, will be an instance of Polkascan MC. I think that will provide us, and the community, a gateway to an entire Polkadot universe. Not only to relay chain, but all these parachains that compose that universe. Joe: 00:18:04 So, the user is connecting, basically, to a single Polkascan MC, which is then connected to several PRE instances? Emiel S.: 00:18:12 Yes, exactly. Some of the data can be forwarded directly to the APIs of the individual Polkascan PRE instances, and there will be some sort of a harvesting process or aggregation process in the Polkascan MC product to actually do aggregations of all the individual Polkascan PRE instances. For example, a very simple example would be counting the number of transactions or extrinsics that are happening in a Polkadot universe as a whole; rather than a single blockchain network. Going back to your question you asked a few minutes ago, how does that harvesting process of a single blockchain work? Because that's where it all starts, of course. Our architecture stack has a number of components. Basically, we manage per network a Substrate node, so we usually build the Substrate node from source, which comes from the project's repositories. We have, for the Alexander network, we built the Polkadot client, and for all the other projects. Joe: 00:19:30 So, you're not connecting to another RPC. You're doing the Robonomics test node. You're running a Robonomics full node. Emiel S.: 00:19:38 Not only a full node, we are actually running an archive node, because that's what particular for block explorers. And, I guess that differentiates our block explorer from the Polkadot UI that Jaco has been building at Parity Technologies. The Polkadot UI really gets into the current state of affairs and listens to current events as they come by. Our block explorer, in contrast to that, actually goes into all the previous states. That allows, of course, for all sorts of additional valuable information, like: how did balances for fellow datas change over time and being able to drill down in what's triggered that. Was it a reward or a slashing activity? That's what you get when you analyze all the historical data, of course. I guess we kind of like to be the historians of what happens on a blockchain. Joe: 00:20:38 I'm wondering, because I looked into your architecture a little bit and you go from an archive node into your own MySQL database and then you're serving this in a GUI, and there's obviously lots going on in between. But why do you need an archive node when you are just taking full node data in the current state, putting this into a SQL database? When you serve data to the end user, what are you taking from the SQL and what are you taking from the archive node? Emiel S.: 00:21:05 Currently we are getting everything from the archive node; because it's a fairly simple way to configure it like that. There are some service management considerations, but we take everything. The blocks, the extrinsics, the events, and some other stuff from the state and the storage from the archive node and we all put it in the SQL database. There will likely be some optimizations later on. Just to keep within the constraints of the block time, you need to be able to process everything you need to process within the block time in order to keep up with the finalized chain head, of course. Joe: 00:21:45 Sure. And you're hosting and serving this all yourself? Emiel S.: 00:21:48 Yes. Yeah, exactly. Joe: 00:21:51 Can you talk a little bit about what are your tools? I saw in your GitHub that you're using, some Python scripts, to take the archive data into your SQL database and then same to serve it. Can you talk about the development of those and what the tasks are within those scripts? Emiel S.: 00:22:13 We have the node and we have a harvester application on top of that node. The harvester application actually looks at the finalized head, gives you a block hash, and you can work your way down all the way to genesis. Basically, you talk to the RPC and ask the RPC give me all the data for this block. Each block that is returned from the RPC has a parent hash and you can use the parent hash to actually fetch the parent block. You can fetch the parent block and you can fetch the parent block, and that's how you get canonical chain all the way back to genesis. One of the tasks that harvester process is doing is frequently talking to an RPC endpoint, and that's why one of the components on our GitHub is called the Substrate Interface. Emiel S.: 00:23:11 We used to call it the RPC wrapper, but Substrate Interface seems a bit more generic or general, because it also allows for web sockets connections in the near future, so we are kind of happy with that term. It is basically a wrapper for the RPC, so all the repetitive tasks are executed in a nice Python library, which of course can be used by other applications. That's why we made it a separate repository. It will be IPS also published on the PIP dependency manager. Joe: 00:23:54 PIP. I have an anti-Python bias, personally. It's not normally used for production mission-critical systems, do you view Polkascan as being mission-critical, and are you using Python to develop and experiment or do you plan to keep the size of your production stack? Emiel S.: 00:24:16 Well, we are quite happy with Python. Yesterday with the presentation someone asked the question: for example, why are you not using the Polkadot UI libraries which are the JavaScript libraries, and it has been a choice, basically we made back in the days. Back then there wasn't a really a good alternative to, we needed to make a choice; it is about the developer so we have, available who are architects that are helping us out with this. Now we are looking at performance benchmarks, and we are looking at strategies to actually be able to parallelize all the tasks we are doing, and we are very confident that this will work very well for us. Emiel S.: 00:25:23 I'm not the Lead Engineer for all the Python architectures, we got a new and fresh Senior Engineer for that, Arjan has been doing a presentation yesterday, next to me and I, it's a fun story but I built most of the stuff in MySQL stored procedures, last year doing all the research and we got our new Lead Engineer in December to actually implement all the research I did last year and get an implementation that is maintainable and scalable in the next couple of years, with a larger team. Nicole: 00:26:14 Nice, so it sounds like you really built the first version of Polkascan and since then you've hired in a team and scaled yourself? Emiel S.: 00:26:21 Yeah, exactly so, I know what it's about but, you know, I also know my limitations. It's good, you know, to have a team and shared responsibilities, so the burden is not on the shoulders of one individual. It is a team effort. Nicole: 00:26:40 Great. Going back to your previous point about performance benchmarks, out of curiosity what are some important performance benchmarks for a block explorer? What do you guys usually keep in mind when you think about that? Emiel S.: 00:26:55 So, this is not something we just thought of. I actually built an EVM block explorer for the Ethereum mainnet, I told you about my annoyance with these block explorers serving proprietary websites, with advertisements on the websites and they log your IP. I don't know what they do- Joe: 00:27:20 And you guys are all up in source, right? Emiel S.: 00:27:23 Yeah. We'll get into that in a bit, but some of the basic stuff, not everything, but some of the basic stuff off Etherscan I rebuilt in our EVM block explorer, and I spent about a year on that and BlockScout of the POA Network; they were basically doing the same thing at that point, and they got a nice grant from the Ethereum Community Fund to make that happen. I guess we were a bit too late there; and perhaps that's a good thing, because that made us move and pivot to the Polkadot ecosystem because of that. But we did actually manage to build a working block explorer for Ethereum mainnet, and the thing is, everyone who has been around knows about all the Twitter flames about how large is a archive node, how large is a full node for Ethereum mainnet and Afri is one of our defenders in that whole discussion of course. Emiel S.: 00:28:34 But all things aside, there is quite a bit of data on the 7 million blocks nowadays, on the Ethereum mainnet and our harvester for the Ethereum mainnet cuts to a size of about 6 to 8 terabytes of indexed data, which would allow you to look up accounts, transaction history, all the events, ERC20 tokens and all the other tokens. So, you can imagine that if you have to fetch that sort of data from an Ethereum node, it will take quite a bit of time to crunch as well the data. Emiel S.: 00:29:14 There were all sorts of events in Ethereum's history that complicated stuff; back in 2016 during Devcon 2 there were the so-called Shanghai attacks, which were DDoS attacks, on the Ethereum network and basically that bloated the state of Ethereum mainnet, and this type of event really hardened our harvester processes to be able to crunch through that data, and to optimize your harvester processes for that, and there were a number of other events. So in the end if you want to provide some near real-time block explorer experience, user experience, so that means that the data you present, which your block explorer keeps up with the moving chain tip, then there is this constraint; you cannot get around it, you need to be able, on average, to process and crunch through all your data within the constraint of the block time. So if the block time is 5 seconds, then let's do it within 5 seconds, if it's eighteen to fourteen seconds, as is with Ethereum then you need to be able crunch through that. Emiel S.: 00:30:33 There are several ways to deal with that, you can look at what tasks can you do next to each other. So a multi-threading or multi-worker processor, so we have a number of facilities for that, so it's fairly easy to get 2 blocks simultaneously, or ten different blocks simultaneously, and crunch through each of the individual blocks in parallel. Emiel S.: 00:31:01 There is even some of the activities within 1 block that you can do in parallel, but it really gets into optimization and I think we can come up with some general rules for that, for the Substrate system. We have identified a number of milestones we are going through with our Web3 Foundation grant we got, and the last milestone is actually implementing a number of these optimizations to do parallel processing. Emiel S.: 00:31:38 I can go on about this for ages- Joe: 00:31:40 Please do. Emiel S.: 00:31:40 There is all sorts of optimizations that you can do on your database layer as well, you don't want to do an inset query and then do many update queries on your database, that's basically database optimization. You don't want to have too many indexes because doing insets and updates on indexed tables actually is a performance drain as well. But, on the other hand, if you want to build a user interface, and be able to do proper select queries, then indexes are useful, so there is an optimization going on there as well. Emiel S.: 00:32:16 Then there is the whole thing of managing 10 terabyte databases, and the 10 terabyte database benchmark or norm, we only got that from Ethereum mainnet so that's only 1 blockchain, so what happens if you have a relay chain with a hundred blockchains that get to a certain size? So, there are all sorts of challenges, and I think there are software development challenges you need to put into your development efforts, and there are service management challenges we are facing, and managing large databases and managing, for example archive nodes being able to do proper service management processes for your updates and upgrades of these nodes of your databases, of your software, these are challenges in addition to the technical challenges of actually building the block explorer. Joe: 00:33:09 This leads in to a scalability and roadmap kind of question: if you're hosting everything on a local server, talking about 10 terabyte databases, you're getting up to the point where you're going to have a sharded MySQL database anyway, it may be sharded in the data center but you're getting beyond single hard drive type of database which leads you towards IPFS more generally, if you're already talking about a sharded database do you have plans to move into something more decentralized than sharded or do you see this as a service that's OK to be served locally? Emiel S.: 00:33:46 I get the whole decentralization narrative in the blockchain world, and I'm all for that. I don't necessarily think that we are going in that direction at this point in time, I'm used to doing old school IT service managements and I'm an entrepreneur in old school IT systems. Emiel S.: 00:34:10 It will be a while before we get to decentralized storage, I think it's almost a no-brainer just to find a way to do proper servicing of block explorers with traditional means, that is a challenge in itself. I don't want to make it too difficult on us. Emiel S.: 00:34:32 You often see startups in the blockchain world trying to solve 1 problem, 2 problems, 3 problems at the same time and I want to solve 1 problem properly; rather than solving 6 problems at the same time and not doing a good job on any of them. Joe: 00:34:50 I think this comes up as a meme in the blockchain industry of just decentralize everything, without thinking about what actually needs to be decentralized and what are the trade-offs of doing that and there's obviously a performance trade-off there when you go to a completely decentralized database and I think the real question is: does this need to be decentralized? Because if your code is open sourced and the archive nodes are publicly available, why does the database for Polkascan need to be decentralized if somebody can reproduce this? Emiel S.: 00:35:25 Exactly, so we should really look at the problem at hand and what does decentralization mean in this context? Like you said, if we provide an open-source software stack, the nodes in the client's software is open-source. The database is open-source. We allow for easy Docker commands to have the community rebuild everything we did with open-source tools, they can start their harvester process and it might take a while to actually crunch through all the data, it might take a month, it might take 2 months if you have a 2 terabyte database, but you should be able to reproduce all the data that we present on our interfaces. That is a form of decentralization as well if you are able to reproduce everything yourself. Emiel S.: 00:36:19 So who will be using these block explorers? There is, of course, the convenient way of just going to Polkascan.io and look at the data, but some people, some organizations might actually want to take the effort to get the data themselves, just for extra assurances, or for integration with their own systems, for example. Nicole: 00:36:53 On that note, is the plan for Polkascan to be open-source indefinitely, and does that wrap into your overall roadmap over the course of the next couple of months to a year? Emiel S.: 00:37:08 Polkascan PRE for sure, and open-source projects, we would actually encourage other organizations to be able to run the same Polkascan PRE instances. So, that allows for multiple instances of these block explorers to exist out there. Emiel S.: 00:37:29 I think with the Polkascan MC product we intend to run the Polkascan PRE service management activities ourselves, to be able that we have access to all the data we are producing. We want to keep up, from the start, of the launch of all these networks, to be able to collect the data, and to aggregate it into the multi-chain system. Emiel S.: 00:37:58 So, I don't necessarily know if that Polkascan multi-chain system will be open-source there are a number of considerations, I think the emphasis is much more on service management activities than it needs to be able to reproduce the data, you can reproduce all the low-level data and there are some considerations with the front end usability libraries, you can use or you can buy, which are not necessarily components that are available within the open-source community, so it allows us to perhaps put more emphasis on paid services with that from a usability angle, or a data integration angle, or event monitoring notification services—that sort of stuff. Joe: 00:38:50 I have a few questions; I looked at your road map, and you presented this yesterday also, you have architecture which we've talked about quite a bit here, core data entities, runtime data entities, search optimization usability, system aggregation, runtime aggregation, and it would be nice to got through a little bit of each of these. Joe: 00:39:10 For core and runtime data entities, we've already talked about getting binary data from Substrate RPCs but can you differentiate core versus run time data, and the challenges of processing, harvesting and presenting this? Emiel S.: 00:39:28 So in our first milestone we actually built our architecture, so all these components that are in our GitHub they have a milestone, one branch and it actually presents a minimum viable block explorer that actually does the harvesting process, it puts it in the database, it presents the data through the API and presents it on the UI. With that first milestone we prove that architecture works by presenting the block entity. Emiel S.: 00:40:02 With the second milestone we get into all the data that we know for sure that all the Substrate instances have in common. So these are the events, the extrinsics, which is a general term for transactions and the runtime object. The runtime object specifies what type of event, and what type of extrinsics exist within the blockchain. It allows us to classify all the events and it allows us to classify and decode all the extrinsics, so that's all part of the system entities milestone, the second milestone. Emiel S.: 00:40:44 Yesterday, Arjan actually presented that second milestone release, so we need to do some polishing and we need to do our release now in order to sign off on that milestone. Emiel S.: 00:40:57 The third milestone goes in to key runtime entities, that goes into particular runtime entities that a lot of the Substrate instances and networks have in common, like almost any runtime for any network that currently exists has the accounts in the indices runtime module and it has the balances module, it has the consensus module, it has the, I don't know, there are a number of modules they have in common, so we want to provide support for that. One easy example is to actually be able to offer a few in the explorer on the account, so you can look up an individual account and see which transactions belong to that account in which blocks and extrinsics activity was related to that account, perhaps see a balance change over time for particular accounts. So I think there are a number of runtime modules that are particularly useful for many of the Substrate chains that exist. Joe: 00:42:10 When you accomplish this for one runtime, is it generic and extensible to others? So, what I'm trying to ask is; we have our libraries that we provide in Substrate but you can also write your own, and so one of the guys here wrote a library that is a Bitcoin price oracle, so it gets the price of Bitcoin and puts it in the chain as an inherent. Can you manage stuff like that, or is it only, we have a timestamp inherent that just ships with Substrate, can you handle anything that the user wants to build? Emiel S.: 00:42:45 Yeah, I think so, but I think it's really important that we engage with all the Substrate implements, and that's basically our open call yesterday in the presentation. We want onboard as many networks as possible, to actually test our hypothesis that we have a generalized Substrate explorer. We are very early in the ecosystem, very early in the development phase, so I'm especially curious about the things that are off or the things that are different, the outliers, because that allows us to actually ensure that we are as generalized as possible. Nicole: 00:43:24 You mentioned before that as long as these data types are constructed on top of core primitives, you're able to boil that down and have it be integratable. Emiel S.: 00:43:34 Yes, exactly. Nicole: 00:43:35 So I think for anything at the Substrate SRML development level, it should be a rather fast integration with Polkascan? Emiel S.: 00:43:43 Yeah, exactly, but there are a number of examples I can think of right now that may be substantially different from what we've seen so far. One example is a UTXO model within Substrate, that works very differently from an account nonce model and I'm really interested in seeing how that works. So it is to be determined. Emiel S.: 00:44:14 Then the other example is Reto's project, Katal, which he's building all these thirty-something financial primitives, which you can utilize to build all sorts of financial products on top of. So he has explained at a EthCC conference that he may be using a different runtime environment or virtual machine for that. It is to be seen to what extent we can offer compatibility, but then again, I talked to him yesterday and he said "well, we haven't really implemented that yet and we need to see," so we are very early on. We want to engage with everyone to test our thesis that we have generalized compatibility. Joe: 00:45:06 It's all binary data right? So it's all something being expressed via type or binary data; so- Emiel S.: 00:45:13 Exactly, you put something into a database and you get something out, it's as simple as that. Joe: 00:45:17 Yeah. What about the second path of your roadmap: search optimization, usability and system and runtime aggregation? You mentioned earlier about paid services so can you talk about what these milestones mean and then where some of the paid services can come in and what you can provide? Emiel S.: 00:45:34 So, obviously we're working on our business plans, I think it's a bit too early for that. Our real priorities are in getting the grant finalized, in getting Polkascan PRE out there. We've been thinking about the business model; one of the things that came to my mind that perhaps a DAO treasury might be our first paying customer. That's a very interesting notion if you start to think about that; how does that work? I think with this 3rd generation technology we are seeing a movement towards inflation within networks that feeds back into a treasury which is governed or decided by sort of a DAO council, I think we fit into the category that we provide. Emiel S.: 00:46:33 Ecosystem services, unless you have a billionaire who's able to subsidize these type of services within your ecosystem, someone needs to pick up the bill for service management. If you want to offer a block explorer for the community, and Etherscan is one of the, has very high traffic it's in the top 30 at some point of most trafficked websites, I believe, so who's going to pay for that? I know I don't have the funds to do that, so, and at the same time, we want to stay away from any type of advertising revenue, I think it doesn't fit well with the ethos of this ecosystem, to have that sort of model to sit on top of such a database. So, I guess I can imagine that at some point there will be multiple treasuries either for a single parachain, or for the Polkadot ecosystem treasury as a whole, that at some point we would make a proposal, or multiple proposals saying, well, you've seen what we can do, these are our criteria, we don't want advertising which you agree, then we ask option A, B or C we could offer running the ecosystem block explorers for say a period of a year, or 2 years, and we have a budget for that, so we would like a council vote, or a community vote, on that of getting us a revenue stream from a treasury for that. Nicole: 00:48:24 Strictly for treasury, you're talking about the treasury module that comes out of the box? Emiel S.: 00:48:32 Yes. Nicole: 00:48:32 Which will what? Which upon dilution in the system maintains a pot of balances to pay for these proposals, that you yourself and the community can submit? Emiel S.: 00:48:42 Yes, exactly. We haven't heard anything in particular about that and how that's going to work, this is just my imagination. How we could fund a public service of a block explorer, and that's besides that we are running those services for ourselves; for our own purposes. But as a public service this might be a way to finance that. Nicole: 00:49:05 Yeah, that makes a lot of sense, I think very cleanly and very nicely aligns everyone's incentive structures, and ensures that public services and providers like yourselves are fairly compensated. Emiel S.: 00:49:17 It is a fair service, it is fair to everyone. If we don't deliver it has a limited timespan, it is an open-source explorer. Someone else, if they believe they can do better, should be able to provide it. We believe that since we are building the servers, we are in a very good position to be able to provide that service to the community. Joe: 00:49:42 Yeah, so we can and we're going to do podcast episodes just on governance, but I think this is an interesting topic, because especially in IT, a lot of users don't see the layers that are in-between. If you use Facebook, a lot of people don't even know about the Facebook data center, but this is a service that need to be provided and so in these blockchain or decentralized systems, being able to fund these intermediate layers that people don't see, it can be quite difficult when you just do like a coin vote or something. Because who's going to vote for this? They might not understand, and really shouldn't have to understand why this is necessary in order to use the system, and so there's a lot of, I think, thoughts going into how does the chain itself have it's own decision-making system? Joe: 00:50:34 Yeah, I don't have a question on that, it's just a thought. Nicole: 00:50:36 It's a great thought. Emiel S.: 00:50:38 I agree, but a DAO for, and a treasury, these are really interesting notions and we haven't really seen that before. So it's like we are in the 2nd era of DAOs that is upon us. Nicole: 00:50:54 Yeah, we're seeing a lot of interesting experimentation on different DAOs in different governance models, and those are things that you can very easily structure and set up and get up and running within 2 hours on Substrate. So, I think you'll see a lot of interesting use cases and interesting feasible ways that block explorers can be fairly compensated. Joe: 00:51:15 Yeah, you had an interesting slide yesterday in your talk, chain length as the abscissa and this line of finality and everything behind finality is transactions and very concrete data, and everything to the right of finality is politics and economics, you can't have a gray area when it comes to this governance stuff, because if people doing signaling or voting on chain, and you have the infrastructure to serve data, where do you see yourself in that line? Do you see any role for Polkascan on the right side of finality? Emiel S.: 00:51:51 Yeah. So I made that sheet especially with genesis on one end, on the left end, and the chain tip on the other end, and then somewhere near the chain tip you have a vertical line, called finality and basically that's what these proof of stake systems offer. They are fair assumptions about where stuff is finalized, rather than the probabilistic finalization you get with proof of work systems. Emiel S.: 00:52:18 So I made that sheet, and because it differentiates what Polkascan does, in contrast to what Polkadot UI does. Which has an explorer section as well, but it mainly goes into what's happening at the tip of the chain, it looks at the tip of the chain and shows you the events and the transactions that happen perhaps a vote that needs to be made on a proposal, that's all happening off stuff that is to come, or stuff that has just happened. I differentiate the section before finality and after finality, as that's what historians do and that's what politicians and economists do. Economists make choices, politicians go about decision processes and fight for what they believe in, and I think Polkascan is really focused on the historical data. Emiel S.: 00:53:19 So what is the role of historical data in politics, economics, and decision processes? It helps you understand how decisions were made in the past. One example is if you want to nominate your DOTs to a validator, it's actually good to look at past performance of a validator to know, I believe the marketing, but let's look at the data. Is this a trustworthy entity, has this validator been slashed before? So, should I nominate my dots to this validator? There's a role for historians. Nicole: 00:53:59 Yeah, we can only make smart political socio-economic decisions when we know history, and that's a critical role that Polkascan will play in the future ecosystem. Nicole: 00:54:09 So, Emiel, in your words what is a block explorer? Emiel S.: 00:54:13 Yes, so I think a block explorer is a general purpose application for a blockchain. So it is the most general purpose application you can think of, because it basically allows you to make the data that exists on a blockchain accessible and understandable. Nicole: 00:54:34 Can you define for us what the difference between an archival node versus a full node is? Emiel S.: 00:54:40 The main difference between an archive node and a full node is that an archive node contains all previous states of the blockchain. Whereas a full node only retains the most recent state or the state of a few of the most recent blocks. Emiel S.: 00:54:58 So, say the default is 250 blocks with a full node, 250 blocks from the chain tip, the data of the previous states can be pruned, so that allows your blockchain nodes to keep a fairly small size. It's an optimization, it's a storage optimization. Stuff you no longer need, you can get rid of it. Nicole: 00:55:20 Whereas an archival node keeps everything? Emiel S.: 00:55:22 Yes. So it doesn't do the pruning, the clean up activity, yeah. That allows you to, for example, not only ask the node for the current balance of an account but for the balance of a particular account at the genesis block. That's the most extreme example. Joe: 00:55:44 But when you serve data for that, like on Polkascan if you look at your account history; you are not going through every state in the archive node that's been observed from MySQL? Emiel S.: 00:55:55 Yes, so our harvester process harvests a number of previous states from the blockchain archive node, it stores it in the database, and once we dealt with that we no longer need to go back to that archive node of course. It's a performance optimization, it's easier and cheaper for the application to go to the database. Joe: 00:56:18 You mentioned that you're involved in Ethereum and how did you discover Polkadot and what's made you decide to make the switch? Emiel S.: 00:56:28 Yeah, so I got into Ethereum I guess because of Vitalik's white paper, it was around the time of Devcon 1, two weeks before Devcon 1 that I, you know, so I was fairly late, that was my impression back then. I couldn't get a ticket for Devcon 1 in London but by actually studying all the Ethereum stuff and the technical yellow paper that was written by Gavin, I soon acknowledged that you should pay attention to whatever Gavin is doing. The Polkadot white paper came out and I thought, well this must be something and it took a while before that dawned on me. It was a dual announcement with the Melonport Project back then and I didn't really understand what that relation was. Only later, when the first testnets came out, I was ready to make that pivot in just researching what it means to get a block explorer for this new ecosystem. Emiel S.: 00:57:34 Then there is another thing, it was about around Devcon 3 I believe, that the Energy Web Foundation they did a presentation in which they announced Tobalaba testnet, which is a POA implementation for an industry-based and energy industry-based blockchain. So that was the first time that it really dawned on me that there will be multiple chains, there could be many implementations of Ethereum Virtual Machines and there could be industry chains, and those were the really early ideas that I, at that point I had built most of the block explorer for the Ethereum mainnets, it helped me with maturing the idea of what does it take to generalize that block explorer to make it compatible with any EVM instance. For example; the Coven test net the Rinkeby test net and the Tobalaba energy chain, so that led us to our first sketches and drafts, of what a multi-chain explorer would look like, a single entry point to that. Emiel S.: 00:58:48 You know, at some point, you start to learn what Polkadot is about and it really addresses that problem, and I think it addresses the problem of a multi-chain universe in a much better way than a sharded Ethereum universe, and I like to compare it to an economy. An economy has multiple companies, and it's not that all these companies within an economy do the same thing, there's specialization and the work goes where it's done cheapest or in closest proximity to where the true value is, and I think that's what a heterogenous multi-chain system is really about. That sort of specialization that you can never have with a sharded universe of similar general purpose EVM blockchains. Emiel S.: 00:59:40 So, I guess this is what really pushed me forward to abandon the previous stuff and really go full force in Polkascan. Nicole: 00:59:52 Well, Emiel thank you so much for coming on the show, I think Joe and I really enjoyed our conversation with you. Nicole: 00:59:57 After you've shared so many exciting things and future developments about Polkascan, for our listeners, how can they get involved and how do they get in touch with you guys? Emiel S.: 01:00:07 I love being here. I don't do podcasts that often, but this was a really enjoyable conversation. Emiel S.: 01:00:14 So how can people get involved? So you can find us on medium.com/polkascan, you can find our code at github.com/polkascan and you can follow us for updates on Twitter, at twitter.com/polkascan. Our multi-chain explorer can be found at polkascan.io. There will be some updates very soon and I guess our main audience at this point is really Substrate implementers so we really love to get touch with everyone hacking and building a Substrate implementation. That will help us test our thesis that we have generalized compatibility for any Substrate instance, any Polkadot ecosystem blockchain. So we'd love to get in touch with you. Nicole: 01:01:10 And on that note, thank you so much for taking the time with us today, Emiel, and we look forward to hearing future exciting updates from you guys. Joe: 01:01:18 Thanks for listening to Relay Chain, we'd love to keep in touch. Follow us on Twitter @relaychain, or email podcast@parity.io Joe: 01:01:26 Our team at Parity includes some of the leading peer-to-peer networking developers, consensus algorithm inventors, blockchain innovators, and Rust developers. If you want to learn more about our work, or want to work with us, visit our website at parity.io and sign up for our newsletter at parity.io/newsletter.