Patrick Short 0:02 Hi, everyone, and welcome to the genetics Podcast. I'm really excited to be here today with three excellent guests and this podcast is going to be all about the UK Biobank whole genome sequencing project, if you haven't come across it, it is a massive collaborative effort between the UK Biobank as well as a consortium of industry partners, Amgen, AstraZeneca GlaxoSmithKline, and Johnson and Johnson, alongside wellcome, which is a global Charitable Foundation based in London that funds a very wide array of biomedical research programmes as well as UK research and innovation. And really the aim of the programme is to sequence the complete genetic code of all 500,000 UK Biobank participants and the sequencing is being carried out by decode genetics, as well as the Wellcome Sanger Institute. If you haven't heard about the UK Biobank, which I think is pretty unlikely, but if you want to hear the full background and backstory, we did have Professor Sir Rory Collins, the CEO of the UK Biobank on episode 40. So I encourage you to go back and listen to that. I'm really excited to have three guests there, Mark Effingham the Deputy CEO of the UK Biobank, Dr. Kari Stefansson, the CEO of decode genetics and Professor Marilyn Richie from the University of Pennsylvania, where she's a professor in the Department of Genetics and the director for the Centre for translational bioinformatics and the Associate Director for bioinformatics. And the Institute for Biomedical Informatics. Marilyn, you went having most titles and things going on. So just to start off, I'm going to ask all three of our guests to give a quick overview of their career and a little bit about themselves to set the scene and context. So Mark, it'd be really great to start with you if you could give a quick overview of your career and what brought you to the UK Biobank where you are today. Dr Mark Effingham 1:33 Okay, thanks, Patrick. Really good to join today. So Mark, I think I'm Deputy CEO for UK Biobank. I've been with biobank for the past six years. So I'm actually a physicist by training. I started off life as a nuclear physicist before moving into working in it. I spent 20 years working in industry for IBM, predominantly in healthcare and life sciences. I joined biobank six years ago, his chief information officer, which was a new role at the time was recognising that biobank was moving from a period where it's very process oriented about how you recruit half a million participants at scale collect biological samples that go into storage for future assays, I recognise it has become more of a an information problem about how you collect, curate, and make available these very large scale data sets at scale to the international research community. So I started off in that role, setting that direction, and then have progressed into more operational roles into the Deputy CEO position. And we've been working very closely, both as part of this project to undertake whole genome se quencing of half million participants, but also provide the platforms through which these data will be available to researchers worldwide. Patrick Short 2:46 Great, thank you Mark. And going to come back to that data sharing paradigm and the technology behind it that definitely later on in the podcast. Marilyn, I'd love if you could go next and give us a quick overview of your background. Professor Marylyn Ritchie 2:57 Sure, I'd be happy to. Um, so I am a biologist by training. I started in biology and then in graduate school, got into biomedical informatics and statistical and computational human genetics. I have spent most of my career in academia kind of going through the ranks, I spent a very short period of time in a health system. And I've been at University of Pennsylvania for a little over four years. Throughout my career, I have been involved with bio banks. So I started as an assistant professor at Vanderbilt University, and I was there when they started their bio bank, I work closely with geislinger with their bio bank, and then the University of Pennsylvania. I'm one of the CO directors of the Penn Medicine biobank. And one additional thing that that changed actually just in the last two months is that I am no longer the Associate Director for bioinformatics. In the institute. I've become the director of the Institute for Biomedical Informatics, so I just kind of took on that promotion very recently. Excellent, congratulations. Patrick Short 3:53 And looking forward to revisiting what you all do. And a little bit more about your work in the UK Biobank a little later on. Cory, you're the CEO, and I believe founder of D code, and you've been a pioneer in population genetics for much of your career, I did my PhD looking at de novo mutations and rare disease and read pretty much everything that you and your team put out around the topic. So it'd be great if you could introduce yourself and talk a little bit about your background and how you got to where you are today. Dr Kári Stefánsson 4:19 I am a medical doctor, I trained in neurology and neuro pathology at the University of Chicago. I was on the faculty there for several years, I was on the faculty of Harvard Medical School for several years. And then about 26 years ago were found to decrease genetics in Iceland. And we in many ways were the first ones to to start population biobank start using Swartz this population approach to human genetics. And we have been focusing particularly on common diseases a man and B have have sort of been looking at it from the point of view or human diversity in general. Currently, we are very heavy uses of the UK Biobank which also has helped to complement what we have with data from Iceland and rest of Scandinavia. Patrick Short 5:07 Wonderful. Thank you, Kári . And I'd love to actually just jump into it with you and referencing a talk that you gave not too long ago at the UK Biobank conference called Why Why sequence of the UK Biobank? What will we learn from whole genome sequencing? This many people I'd love if we could start there. What have we learned to date with the 150,000 samples that have already been sequenced? Maybe the number is even even higher at this point? And what can we hope to learn when the whole programme is completed? Dr Kári Stefánsson 5:34 This is a, this is a fairly big question. Because basically, when you sequence this large number of people, you end up having a reasonably broad view of the beginning of human diversity. Unfortunately, the UK Biobank is mostly not solely, but mostly people of European descent. And a we need, indeed, substantially more sequences from people of African descent and, and other ethnic origins. But we are already learning an awful lot from from the UK Biobank. And it's not just that it is a collection of incredible amounts of data, once human diversity, it has also been generously made available to the rest of the world in a manner that we have never seen before. So basically, the UK Biobank is probably the biggest gift ever given to the field of biomedical research. And, and basically, in the group that I interact with shorter one world scale, everyone is using the UK Biobank to test out hypothesis that are generated by other data that people have have available to them. And there are all kinds of things that are coming out to this, for example, we, we have been amazed, you sort of when we begin to look at a secret that the intergenic sequences, how large part of the genome that is under selection is actually outside to the protein sequences. And therefore it's absolutely clear, after an incredibly functionally important regions are tied to the coding sequences outside to the classic regulatory sequences that we have yet to begin to understand fully. So what what the UK Biobank sequencing this 500,000 genomes that have now been sequenced, what to have already started to teach us, is, I think extraordinarily small compared to everything we will learn from the sequences once we begin to use them amongst the rest of the world begins to use, in contrast to everything else we know about man, Patrick Short 7:40 Mark, carry reference, this open model of data sharing, I wonder if you could talk about that a little bit. It's been part of the cultural DNA of the UK Biobank, really from the very beginning. How did that come about? And what how does that manifesting from both that cultural aspect of being very open about data release, but also from the technology side of making that possible while still preserving privacy and other, you know, important promises that you make to the participants. Dr Mark Effingham 8:06 So I think that's a really important point. So that's UK Biobank as a large scale, biomedical kind of database, the whole premises about how we make these data available to researchers worldwide, to undertake all kinds of research into diseases that factors in middle and old age. I think biobank is quite quite unique for a number of reasons, I think, probably the primarily, it's around accessibility. And this is really about the core of this access policy, about how we make the resource available to bonafide researchers all around the world, we engage with everybody on exactly the same basis, whether they are UK or international, whether they are industry and academic. Again, the the policy that's in place is really there to support that, and was really part of how we actually engage with participants to join the city in the first in the first place, around the consensus in place and how the data would be used. I think what we've seen, and this was, after we made available genotyping imputation data on all of all the half million participants, it really fueled the visibility of the resource worldwide. I think back in 2017, half of all, researchers using the quick in the UK are from outside acts of US, Canada or Australia. I think the visibility of the genotyping imputation data really fueled use by other countries, such that we now have 30,000 Researchers in 90 countries all around the world being able to work with these data. I think that visibility led to the Exome Sequencing Project, which was a precursor to this, where we were approached by Regeneron and GSK to see how they could access biobank samples to undertake company Beck's assays to produce data that they could use themselves, but then we become available for all researchers, I think this is an important part of the access policy that we do allow a short period of exclusive access. So for researchers who are making an investment in the resource, for example, to receive samples and turn them into reusable data that all researchers can use, they can request a short period of exclusive use of nine months, they get to work with those data for that period of time. And then those data become available to all researchers on exactly the same basis. So that really kind of started out around that point. And it has led to subsequent collaborations being informed is not limited to industry, we have academic groups who have undertaken sample assays, such as looking at telomere length, or where again, they've been able to request a short period of exclusive access. But but it's a really kind of nice model that has allowed data to be generated that otherwise would have taken many years to come about, and really are there for the international research community to use, Dr Kári Stefánsson 11:08 I think it is important for you to understand that the people have tried before to make data of this type available. And I think that the it has turned out to be rather difficult, for example, I don't think there is a single resource in the United States that is available in the same manner, in spite of the fact that the NIH says as an arms time in the game, that all data generated with grant money coming from them should be made available to the research community in general, but somehow, a My dear friends on the other side of the Atlantic have not succeeded in the same way as their as the UK Biobank has. Patrick Short 11:47 Why do you think that is? What do you think are the differences driving that Dr Kári Stefánsson 11:51 the differences probably lies in the fact that they're the Brits and managed to put together these very large publicly initiated projects like the UK Biobank and and there has the people running the biobank? I've done it in a rather unselfish manner, it hasn't been governed by the big personalities that have a tendency to want these things for themselves. So I think that through the UK Biobank through the so all kinds of large public process like this, the Brits have basically taking the lead in biomedical research after decades. Patrick Short 12:29 Marilyn, maybe you can jump in on this. Is this something that you're seeing other other bio banks around the world looking to adopt this model? Or is the UK Biobank still one of the only who are taking this fairly different route? Professor Marylyn Ritchie 12:42 Now? I think the UK Biobank is certainly at the forefront. And I think they are leading the way and showing everyone else how to do this. I do think, you know, in the US, we're certainly trying to learn from what they've done, I can think of a couple of examples. One is the all of us cohort programme, which is a research programme. It was originally called the Precision Medicine Initiative, and it's now kind of coined the all of us research programme, they have a goal of recruiting a million Americans. And currently they have close to half of a million. And I think they're in the process of doing the sequencing for about the first 100,000. I think sometime later in 2022, those data will be available, but they've done something very similar to the UK Biobank in that the data are available in a researcher workbench, which is very similar to how UK Biobank has been moving with instead of investigators downloading data, the data are available in a cloud based resource. So I think that the all of us programme has really been looking to how the UK Biobank has done this, and is trying to emulate that to make those data similarly available. And as a researcher, you know, I've applied for access to the UK Biobank, and I've applied for access to all of us, and I think the processes were very similar. But to Kerry's earlier point, it was very hard prior to that, you know, getting datasets, it was a long and onerous process. And, you know, you'd get piecemeal datasets that you'd have to try to put together locally, and that was very complicated. I think some of the other institutes at the NIH in the US are building cloud based platforms. The NIH GRI, which is the Human Genome Research Institute, they have a programme called anvil. NHLBI has one called bio data catalysts. I think there's one with the National Cancer Institute as well. They're all moving toward that. And I think they're looking to see how UK Biobank did it and trying to learn from from them. Dr Kári Stefánsson 14:36 Yeah, it is interesting when you begin to think about it is that there is a tendency to move with data like this into a shorter onto a cloud based servers. So basically, the access to the data becomes limited to using them on specific platforms. And that I think is going to diminish our ability to make discoveries. With these data beakers, you have to be able to take these data and work with them in the environment that you are familiar with the environment you have put together and have the possibility of use them unfiltered, in connection with all kinds of other data. So I think that the, the development towards having the access confined to specific work platforms is not a good development. Patrick Short 15:29 Mark, I'd be really interested in in your thoughts on that, because I know you've got to balance making sure bonafide researchers have access to the data. But also card makes an important point, which is it's very expensive in the broadest sense, not in a money sense, but time complexity to migrate all of your tools into some new system. How did you all think about that, that trade off? And what are your thoughts there? Dr Mark Effingham 15:50 Yeah, it's a really good point. It is about colour coding. I spoke about this offline. I think it's absolutely right, that biobank has been so successful, because of the way it's allowed researchers to be able to download and use these data in their own environments using their own tools driving innovation to drive new insights. And I think it is, we see the value in the biobank resource is really there when it's been used to drive new scientific discoveries, thinking is balanced around practical dimensions of there are a lot of research groups out there who simply don't have the resources in place today, to be able to work with these very large scale datasets. And certainly one of the areas we're looking at how we improve engagement is particularly researchers in low and middle income countries who really aren't using resources today. And things like a platform does provide facility to hold the data but also put in place the another the computational resources for for researchers to use, but it is a really careful balance, because we need to create environments that people want to use, and doesn't stifle the innovation that they have with within their own environments. I think we're taking steps towards that. Certainly the investments we've been making in our platform, it's very genomics focused right now, it has a number of things and things well, but it needs further development over the coming period. And particularly to appeal to a broad church of researchers, from geneticists to imaging analysis through to just normal statistical kind of analyses. So I agree with Carrie, we need to find a way of retaining what really worked with biobank in terms of that principles around openness and access. But also recognising that Technology Paradigms moving forward, there is increasing focus on information governance for how these datasets are available, I think by a bank has got it right so far, but against keeping our eyes open by how we maintain that in the future Dr Kári Stefánsson 17:51 market, it breaks my heart to have to disagree with you Berta, have to do it in this instance. Yes, it is true. There are a lot of scientists from from low income countries who cannot put together their own platforms to do work on all the data. And I think it's very important to provide them with a platform. But that should not necessarily mean that you should take every modest opportunity of being creative, and the way in which they put together tools and platforms, etc. And I think that you will diminish dramatically the impact that the biobank is going to have if you're going to prevent people from downloading data and use them in their own environment. And I think it would be a grave mistake on the behalf of the spectactular enterprise that the UK Biobank this Patrick Short 18:37 sounds like advocating for for not an either or, but really having have the option of both routes, because I think you both make very good points there yet. Marilyn, please, what do you have thoughts on Professor Marylyn Ritchie 18:46 one of the other challenges that I think we will face? I think conversations are ongoing now is that, because some of the data sets are available only in a certain cloud workbench, you know, a lot of us want to be able to bring those datasets together and to do replications. And, you know, look to see similarities and differences. And so I know, you know, I'm on the advisory panel for the all of us initiative and on the international scientific advisory board for the UK Biobank. And that's an ongoing conversation, you know, these are two data sets that it would make a lot of sense to bring together to make discoveries, but they live in different clouds. And so how do we figure out a way that researchers can use both and there's some preliminary pilot work that's ongoing, but I think that's something we're going to figure out because the way that we're really going to make discovery is by bringing all of these large datasets together, any one alone won't have enough world diversity, to be able to answer all of the questions that we want to answer. We're going to have to bring together the different datasets, Patrick Short 19:44 maybe you could talk a little a little bit about your research and what you do, because I understand you're bringing together not just large scale genomic data sets but other other kinds of omics and other data that mark and inquiry reference previously UK Biobank is quite unique and having not In large number of people, not just a high depth of sequencing, but also a great diversity of data, maybe you could talk a little bit about how you're using the data and what your groups work on. Professor Marylyn Ritchie 20:09 Sure. So there's a couple of different kind of axes of research. So one is in the space of really thinking about the phenotypic diversity of disease. So in a lot of our research, we tend to label people with a disease and without, and we'll pick on maybe type two diabetes as an example, you know, you'll create a case control study of people with diabetes, people who are healthy controls and run your associations or your exome analysis for your genome analysis. What we're trying to do is use these large data sets of phenotypic data and clinical data to determine are their kind of subtypes of disease, can we use machine learning methods to identify that there are actually different subgroups who clinically actually have a different manifestation, that also means they probably will have a different disease trajectory, different medications or treatment paradigms would work for them. And then our hypothesis is that genetically, perhaps the disease is actually different. So the underlying genetic mechanisms that lead to these different subtypes might be different. And that could be part of the challenge that we have faced in identifying the genes for many of these common complex diseases. We've lumped kind of heterogeneous phenotypes together. So one strategy is kind of on the phenotype side, could we find subgroups? And to do that? Well, we do need very large datasets, because if you imagine, you know, half a million people is a lot. But if you have five different subgroups of disease, like now you're looking at, you know, much smaller sample sizes. And then the other side is on the the genetic and genomic side, how do we put the data together in more sophisticated ways beyond looking at one variant or one gene at a time, you know, these diseases are complex. So it might be that it's variation, you know, maybe there are some single nucleotide polymorphisms, or some pathogenic mutations that are important in some of the individuals, perhaps their methylation changes, or gene expression changes that are important. Perhaps it's something that doesn't happen until it's at the protein level. And so trying to think about ways to bring together multi omics data sets is something else that we're thinking a lot about, you know, really, we know that diseases are complex in their aetiology. And so really trying to think about how can we use kind of machine learning and other methods to try to address that Patrick Short 22:19 I'd actually really love to pick up on this topic, because as I was speaking to Peter Donnelly, from Oxford genomics PLC A few weeks ago, and we talked about how there are some very simple drug targets, simple, maybe simples not the right word, but some very straightforward monogenic drug targets like PCs, canine we talk about them all the time. But in some ways, they're they're actually surprisingly few of these simple examples, PCs K, nine, maybe Lark to maybe upwey. For Korean, there's probably a few in the neurology space. But in some ways, it's been 20 odd years of sequencing, and we have relatively few simple stories that we can tell from a drug discovery perspective. And and I guess the question is, are or do we actually need to be embracing the complexity of this a little bit more, and thinking about how to what stories to multi omics datasets or poly genic scores? Tell and maybe I'll disagree with me on that a little bit, that actually we were just still not at the scale yet. And there are many hidden single single genetic drug targets to look at. But I'm really curious to get your thoughts on that or whether whether that story that we often tell about PCS canine is actually a case for the exception, it's actually an exception that proves the rule, or whether there are many more of those to be found. Dr Kári Stefánsson 23:34 I don't think that we should be expecting a lot of of the stories like the PCs canines story, it is a it's a one spectacular example of a spectacular success and, and just shirts for dark takut. But one of the things that I believe this is incredibly important to recognise, what it is that we are striving for, we are mainly studying human diversity, we are trying to figure out how how we can put the diseases in the context of what we understand about human diversity in general and escaped cancers. Most of the diseases are caused by perturbation in biochemical pathways, biochemical pathways that are either upregulated or downregulated. And even though the genetics of the disease is fairly complex, many genes come through the upregulation of down regulation of biochemical pathway you may not need, but one target one protein in this in this pathway. And and once you begin to look at human disease in the in the context of human diversity, it is absolutely clear that you can not let the diversity in the seek from suffice because the diversity is caused by this introduction of of the sequence diversity with the environment. And the older the individual is the more opportunity as the environment how to influence the This diversity and one of the things we have been doing of late is to look at both the polygenic risk score for diseases and poly proteomic a score for diseases. And it's interesting that they're all you know, that that the proteomic risk score if you take, for example, cardiovascular disease, and you just take a population in Iceland, people who are average age of 57, that proteomic risk score that consists of level of one to 200 proteins capture several waters of magnitude greater risk than the polygenic risk score. And what is more the polygenic risk score, and the approach on the score are uncorrelated, or the correlation is very, very little. And what does that mean? It means that after the pathogenesis begins, after atherosclerosis start, there is a process that takes over that is fairly independent of the genetics. If you would, however, compare polygenic risk scores and proteomic risk score in a 35 year old. First of all, the origin query score captures more of the risk and what is more the proteomic risk score and the politics degree score are completely correlated, because the proteomic re score is just a surrogate for the polygenic score. But once the pathogenesis begins, then the proteome or the proton mate viscose takes over and and it's probably not the true risk code is probably just a documentation of early steps in the pathogenesis of the disease. So I think that we are now into an era when Yes, there was incredibly important to have the sequence of all of these genomes, but we have to begin to look at them in the context of not just clinical phenotypes, but things like proton Mason, transcriptomics, and metabolomics. Begin to bridge this gap between the genome and the clinical phenotype. And in bridging that gap, you can begin to capture the environmental influences because for example, the proteins or the business molecules in our body, the proteins make everything else and it is inconceivable data environment can have much impact on your biology, unless it is reflected in the peloton. So we have a we have an enormous amount of space to explore new territories to explore in the context of the states one on diversity in sequence and diversity in clinical phenotypes, etc. Patrick Short 27:23 Marilyn, please, you're nodding your head vigorously. I think you agree or disagree, I'd love to hear. I agree Professor Marylyn Ritchie 27:28 completely. And I want to build on one of the things that Carrie said about the biochemical pathways. So as I said, at the beginning, I was a biologist first. So I think about a lot of these data analyses with my biology hat on. And if we think back to some of the early work in Dalian disease even and thinking about inborn errors of metabolism, you know, these are pathways that yes, a single mutation and a single gene causes disease. But within that pathway in different individuals, it's different mutations and even different genes. But therapeutically, it's often the same therapeutic that would treat patients with a mutation in any one of those genes determine any one of those genes. So the way that I've been thinking a lot about this is from a drug discovery perspective, we're not necessarily looking for the PCs K nine and the target for only that protein. But instead, what are the biological pathways that we should think through and put the data together in ways that there are probably therapeutics to be created that would target any of the variants from that particular pathway and would treat patients no matter what the variation is within that pathway. So I think really starting to think about the biology that mechanistically might underlie some of these diseases. And instead of what we not have other PCs, canine examples, I don't think we can say that there probably are some more, but my prediction would be more often we're gonna find sets of genes that in different people, it might be a single mutation in a single gene, but in a population of individuals, it's going to be a set of genes that function together in biological pathways. Dr Kári Stefánsson 29:07 And actually don't forget the fact that the PCS canine story is just an extension of the story of the LDL receptor. So it is it isn't a single Dean and a single proton, it just falls into the in sort of a brown and Goldstein old story. So I think it is we're going to see a lot of things like that and and what is more, we will begin to see a lot lot more interactions. The rats have been, you know, escaping there so far, because it is inconceivable that the genome can function to put together people without having significant interactions between these variants in the genome. We have been clumsy and finding you know, lucky we have not been finding them but we will find them in the in the near future. Patrick Short 29:56 Mark. The UK Biobank is very, very clearly set up for this Future you all have been investing for the last couple of years in proteomics transcriptomics other kinds of flavours of omics? How do you think about the vast array of possibilities? You all have you could recruit more participants, you could sequence more deeply, you could sequence new assays. How do you think about that space in it with finite resources? Dr Mark Effingham 30:19 So I think Kerry is absolutely right, in terms of I think we're right at the start of this. And as huge areas of exploration ahead. I think biobank is probably now the most genetically characterised resource with genotype exome, and whole genome sequencing data already available. There are already projects in flight to add metagenomic data. So we're working with a group in Nightingale who are undertaking a set of measures for metabolic measures using NMR based assay and that work is underway. There's an initial project during the first 50,000 measures for proteomics using the link panel again, I think, significant interest about how we extend that across all half million participants in the months and years ahead. And I think it really is just the start, there's been a lot of interest in looking at how we go beyond panel assays and start looking at Mass Spectrometry. For both metabolomics and proteomics, we have samples on half million participants, we've used about 8% of those samples for all the assays that have done that to date. So there really is considerable material left. And we have that not just from baseline, but for a growing proportion of participants. We have samples from longitudinal time points where participants have come back for either repeat assessment. So as part of our imaging project, so I don't think we're as motivated by extending the range of participants by recruiting more, I think we see most value coming from getting additional time point samples and looking at change over time, and how that can be associated with drivers of disease. Dr Kári Stefánsson 31:56 I have a suggestion, Mark, you see, I wholeheartedly agree with this, and I embrace it, I want to emphasise that I'm in or, you know, when it comes to generosity, you guys have made this resource available to the rest of us. I'm even feeling willing to forgive the bridge for invading the Atlantic fishing limits, because of the UK Biobank. But I think that in addition to what you're proposing to do, I think it would make a lot of sense to look at the children of the participants in the biobank. Because if there is one flaw in the biobank is it is the lack of close relationships that make for example, the discovery of rare variants easier makes it easier to put the rare variants in contracts. So I think that is one of the things that I would think seriously about. Dr Mark Effingham 32:49 I think this is what I think biobank itself doesn't do any research, we really see ourselves there as give us the challenge. Let us go figure out how to make it happen. I think there has been interest on can we go to children and grandchildren. And indeed, we did do that last year as part of astrology project. And we saw huge interest in participation from children or grandchildren for that. So it's certainly one of the ideas, I think if we can find funding, they may be interested in doing that. So I say these aren't mutually exclusive. I think the additional timeframe for the existing participants to really extend the range of information that we have, or potentially also children, grandchildren will be a fabulous addition, if we can get funding to do that. Patrick Short 33:32 I couldn't agree more, I think I was gonna ask a similar question. And maybe Carver portraying our mutual interest in de novo mutations and other things that you can find from family members that I I would second that, that having generation two, three and beyond would be a really very powerful addition Dr Kári Stefánsson 33:48 for us. And we're proud to introduce it, think about it, that a 10% of children, one out of 10 children born has a denoble loss of function mutation in a gene. Alright, and that means that one out of 20 children born is born with denoble loss of function mutation in a gene that is expressed in the brain. So they didn't know when mutation are very, very significant contributors to public health in our society. The other Professor Marylyn Ritchie 34:14 benefit to getting the children and the grandchildren is that linkage between genetics and the environment. So knowing about shared environments, and being able to link all of those environmental risk factors, as well as social determinants of health, you know it within families to then ask questions about the relationship between the genome and those environmental risks that so yeah, I guess I would third, that that would be a great addition to the biobank. We'll help you write Patrick Short 34:37 the grant mark, you just just tell us we don't know what it will cost. I'm sure it's a lot of money. But Dr Kári Stefánsson 34:42 once you begin to think about environment, share the non shared environment that actually focus very, very quickly towards the brain. Because the brain is the control instrument. It is the thing that directs us into environment and away from our environment. And also the brain is the last frontier biology we haven't the faintest idea how the brain works. We don't know how the brain generates a thought we don't know how the brain your brain generates emotion. And it is pathetic because we as a species are basically defined by our thoughts and emotions. And we as individuals of the species are defined by the same things. So one of the things you could do with a UK Biobank is to increase their cognitive function, you know, testing and stuff like that which we are missing sorely missing. Patrick Short 35:28 I don't know if you want any more requests, marker, or suggestions, but we're running out of time here, I'd love to just finish off with with a really high level question. And I'll give you all some time to think about it. But from my perspective, 2000 to 2010 was was really about technology and methodology improvements. How can we get cost of testing down? And how can we figure out not to do candidate gene studies and to GWAS a little bit more effectively 2010 to 2020 was really the age of the genome wide association study, increasing scale, finding your findings of hits in the hundreds and 1000s. And I really want to ask you all what is what is 2020 to 2030? When we look back in eight years from now, what will we say that the 2020 to 2030 decade in genomics was all about? I think Dr Kári Stefánsson 36:13 that the most important if I take the next five years, I think the most important contribution will come from bridging this gap between the genome and the clinical phenotype by gathering the enormous amount of data on proteomics on on transcriptomics and multiple omics that is, that is what is the simplest way, or generating a new avalanche of discoveries. But I hope, we will also make some fundamental discoveries about things that we are missing from our understanding of human genetics, because even in spite of all of these sequencing variants we have discovered are accounting for too little or far too little part of their diversity or the variance in risk, etc. So that's those are my I have an incredibly bad track record to predicting the future. So be careful not to listen to even begin using Patrick Short 37:08 Maryland. Mark, what about you? What are your thoughts? Dr Mark Effingham 37:10 I'd second nice in terms of I think two areas around the kind of multimodal analyses bringing together these different datasets. Also, imaging data, we've got imaging data on what will be 100,000 people in the next few years to really give insight into kind of brain art, etc. I think Federation is the second area to play, I think Marilyn picked up on earlier around how the research community can really take advantage of these large scale resources around the world biobank all of us elsewhere, and really start to be able to undertake analyses across these in a kind of easy and accessible way. But I think a lot of hard work, but opportunity ahead in that regard. Professor Marylyn Ritchie 37:51 Yeah, I agree with both of them have said I mean, I think the the next 10 years is really the era for the data science and informatics to kind of shine. You know, we've we've created extremely large datasets, we've done lots of one variant, one gene at a time statistics, and there's so much more to be done, we have barely hit the tip of the iceberg, I think in terms of what is discoverable in these datasets. So I think we're gonna see, as Corey said earlier, it's inconceivable that it's not interactions between genes between genes and environment and these different kind of multi omics data. And I think all of these folks that have been working in these sophisticated machine learning and statistical methodologies for the last two decades, but the datasets were just too small and too simple to actually ask the questions. I think now that we have these datasets, we're finally going to start to tease apart some of this complexity and embrace it and model it. Patrick Short 38:53 Is it possible that there aren't enough humans on Earth to answer some of these questions, because at some point, we're going to sequence all 6 billion people and image all 6 billion in omics all 6 billion is, is there an upper limit? And do we need to think about other strategies? Dr Kári Stefánsson 39:06 Listen to me, what we are mining, when we are mining data on tumour diversity, we are mining data on a on an experiment that has been going on for 250,000 years. So we there's actually enormous amount of data behind this. And and I think that rather than blaming lack of material, if you cannot do this, we should blame our own stupidity and nothing else. This is our task to do what we can do it. Patrick Short 39:32 I love that. Yeah, I Professor Marylyn Ritchie 39:33 agree. I think we need to be smarter with our approach. Rather than work harder to make the sample size as big as possible. We just work smarter, not harder. Dr Kári Stefánsson 39:44 We need to work very, very hard. And let's put it this way. You know, there are many years since I accepted the fact I have just to live with lemons that was born with but I can always work a little bit more and I'm going to do that Patrick Short 40:00 maybe we need to organise the debate because I know that there is a there is a group of people who believe that saturation genome editing and cellular models are the answer because if we take I'm not Braco, one bracket two expert, but there's a such a complexity of possible missense variants that to see all those in, in sequencing humans may take a long time to get there. Dr Kári Stefánsson 40:25 I don't think we should be nasty to these people. We choose to show them or be nice to them. Patrick Short 40:32 That's right. Well invite them at and show them the way. Well, thank you all. I really appreciate you taking the time is a great conversation. We hashed out some some issues over Icelandic fisheries, and we've made some suggestions to market the UK Biobank team. So thank you all very much for taking the time today. Thank you. Thank you. Thanks, everyone, for listening. As always, if you enjoyed the episode, the best thing you can do is share with a friend let them know you liked it. And you can also leave us a review on your favourite podcast player. Thanks again and we'll see you next time.