[00:00:00] Patrick: Welcome everyone to the genetics podcast. I'm here today with Daniel McArthur, the director of the center for population genomics based jointly at the Garvan Institute of medical research in Sydney and the Murdoch children's research Institute in Melbourne. That is Australia. It's nine at night for Daniel and 10:00 AM for me here in the UK. So thank you, Daniel, for giving up your evening to make this happen. Absolutely. [00:00:24] Daniel: Absolutely, pleasure to be here. [00:00:25] Patrick: No, thank you. We're going to cover a lot of ground today. I hope Daniel has led the development of some of the largest datasets in human, genetic variation, and we're going to cover what he and his team has learned, what they've taught the field and also what they haven't learned and how that's motivating some of Daniel's next big research focus areas. I'm also hoping that we have time to cover some other big time. That aren't necessarily directly related to genetics, but are certainly one or two squares away on the chess board. So to speak, including how to fix the broken model of academic publishing, something Daniel has talked in and tweeted a lot about as well as advice and mentorship for early career researchers thinking about the dichotomy between academia industry or the lack of dichotomy potentially as the industry has changed. So with that long intro, Daniel, welcome to the podcast and thanks so much. [00:01:10] Daniel: Thanks, Patrick. Absolute pleasure to be here. [00:01:13] Patrick: I'd love to jump right in to the large genetic data sets that I referenced, which are of course exec and its sister or a sequel nomad. The exec pre-print actually was one of the first. And I think possibly the very first paper that I read during my PhD, my supervisor, Matt hurls handed me this paper. I think it was a pre-print at the time and said, read this top to bottom. See, if you can reproduce some of the methods. And one of the things that actually struck me at the time was just how open you all were. You shared everything, data methods, all the code, and this is not something I was used to seeing in papers. Normally it was a see here for the coded data that it was a broken link and we will get back to that. But I'd love if you could actually take us back to when the idea for exec first came about what was the idea for those who aren't familiar with the project? How did you get it off the ground in particular coordinating hundreds of investigators around the world to share data. [00:02:04] Daniel: Great question and it's a, it's a good nostalgic trip for me actually, to go back to the early days of this. So this, this really dates back to the first year after I'd started my lab, which was back in 2012. So I had a brand new lab. It was at the Massachusetts general hospital and the bird Institute in Boston. We had just started sequencing our first genomic data from patients affected with rare diseases. And this was a collaboration with a, an Australian group I've worked with for a very long time, a set of patients with different muscle disorders. And we had run exome sequencing, which is a technology for looking at the sequence of the person cutting bits of the genome, and that as we, as we started to analyze that diner became increasingly equated, but it was very difficult to make sense of the variation we were discovering in. With the existing biases that we had a variation for a few reasons, the existing databases that existed out there were pretty small. It was about 6,000 individuals at the time. And I had been generated using older technology. So there was a lot of inaccuracies and errors in the existing data bases. And so I, I, I started to get this gem of an idea in my head that we could actually take advantage of the fact that the bird was now sequencing more exomes than anywhere else in the world. And build their own reference database. So take advantage of all of these tens of thousands of people that were being sequenced and, and put that all together. And initially that was just an idea. And actually, it seems pretty insane when we first started it because of the political complexities around that. But I got a, I got a lot of help early on in particular, benefited a lot from conversations with an amazing mentor, David Altschuler, who, who later went on to become chief scientific officer at first. And David is a master of understanding the motivations in academia and how to make big things happen, even in the sort of complex political spaces of academia. So I spend a lot of time talking to him and to two other mentors like mark Daley and , and it became increasingly clear that this was actually possible. There were a few different strands that were coming together that would make it feasible to build a. The first of those was the sheer amount of data that was being produced. So we had about a hundred thousand exomes by then that had been generated at the board that we can potentially use. The second was that there were new methods that had just come out that allowed us to look at genetic variation at this massive scale. So it became possible for the first time to think about actually calculating a particular side of variation across tens of thousands of people at the same time. And then the third was, I think that we, we had an unusually collaborative group of investigators who were involved in these very large exome sequencing projects at the bird Institute. These were people who were, who had to were typically generating these data with a particular scientific goal in mind, usually case control study of some complex adult onset disorder, like type two diabetes or heart disease. But I, I could see the benefits of making that data available to a bigger project that would then create a reference database that anyone in the world could use to analyze their rare disease patients with. And so we set about trying to build that, but that was really the Genesis for those, those early conversations about the politics of getting this right. [00:04:54] Patrick: And how did you assemble that? I assume there was an initial core group of 10, 20 investigators and organizations. How did you assemble that core group and what were their motivations for getting involved? [00:05:05] Daniel: So we had, I think the first exec consortium had, I think it might've been about 25 investigators involved in it. It was a, it was a pretty pragmatic set of decisions. We, we really, we, we wanted to build this as a consortium from the ground up as a, as a consortium that would be functional and actually work together well. And so we chose the investigators we worked with pretty carefully. These were all people who've had developed large collections or cohorts of sequencing data, but also they were all people who were genuinely pleasant to work with. So we kind of scored them on this, on this measurement. I have lots of data and also where we're just genuinely decent human beings. And that that collection of investigators, uh, was, was really the cool for a whole host of, of downstream aggregation activities later. And they were, they were amazing, actually, the, it is, as you said earlier, In the space of academia, there's often a reluctance for people to share data. They're often very suspicious about the idea of someone else taking that data and doing things with it. And I think we got very lucky in having a set of people who were not just willing to make the data available, but also genuinely excited about the opportunity of using this aggregation to benefit to, uh, interpretation of variants in rare diseases [00:06:12] Patrick: that article about the, the research parasites. Right? So sometime, uh, sometime my memory is hazy, but, uh, the research parasites who were taking all this pesky public data and, and making something useful out of it, [00:06:25] Daniel: that's right. How dare they? Right. But I think probably a point we'll come back to later, but I think in genomics, in general, there was a, there was a better culture than many areas of science about the data sharing. But we were fortunate enough, I think, to be parked in a particularly collaborative corner of that, of that community. So a lot of people who were genuinely very happy to, to have that. [00:06:42] Patrick: If my memory is correct the first major publication or not a publication in the strictest sense, but the post on bio archive was sometime in 2015. What were the biggest things that you all learned? What were the big surprises in particular from assembling this very large dataset and, um, and, and really starting to crawl through it. And now. Questions that you probably didn't even know you had when you started the project. [00:07:05] Daniel: I mean, there were, there was a scientific stuff we learned and there was also the Metta science aspects that we learned about which in many ways were, were more illuminating. But one of the things we were very lucky to be able to do again, because the consortium was so willing was that we were able to make our data available long before we actually wrote a publication about it. So we released the first version of XR. I think it was back in 2014. And that was a might have been I think it was a year and a bit before the, even the pre-print came out. And that was, that was extremely powerful for us because it meant that the dataset was out there. People were using it, they were diagnosing patients using that data. It was by an order of magnitude, the largest collection of exomes that have been put out there into the public domain at that stage. And so it very rapidly became the default clinical reference database and got, got used very heavily, pretty quickly. But the thing that I learned from that. Was not just that you can have more impact by releasing your data early, but that it also benefits your science. And as a result of having that data set out there and people crawling all over at night, hooking it, all the edge cases and looking at their favorite gene and exploring the variance when they found weird stuff that didn't make any sense, they would bring it back to us and we would dig further and figure out if there was a systematic error. I think that sort of crowd quality control was extremely useful for improving the quality of the data. And that happened really. That whole year between releasing that I said and running on that was, was about that iterative process of quality control and improving the data. And then we got to the science in that first paper we wrote a lot about in particular, we were able to discover a lot of new variants. So most of the variants that we found had never been seen before, but the other thing we were able to do, and this was thanks to some stellar work by Caitlin Smoker at the time a graduate student in Mark Daley's lab was to develop a statistical model that allowed us to identify the genes where variants appear to be missing some particular classes of disruptive variation were less common than what we would expect to see by chance. Um, and there's obviously an area, you know, well, you did it scientific work in this domain yourself, but the, that was, that was very important because it pointed us immediately to a set of genes. Where it was clear that they were very important, that there was almost no one in the type of set who carried, uh, predicted disruptive mutations in these genes. But for, for about three quarters of them, we had no idea what their actual function was. There's no associated disease, there's no associated biological innovation. We just know that these are really, these are really critical genes and that, that list of very highly constrained genes. Was, uh, was the basis of a lot of the work that was done in that publication. And also ended up guiding a lot of downstream disease discovery work as well, because we could prioritize those genes for identifying potential disease causing variation. [00:09:38] Patrick: I think it's hard to overstate how useful that concept of the, of the loss of function constraints. I think I used or heard the term every single day during my PhD, probably without fail from 2015 to 2018 or whenever it was, because it's such a useful concept. I wonder if you could talk a little bit. About that. What, how do you find these constraint genes? Why are they useful? And, and some of the applications to rare disease, gene discovery, drug therapeutic discovery, because I think it's just such a once. I think once it's explained, it makes a lot of sense, but it definitely blew my mind in 20 14, 15 when I first learned about it and something clicked into place that it made a ton of sense. So maybe you can talk a little bit more about that. [00:10:18] Daniel: Yes. There is something counterintuitive about it at first glance. I think about the fact that really what you're looking for here. It's not just the variants that are present in a dataset, but the variants that are missing and, and gathering information from the patents of missing those from that data. But yeah, just take a step back. So the class of variation we were particularly interested in were variants called loss of function, variants, which is basically why I'm saying that these are variants predicted to disrupt normal function of a protein coating gene. And these, these are generally practically defined as variants that are expected to break the coding sequence in some way, either by introducing a stop signal or disrupting the normal reading frame or breaking the, the normal splicing of that chain. So these loss of function variants. Relatively easy to find. Although you have to do some careful filtering to get rid of the artifacts, but we, we found a lot of them. So hundreds of thousands of these variants in exec, and then what we were able to do, I'll just say up front here that the, the fundamental statistical innovations here rolled down to Mark Daily and Caitlin. But the, what we were able to do was to develop a model that allows us to predict for any given gene in the genome, how many loss of function variants we would expect to see by chance. And 60,000 people, if the only force that was acting on the genome was mutation. So in other words, if natural selection was not acting to remove harmful mutations from the population, then we would expect to see X number of loss of function variants in that gene. And then we can then compare that number to the number of lots of function, variants that we actually observed in 60,000 people. And the difference between those two numbers, that is the ratio of the observed. The expected to the observed tells us how many variants have actually been removed by natural selection. And therefore that is a direct measurement of the harmfulness of disruption, of that particular gene. So constraint basically ends up being a really neat mechanism of, of being able to define that this chain is, is important. If you mess with it, something horrible happens. And those people generally will not be either won't be able to reproduce, or at least I'm not able to be present in a, in a reference dataset life, like exec as a result. And that yeah, so that loss of function can start then because, because it is a measure of the, of the importance of particular gene that them proves to be a good wife, prioritizing the genes that are most likely to be associated with very severe diseases. Yeah. [00:12:29] Patrick: And I think once you understand the concept, you start to see so many interesting applications. So there's the rare disease gene discovery application, which I focused a lot of my time thinking about, which was if there are people who are healthy or at least healthy enough to be in one of these databases that, and none of them, or very few of them have lots of function, these genes, then it tells you that when you're looking for a rare disease, Gene in a cohort or rare disease patients, you might want to focus on these genes, because as you said before, they've been suggested to be harmful if, if disrupted, but I think there's an interesting flip side of it as well, which is all those other genes. You, you can say the opposite and say, drugging this gene for example, is likely to be safe. So if I think an example, actually, that you all worked on is LARC two in Parkinson's. So patient. We have an overactive version of LARC two, um, can end up at a much higher rate with Parkinson's disease. So the question is, if we knock LARC two down with a drug, is it safe? And you can actually use this data set to answer that question as well, to say, there are many people in the population that have LARC two knocked out naturally and they appear to be fine. Right? So th th what, what other interesting applications did you find as you're opening the datasets to the community? [00:13:40] Daniel: Yes. I think that's the right way to think about this is the idea of loss of function variants as these natural experiments, that where there was a person out there in the wild, who has a, either one or in some cases, both copies of that gene knocked out. And if you look at that person, you can then learn about what happens when that, when that geneis removed.. And that idea of genetic variation as a, as a series of natural expression, natural experiments that we as geneticists can study is new obviously. And we've learned a huge amount about human biology by looking at individuals who have particular diseases and then going back and figuring out what genes caused those diseases. The interesting thing, as we start looking at these very large datasets, like, like exec and Enlighta even bigger, no man tried to set is that we can go in the opposite direction and find people who have disruptive variants in a particular January, we don't know much about it. Or as you say, where we have a hypothesis, that it might be an interesting target. We can find those rare people who do actually have, uh, uh, a lack of expression of that gene. And then, and then study them to figure out how they healthy, which kind of you say you want to Jane is probably okay if they're unhealthy or they are, they unhealthy in a way that teaches us something about the potential side effects and that's also been. So that is a great example. Mark Daily often talks about, for instance, is you got one, a gene where we know that inhibitors that we found in clinical trials that inhibitors actually cause some pretty unpleasant gastrointestinal side effects. And it turned out later that if you, if you have loss of function, variants into DCAT one, you get severe inflammatory bowel disease, et cetera, really informative. So I think these natural experiments are really powerful way of learning about gene function. That was just when we spent, we spent a lot of time exploring these loss of function variants in the next second nomad. For that reason, lotsa was the example. We probably chased the hardest but we did also spend a lot of time taking most of the common loss of function variants that showed up in the data set and trying to clean those up and then go through all the genes where we found these common variants to see if we could understand what's actually happening when they stay in the disrupted and the degree to which these LOF parents are actually real, genuine muscle function, variants acting in the, Population [00:15:41] Patrick: and you mentioned the excruciating work of cleaning everything up there. There are definitely many unsung heroes of building these data sets that spent many years doing the very challenging work of going through and actually understanding what what's artifacts what's real. And building pipelines that allow you all. I was always so impressed with how open and reproducible everything that you did was that someone could actually download everything on a Google cloud data pipelines, rerun it, and then everything pops out exactly the same as, uh, as you expect to which almost never happens. It's hard to understate for those who aren't doing science everyday, how rare that is to find. So tell me more about that, that culture and why that was so important. [00:16:22] Daniel: The reproducibility aspect. I mean, that is something we believed in very strongly as a group and we still do, or is the idea that whatever, um, whatever it is that we create, it should be possible for someone else to, to really thoroughly redo that with data that's publicly available. But that came about not just through altruism. Wanting to make sure others could benefit, but also the painful lesson, which is that if you, every time you analyze a particular dataset, you can almost guarantee you. That's not the last time you'll analyze it. And I know, you know this as well. There's, there's a, uh, there's a tendency to, uh, to learn something new. Every time you run an analysis to people, you'll find some artifacts that didn't show up the last time around, and then you go back, can you clean up that upset you rerun your analysis, and if you don't have pipelines, that are beautifully documented and easy to rerun again, you're going to spend all of your time recreating those pipelines and redoing them. And so, as a result, it's actually a kind of a good form of self-defense to build for reproducibility, because that means you noted for sure that you can go back and clean those things out. But you're really point is spot on. Actually, I think it is often, it is not talked about enough. The fact that the real work of genomics is quality control. Most of what we do day-to-day is we get a big dataset and we filter we clean it, we run a whole bunch of plots. We look at them and we scratch our heads because they don't look the way that we expect. And then we go back and we figure out why, why they're wrong. And we repeat that process over and over and over again, until finally we have a type of set that we feel we can trust, or we hit the point of exhaustion. But that is in the, in the case of these big releases, exec and nomad that was often multiple years of work by, by a very talented set of individuals who, um, who became over those years. Extremely good at figuring out why puzzling results were puzzling. [00:17:59] Patrick: I'm smiling because I'm just remembering it in the, in my PhD with Matt, both Matt and Jeff Barrett had to deal with me thinking I had discovered something amazing. So many times only to be told that, uh, did you double check that it wasn't a bug or an artifact versus you get that drilled into you? The first, the first year, I think of every PhD student's life. They see stop between thinking they've discovered something amazing and then realize that they've discovered. In their own codes produced. That was [00:18:24] Daniel: depressing mantra in our lab. And actually, I think, I think I might've learned this from Matt huddles, but, but the idea that the more interesting result is the more likely it is to be false, more skeptical. Exactly. The more excited you intuitively become. As soon as you say that result, the more you should just be jumping all over it, trying to figure out why it's absolutely wrong. And because it almost certainly is, and it takes a lot, I think to convince uh, particularly those two mountain Jeff trying to convince those two of anything is a tough ask., uh, yes. [00:18:51] Patrick: You recently moved from Boston to Australia in order to become the director of the center for population genomics. And I'm really curious why you decided to make this move. What plans you have, why not continue to grow exec and nomads to be ever larger orders of magnitude. What made you decide to switch and pursue something a little different? I [00:19:09] Daniel: mean, certainly not. Loves the work that we were doing with exec and nomad. It had been, I mean, at the time that I moved, which is right at the end of 2019, it had been about eight and a half years at the roads. And the work was continuing to be amazing. I love the team that we had built. It was, it was just a great, great group of individuals, both scientists and human. But in 2019, there were a few different things that came together to make me think about a return. The biggest and most important one was, was family. And really the, you know, we had been away at that point for 12 years. Those of your listeners who are Australian when there was a certain gravity to Australia, but just kind of pulls people pack. And the, the idea of, for me, of, of our kids being able to grow up. Knowing my grandparents, knowing that cousins growing up with Australian accents, that was, that was very appealing. And so we had, we had been keeping an eye out for opportunities in Australia for a while, but then the, the other thing that, uh, had, had really changed was the genomics landscape in Australia has shifted profoundly in the more than a decade that I was, that I was away from the country. When I left there, wasn't really genomics in Australia to speak of. And then by 2019, and we had dedicated funding systems that were supporting genomic medicine. We've got large scale national consortia that are overseeing a really large scale efforts in cancer and in rare disease. And, and I think a major shift in culture towards the idea of genomics and data sciences is becoming a first-class science as opposed to something that was done overseas. So the idea of being able to take the lessons that we have learned in these very large scale projects in the context of the road. And being able to bring those back to Australia and the unique challenges and opportunities that existed, that was pretty exciting as well. And so in, in 2019, I, I started to explore various opportunities back in OZ and was very lucky to, to have been the supportive of two institutions, the Galvin Institute in Sydney and the Murdoch children's research Institute in Melbourne who were willing to come together and jointly fund the development of a center that would be split across the two sites. And, uh, where we would then be able to build a virtual center with people by Steamboat cities that was very focused on building large scale population, genomics resources, and tools with the idea of not just building foundations for genomic medicine in Australia, but importantly, doing them doing that in an equitable fashion. And so that was, that was the Genesis of the center for population genomics. And the scientific vision that we've been able to build, uh, has, has really focused on this, this core idea of. Ensuring that as Australia moves into what is now very clearly a transformative era of genomic research and medicine that we do. So in a way where we're learning the lessons from, from other countries, but taking advantage of the fact that we have a very diverse population, including a large set of indigenous communities, Aboriginal and Torres Strait Islander communities, as well as many, uh, more recently arrived individuals of both European and non-European ancestry. Uh, there's 25% of Australians were born overseas more than half of us have at least one parent who was born overseas. So there's a, there's a lot of diversity we need to make sure is representative. And if we want to get genetic medicine, Rocky, or we need to make sure that all of those communities are actually engaged in the process of genomic research. And we build resources that ensure that they can benefit just as equitably as, as Europeans can from the practice of genomic medicine. [00:22:31] Patrick: How does achieving that aim manifest differently in terms of what you do because with exec and nomad, you were first and foremost, coordinating existing data sets to come together and share is, is the challenge. In Australia right now that those data sets don't exist and need to be generated or do they actually exist and they are already represented of the population, but need to be brought together, or is it really about creating new representative data sets? [00:22:58] Daniel: It definitely will require creating new datasets. And that's the consequences of that act that I think, and this is true in countries all around the world is that for, there is a long history in genetics ending up with large cohorts that are fairly European centric and it's not through maliciousness or specific racism that this is a reason it's it's I think largely. And it's, it's also not. Uh, non-European individuals are less interested in participating in genomics. In fact, if anything, they're often more interested, the fundamental issue is that the typical approaches that are built for recruitment in these cohort studies and not explicitly designed to engage with work with communities outside that traditional Anglo European. Uh, culture, that is the gap that needs to be filled and, and, and we can't fill it through aggregation of, of existing datasets. So that the real difference in the way that I'm, that I've been, we set up the center to do science in Australia, is that, that begins part from the outset with community engagement, with actually going out and working with communities to understand what they think about genetics, what they think about participation in research, their understanding of those of those concepts. And, um, and then, and then how we can ensure that we can do. Uh, we can do genetic research and build resources that actually resonate with the way that I think about, uh, the bodies in themselves and that they're comfortable and where we respect the community desire to do this the right way that I think is the most important thing. This isn't a lesson that's unique to Australia, and I think it's actually been fantastic to see now, uh, projects launching all around the world that I'm much more focused on inclusion and equity than has ever been the case. Yeah, and that, and that general recognition that if we are to do this properly, we actually need to talk to these communities from the very beginning and not just kind of cast the net wide grab, grab whatever participants are the most convenient and we'll check. [00:24:43] Patrick: How do you think about or balance the tension between on the one side value of open data participation, research, data sharing, but also on the other side, that the very real desire of communities in particular, I believe indigenous communities in Australia to protect their genetic data from exploitation and not have it, you know, simply like you say, cast into a net and take it away. How, how do you balance those two ends of the spectrum? Which, which really from their own frame of reference are, are both equally right. In many ways. [00:25:12] Daniel: And I think that the most important thing is to acknowledge up front that there is actually a fundamental tension and not try to sweep it under the carpet. What side of it is that there is actually some happy middle ground where everyone can be on happy. In fact, it is the case that from a pure culture blind research perspective. It would be ideal if all data could be made fully open and any researcher can access that and reuse it for their own purposes. And that, you know, that that is, that is the case. I've, you know, I've certainly benefited enormously in my career from the fact that there are many groups who have been willing to make that data available in various different forms in a way that we can reuse. So that. But at the same time, it is also absolutely the case that there are many good reasons why groups, particularly indigenous communities, uh, feel a strong need to protect and think very carefully about the governance of the data that comes from those. From those groups. There are long histories of misuse of data that arises from those. There are long histories of a lack of consultation, of a lack of really, of any respect given to those groups in designing research projects. And as a result, there's a often a real lack of trust about the research situation and, and a desire. I think perfectly sensibly to ensure that the indigenous voices are involved in making decisions about exactly how a particular data set is used. So for us, I think the first step is acknowledging that that tension. And then the second thing is, is working with groups that, uh, directly thinking about indigenous communities and what they want. So for us in the center, what that has meant has been building a set of collaborations with, uh, indigenous researchers who were driving. Indigenous genomics consortia in Australia. And we're really fortunate, I think to have some fantastic, highly collaborative, extremely thoughtful leaders in this space in Australia, people like Alex Brown who's based in south Australia or as Alex homie's who's based in Canberra, but themselves, um, people from indigenous communites. With strong backgrounds in Alex's case, in medicine, in Azure case, community engagement who deeply understand the needs of those communities and are now driving the research forward in those spaces. So our view here is that we, as, as genomacists provide support to ensure that those efforts can get up and off the ground. And what this essentially will be doing is building is funding sequencing and building data, infrastructure, and longterm systems for management and analysis of data that fits with the needs of indigenous communities to have self-determination and control the sovereignty of that. [00:27:35] Patrick: I'd Love to hear more about what you see over the next 2, 3, 4, 5, however long your, your vision goes out for, for this. What are you taking from your eight and a half years at the Brode Institute who are obviously the world's leading genomics Institute and do a lot of things, right? What are you doing differently here? And what are you, what are you taking from their importing over to say, this works really well, and we're going to run with it. And what are you doing differently in terms of the way you think about generating these large scale, inclusive datasets and running the center as a whole, not just the datasets. Really the I'd love to hear more about the whole philosophy. [00:28:08] Daniel: I'm not sure so much actually there's, there's necessarily huge differences in the way that we're thinking about it in part, because the, the broad thinking has evolved in parallel with, with so many other groups around the world and starting to think about all these things. The, um, we've moved from a very reflexive and responsive model of kind of what you use, where it's all about what data sets we can, we can grab to a model where it's very active thought being put into making sure that the, the groups that aren't currently represented do get represented. In fact, a huge amount of work now underway at the broad rate, including people like Alicia Martin and others so, but anyway, in the sense of setting. Um, mean the way that we're approaching this is building community engagement into the process of actually having a team embedded within the center that thinks about this is actually it's the first time that I've had anthropologists and social scientists, which has been enormously fun. Having a set of people who come from it, you know, very far from genomics as a background, but who bring in really deep experience in thinking. The issues associated with engaging diverse communities and research, and then being able to bring those worlds together. Think about genomics and culture and anthropology at the same time. So that being fun then in terms of modifying the way we work to the uniquely Australian aspect of the landscape, there there's been a few changes. One has been, um, in the bird environment. We didn't have to think as hard about when you, I mean, we use cloud computing for everything that we did. So we had the benefits of enormous scalability and that's. That was a lesson learned the hard way at road is that if you're not on the cloud, it's basically impossible to do analysis at this at the scale of exec and nomad. So that's something we've moved over to Australia as well. And the whole center is built on cloud computing from the outset at the moment. But what we are doing is making sure that we have data localization sorted out. So that means that the data that is that we, that we bring into the center, particularly the data from indigenous communities. But in fact, all the datasets we collect that never leaves Australian soil. So although it's on the cloud, it's always posting Google type of centers that are physically in Australia. And that, that I think is something we're going to say a lot or have, as you know, we're already seeing a fair bit of it. As, as other countries launch, there are national genomics projects. It will definitely be an awareness of that that need to maintain jurisdictional boundaries and sovereignty of, of data that's being collected. And then I think a lot of the rest of the stuff that we're doing is very well aligned with just making sure we're following global best practices in data security and logical analysis approaches and those types of. [00:30:26] Patrick: I'd love to get your thinking on. And, and this is definitely a loaded question. The how useful genetics is, or isn't in therapeutic discovery. There's a level of hype that genetics has gotten. And, and from some corners of Twitter, it's all, it deserves it all from other corners. It's the exact opposite. It's been a huge waste of money. I'm really curious for your, I imagine it's probably somewhere down the middle, but I'm interested in your take of whether genomics deserves the hype that it gets and what it needs to do to fix. [00:30:54] Daniel: Um, no, obviously I'm biased on this. I'm not sure I'm right down the middle. I think I'm a bit of a genomics evangelist generally, but I mean, it is interesting when you talk to veterans in the pharma industry business, weariness about them as they sometimes talk about genetics, they've seen previous fads and drug discovery come and go, or they know what it looks like. There was definitely a significant component of hype in this as, as with any other major transformative technology. Um, it's very easy for people to over-talk it or deploy it in ways that it's not worth not actually productive, but I think genetics is definitely not a pure hot poppy endeavor. In fact, I think it probably will end up being the single most important tool. to date, in characterizing the function of human genes and then understanding the mechanism by which we can actually then intervene and therapeutically and create the outcomes that we want. And the reason for that comes back to that, that concept of experiments of nature. We have historically had a fairly limited toolkit to understand the functioning of human genes. That's often involved understanding people who have a particular disease, but also spending a lot of time in animal models and cell models in various obstructed versions. Of underlying human biology and all of those have the, of that limitations. The, the benefit of human genomics is that we can look at the, by looking at genetic variance that disrupts the impact of a gene. We can actually look directly at what happens when you mess with that gene in the living human, where that gene is affected in every cell in their body over their entire life span. And understand what, what impacts that has in biology. And that, that is I think, a transformative thing. It does. It teaches us something about, about biology that we often couldn't learn in any other way. And that's particularly true for some diseases. If you think about mental illness, schizophrenia is a great example where we're talking here about diseases of the adult human brain, a fundamentally inaccessible organ that you can't model in a, in a dish. You can't model in a mouse. Uh, these are not diseases that, I mean, you can come up with with various, you know, very limited models with all the caveats, but ultimately the best model that we have is to look at humans who have these diseases and then use genetics and other approaches to trace back the underlying genetic circuitry that's gone wrong. And therefore what the biological mechanisms are and the other nice thing about genetics is that it, it teaches us about direction of effect and mechanism. So if we see that a loss of function, variant has an effect that pushes disease in one direction. And it's pretty likely that for instance, we have a protective loss of function variant, and it's, it's pretty likely that that drug, that inhibits that gene is also going to have a related effect. So I think we're now in a world where after multiple years of probably too much hype at times. Including some wild promises about all common diseases being solved through genomics in a very short timeframe. I think we've now settled into a more mature understanding of what we can do here. And we now have big success stories in genes. Like PCSK nine where human genetics were critical for understanding the impacts of that Gene could have on, on LDL cholesterol, whether we already have some drugs, admittedly, with still with small market effect, but still some, some new successful drugs in that space. There's a whole host of new drugs coming out now based on other discoveries in that heart disease area, episode three and other genes involved in triglycerides and LDL cholesterol. And these I think will be transformative. And then this is just scratching the surface. There's a lot more to come as we start to dig more and more into these very large datasets of variation. I mean, fundamentally what we're doing here is as we build up a big enough data set of hundreds of thousands of humans with information at every position in the genome, And then all sorts of information about clinical famous art is where we're reconstructing the genetic wiring diagram of, of the human re reconstructing. That map that goes from genome sequence, right? The way through to biology and health and with a large enough metrics like that, we can really start to get to the point that we have a deep understanding of mechanism, most genes in the gene. In a way that really teaches us how we could intervene to reduce the risk of disease. So, yeah, I think there's some hook that has been official, but I also think this is already changing the world and it's going to change. Yeah. [00:34:34] Patrick: It's been remarkable as well. I think to see some of the early data that starts to come out last year around the one-time gene editing, you mentioned sake, Catherine sin, and I think they published some of this data, but that's such an interesting and fascinating thread that I think will be pulled over the next decade. Really? About how far can we take that? [00:34:55] Daniel: I mean, it's, it's a glorious time to be in, in human biology. Right? You've got these two threads that are coming together at one particular point in history, which is an increasingly deep understanding from genetics of the ways the genome sequence can affect biology. And at the same time, a whole set of different platforms that can be used to tweak the sequence or the expression of genes in very fine. Scale ways so CRISPR gene editing has, as you said, you know, from, from Verve and other companies, Fantastic example, where you can already start to do that type of work. There's also all these MRNs platforms that allow us to subtly affect your up-regulate or down-regulate the expression of a gene in particular lines. These tools, along with this very close understanding of human biology will absolutely change the way that we approach these things. So it's remarkably exciting. And I think, well, we were already starting to see an impact of that in rare disease. That's that's the place I'm currently most excited about it because I think that's where 95% of our disease is still having absolutely no therapeutic approach at all. This is where we will have the biggest impact, the fastest. [00:35:50] Patrick: That's right. And I think the thing that will make more of the pharma industry into believers will be the first set of common. So rare diseases, obviously transformative impact, but the first set of common diseases where genetic subtypes of those common diseases can be treated really effectively. LARC to Parkinson's APOE for Alzheimer's obviously Alzheimer's has been a very tricky area, genetic subtypes of Nash or genetic drivers of Nash. I think there'll be a couple of examples in the next few years that people start to realize, actually, this Alzheimer's this enormous problem. If we start taking bites out of it with genetic subtypes of individuals, that we can actually treat some kind of core underlying biological process. I think that that might be. Uh, turning point as well, when we start to see the first few of those examples. [00:36:36] Daniel: Totally agree. So the application of this technology to create novel, fundamentally novel therapeutics with enormous market size is that that will make a believer out of it. And one, I think, yes, [00:36:46] Patrick: I, uh, as we're talking about market sizes, I think it'd be a good segue into. Academia versus industry or hybrid of the two. You had a really great tweet a couple of weeks ago. And I can tell this is just a scenario you're really passionate about how the boundaries between academia and industry now are really more poorest than ever in a early career scientists, late career scientists, graduate student, whoever thinking about these two, shouldn't think of them as completely independent fork in the road tracks. I'd love to hear your thinking on this. What can people like you do to better prepare people earlier in their career on making these decisions. And how do you think about this academia versus industry dichotomy? [00:37:25] Daniel: It's amazing how much it's changed in this, on that I've been most through the course of my career. I mean, I, I, as a, as a baby PhD student in Australia, I didn't think I'd ever met anyone who worked in an industry setting probably right through until the last year of my PHD. I did consider industry options as I was contemplating my postdoc transition and you applied and went through actually a couple of interviews at that stage. And then, but then ultimately decided to stay in academia and going to that postdoc role, actually in every job transition since then has spent a fair bit of time deciding, or actually sometimes agonizing over opportunities that exist both in academia and in industry. And I think it'd be a big part of the reason for that is, is as you know, better than anyone. But the, the opportunities to do exciting research in the industry space and now are bigger than they were historically. And at the same time, boundaries is also becoming greater I think on the academic side, I mean, a lot of the projects that I ended up doing in the academic setting, where industry scale projects, you know, we were here, we were talking to her about doing, you know, lodge, production, style engineering work. Uh, so that, that was, you know, often doing academic, uh, academic work in a way that was closer to the way that we might think about production scale, working in an industry setting. So I do think that the boundaries have become, uh, become gray up and, and, and also there's an increasing ability for people to move back and forth between, between those worlds. The other thing that has changed. And that I really enjoyed about my time in Boston is the perspective of academics about moving to industry has gradually shifted. And it actually, it's been one of the slightly less pleasant surprises about moving back to Australia is I feel like I kind of jumped about five years back in time. When I talk to him sometimes through Australian academics, there is this sense still there's someone who goes from an academic role into an industry role has taken a step back. You know, they must not have been good enough to stay in academia. And, but while I was in Boston, it was, it was, that was absolutely not the case. In fact, for the trainings, we came through my lab. With every single one of them, you know, I worked from the outset to ensure that they had exposure to both academic and industry opportunities. Every, every PhD, student and postdoc who came through the group, I might show that they had, they were involved in at least one industry sponsored research agreement. So they could get scenario, industry scientists and understand how, how things work and to touch them. But the majority of my trainings went on to industry positions out of choice, because know they had, I had great offers in both academia and industry and they went, they went down the industry where, and I was delighted about it. So the things are changing, but we still, we still have a lot of work to do on that in particulary Australia. We still have a lot of work to do, to educate people about, about that shift. The point I was making in that, in that series of tweets was encouraging people to avoid a particular mistake of saying a lot of young academics make which is to think about career choices in this very serial fashion, where they, they first say to themselves. Okay. You know, think about that industry. I know it's not that useful. I know that industry has good options there. But first, I'm going to think about academia and I'm going to make you all of my possible academic options. That's my plan. And then if academia doesn't work out, then I'll start thinking about plan B and consider my industry options, or at least, you know, the first of those. And that that is a very risky approach to planning a career because it means that you spend all of your time in that, in that first phase, you know, not building any of the connections or the skills or the, the marketable qualities that are required to transition into another role. And you're kind of leaving yourself open to the weird arbitrary, random aspects of, of academic life. The fact that if you get, you know, if you miss out on a couple of grants or your pipelines get badly reviewed for one reason or. You can end up through no fault of your own, or you get given bad projects by your PI. You can end up through an actual lots of yard. Everything's just not working at an academia. And then suddenly you're having to make career decisions at a point where you're not feeling great about yourself necessarily, and you haven't planned for the next step. So instead my, my strong encouragement to young academics now is to think very actively about all of the possible options as early as I can and do that in parallel. So, so think to yourself, what are the things that I love about science or, you know, what is it, what is it that motivates me that drives me to get things done, consider all of the possible jobs that involve that, those types of activities and explore all of those in parallel so that you, and then ideally as you start thinking about your next transition, apply for jobs across all of those domains. So yes, you know, apply for post-op. But also make sure you've talked to industry scientists. You've applied ideally for industry roles at the same exciting. And then interviews are very unpredictable and you may well find that you get also to different offices at different times, and it's, it's hard to, it's hard to get these things right, but at the same time, that way you can be pretty confident that you've done your due diligence. The role that you end up in is probably the one that was best suited for you, as opposed to just one you ended up in by default because you, you had sort of left yourself. Sorry. And then the other message I tried to get across, because I feel like I've spent so many conversations with young trainees about this is that it's not your fault and you should not feel bad about the fact that academia has not been the path that you followed. And that just because you're surrounded by other people who see academia as the only path, it doesn't, it doesn't mean you should fall for that kind of Stockholm syndrome and, and get trapped in that mentality. That that's the only way that you can be successful. You have to really think about what makes you happy and what actually allows you to have the biggest impact on the world. And that in many cases will not actually be in academia. And that's totally okay. It's not, nothing has gone wrong if you end up in a different world. [00:42:19] Patrick: I think that's so well said. And I personally had a lot of great people around me who said very similar things while I was a PhD student, but I also had a lot of friends who had people around them who didn't. And it's really hard when your PhD supervisor, as an example is an academic hard liner. And you're not actually sure if it's right for you. You know, I think you're saying that is so important because there are a lot of people whose I think aren't hearing that directly from their supervisor was the only other thing I'd add that was personally really helpful for me was this concept of one way and two way doors, that there are decisions that you make in life that are one way doors and you can't go back like, um, yeah, there's a spectrum or degree this, but when you marry someone, you've made a pretty significant commitment and it's hard to roll it back. [00:43:01] Daniel: There's another great example of [00:43:03] Patrick: even better example. I think 10 years ago, it was a little bit more of a one-way to. When you left academia, it was harder to come back in and, and, but today it's very much not. So you can go work at vertex for two years. And if you decide actually your heart is in academia, then there's a million places that would take you as a post-doc. Maybe, maybe you are a little bit, you're two years behind where you might've been. If you'd gone straight into the postdoc, but your experience is so much richer for it. So I think that's, you should ask yourself, is it really, uh, am I making a decision for the rest of my life or am I actually going down this path for a little while? And I can always circle back. [00:43:37] Daniel: And just remember there, there were people out there who value industry experience enormously in the academic setting. I can say in the sense, uh, actually the vast majority of the people currently working in the center are from non-traditional academic backgrounds. Software engineers, people focused on community engagement. People who have taken very sometimes started in academia, but are second very different pilots and become project managers for instance, or other done, other things that variety of skill set. And the fact that we have people who have spent significant amounts of time working in entirely different industries, building up that different skill base has unbelievable value. And if you just need to find the right group who actually really understands that about you, but yeah, totally agree that the door is much more two way now than it ever has been. [00:44:17] Patrick: Yeah, thank god. I was hoping that we could get to academic publishing, but actually I think this is a really good place to end. And I'm going to use this as a, um, as an opportunity to get you back in a couple of weeks or months, because I think it is such an important topic and from a kind of business fundamentals perspective, the monopoly that the large publishers have is a really tough nut to crack. So I'm really interested in, and a lot of people have taken a try that. So I'm going to suggest that we save that for the next time, but, but thank you, Daniel. I really appreciate you taking the time. As I anticipate this was a really great conversation. And so appreciate you spend the time today. [00:44:49] Daniel: It was a pleasure Patrick and I think it is wise to push that back. If you get me started on academic publishing, I'll be here all night. [00:44:55] Patrick: That's right. You need to go to bed or. I do like talking to you, looking for to further conversations and working together on various things. Absolutely chatting to you. See you later. Thanks everyone for listening as always, please share with a friend. If you liked the episode, leave us a review on your favorite podcast player to help other people find us, or just tell somebody that you liked it and that you think they might like it. So thanks so much for listening and we'll see you next time. .