Processing Genomics Data at Melbourne Bioinformatics with Bernie Pope
Senior Research Community Coordinator Errol Lloyd speaks with Bernie Pope, Victorian Health and Medical Research Fellow at Melbourne Bioinformatics, about his experiences analyzing large amounts of genomics data in cancer research.
What kind of a research world do you live in, and what do you care about?
It’s a good question. I care about human genomics, and human disease and cancer especially. I lead the human genomics and cancer group here and I’m funded to work on colorectal cancer. I’m especially interested in colorectal cancer and related diseases and I work very closely with the colorectal Oncogenomics group at University of Melbourne, who are situated in the VCCC building, so I spend a fair bit of time there.

The Melbourne Bioinformatics Human Genomics Group. Photo source.
How does your work with genomics relate to cancer research?
In the case of colorectal cancer, we spend a lot of time working on hereditary predisposition, so risk factors that might increase one’s risk for colorectal cancer. There are well known cancer syndromes that are familial, so we have aggregations, increased prevalence of cancer in a family far beyond what you’d expect by chance.
For example, a classic one is called Lynch syndrome. Lynch syndrome is caused by a defective DNA repair mechanism, so DNA gets damaged from time to time, especially during cell copying. When cells divide and produce new cells, the DNA is copied from the cell to the progeny and in the copying process some errors in mutations can happen. If those errors happen to a fall into important genes that regulate cell growth, that can lead to a cascade of mechanisms that might ultimately develop into a tumor.
We’re interested in finding the mutations that might be causing Lynch syndrome, because they can be tested for in screenings at cancer clinics. Other family members may also be advised to be screened for identified mutations as well, and advised on lifestyle changes and other things that can help reduce the risk of cancer developing in their life. Identifying cancer development at early stages or before cancer develops is the best outcome for patients. One thing we’re trying to do is clarify those cases like suspected Lynch syndrome, trying to work out more mutations that might be occurring. Some of them are more complex changes to the DNA, that are harder to detect with normal techniques at the moment.
So, that’s where you and research comes in, outside the clinic?
Yeah. I’m a computer scientist by background, so I have a computational angle on all of this but I collaborate very closely with biomedical researchers. We have to work together and genomics is really the tool for which a lot of this is done. It’s a technique that works very well and the cost has come down, so it’s becoming more and more widely applied. We’re able to do whole genome sequencing now, in many cases, which gives us most of the DNA in the cells, which is vast amounts of information requiring lots of storage on the computer but also a fair bit of computation, clever algorithms to deal with it to get answers out in reasonable amounts of time and so on.
On a day to day basis, where do you come in? What’s the magic that you lend to all of this?
What happens is that tissue samples are taken via other collaborators. Then the DNA sample is sent for sequencing and that can be done in many different ways. We might, depending on the study, sequence just a targeted region if we’re interested in specific genes. Alternatively we might do whole genome sequencing, which is look at all of the DNA in the cells and looking for mutations in that DNA. This would occur in suspected Lynch syndrome where we don’t know what the causes might be, and we might be looking for new mutations.
My role is to engage with that computational aspect, which is running some large computations to get from vast amounts of sequencing data down to biologically interpretable results and also develop novel techniques for annotating the mutation discoveries, so clever ways we can find new things. Also incorporating other datasets where possible, so we have a wealth of biological data out there. We can try to come up with interesting ways in which to use that information to improve our enrichment of truthful results. There’s a fair bit of data analytics and statistical analysis and so on that goes into it as well.
So what are the challenges you’ve come up against to get this work done?
Challenges around the computational side of thing are that we need fairly large amounts of storage. Over the many projects that I’m working on, there would easily be a few hundred, several hundred terabytes of data and it’s constantly growing, so that’s a pretty sizable amount of data to manage. Even just basic data management, it’s a fair bit of work. The data has cost lots of money to obtain in the first place if you think about all of the work that’s happened, so it’s a very precious resource.
There’s a pretty large amount of computation as well. It tends to be bursty, so when we get a larger amount of data and we do analysis, we do a very big calculation. That might take days or weeks but then there might be a period where we’re just then working with the downstream results, which I trivialized a little bit, but more or less looking at spreadsheets. There’s not a lot of computation going on there, while we’re we thinking about and analyzing that data, which you can do on a smaller computer. But then later on we might do a bigger calculation, so there’s these big bursts every now and then of large scale calculation, followed by intense analysis. I think there’s just practical, basic issues around getting enough storage, getting enough computation, being benefited greatly from being at Melbourne Bioinformatics. We’ve had lots of computing resources and also now using a lot at Research Platforms, especially Spartan. That’s growing and that’s great.
The more individuals we can see, the better. But we need ever increasingly large sample sizes because some of these mutations that we see in the DNA are extremely rare, say one in a few hundred thousand people. Just having enough samples is challenging, that’s globally challenging for everyone. People are aggregating their data as much as they can, so there’s people publishing public datasets and so on; it’s always growing but still a long way off the numbers we’d like to have.
More practically, it’s very noisy data. Processing the data to try and reach for a true signal is challenging, as DNA itself is a very complicated system. In humans and any cellular organism, the DNA itself in the cell is a very complicated system and so dealing with all of the complexity in there. It is quite a challenge from an intellectual point of view. We tend to simplify things down a little bit but the reality is it’s extremely complicated. Tumors themselves, depending on the cancer, can be very chaotic internally. They can be vastly different than what the normal cellular DNA looks like. For example, you can have extra copies of chromosomes, you can have total loss of other chromosomes, you can have neo-chromosomes, which are just joining two different ones together. The genome itself is a much more plastic thing then we think it might be.
Strange things can happen, you can have a big section of the genome inverted, just flips over and goes in the other direction. You have copies and inversions, you can have deletions and copies and inversions, you can have translocation where different things join together in unexpected ways and two genes which are not really connected to each other can break and then join to form and new gene that doesn’t previously exist, which has some effect on the growth of the cell. Looking at tumor DNA and genomes is a sort of interesting area and a very hard area to work in because it’s quite complicated. We do a fair bit of that as well, which I’m quite interested in.
Perhaps Researchers on campus are unaware of the infrastructure and support communities around them. Do you think it’s something Researchers require experts for using these resources or could they dive in themselves. What are your thoughts?
I think it’s very obvious that while the infrastructure is amazing and it’s very important, it’s not enough just to buy computing equipment and let people loose. It’s a difficult environment sometimes to use and requires specialist skills and so on. I think there’s a strong need for computational people, experts to bridge the gap. But not just to be an intermediary between the researcher and the computing equipment. I think actually my experience is to be a researcher as well, to add my knowledge and skills into the research plan, People bring skill-sets together and it adds up into being a much more powerful group.
You want people who can use the computers for sure, but also you want people who have that sort of research mind and understand what the collaborator’s trying to do.
The thing about accessing those resources that exist, the infrastructure’s already there, the key is just awareness, so knowing that what exists. And I guess organizations like Research Platforms and Melbourne Bioinformatics do some promotion as well. One of the main ways that happens is through training courses. And so we do heaps of training. You guys, research platforms do also, yourself do lots and lots of training. That’s a great way to introduce people to the ideas. And give them a sort of gentle introduction to things without jumping into the deep end.
While it is impressive that you have a room full of computers and they’re all grinding away, it’s rather more important to think about how your problem fits into how you’re going to solve it. And some problems are just very large, especially in biomedical sciences. But the practical side of things is not to be put off by the grandiose scale of stuff, just see it as a device that works for you.
Computational skills and knowledge is an increasing requirement across many disciplines, as more and more disciplines become more and more digitized and automated. And inevitably, researchers in many disciplines have got to pick up more of those skills and computational people will become more expert in other disciplines. Start asking questions like, where’s the research going? What questions can we solve? What technology do we need to solve those things? And think about writing grants around those things and so on. Think about collaborating with people, finding the right people who can provide the bits and pieces you can’t do yourself.
What do you wish you could tell your younger self? What would you tell your students here, or anyone who’s thinking of entering this world of biomedical research? What does the future hold for them?
That’s a good question. One piece of advice is not to pigeonhole yourself too much into some discipline. And in many ways, the world and academia tends to pigeonhole people. There are streams that you can follow, and when you follow those streams, you tend to kind of end up confined a little bit. And the modern world of research is very multi-disciplinary, it’s very collaborative, the idealized version of a scientist working in a lab on their own is not something I’ve experienced. It’s been large groups of people, working together over long periods of time. Research is much more dynamic than I imagined it would be so it changes direction a lot. Don’t be afraid to kind of cross disciplines, there are a lot of opportunities there.
You need to kind of follow where your interests lie. When you’re interested in something, you’ll naturally study it, you’ll naturally read about it and just think about it and so on. And that can’t help but improve your skills and knowledge in that area, rather than doing things that you feel other people are telling you are good things to do.
One more practical thing is I wish I’d studied more statistics because that’s a constant challenge for me and statistics is such an important part of lots of the work we do, so that’s something I’m always trying to improve upon.
To read more about Bernie Pope and his research, visit http://berniepope.id.au/
