Galaxy in the Cloud…it’s not as far as you think!
By Katie Ewing
Have you ever dealt with a spreadsheet so large (say, on the order of billions of entries) that it couldn’t even be opened in Microsoft Excel? This is a common problem for computational biologists who investigate the cause of diseases by looking for a change of a single DNA base in a huge gene. Luckily, there’s a tool that can help make sense of all that data. Galaxy is a web-based platform designed for data intensive biomedical research, making large data sets more accessible and reproducible. It runs as part of the Genomics Virtual Lab (GVL) on the Australian Research Cloud.
Over two days in April, the first Galaxy in the Cloud training course was held at the University of Melbourne, in collaboration with VLSCI, HABIC, and ITS Research. Participants were instructed how to launch instances on the cloud so that they can work with large data sets without using up resources on their individual machines. Researchers learned to create workflows to analyze their own data, while system administrators learned how to administer Galaxy so that they could take it back to their respective institutions.
Interviewing researchers at the Galaxy training session @ITS_Res @galaxyproject. Interesting projects in the works!! pic.twitter.com/qRtywpNntx
— Dejan (@heyDejan)April 3, 2014
Dejan talks to Ashley and Miriam from the Murdoch Childrens Research Institute (MCRI) about their research
So how does genomics data typically flow? A clinician will meet a patient who has a genetic disease such as Parkinson’s or autism. He or she will collect a sample of the patient’s DNA and then send it off to get sequenced using next generation (“next gen”) sequencing, which enables the whole genome sequence to be read digitally (meaning quickly!). A bioinformatician will sort through the DNA bases and send the organised data back to the biologist, who is then able to identify the specific mutation that is causing the disease.
With next gen sequencing becoming more and more common, researchers are wanting to understand what exactly is IN the “black box” that they’re sending their data through. Galaxy is the bridge, or common language, between the biologists in the wet lab and the bioinformaticians in the dry lab. As one participant said, it’s not about reinventing the wheel; it’s just about learning how to use it.

Is this necessary? Probably not.
Data is more than just an Excel spreadsheet. It takes powerful yet user-friendly tools, such as Galaxy, to produce and share world-class research that ultimately leads to new discoveries.
Ready to learn more? Stay tuned for upcoming Galaxy Workshops!
