Research Platform Services Blog

Feb 21

We’re hiring!

Do you love science and research? Are you the one all your friends/colleagues come to for their typesetting solutions? Can you explain technology eloquently and kindly to the grumpiest professor? 

If you answered YES to any/all of the above, then Research Platform Services invites you to apply for one of our many Junior Research Community Co-ordinator positions available! The successful applicant will grow their respective communities through regular workshops and meetups. Maintaining an online presence is also an essential part of the job. You would also be expected and supported in organising events within your communities, as well as Research Platforms-wide events such as the famous Research Bazaar conference. You can find out more about the Research Bazaar community in the first pages of our new publication: The Digital Research Skills Cookbook. Please see the below links for a detailed position description for each role, including how to apply. 


Get in quick, applications close COB 28th February. 

 We’re hiring for: 

Git

RStudio

Python

Omeka

Dec 11

ResBaz 2019 Registrations are now open!

We’re excited to announce that registrations are now open for the ResBaz Conference at the University of Melbourne! Register here:resbaz.edu.au


 

via GIPHY


ResBaz will be held on February 20th-21st 2019 at Wilson Hall. 

Learn all about the digital tools you need to work smarter not harder in your graduate research, while also meeting your fellow researchers. There will be amazing food trucks, swag plus fantastic speakers too. 

Just like previous conferences, submit a digital research toolbox poster and receive a free lunch Voucher! The top 10 will win a $20 Coles Myer Gift Card, while the top voted poster wins a $500 Flight Voucher! 

Want to get a feel for what this conference is all about? Watch our promo video below:

Have more questions? Check out our FAQ here

To register head to resbaz.edu.au


 

Oct 12

Welcome from Professor Ian Gordon

image

Professor Ian Gordon, Director, Statistical Consulting Centre and Melbourne Statistical Consulting Platform.

This is just a small sample of questions researchers across the University of Melbourne are pursuing by collecting quantitative data.

The Melbourne Statistical Consulting Platform provides statistical support to this research community.  Our consultants work collaboratively with graduate research students and staff across all and any stages of the research cycle – from planning, data collection and management, to analysis, interpretation and reporting.   Consultants have experience working with researchers from all faculties across the University, including those that have a strong tradition of quantitative research and those adopting new, novel and innovative quantitative research approaches in traditionally non-quantitative fields.  We handle data of all sizes – from small specialised experiments in animal science to big data.
Investment in quality research planning and data collection must be matched in analysis and reporting.  The practical, applied focus of the Platform services supports this goal.

In second semester, the Melbourne Statistical Consulting Platform will be running a free half-day workshop on producing quality graphs, using statistical software freely available to University of Melbourne staff and students.

The Platform supports graduate researchers doing a research higher degree (mainly PhD, some Masters) and University staff members.

You can find out more about the Platform here

Director’s Introduction - Research Platform Services October 2018 Newsletter

Dr Stephen Giugni, Associate Director, Research Platform Services.

Welcome to our Spring Newsletter!

Our focus for this issue is on increasing the efficiency of research publication via compute, community and consultancy.   If I can use a little bit of executive license, our focus is about supporting you to reduce the time to research outputs and their benefits. 

The ability to move, process, analyse, engage with, share, compare, interpret and explore research data is facilitated by computational environments. Whether it is a large High Performance machine, the cloud, or your laptop – the ability to compute on data is fundamental to research, be it in linguistics and sentiment analysis through to genomics, complex simulation and modelling applications and artificial intelligence.

But raw computation is only part of the story.  A key role that we provide to the research community is consultancy, endeavouring to understand your requirements and providing advice regarding the most appropriate environment or platform to support you, or to determine if we can develop something tailored for your needs, that assists in building community, enhancing interaction or accelerating outputs.

Hopefully, through the stories in this issue, you will see how we have been able to support a number of research activities – and perhaps how we might be able to work with you!

Our New GPGPU Service

This year the University of Melbourne in partnership with Deakin University, La Trobe University, RMIT and St Vincents Hospital launched a brand-new General-Purpose Graphics Processing Unit (GPGPU) cluster as part of a new high-end compute service hosted by the University of Melbourne. 

Funded through a combination of ARC-LIEF, the University of Melbourne and our University partners, it is operated on behalf of the partnership by University departments of Melbourne School of Engineering (MSE), Melbourne Bioinformatics and Research Platform Services, the service came online in July.  Additionally, MSE contributed departmental funds to augment the service.  The cluster consists of 72 nodes, each with four NVIDIA P100 graphics cards, which can provide a theoretical maximum of around 900 teraflops.  

image

GPGPUs are a valuable resource for computational science as each GPGPU chip contains thousands of cores that are optimised for certain kinds of tasks.  For example, if the four core CPU in a typical laptop computer were able to compute the flow of particles through a blood ventricle in a certain time, a GPGPU chip might be able to do the same task tens or hundreds of times faster.  This is an example of Computational Fluid Dynamics (CFD), a type of process that is very well suited to the many cores in a GPGPU.  Another research domain well suited to GPGPUs is Molecular Dynamics (MD), where the configurations and interactions of complex molecules and molecular chains are simulated in the GPGPU.

Perhaps the most rapidly expanding area of research to take advantage of GPGPUs is Deep Learning.  Deep learning is a subfield of Machine Learning, where a computer is trained to recognise and identify patterns in sets of data.  For example, a computer can be trained to recognise a cat that it has never seen, by looking at many pictures of cats.  The more pictures of cats used for the training process, the better chance of the machine to identify a new cat.   This becomes very important when you consider that self-driving cars will need to be able to identify not only cats, but all sorts of things in all sorts of configurations, in real-time.

image

The new GPGPU service has seen a rapid take up, reaching full capacity (usage >95%) within six weeks of launch. Coupled with high-performance storage, the service is already supporting almost 100 research groups across the five partners and has processed over 100,000 jobs.

Research Platform Services has started running training to assist researchers to prepare their jobs for the GPGPU environment, with more courses including GPGPU programming planned for 2019.

While not every computing workload is well suited to GPGPUs, more and more applications are including modules specifically for GPGPU (Matlab, anyone?).  If you think you might have a computation challenge that might benefit from a GPGPU, or would just like to know more about them, please email hpc-support@unimelb.edu.au and we will be in touch.

Oct 11

Research Output and High Performance Computing

Lev Lafayette (Research Platform Services). 

Many contemporary researchers are often confronted with significant  computational problems. Often their datasets have grown beyond the capacity of their desktop systems to solve, or the complexity of their computational tasks are too great. This becomes even more challenging when one realises that both the datasets and the complexity of computation is growing faster than improvements in desktop systems. All of a sudden many researchers discover that not only do they have their own domain specialisations, but they also need to have an increasingly high level of familiarity with information science.

It is at this point that many researchers may have to turn to high performance computing. For many it’s not an easy transition; they may be used to a different operating system, and a very different user interface. Many come to the environment with little, if any, experience with the Linux operating system let alone the command-line interface and batch-job submission. They might be surprised that forwarding of graphics-intensive applications comes with major latency issues, if it is available at all, ‘data management’ is something meaningful, rather than just a buzzword (buzz-phrase?), and the version of the software being used and even the compiler used to install it is suddenly important.

image

Aspirin ion as produced with molecular dynamic simulation software NAMD, viewed locally with molecular modelling software VMD.

All of this generates a steep learning curve, but the good news is that it’s worth it. At this point of one’s research activities one is working in a very advanced environment and the challenges and results are commensurate. The use of the command-line is no mere fancy - operating at the level of the system shell means that one is very close to the bare metal, rather than abstracted by software and user-interface layers. As a result, performance is critical. Knowledge acquired of the shell environment is not knowledge that goes away either; whilst incremental features have been added to the original shell from 1977, it is still fundamentally the same beast, and it will remains so for decades to come - for the rest of one’s research career and beyond.

All of this comes together with the batch job submission system. A HPC cluster is essentially a large number of commodity servers linked together which acts as one system, even if partitioned according to hardware (or even ownership), and shared between many users. With many users competing to use this shared resourced some sort of queuing system is required - hence a scheduler which receives data from a resource manager and allocates where and when jobs can run. It is because of this capability (in terms of interconnect) and capacity (in terms of processor cores) that users can run their complex or large dataset tasks. How else is one going to run a complex computational problem that requires dozens of tasks to communicate with other without a message passing interface (MPI) application across multiple compute nodes? How about running the same processing task over dozens of datasets at the same time, as with a job array? Unless you have access to a HPC cluster, this simply can’t be done efficiently or effectively.

Of course, HPC is not the solution to all research or computational tasks. Long-running single-threaded applications whose datasets are dependent on each are not always a good fit. Nevertheless it is perhaps unsurprising to discover that both the availability of HPC systems and HPC training correlates with research output. It is almost as if by having powerful computing resources and the knowledge of how to use them means that the data can be analysed faster and the interpretation by researchers can be conducted earlier. It is something that many of the top universities around the world have realised, and the University of Melbourne has certainly come on board with this realisation with major upgrades to the Spartan HPC system this year and with the ‘Petascale Campus’ plans. Most of all, the Research Platforms team will continue to provide the best assistance we possibly can to help researchers get their work done efficiently and effectively.

ENIGMA Major Depressive Disorder Consortium

Neil Killeen (Research Platform Services, the University of Melbourne), Elena Pozzi (Orygen and the University of Melbourne) and  Lianne Schmaal (Orygen and the University of Melbourne)

The Enhancing Neuroimaging Genetics through Meta-analysis (ENIGMA) Major Depressive Disorder (MDD) Consortium is an international consortium that pools brain imaging data worldwide to conduct large-scale studies aimed to identify patterns of brain alterations associated with depression, and to test their replicability and reliability across many different samples worldwide.

The first meta-analysis of the group on subcortical brain structures included data from 1728 MDD patients and 7199 healthy controls from 15 institutions (see Schmaal et al., 2016, ‘Molecular Psychiatry’ 1–7) and the second meta-analysis on cortical brain structures included >10,000 MRI scans from 20 institutions (Schmaal et al., 2017, Molecular Psychiatry). ENIGMA MDD now includes 35 institutions from 14 countries and 15 on-going projects . New research groups are continuously encouraged to join the consortium to increase our sample size and thereby the power of detecting meaningful findings. The ENIGMA MDD working group is led by Dr. Lianne Schmaal (Orygen and the University of Melbourne - chair) and Prof. Dick Veltman (VU University Medical Center, Amsterdam - co-chair), and is coordinated by Elena Pozzi (Orygen and the University of Melbourne).

Recently, the ENIGMA project team recognised that with the continued growth of the consortium,  it was timely to enhance the way in which data were acquired, managed and distributed.  Following discussion with Research Platform Services (ResPlat), it was concluded that one of ResPlat’s research data management (RDM) services (i.e. Mediaflux) could support the requirements regarding data contribution, sharing, retention, redundancy and security. There was no intent to change significantly the detailed way in which the data are held, merged and distributed (in spreadsheets), but more to provide  a better environment to operate the study’s  business processes (and to enhance them as appropriate).  

The end-to-end workflow is that more than 30 external groups contribute data to the study into site-specific spaces. The ENIGMA Study team merges the new data into the master spreadsheet. Then, authorised subsets of the master spreadsheet are provided to end users (ENIGMA ‘projects’) by the ENIGMA Study team.

image

A key part of this workflow is to enable external user groups to securely upload their contributed data to a specific destination space, but ensure that they have no access to any other part of the system. Similarly, only the specific data for an end user ‘project’ must be securely accessible to the users of that project.

The main features of ResPlat’s RDM service that support this workflow are:

This study is  a good example of how ResPlat’s RDM capabilities can be used in an agile way (a little development was required from Arcitecta, the vendor of Mediaflux) to support a bespoke workflow via generic and configurable Mediaflux frameworks.   It also makes use of the unique multi-protocol capabilities of Mediaflux (in this case sFTP, SMB and HTTPS).

The only part that was slightly problematic is that a small number of the contributors work in clinical settings and their networks are heavily restricted. For the above solution, a hole in *their* firewall had to be opened to allow access to our system because the special sFTP service uses a non-standard port (the standard sFTP service on port 21 does not restrict users to configure home directories).

Since going live with this new system in July 2018, it has worked reliably and effectively. It has:

  1. Standardized the Study process
  2. Enhanced data security and redundancy
  3. Provided straight-forward mechanisms for contributions and end-user data despatch

Processing Genomics Data at Melbourne Bioinformatics with Bernie Pope

Senior Research Community Coordinator Errol Lloyd speaks with Bernie Pope, Victorian Health and Medical Research Fellow at Melbourne Bioinformatics, about his experiences analyzing large amounts of genomics data in cancer research.

What kind of a research world do you live in, and what do you care about?

It’s a good question. I care about human genomics, and human disease and cancer especially. I lead the human genomics and cancer group here and I’m funded to work on colorectal cancer. I’m especially interested in colorectal cancer and related diseases and I work very closely with the colorectal Oncogenomics group at University of Melbourne, who are situated in the VCCC building, so I spend a fair bit of time there. 

image

The Melbourne Bioinformatics Human Genomics Group. Photo source.

How does your work with genomics relate to cancer research?

In the case of colorectal cancer, we spend a lot of time working on hereditary predisposition, so risk factors that might increase one’s risk for colorectal cancer. There are well known cancer syndromes that are familial, so we have aggregations, increased prevalence of cancer in a family far beyond what you’d expect by chance.

For example, a classic one is called Lynch syndrome. Lynch syndrome is caused by a defective DNA repair mechanism, so DNA gets damaged from time to time, especially during cell copying.  When cells divide and produce new cells, the DNA is copied from the cell to the progeny and in the copying process some errors in mutations can happen. If those errors happen to a fall into important genes that regulate cell growth, that can lead to a cascade of mechanisms that might ultimately develop into a tumor.

We’re interested in finding the mutations that might be causing Lynch syndrome, because they can be tested for in screenings at cancer clinics. Other family members may also be advised to be screened for identified mutations as well, and advised on lifestyle changes and other things that can help reduce the risk of cancer developing in their life. Identifying cancer development at early stages or before cancer develops is the best outcome for patients. One thing we’re trying to do is clarify those cases like suspected Lynch syndrome, trying to work out more mutations that might be occurring. Some of them are more complex changes to the DNA, that are harder to detect with normal techniques at the moment.

So, that’s where you and research comes in, outside the clinic?

Yeah. I’m a computer scientist by background, so I have a computational angle on all of this but I collaborate very closely with biomedical researchers. We have to work together and genomics is really the tool for which a lot of this is done. It’s a technique that works very well and the cost has come down, so it’s becoming more and more widely applied. We’re able to do whole genome sequencing now, in many cases, which gives us most of the DNA in the cells, which is vast amounts of information requiring lots of storage on the computer but also a fair bit of computation, clever algorithms to deal with it to get answers out in reasonable amounts of time and so on.

On a day to day basis, where do you come in? What’s the magic that you lend to all of this?

What happens is that tissue samples are taken via other collaborators. Then the DNA sample is sent for sequencing and that can be done in many different ways. We might, depending on the study, sequence just a targeted region if we’re interested in specific genes. Alternatively we might do whole genome sequencing, which is look at all of the DNA in the cells and looking for mutations in that DNA. This would occur in suspected Lynch syndrome where we don’t know what the causes might be, and we might be looking for new mutations.

My role is to engage with that computational aspect, which is running some large computations to get from vast amounts of sequencing data down to biologically interpretable results and also develop novel techniques for annotating the mutation discoveries, so clever ways we can find new things. Also incorporating other datasets where possible, so we have a wealth of biological data out there. We can try to come up with interesting ways in which to use that information to improve our enrichment of truthful results. There’s a fair bit of data analytics and statistical analysis and so on that goes into it as well.

So what are the challenges you’ve come up against to get this work done?

Challenges around the computational side of thing are that we need fairly large amounts of storage. Over the many projects that I’m working on, there would easily be a few hundred, several hundred terabytes of data and it’s constantly growing, so that’s a pretty sizable amount of data to manage. Even just basic data management, it’s a fair bit of work. The data has cost lots of money to obtain in the first place if you think about all of the work that’s happened, so it’s a very precious resource.

There’s a pretty large amount of computation as well. It tends to be bursty, so when we get a larger amount of data and we do analysis, we do a very big calculation. That might take days or weeks but then there might be a period where we’re just then working with the downstream results, which I trivialized a little bit, but more or less looking at spreadsheets. There’s not a lot of computation going on there, while we’re we thinking about and analyzing that data, which you can do on a smaller computer. But then later on we might do a bigger calculation, so there’s these big bursts every now and then of large scale calculation, followed by intense analysis. I think there’s just practical, basic issues around getting enough storage, getting enough computation, being benefited greatly from being at Melbourne Bioinformatics. We’ve had lots of computing resources and also now using a lot at Research Platforms, especially Spartan. That’s growing and that’s great.

The more individuals we can see, the better. But we need ever increasingly large sample sizes because some of these mutations that we see in the DNA are extremely rare, say one in a few hundred thousand people. Just having enough samples is challenging, that’s globally challenging for everyone. People are aggregating their data as much as they can, so there’s people publishing public datasets and so on; it’s always growing but still a long way off the numbers we’d like to have.

More practically, it’s very noisy data. Processing the data to try and reach for a true signal is challenging, as DNA itself is a very complicated system. In humans and any cellular organism, the DNA itself in the cell is a very complicated system and so dealing with all of the complexity in there. It is quite a challenge from an intellectual point of view. We tend to simplify things down a little bit but the reality is it’s extremely complicated. Tumors themselves, depending on the cancer, can be very chaotic internally. They can be vastly different than what the normal cellular DNA looks like. For example, you can have extra copies of chromosomes, you can have total loss of other chromosomes, you can have neo-chromosomes, which are just joining two different ones together. The genome itself is a much more plastic thing then we think it might be.

Strange things can happen, you can have a big section of the genome inverted, just flips over and goes in the other direction. You have copies and inversions, you can have deletions and copies and inversions, you can have translocation where different things join together in unexpected ways and two genes which are not really connected to each other can break and then join to form and new gene that doesn’t previously exist, which has some effect on the growth of the cell. Looking at tumor DNA and genomes is a sort of interesting area and a very hard area to work in because it’s quite complicated. We do a fair bit of that as well, which I’m quite interested in.

Perhaps Researchers on campus are unaware of the infrastructure and support communities around them. Do you think it’s something Researchers require experts for using these resources or could they dive in themselves. What are your thoughts?

I think it’s very obvious that while the infrastructure is amazing and it’s very important, it’s not enough just to buy computing equipment and let people loose. It’s a difficult environment sometimes to use and requires specialist skills and so on. I think there’s a strong need for computational people, experts to bridge the gap. But not just to be an intermediary between the researcher and the computing equipment. I think actually my experience is to be a researcher as well, to add my knowledge and skills into the research plan, People bring skill-sets together and it adds up into being a much more powerful group.

You want people who can use the computers for sure, but also you want people who have that sort of research mind and understand what the collaborator’s trying to do.

The thing about accessing those resources that exist, the infrastructure’s already there, the key is just awareness, so knowing that what exists. And I guess organizations like Research Platforms and Melbourne Bioinformatics do some promotion as well. One of the main ways that happens is through training courses. And so we do heaps of training. You guys, research platforms do also, yourself do lots and lots of training. That’s a great way to introduce people to the ideas. And give them a sort of gentle introduction to things without jumping into the deep end.

While it is impressive that you have a room full of computers and they’re all grinding away, it’s rather more important to think about how your problem fits into how you’re going to solve it. And some problems are just very large, especially in biomedical sciences. But the practical side of things is not to be put off by the grandiose scale of stuff, just see it as a device that works for you.

Computational skills and knowledge is an increasing requirement across many disciplines, as more and more disciplines become more and more digitized and automated. And inevitably, researchers in many disciplines have got to pick up more of those skills and computational people will become more expert in other disciplines. Start asking questions like, where’s the research going? What questions can we solve? What technology do we need to solve those things? And think about writing grants around those things and so on. Think about collaborating with people, finding the right people who can provide the bits and pieces you can’t do yourself.

What do you wish you could tell your younger self? What would you tell your students here, or anyone who’s thinking of entering this world of biomedical research? What does the future hold for them?

That’s a good question. One piece of advice is not to pigeonhole yourself too much into some discipline. And in many ways, the world and academia tends to pigeonhole people. There are streams that you can follow, and when you follow those streams, you tend to kind of end up confined a little bit. And the modern world of research is very multi-disciplinary, it’s very collaborative, the idealized version of a scientist working in a lab on their own is not something I’ve experienced. It’s been large groups of people, working together over long periods of time. Research is much more dynamic than I imagined it would be so it changes direction a lot. Don’t be afraid to kind of cross disciplines, there are a lot of opportunities there.

You need to kind of follow where your interests lie. When you’re interested in something, you’ll naturally study it, you’ll naturally read about it and just think about it and so on. And that can’t help but improve your skills and knowledge in that area, rather than doing things that you feel other people are telling you are good things to do.

One more practical thing is I wish I’d studied more statistics because that’s a constant challenge for me and statistics is such an important part of lots of the work we do, so that’s something I’m always trying to improve upon.

To read more about Bernie Pope and his research, visit http://berniepope.id.au/

Machine Learning for Research in Physics with Associate Professor Martin Sevior

image

Associate Professor Martin Sevior. Photo: Eric Jong.

Martin Sevior is Associate Professor in the School of Physics at the University of Melbourne. As part of his research in the field of Experimental Particle Physics he performs experiments with the world’s highest intensity particle accelerator, Belle II at the SuperKEKB in Japan.

This experiment probes conditions that last existed less than 1 billionth of a second after the Big Bang and investigates the cause of the Universal Matter-Antimatter asymmetry. 

It’s goal is to discover fundamental new physics not encompassed by the Standard Model of particle physics; what Professor Sevior says is generically called “Looking for new physics.”

Being particularly interested in the development of Machine Learning to make the best measurements possible with the Belle and Belle II data sets, Professor Sevior has been collaborating with Research Platform Services.

Research Community Coordinator Eric Jong sat down with him to talk about his research.

Could you give us a elevator pitch style overview of what you are researching?

So what I am interested in doing is making precise measurements of the standard model of physics, of processes that are well predicted of the standard model that nevertheless give different results to what the standard model predicts.

One place I am particularly interested in looking is in the phenomena called CP violation, which is essentially the difference between how matter and antimatter behaves.

Antimatter has exactly the same mass, and almost the same properties except for the opposite charge to matter. But we know that there’s an asymmetry because our universe is made of matter and not antimatter.

If you take the standard model and you put it in the model of the early universe and you run it all through, that gets the matter and antimatter asymmetry wrong by over 10 orders of magnitude.

So we know that there is some interesting new physics that is probed by making measurements in CP violation. And to do that I employ experiments called Belle and Belle II at an accelerator lab in Japan where we collide electrons and positrons.

image

Construction work on the Central Drift Chamber (CDC) of the Belle II experiment. Photo: Nanae Taniguchi.

So for your workflow, would it be right to say that you are making your data collections at Belle and Belle II in Japan, and then taking that data and processing it here at the University of Melbourne?

Some is processed on the world wide grid, on computers all around the world. Some is processed at the laboratory in Japan. And the final processing happens right here at the University of Melbourne.

So is that how you linked up with Research Platform Services?

Yes, kinda indirectly. The Centre of Excellence for Particle Physics employs two exceptionally talented computer professionals, Lucien Boland and Sean Crosby.

At one point my colleagues and I realised that we could make use of next generation machine learning technology. Which really needs powerful GPU’s to run. So the school of physics has a bequest, from which we requested funds to invest in one of these systems. We were able to get matching funds from other places and were able to get a few of these.

Sean Crosby was aware that Research Platform Services was putting together a few of these systems. They came to him and said, if you put yours in with ours - you can use all of them. So we did. We’ve been using the GPU systems that we initially purchased and also the GPU systems from Research Platform Services together in collaboration.

So it sounds like the machine learning for your research has been quite a cornerstone for the processing of your data. For a lay audience (such as myself) could you speak to that a little bit?

The problem with doing all of these measurements is distinguishing our interesting signal from a whole slew of random background noise. Our signal is less than one ten millionth of all of the data that’s actually collected. What we do with machine learning is make a important discrimination between those events where electrons and positrons collide and make processes that are interesting, and those that aren’t.

To do that we use machine learning techniques where we simulate the processes of interest. And we simulate the processes that aren’t of interest. Then we build a model that distinguishes the difference between the signal and the background, and then we train the model. It’s called classification. Every time there is a background we say this is a background and every time there is a signal we say this is a signal. And then the machine learning algorithm recognises what’s signal and what’s background and helps us make that distinction.

image

The classifier uses a neural net to combine many input variables to distinguish signal from background. The output of the neural net ranges between -1 and +1. Events near -1 are more likely background, events near +1 are more likely signal. By placing a threshold on the output of the classifier we can choose what fraction are signal and what are background. There is always a trade-off between signal efficiency and background contamination.

How would that have been achieved previous to machine learning? Was this process of classification something that you’ve gone through before using machine learning?

We’ve been using machine learning techniques in my experiments for well over 15 years, and probably longer, over 20 years. I like to tell people we’ve been doing data science since well before it was sexy. Or you could say, before it went mainstream.

So we have been riding the wave, and these new generation algorithms that use machine learning we’re still investigating. Because what we have now works very well, but we’re looking to see how we can do better using these modern techniques. And it’s possible, I think we can do better by at least a factor of two. Which helps enormously.

Do you have any advice for people in similar fields, or perhaps for people who are working with massive data sets, who are thinking about using a service like Research Platform Services?

First off, do the work. You really have to work to understand how it all works. Learn Linux. Learn how to use the command line. Learn how to do scripting.

Because all of this stuff that we do using large data sets involves taking one file from somewhere and putting it somewhere else, and processing it. And all of that requires some sort of algorithmic flow. There are techniques for doing that that are well established that aren’t what people who get trained with Microsoft products are used to.

So you really have to put in the hard yards to learn how to use them. And I am putting in the hard yards to learn how to use these modern algorithms too, it’s real work for me. It certainly hasn’t been easy, but a factor of two is a big deal to me.

If you have a project examining large data sets and are interested in learning Linux you can register for the next workshop at Research Platform Services here.

Oct 08

Our Exciting New Initiative: ResGrants!

WIN up to $1000 to pitch your research tool and story!

 

image

ResGrants is a new initiative that rewards Graduate Researchers at the University of Melbourne who want to teach a digital research tool. 

KEY DATES: 

Expressions of Interest open: 8th October

Information Session: 25th October, 4pm - 6pm 

Applications due: 1st November 

Winners notified: Friday 9th November

Winners present at the Research Bazaar Conference, University of Melbourne in February 2019 

For questions or more information email: research.bazaar@gmail.com

What is a ResPitch?

image

A ResPitch explains the very basics of a digital research tool. It also:

- Engages the audience with a ‘challenge’
- Excites the audience about digital research tools
- Entices the audience to want to learn more

What are the 4 key ingredients of a ResPitch? …

Learning Objectives Sneak Peek

Introduction (5 mins)

Briefly introduce self and tool. Understanding the origins of the tool, how the speaker was first introduced to and continues to use the tool can help in a number of ways:

a) increasing the validity of the tool

b) creates a personal touch by knowing who
created them and why

c) seeing how they are embedded in the broader
research community.

Learning Objectives might sound boring but are essential for learners to:

a) know 

b) understand and 

c) be able to do.

It is comforting to hear: “In the next half hour you will learn how to do A, B & C.” This focuses a participant's’ attention, points them in the right direction, and also gives them satisfaction at the end once they have achieved those learning objectives.  

Lesson (10 mins)

Give short demonstration or teaching section. It’s assumed there’s very little existing knowledge of each digital tool so knowledge will be slowly built up.

This concept is called scaffolding (Lev Vygotsky). By breaking large tasks into smaller, more manageable tasks, the audience can build on their current knowledge base to learn new concepts.

In the vein of scaffolded learning, provide the audience with a quick visualisation of the digital tool, orienting them to the environment they’ll be using in the challenge.

Challenge (10 minutes)

Each ResPitch has a challenge. These are based on the principles of Problem Based Learning. These techniques will ‘activate’ learners to engage.

At ResBaz we avoid participants having to open their laptops and downloading tools – this takes away precious time from your pitch. Find three or more helpers with the tool already downloaded on their laptops. Assign participants to small teams of three or four with the helpers at the helm leading the challenge.

Learning-by-doing is a powerful tool. You will never understand the pitfalls better than if you have already made the mistakes yourself.

Plenary (5 mins)

Each ResPitch ends with a plenary that:

…Lastly, have fun during your ResPitch and enjoy connecting with like-minded people!

Working smarter with Ashton Dickerson: Using HPC for Increasing Efficiency in Research.

image

Ashton Dickerson, Biosciences PhD researcher and member of the Urban Light Lab. Photo: Eric Jong

Working smarter with Ashton Dickerson
Using High Performance Computing for increasing efficiency of research.

For the last year Ashton has been using Spartan with a PhD project that examine the effect of light on the nocturnal songrate of Willie Wagtails.

By using a automatic song detection package through R to extract data from the over 2000 hours of audio recordings she has gathered in her field work, Ashton has been able to automate the otherwise labour intensive handling of this data.

Then by working with Research Platform Services, Ashton has been able to complete these processes on a HPC system where large numbers of these tasks can be run simultaneously, saving her time that she can use of other aspects of her research.

Research Community Coordinator Eric Jong sat down with Ashton to talk about her project, and how she is integrating high performance computing into her workflow.

image

A Willie Wagtail.  Photo: Timon van Asten.

Can we start with the question that I’m sure you’ve answered a million times now as a graduate researcher,  what are you doing your PhD on?

Well, I research a quite unusual behaviour of birds that not only sing during the day but at night time as well.

Some diurnal (active during the daytime) bird species, also sing during the night time. This is an unusual because you would instead expect these birds to be sleeping during the night.

For my PhD I aim to understand why diurnal species are singing during the night.

To answer this question I have been examining this behaviour in an iconic Australia species, the willie wagtail (Rhipidura leucophrys), who has a reputation for its prolific nocturnal song.

So it sounds like a big part of your PhD is listening to the song of the Willie Wagtail, how have you been gathering this data so far?

To measure nocturnal song, I use bioacoustics recorders from Frontier Labs that allow me to record audio for prolonged periods. I target the roosting spots of willie wagtails to record their nocturnal song.

Thus far I have gathered over 2000 hours of audio.

image

A researcher checking a bioacoustic recorder.  Photo: Justine E. Hausheer / TNC.

That is a huge amount of data, can you talk a bit about how you have been handling that volume of information for your research?

To be able to handle such large data sets I am utilizing an R package, monitoR, which automatically detects bird song.

I import templates of willie wagtail songs into this package, which then is run along my recordings and it detects when the template matches a song. From this data I can extract the song rates (how often the willie wagtails are singing) and then I can examine the data to look for patterns.  

image

A spectrogram showing an example of the automatic song detections from an hour-long recording. Blue line indicates where the R package, monitoR, has detected a willie wagtail.  Image courtesy Ashton Dickerson.

image

A spectrogram showing an example of the automatic song detections from an eight minute long recording. Blue boxes indicate where the R package, monitoR, has detected a willie wagtail. Image courtesy Ashton Dickerson.

I am using the Spartan service through HPC at the University of Melbourne to be able to handle such large data loads. Lev Lafayette has assisted me by uploading my audio recordings to the UniMelb cloud, which is much more efficient that uploading this data via my personal computer.

The HPC is significantly faster than running these scripts on my personal computer. It would take me about 7 mins to process 1 hour of audio this way. Now using HPC it is about 3 to 4 times faster.

And in addition I am able to run this script over multiple recording sets at one time thanks to the multiple nodes. Not only does this save me immense amounts of time, this also means my personal computer is free for me to use while this data is being processed.

One of our mottos at Research Platform Services is ‘work smarter not harder’, which I think you are most definitely doing by automating these processes. Do you think there are things that you are able to spend more time with now in your research because of this?

Most definitely so, it frees me up to read papers and continue researching. To form thoughts and ideas around what this data actually means.

image

Ashton using HPC to give her more time to do MAXIMUM SCIENCE.  Photo: Eric Jong.

Using HPC has allowed me to take away the manual processing and gives me time to think about what this data actually means, to analyse it and put together ideas from it.

Thus far from the data I have extracted using the HPC services, I have discovered that willie wagtails’ nocturnal song significantly increases with lunar illumination, showing that this behaviour has a relationship with light, and therefore may be related to a visual cue.

This is an interesting finding and gives insight into the possible function of nocturnal song, furthering our understanding of the evolution and function of bird song in general. I am now preparing a manuscript for this finding.

Furthermore, given that I have discovered that nocturnal song has a relationship with light, I will also examine if this behaviour also responds to artificial light at night (e.g. streetlighting), which could highlight a possible stressor for urban bird species with nocturnal song. I will again utilize the HPC services at UniMelb for data extraction.

Thanks for your time today Ashton, do you have any advice to share with other researchers?

The first step for me using HPC was just hearing whispers that something like this was possible, and then from there I looked for and found people who could help me with it and point me in the right direction, and also gave me different options to choose from.

So I guess I would say it’s all about building and engaging others in the research community.

Visit Research Platform Services for more information on HPC and other services. 

A Win for Diversity in Tech at ResPlat!

Congratulations to Sara Ogston and Koula Tsiaplias for their work in Vic ICT for Women’s 2018 Grad Girl program which won two TechDiversity Awards on Thursday the 27th of September!

image

Sara Ogston and Koula Tsiaplias receiving the award.

The Vic ICT For Women Grad Girl 2018 program was recognised in last week’s TechDiversity Awards as a valuable contributor to diversity and inclusion in the STEM sector, for which it received two awards;

This achievement was celebrated and shared with the sponsors of Grad Girl, one of them being The University of Melbourne, for their support and commitment to making a difference, and with the grad girls, who committed in this year-long program on top of their academic and work schedules.

image

GRAD GIRLS IS A 1 YEAR PROGRAM RUN BY VIC ICT FOR WOMEN FOR FEMALE UNDERGRADUATE STUDENTS TO DISCOVER AND UNDERSTAND THE PATHWAYS AVAILABLE WHEN TAKING THE NEXT STEP IN THEIR CAREER.

Read more about the Grad Girls program here

Oct 04

Meet Alison, researcher in Visual Arts, 3D explorer

By Emilie Walsh


Alison’s experimental film using Fusion 360

One of the exciting part of working for Research Platform services as a CAD and 3D printing ResCom*, is to get to meet researchers working with 3D in all disciplines. Alison Kennedy is currently a Master by Research candidate at the VCA, and has been coming to our trainings for a few months now.

image

Alison Kennedy, Self-portrait, 2018

What I find fascinating with researchers in Visual Arts, is how they take ownership of a digital tools, push the limits of the applications sand find creativity in the often frustrating glitches and bugs

I’ve asked Alison to tell me about her use of CAD and 3D scanning in her research and art practices. It’s intriguing to see how engineering, archaeologists, designers, and artists use CAD in very different ways!

Emilie: Alison, tell us a bit about your art practice?

I am particularly interested in how art gives artists a platform for commenting on and  taking a position in relation to things happening in the world. I think that art can provide a way of suggesting a response without being didactic.  For me this is because art, once created, allows the viewer to complete the artwork through their own personal experience.  This tension between what is intended by the creator and what actually occurs is a constant fascination to me.  My use of technology arose out of this - I started creating a series of collages and digital paintings that used and were generated from collapse, breakdown and error.  These glitches represents the slippage between intention and creation and the uncovering of personal truth.  We are both furthest away and closest to ourselves.

image

“Untitled: Force of Reason” 2016  120cmx120cm digital painting/ collage limited edition giclee print.

My initial work in technology concentrated on digital painting and referred to romantic narrative paintings of the 18th century. I wanted to reconsider the human gesture - how embodied expression translated through the medium of the mouse, and stylus. I started to consider how texture and colour transformed completely through algorithmic extrapolation and started to use this quality in an intuitive way to express personal environmental concerns.

image

“Untitled” 2016 20x24 cm limited edition giclee print.

Emilie: How has 3D scanning and modelling have bring new direction to your work?

I became aware of the potential of 3d technology applications to express the body in a totally new and unusual way that I believe critiques our approach to other people  and to the world. I am interested in taking existing applications, hacking into them and pushing them to the point of collapse - at this point I think that something new and quite profound occurs. Again, I work with technology intuitively and at this stage in my research I think that the constant creation and destruction inherent in the process highlights the relationship with the world in general.

image

Still from animation “Selfie”  https://alison-kennedy-gdg8.squarespace.com/config/

In the work above, for example, I wanted to show how the artist in her studio can make a stand in relation to the world and I also wanted to suggest that at times the artist’s studio is a claustrophobic space. Personally, I love working in my studio so in a way this was quite a confronting idea for me. The figure ultimately breaks down through algorithim and is revealed as a series of surfaces - which is an idea I’m really interested in and working with in my research. 

image

Still from an experiment in Fusion 360

At this stage I’m most interested in how 3d packages critique image and our image saturated society.  The packages I was introduced to at Research Platforms at University of Melbourne connect engineering CAD and create surfaces and objects. Once again I am interested in how new approaches to these standardised applications expose how technology and our world interact.

If you are interested in learning to use CAD, 3D modelling, scanning or printing for your research, be in touch with us at Research Platform Services!

Oct 02

3D meet up at the VCA

by Emilie Walsh


For my last day working for Research Platform Services, I wanted to organise a meet up at the VCA where I did my PhD. The campus is on Southbank, and sometimes researchers there find themselves a bit far away from the services that are offered at Parkville, so we make sure we offer some of our training in the other university campuses.

image

At the digital lab at the VCA

3D modelling, 3D scanning and 3D printing can be an amazing tool and resources for researchers at the VCA, in Fine Arts, Music, Theater or Dance Studies. Of course we invited researcher from all discipline too, as we believe in the emulation it creates and we always get excited about researchers collaborating across disciplines!

Our meet ups at Research Platforms are not your usual tech training: it’s more about sharing research projects and talk about the digital skills we use as researchers

First we talked about the benefit of 3D scanning to share and collaborate. If you are working with fragile artefact, you may not be able to access it, manipulate it, or share it with other researchers. Having a digital 3D model is a great archive and tool to share your research.

Drag and drop script: 3.3M vertices > 20K vertices + normal + AO + displacement + centre geometry. No material editing required in Sketchfab 😀 https://t.co/OZhAfAEpoh pic.twitter.com/S7zgGC6k07

— Ben Kreunen (@OzBigBen)
9 September 2018

You can also 3D print a replica for teaching or communication purposes.

image

A 3D printed replica of a skull

If you are interested in working with object based data sets, and use 3D modelling and 3D printing, you can read more about it here

Then we welcomed Tall Ben, from the digitisation services of the university.

He presented the 3D scanning technology available for researchers. If you need a 3D model of a tiny insect or of a large architecture building, Ben is the guy for you!

Our workstation is getting old now but still fast enough with @RealityCapture_ to make a draft while shooting, just for peace of mind. It’s the weekend now TBC… pic.twitter.com/XYaZqJ0rkN

— Ben Kreunen (@OzBigBen)
31 August 2018

Next, Mitchell Harrop, from the digital lab in Arts West in Parkville presented some projects the Digital Studio in is supporting. 3D modelling can also allow you to archive architectural artefacts, geo-localise them, and embed photos and other documentations to display online and communicate research better.

image

More about Mitchell’s project :

https://people.eng.unimelb.edu.au/mharrop/mhw/v2/

Eric Jong, Master student and the VCA and ResCom at ResPlat, shared his latest 3D printed experiment.

image

Screenshot of Eric’s early experiment with a 3D model of a soundwave, in Fusion 360

3D printing is a cheap and fast technology, that allow makers to prototype, fail fast and made adjustment much quicker than with traditional technologies.

https://www.instagram.com/p/Bl4tpeOFyau/?utm_source=ig_web_copy_link

To end the meetup, we did a quick fun demo: How to do a 3D scan with your phone!

We 3D scan a cheese board in a few minutes for some amazing result


There are of course a range of options if you need a 3D scan for your research: the digitisation services at Uni would be the highest quality, but will require a lot of time, collaboration, will generate a lot of data, and potentially have a cost. A 3D scan with your phone would be the quickest options, with a lower quality in a DIY spirit!

If you are interest in 3D scanning some of your object based data set, be in touch with us at Research Platform and we can point you to the right option for you!

image

The whole team of researchers at the digital lab, VCA

Meetups are a great way to meet other researchers using similar tools than you, and work on solving some of your problems together. We alternate between meetups and trainings. If you are interested in joining a training check our calendar or be in touch with Eric for CAD and 3D printing.

Sep 30

How does 3D printing work?!

by Emilie Walsh

image
image
image
image
image

via GIPHY


If you looking for a hard copy of that little comic on 3D printing, come over at Colab to pick up one, and join a training in 3D modelling and 3D printing with Eric Jong!

Next training the 10th of October (cake included) :

https://www.eventbrite.com.au/e/introduction-to-3d-printing-with-tinkercad-tickets-50681449580