Research Platform Services Blog

  • Archive
  • RSS
  • Got a question? Let's talk.

Machine Learning for Research in Physics with Associate Professor Martin Sevior

image

Associate Professor Martin Sevior. Photo: Eric Jong.

Martin Sevior is Associate Professor in the School of Physics at the University of Melbourne. As part of his research in the field of Experimental Particle Physics he performs experiments with the world’s highest intensity particle accelerator, Belle II at the SuperKEKB in Japan.

This experiment probes conditions that last existed less than 1 billionth of a second after the Big Bang and investigates the cause of the Universal Matter-Antimatter asymmetry. 

It’s goal is to discover fundamental new physics not encompassed by the Standard Model of particle physics; what Professor Sevior says is generically called “Looking for new physics.”

Being particularly interested in the development of Machine Learning to make the best measurements possible with the Belle and Belle II data sets, Professor Sevior has been collaborating with Research Platform Services.

Research Community Coordinator Eric Jong sat down with him to talk about his research.

Could you give us a elevator pitch style overview of what you are researching?

So what I am interested in doing is making precise measurements of the standard model of physics, of processes that are well predicted of the standard model that nevertheless give different results to what the standard model predicts.

One place I am particularly interested in looking is in the phenomena called CP violation, which is essentially the difference between how matter and antimatter behaves.

Antimatter has exactly the same mass, and almost the same properties except for the opposite charge to matter. But we know that there’s an asymmetry because our universe is made of matter and not antimatter.

If you take the standard model and you put it in the model of the early universe and you run it all through, that gets the matter and antimatter asymmetry wrong by over 10 orders of magnitude.

So we know that there is some interesting new physics that is probed by making measurements in CP violation. And to do that I employ experiments called Belle and Belle II at an accelerator lab in Japan where we collide electrons and positrons.

image

Construction work on the Central Drift Chamber (CDC) of the Belle II experiment. Photo: Nanae Taniguchi.

So for your workflow, would it be right to say that you are making your data collections at Belle and Belle II in Japan, and then taking that data and processing it here at the University of Melbourne?

Some is processed on the world wide grid, on computers all around the world. Some is processed at the laboratory in Japan. And the final processing happens right here at the University of Melbourne.

So is that how you linked up with Research Platform Services?

Yes, kinda indirectly. The Centre of Excellence for Particle Physics employs two exceptionally talented computer professionals, Lucien Boland and Sean Crosby.

At one point my colleagues and I realised that we could make use of next generation machine learning technology. Which really needs powerful GPU’s to run. So the school of physics has a bequest, from which we requested funds to invest in one of these systems. We were able to get matching funds from other places and were able to get a few of these.

Sean Crosby was aware that Research Platform Services was putting together a few of these systems. They came to him and said, if you put yours in with ours - you can use all of them. So we did. We’ve been using the GPU systems that we initially purchased and also the GPU systems from Research Platform Services together in collaboration.

So it sounds like the machine learning for your research has been quite a cornerstone for the processing of your data. For a lay audience (such as myself) could you speak to that a little bit?

The problem with doing all of these measurements is distinguishing our interesting signal from a whole slew of random background noise. Our signal is less than one ten millionth of all of the data that’s actually collected. What we do with machine learning is make a important discrimination between those events where electrons and positrons collide and make processes that are interesting, and those that aren’t.

To do that we use machine learning techniques where we simulate the processes of interest. And we simulate the processes that aren’t of interest. Then we build a model that distinguishes the difference between the signal and the background, and then we train the model. It’s called classification. Every time there is a background we say this is a background and every time there is a signal we say this is a signal. And then the machine learning algorithm recognises what’s signal and what’s background and helps us make that distinction.

image

The classifier uses a neural net to combine many input variables to distinguish signal from background. The output of the neural net ranges between -1 and +1. Events near -1 are more likely background, events near +1 are more likely signal. By placing a threshold on the output of the classifier we can choose what fraction are signal and what are background. There is always a trade-off between signal efficiency and background contamination.

How would that have been achieved previous to machine learning? Was this process of classification something that you’ve gone through before using machine learning?

We’ve been using machine learning techniques in my experiments for well over 15 years, and probably longer, over 20 years. I like to tell people we’ve been doing data science since well before it was sexy. Or you could say, before it went mainstream.

So we have been riding the wave, and these new generation algorithms that use machine learning we’re still investigating. Because what we have now works very well, but we’re looking to see how we can do better using these modern techniques. And it’s possible, I think we can do better by at least a factor of two. Which helps enormously.

Do you have any advice for people in similar fields, or perhaps for people who are working with massive data sets, who are thinking about using a service like Research Platform Services?

First off, do the work. You really have to work to understand how it all works. Learn Linux. Learn how to use the command line. Learn how to do scripting.

Because all of this stuff that we do using large data sets involves taking one file from somewhere and putting it somewhere else, and processing it. And all of that requires some sort of algorithmic flow. There are techniques for doing that that are well established that aren’t what people who get trained with Microsoft products are used to.

So you really have to put in the hard yards to learn how to use them. And I am putting in the hard yards to learn how to use these modern algorithms too, it’s real work for me. It certainly hasn’t been easy, but a factor of two is a big deal to me.

If you have a project examining large data sets and are interested in learning Linux you can register for the next workshop at Research Platform Services here.

  • 4 months ago
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+
← Previous • Next →

Portrait/Logo

About

Welcome to the Research Platform Services Blog. We're here to help you do your research better! We'll connect you with the best research tools, workshops, expertise & community. Need more information? Check out our pages below!

http://research.unimelb.edu.au
/infrastructure/research-platform-services

Pages

  • About us
  • Sign-up for FREE researcher training HERE
  • ResPlat Training Catalogue
  • Calendar of Events and Trainings
  • CoLab: A New Collaborative Space for Researchers!
  • Mailing List
  • The Research Bazaar 2018
  • #MyResearch Video Campaign
  • Resbook

Me, Elsewhere

  • @ResPlat on Twitter
  • ResBaz on Youtube
  • ResBaz on Flickr
  • resbaz on github
  • ResBaz on Instagram
  • RSS
  • Random
  • Archive
  • Got a question? Let's talk.
  • Mobile
Effector Theme — Tumblr themes by Pixel Union