Research Computing Services Blog

  • Archive
  • RSS
  • Got a question? Let's talk.

A busy week in data mining (Part 3: data cleaning)

By Kim Doyle

We started our fourth day of data mining with an introduction to messy data, why it ruins our life and haunts our dreams… But it doesn’t have to be this way!

image

Yuandra introduced us to some basics of clean data. He used this handy digram that he got from the School of Data to show how important cleaning is to data workflow:

image

As you can see, cleaning data is an integral part of working with data. Cleaning before analysis is vital because messy data is not only difficult to work with, but it can distort your findings. Accurate analysis requires clean data. This might sound like the most boring part of the data cycle, but there are lots of reasons to get excited about clean data. 

There are a bunch of free and exciting tools that can visualise your data in informative, interesting and beautiful ways, such as CartoDB, for mapping, and Plot.ly, for graphs and many, many more (speak to our awesome DataViz team for more tips to beautify your data). If you have clean data, it can be as simple as drop and drag and you’re done! 

Here’s an example from our CartoDB training:

(mapping lobsters off the coast of Tasmania)

Anyway, getting back to the workshop, after Yuandra’s introduction is was my time to shine, teaching Open Refine :S

Here is Naomi getting her own back on Twitter.

My turn to document @kim_doyle1 dazzling us on the stage. @iniandra pic.twitter.com/IGp0m0imFn

— Naomi Sutanto (@nomisutanto)
April 8, 2016

Formerly Google Refine, Open Refine is a program that helps us clean up our data. It began life as a ‘blue sky’ project of Google employees, but has since blossomed into a separate open-source project. It is free to use and well document. See our course curriculum here.

We ran our course off the cloud.

Look at the pretty cloud…

image

But you don’t have to. If you’d like to give Open Refine a try, download it to your own computer. And don’t worry about your data; it runs from your local server, so your super-secret research is safe (sshhhh!).

After we got started with some open government data from the US and learnt how to create a project, import our data and do some basic mass edits of mislabelled data, I handed over to the capable Naomi to run us through some of the fundamentals of General Refine Expression Language (GREL).

She explained that GREL transformations are similar to formulas in Excel. Except, rather than being stored in a cell, GREL transformation are applied at once to the entire dataset. In Refine, GREL lets you write powerful yet simple queries to filter and transform your data.

Next, Naomi handed over to Yuandra in our data mining tag-team act and we finished up by learning how to filter, cluster and transpose cells in Refine.

We finished the week on a high and most of us made it through ‘OK’.

Not sure what’s going on here, but it looks serious 😁 @nomisutanto @iniandra #DataMiner pic.twitter.com/Pqe6nJArPL

— Kim Doyle (@kim_doyle1)
April 8, 2016

Previous: Part 2: data acquisition

    • #data mining
    • #data science
    • #data cleaning
  • 3 years ago
  • 1
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

1 Notes/ Hide

  1. barquar liked this
  2. resbaz posted this
← Previous • Next →

Portrait/Logo

About

Welcome to the Research Computing Services Blog. We're here to help you do your research better! We'll connect you with the best research tools, workshops, expertise & community. Need more information? Check out our pages below!

https://research.unimelb.edu.au/infrastructure/research-computing-services

Pages

  • About us
  • Sign-up for FREE researcher training HERE
  • ResPlat Training Catalogue
  • Calendar of Events and Trainings
  • CoLab: A New Collaborative Space for Researchers!
  • Mailing List
  • The Research Bazaar 2018
  • #MyResearch Video Campaign
  • Resbook

Me, Elsewhere

  • @ResPlat on Twitter
  • ResBaz on Youtube
  • ResBaz on Flickr
  • resbaz on github
  • ResBaz on Instagram
  • RSS
  • Random
  • Archive
  • Got a question? Let's talk.
  • Mobile
Effector Theme — Tumblr themes by Pixel Union