A busy week in data mining (Part 2: data acquisition)
By Kim Doyle
On Thursday we got straight into data acquisition with Yuandra and his wonderful team of data miners, including Naomi and Edith (who was wise enough not to give me her Twitter handle)
Here is a cute picture of the four (five) of us:
The data miner team for Data Acquisition @ResPlat training yesterday @iniandra @nomisutanto pretty adorbs 😊 pic.twitter.com/6TTMVjyz8Q
— Kim Doyle (@kim_doyle1)April 8, 2016
Yuandra kicked us off with a discussion about data acquisition. We talked the methods we currenctly use, the problems we face and some solutions provided by Google survey forms.
.@iniandra introducing data acquisition with Google forms @ResPlat pic.twitter.com/zzXv6oyfED
Next, Naomi got has started with some of the practicals. First, connecting to Google drive and opening a blank survey document (pssst! It’s the pretty purple one).
.@nomisutanto takes the stage with our #DataMiner platy for data acquisition training woo! @ResPlat pic.twitter.com/z8g6CxfD81
— Kim Doyle (@kim_doyle1)April 7, 2016
Next, we were encourage to create a survey topic and title.
.@nomisutanto let me chose whatever survey question I wanted for @ResPlat data acquisition course… pic.twitter.com/jaxBh1OUgY
— Kim Doyle (@kim_doyle1)April 7, 2016
Here are some other examples from our surveys participants:
Sam http://goo.gl/forms/sB0OvUL8Vh
Nusrath http://goo.gl/forms/qeKLUhcOMg
Gerard http://goo.gl/forms/qCSi7P8JQE
Naomi taught us a number of quite sophisticated survey design skills, inlcuding: creating a consent page at the beginning; essential for ethics approval for researchers, grouping questions depending on previous responses and validating participant data. Now, this last task required us to use some Regex and understand the basic principles of regular expressions; not an easy task.
.@nomisutanto taking a fabulous lesson on regular expressions that I’m totally gonna steal 4 NLTK training @ResPlat pic.twitter.com/WGtcTEvU7r
— Kim Doyle (@kim_doyle1)April 7, 2016
Naomi explained that regular expressions (Regex) are codes to detect patterns in data. Naomi taught us how to use these expressions to detect whether our survey participants were entering an Australian postcode. Basically, mind-reading (not really, though). If you’re interested in brushing up on your Regex skills, here is the website recommend by Naomi: http://regexr.com/
We wrapped with some discussion and questions from the floor.
.@nomisutanto taking impromptu Q&A at Data Acquisition course #awesome @ResPlat pic.twitter.com/EFUfhyQoip
— Kim Doyle (@kim_doyle1)April 7, 2016
The second half of the day focused on mobile data acquisition.

Yuandra introduced us to the world of mobile data collection. Due to the convergence of telecommunications, mircoelectronics and computing, multimedia devices (i.e. your smartphone) are ubiqtuous. There are now many remote places in the world that although they do not have access to digital frasturcture that we take for granted, can afford a smartphone. Yuandra gave us some examples field surveys in development studies that would not be possible without mobile data collection. See here for examples: KoboToolBox
Then Naomi taught has how to design our own mobile survey.

Ooops, not that photo. That’s Naomi laughing at my choice of survey question…
This one!

Now we have a bunch of data, what to do with it…
Next: Part 3: data cleaning
Previous: Part 1: nltk
