Research Computing Services Blog

  • Archive
  • RSS
  • Got a question? Let's talk.

Do you wish that your data acquisition can be done easier? Here’s how I do it

by Marcella Purnama

Hi everyone! My name’s Marcella and I had been the videographer of Research Platforms for a year and a half — a role I stepped down from earlier this year. The reason? I wanted to focus on my thesis. And here’s my story.

I’m currently finishing my Master of Publishing and Communications at the University of Melbourne, and even though it’s a coursework program, working at ResPlat inspired me to do the minor thesis stream. I’ve just sent the 13,199-word paper to my supervisor for final approval, and I would be submitting it this Friday.

So yes, I’m a legit researcher now. (I wish.)

My thesis focuses on the audience emotional engagement and how this affects the success of a Young Adult book. In this thesis, I’m trying to figure out whether publishers and authors’ attempts in engaging readers through social media are effective, and how the emotions expressed by readers on Goodreads reviews relate with the success of the books. I draw my case study from John Green’s four YA books.

(Ps. If you’d like to know more about my thesis, I’ve written an article on it for The Conversation.)

But first of all, the data…

image

In determining the ‘audience emotional engagement’, I decided to use the Goodreads reviews of John Green’s four books. These were the things I needed: the reviews, the users’ names and the users’ ratings of the books.

My supervisor had no experience in scraping data from Goodreads, but a colleague of his had used the software OutWit Hub Pro to scrape the reviews. This colleague showed me how to use the program, and I played around with it for a few hours before calling it quit.

Using OutWit Hub Pro, I decided, was a pain.

Here’s how my data from OutWit Hub Pro looks like:

image

The data that I had gotten was not clean at all, and it required many trials and errors in trying to get only several reviews. Plus, I had doubled up the data, for the program scraped both the short and long versions of the reviews. I had headaches then, but I remembered one thing: I worked at a research department.

I came to Yuandra, one of the ResCom coordinators, and asked whether he had a better solution to my problem.

…And he did!

He taught me to use the programming language R to scrape the data.

It took him one coffee meeting to teach me how to run the program. My supervisor estimated that it would take at least a month to get all the data that I needed. I had mine ready in a matter of hours.

Needless to say, he saved my thesis, allowed me to go on a two-month holiday (true story) and gave me the luxury of time. Here’s what my data from R looks like.

image

There’s an easier way to get your data, only if you know who to ask. I’m certainly lucky to know who.

So if you’re grappling with data acquisition, I highly recommend going to our upcoming Data Acquisition Training that is being held on 12-13 May 2016.

There, you can ask questions and talk about the most effective way to get your data. There are talks on survey tools, mobile data collection, scraping and more!

It’s a free training, but do you need to apply for it. Simply go to https://www.eventbrite.com.au/e/introduction-to-data-acquisition-beginners-tickets-24973908633 and you’ll find all the information you need.

Oh, if you’d like more information about Research Bazaar and the trainings we offer, go to melbourne.resbaz.edu.au or tweet us @ResPlat!

    • #Marcella
    • #data acquisition
  • 3 years ago
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+
← Previous • Next →

Portrait/Logo

About

Welcome to the Research Computing Services Blog. We're here to help you do your research better! We'll connect you with the best research tools, workshops, expertise & community. Need more information? Check out our pages below!

https://research.unimelb.edu.au/infrastructure/research-computing-services

Pages

  • About us
  • Sign-up for FREE researcher training HERE
  • ResPlat Training Catalogue
  • Calendar of Events and Trainings
  • CoLab: A New Collaborative Space for Researchers!
  • Mailing List
  • The Research Bazaar 2018
  • #MyResearch Video Campaign
  • Resbook

Me, Elsewhere

  • @ResPlat on Twitter
  • ResBaz on Youtube
  • ResBaz on Flickr
  • resbaz on github
  • ResBaz on Instagram
  • RSS
  • Random
  • Archive
  • Got a question? Let's talk.
  • Mobile
Effector Theme — Tumblr themes by Pixel Union