Investigating video complexity using Natural Language Processing and Machine Learning
We have spoken to PhD candidate Emad Alghamdi from the School of Languages and Linguistics about his research project.

What is your PhD Research Project?
I am trying to answer a deceptively simple question: what makes a video complex or difficult for language learners. My study of that question is very challenging at many levels, for one, I am dealing with a dynamic and multifaceted type of data, videos.
What prompted you to choose this research topic?
Before starting my PhD, I worked as an English teacher for non-native learners for almost six years. As a teacher, I came to know that students like to watch videos and they learn much from watching videos. But whenever I looked for videos on the Internet, it was always a challenge to find videos that are not too difficult for my students. The process was tedious and time-consuming and I always wished that if there was an automated tool that can help me find the right videos with less effort. So I decided to take up the challenge and build one for myself and for all language teachers and practitioners.
What are some of the challenges you have faced or overcome in your research project to date?
With the aim of developing a prediction model of video complexity, I searched for an approach that could help me make sense of the data (videos) and I found Machine Learning to be the most appropriate approach for the task. But Machine Learning is an emerging and active field and it is very challenging to keep up with the recent approaches and techniques.
Another challenge I faced is that I could not find a video dataset that I can play (experiment) with. So I built a video dataset myself and thought I overcame my biggest challenge. Not long after I started analysing my data, I knew I had a very challenging problem on my hands. Hopefully, I’ll get through the analysis phase soon.
What digital tools do you use to use to help analyse your research data?
To remind you, I am analysing videos (a lot of them) which are generally made of three components: language, picture and sound. I use different tools for each component. At the moment, I am focusing on analysing the language component using advanced NLP tools such as TAACO, TALLES, and Coh-Matrix.
I am also using many great Python libraries for data pre-processing, presentation, and visualisation such as NumPy, Pandas, Matplotlib, and Seaborn. For building ML models, I have been exploring Scikit-learn and TensorFlow.
Have you attended any workshops at the university to learn how to use the digital tools you need?
I am SO fortunate to be a resident at Digital Studios where all fascinating workshops and seminars are happening. I have learnt a lot from attending those workshops and others organised by Research Platform Services. I recommend every student to benefit from such wonderful workshops. You never know what doors these workshops may open to you.
1 Notes/ Hide
swedish-mathematician liked this
resbaz posted this
