Defender of the Research Universe
I guess I am showing my age here: when I was six, one of the best TV shows around was Voltron: Defender of the Universe.
It had everything a six year old could want: robots that looked like lions that joined together to form a giant new robot that had sword fights with space pirates.

The thing is, the TV show would not have worked if it was about just one of the robot lions working by itself. As the first episode explains, the smaller (but still formidable) robot lions were created by fragmenting the larger, almost godlike, Voltron. The giant Voltron, Defender of the Universe, had much greater power than the sum of the leonine components.

Not long ago, I wrote about LaTeX. Certainly LaTeX is fantastic. However, when I try to convince people to give LaTeX a shot, it sometimes feels like I’m trying to get across the virtues of Voltron by selling one of the robot lions in a used car lot.
Likewise, when I try to convince people that R is better than Excel, or Git is better than Dropbox, it is always like selling a beat up old robot lion collecting dust in a used car lot. People get this look in their eye and I can tell what they’re thinking. They’re thinking, what am I going to do with a beat up old robot lion with dust all over it; it can’t destroy fleets of space pirates, it doesn’t even have a sword; what I really need to help me is the legendary Voltron: Defender of my Thesis.
Now, from using these tools in my own work, what I really want to get across is a vision of Voltron, made up of these other tools which in isolation don’t appear obviously superior to what people already feel comfortable with. I mean, a robot lion might be cool, but if you were told you need a bit of training to pilot one, would you really prefer it to the car you already know how to drive?
The purpose of this blog post is to let you know that Voltron does exist; that although the lions by themselves might not attract you at first, you will not be able to deny the benefits of bringing them together to form a greater whole. A greater whole that together can wield a giant sword for annihilating all the space pirates your supervisors and reviewers can throw at you.

Unconvinced? Okay, we shall have a look at each of the lions. Then we will consider what happens when we bring them together.
The Lions
Git
There are a lot of fancy words that can be used to describe Git. It is:
Forget all that. From now on you should think of Git as the green lion.

The green lion forms the left arm of Voltron. The green lion has the youngest, smartest, smallest and fastest pilot, called Pidge. Git in turn is very fast, a bit technical, and was first developed in 2005, which makes it quite young in computer years.
Pidge’s homeworld was destroyed by nuclear war; likewise Git was born from battle. Once upon a time, people used the excellent BitKeeper software to collaborate on large programming projects. Unfortunately, it was commercial software, and a great war erupted over copyright and licencing issues. Git was created as a free replacement for BitKeeper so that no one had to worry about such matters any more.
There are all sorts of reasons to use Git, such as keeping your work synced between all your machines, and being able to roll back to earlier points in time — but of course you already use Dropbox for all that, right? And it is not like you need to collaborate on large software projects, at least, not at the moment.
That is okay, I am not telling you to stop using Dropbox. Git as a single green lion working by itself is not obviously superior to Dropbox.
What I am asking is, do you just want a single lion, or do you want Voltron? If you want Voltron at your side, then you will need to consider how well Dropbox works with the other lions.
R
R is the red lion.
The pilot of the red lion is a charismatic and reckless trickster called Lance.
As the right arm, the red lion holds Voltron’s sword and other armaments.
Likewise when you use R, it can seem a bit tricksy at first,
but it brings in a wide variety of
external packages that can slay just about any type of data problem.

One way to think of R is as a programming language; and it is that.
However, if you haven’t done programming before,
this is perhaps not the most useful way to think of R.
It is possible to do simple things in R without ever having done programming
before.
If you have ever entered a function into a cell in Excel,
that is a way you can think of R.
The difference is that you won’t automatically see all the data
laid out before you in spreadsheet format.
Instead, like a genie in a bottle, R will keep everything tidily
hidden away until you give it commands.
A typical session in R looks something like this:
- Give a command that reads in some data from a file.
- Give some commands to explore the data, such as averages, standard deviations, and simple plots.
- Give some commands that try out different statistical models on the data.
In addition to having powerful statistical tools, R is able to generate
beautiful graphics such as the following:

Still, you might think, you already can do reasonable things in Excel. The plots might be hard to customise, but you still have your papers accepted by some journals, and publication is all that matters, right? And in Excel you already know your way around all the buttons and options.
Again, I am not asking you to stop using Excel
and I am not trying to say R is always obviously better than Excel.
Instead, what I want you to think about is how much you can trust Excel
to seamlessly work with other tools.
When writing your paper in Word, if you get some new idea for a plot,
can you tell Word to go and re-run your analysis in Excel and have
the new plot automatically inserted at the right place?
(Well, maybe with some time-consuming VBA magic it is at least theoretically possible, but no one would call this seamless.)
LaTeX
LaTeX is the blue lion, Voltron’s right leg, piloted by Princess Allura. Like LaTeX, Allura is the beautiful, strong-willed and principled ruler over a domain with a long history. She welcomes newcomers and helps them acquire full control over Voltron. Like Allura, LaTeX will often seem to demonstrate telepathic powers and will stubbornly resist attempts to make it do things that will not end up looking perfect.

People often try comparing LaTeX to Word. People accustomed to a traditional word processor will be inclined to approach LaTeX with some skepticism. It can seem a little baroque to write everything in plain text scattered through with commands that need converting to a PDF before you can see what the document really looks like.
Once again, this is a matter of focusing on the individual robot lion and forgetting about the power of Voltron. Word is okay for simple tasks, but if you need to kill a horde of space pirates then you will need a robot lion that can play nicely with the rest.
Remember, your thesis advisor is a space pirate.
Your reviewers are all space pirates.
Your thesis is a space ship driven by space pirates who want to kill you.
Do you want to take them all on in a Ford Laser, or do you want Voltron?
An artist’s depiction of Microsoft Word.
The other lions
At this point, I could belabour the analogy further.
No doubt, the strong golden lion on the left leg, supporting the rest of Voltron, could well be Linux. Likewise, the black lion, forming the torso of Voltron, is the Unix command line, holding all the other lions together.
We can discuss these more in another blog post in future; for the moment let’s go with the three lions discussed so far.
The beginnings of Voltron
Git and R
Git is used by programmers to collaborate, keep a history of work, and distribute backups around between multiple machines. It is especially efficient when working with plain text, which is the usual method for writing programs.
R is a programming language which, like all the others,
stores its commands in plain text files.
Thus it makes sense that if you are working in R,
especially if you need to work on multiple computers or collaborate with
other researchers,
that you would use Git to handle the backing up and sharing of R code.
It is true that you can collaborate on data analysis with other researchers
without using R and Git.
However, by using R with Git:
- You will find yourself at liberty to experiment with new types of analysis on your own without letting other people see your mistakes and dead-ends.
- You will always have a local copy of everything you work on without needing to search for extra non-default settings to change.
- You don’t need to entrust your data to a company that could experience outages or security breaches.
- You don’t need to pay extra to keep 100% of the history.
- Furthermore, the history is annotated, so you can easily discover which point in time you want to rewind to if you find yourself going down the wrong track.
R and LaTeX
One common way to try doing research is to perform some analysis in Excel, then copy the results into Word or PowerPoint when you want to present the results. Invariably, whatever is pasted needs further massaging since there are always formatting problems. If you also want to number the figures and tables, you can only cross your fingers that everything will update correctly each time there is a change.
Compared to this, if you are working in LaTeX,
you can use tools called Sweave or KnitR to mix in
any R commands you feel like.
Then the outcome of your analysis will appear directly in the final PDF,
including all your plots.
Let’s say you are using KnitR; it is regarded as a bit nicer than Sweave. Then if you need to update your data or change some parameters of your analysis, all you have to do is run KnitR again. This will rebuild your document from scratch, with the new data and settings, ensuring all the plots and their numbering and cross-references are updated automatically
LaTeX and Git
Like R, LaTeX is written in plain text.
For the same reason that you can (and should) use Git with R,
you can also use Git with LaTeX.
Some interesting things have already been written by others on
what this allows and how to do it most effectively:
For example, since Git makes it very easy to create and merge branches, you can have branches for activities like trying out supervisor and reviewer suggestions, or changing the formatting and bibliography style to better suit particular journals. If you are using LaTeX to prepare cover letters or a curriculum vitae when looking for work, you can create a different branch for each different employer.
Overall, use Git with LaTeX if you are working on a complicated document like a thesis or paper and you have any of the following extra requirements:
- You need to work over multiple computers without relying on an external company to keep your documents safe and available.
- You want to collaborate with other people whose suggestions you will want to try before making a commitment one way or the other.
- You want the freedom to experiment with new ideas and formatting without worrying that it could break something.
- You don’t want to rely on a commercial company to always keep your data both available and safe.
It is not the only way to do things, but working with Voltron is a good way to do things.
Git + R + LaTeX
Hopefully you can see where this is going.
By bringing together all of the above points,
you can now imagine a situation where you
are writing a thesis or paper in LaTeX,
with bits of R code through it that will automate your analysis for you.
You always have a copy of this document on any computer where you are currently working, and so do all your collaborators. No one has to rely on a particular external company to keep everything available and safe.
You and all your collaborators are free to experiment with innovative ideas without affecting anyone else’s work. If anything goes wrong anyone can delete their failures without feeling self-conscious. Everyone gets to choose which contributions they share with everyone else.
Furthermore, you don’t even have to switch around between different programs.
Once Voltron is fully formed, the individual lions are no longer noticed.
In particular, there is a program called RStudio
which has built into it not only R,
but also the ability to write documents in LaTeX
and automatically run KnitR on them,
and an interface for Git.
Rstudio is not the only way to form Voltron but it offers a low barrier to entry.
Finally, all these tools are free, open source, and cross-platform. What this means for researchers is that whatever type of computer you are working on, from now and forever into the future, you will be able to install all this software completely without charge. All the tools you use can be audited for correctness, and all your work will be fully reproducible.
If you want to learn more, come and speak to us in Research Platforms. We might start by telling you about the individual lions, but really we want to empower you to awaken your inner Voltron.
