In conversation with... Andrew Moles, Learning Developer in the Digital Skills Lab has a Q & A with LSE undergraduate and DSL columnist Kara Jessup about the pros and cons of data visualisation with using Excel, Tableau, Python and R.
Could you tell me a little bit more about yourself and what you do at the DSL?
I’m a Learning Developer for data science. We curate pages on Moodle, for example, in which we find good resources and advertise them. I also make materials, I predominantly design materials in R and a little bit of SPSS and MATLAB as well, and I do some teaching, so I create materials for workshop series or tailored trainings. We work closely with academic departments to provide pre course training, which are mostly in Python and R. I also manage a team of trainers.
Could you tell me a bit more about what led you here, what led you to R and data science?
I learned MATLAB originally, which is a programming language mostly used in academia. I learned it for my undergraduate in Psychology to make behavioural experiments. Then I did a Masters and learned MATLAB properly to make my own experiments from scratch.
Then for my dissertation for my Masters I was doing a project on genetic sequencing and I did a review of the tools and found that R was the best tool for the job. From there it was a mega steep learning curve going from MATLAB to R, it was pretty hard. After a little bit of playing around with it, I loved it. It was so much easier to use than MATLAB.
For what I like to do, which is working with data, R is better, I find it nicer to work with. I ended up just using R predominantly from then on.
What is Python/R data visualisation useful for (what kind of projects, what kind of careers...)?
Data visualisation has many uses such as visualising statistical information, networks, maps, tables, projects such as Gannt charts, and even making artwork!
Traditionally, data visualisation has been used to visualise statistics. Florence Nightingale in 1858 used data visualisation to demonstrate the importance of clean hospitals in the Crimean war; there is also a great podcast by Tim Harford about this topic. Data visualisation is also a crucial step in reviewing your data before you perform inferential statistics, as this article called Same Stats, Different Graphs by Justin Matejka and George Fitzmaurice demonstrates. Data can have the same statistical information (mean, range and correlation) but look very different visually. Both Python and R provide flexible environments to enabling you to make some really excellent and informative visualisations that are fully reproducible, repeatable, and open-source.
Any that career that involves working with data can include data visualisation, this includes finance, data scientists, academia, data analysts and engineering.
Do you need to have knowledge of Python/R to take these sessions or are they for beginners?
In order to take the data visualisation courses in R and Python, you will need some knowledge of R or Python. For Python you should be comfortable with the pandas library such as loading and working datasets. For R you should be comfortable with the data frames, indexing, conditional operations, and the dplyr library.
What makes Python/R different from data visualisation in Excel, Tableau...
The main difference is you will need to use coding in order to make visualisations in R and Python, meaning the entrance level is higher as you have to learn a fair amount in order to make visualisations.
Excel and Tableau have drag and drop interfaces, which allow you to make good quality visualisations quickly. However, Tableau needs a license which comes at a cost; Excel does too (although most workplaces have Excel these days). The main downside of Tableau is the data manipulation (preparation for visualising) isn't great, so a lot of people use Excel in conjunction with Tableau.
R and Python are more flexible than Excel and Tableau, you can do almost anything in terms of visualisation with them! R and Python are open source, and allow reproducible and repeatable visualisations. This means that you can share your code and someone else will be able to make the same visualisation as you, or you can go back a year later and make the same visualisation even if you forgot how you made it; both of which are great things. Further, R and Python have integration with document making software (Markdown) to allow you to make reports with your visualisation and explanations without much effort, and again all fully reproducible and repeatable. You can integrate R and Python with Tableau and Excel if you want.
What is different between data visualisation in Python or in R?
R is widely considered the best tool for data visualisation, much of this is due to the ggplot2 package and extensions which are based on the grammar of graphics. Making visualisations in Python by comparison is more convoluted and the results can be less visually pleasing. There is now a ggplot2 extension available in Python called plotnine, however it is still in development and is far more limited than ggplot2 in R. Both Python and R can use the excellent plotly package, which makes interactive visualisations.
How many data visualisation classes are there for Python?
We are currently running 3 data visualisation workshops in Python and 2 in R. All courses are delivered both online and in person and can be found on our Python and R web pages.
If you had one piece of advice to offer students while they're still at LSE, what would it be?
I would say, be curious about things. A lot of students at LSE are very career-driven, which is very inspiring, so I wouldn't say be driven because they already are, but being curious about what they can do. Find good blogs to look at, test stuff out, try some challenges. For example, if you want to get better at programming, a nice, friendly way of doing it is the Tidy Tuesday challenge (there is also the Makeover Monday challenge). Every week, a data set is released, and you have to do some analysis and cleaning and present your results. You can send it to an R studio community, and that is a friendly way of seeing what other people have done, and then people share their code.
Try and be curious as much as you can, be curious about what other people are doing and other techniques that are happening, go to as many trainings as you think are relevant to you, go to talks, find blogs to follow, and in general just be interested in what's going on.