R

 

R is an open-source, Widley used statistical programming language, that is easy to learn and, thanks to its many extensions, can be used as a general-purpose programming language. R was first released in 1993 and has since risen greatly in popularity, being widely used in academic institutions as well as companies such as the BBC, NHS, Google, Facebook, Twitter, Microsoft, Wellcome Sanger Institute, New York Times and Mozilla. 

R is simple to use and easy to read, this makes sharing your code easier and enables you to write your code faster. This is taken further by the Tidyverse ecosystem within R, that provides even easier to read code as well as excellent documentation. R is one of the best choices for data science and machine learning due to its wide pool of libraries for statistics, data manipulation and wrangling, data visualisation, and modelling; it is used across many sectors such as finance, healthcare, technology and retail for these purposes. R is the gold standard choice for data visualisation in data science thanks to the ggplot2 library and its many extensions. R also has one of the best open-source Integrated Development Environments (IDE) available in RStudio, not only does it make programming in R easy, it makes it simple to create documents with your code and outputs in various output formats such as HTML, Word, Powerpoint, and PDF.  

The Digital Skills Lab  is currently running two series of R workshops: 

  • R Fundamentals (7 part series)
  • R Data Wrangling (3 part series) 
  • R Data Visualisation (2 part series)

Workshops will take place online and in person throughout the year. Click on the links below to book your place or express an interest so that you are notified as soon as workshops are available to book. 

Please note: the content for the online and in-person workshops are the same. Unless (online) shows up in the workshop booking link, the session will take place on campus in LRB.R.08 in the lower ground floor of the Library following current government guidelines, i.e. masks and social distancing measures will be in place to stop the spread of coronavirus. 

We have also curated some self-study courses that can be accessed via our Moodle page.  

If you can't find what you're looking for below, please email digital.skills.lab@lse.ac.uk or attend one of our drop-in sessions for advice. 

We also have workshops and self-study courses for Python. See below if you're not sure which is right for you. 

Python vs R: Which is right for you? 

Python is another very popular programme language for data science and both have their pros and cons without there being a clear winner. Python is easier to integrate with other software and is more versatile. Python is the better choice when it comes to web applications, software development, task automation and integration of analysis with web applications and production databases. The R analysis ecosystem is superior and provides a much larger number of libraries specialized in different types of statistical analysis and its ggplot library is still the gold standard when it comes to data visualization.  

  • Programming experience: Python and R are both easy to learn with many online resources available to continue to learn independently. If you already know another object-oriented programming language Python might feel more natural to you. 

  • Your environment: Which programming language do your peers, fellow students or teachers use? What is more common in your field of study? Do future employers or job sectors that you might target after graduation have a preference? 

  • Your goals: Some models and visualizations might be better supported in R than Python. Also consider your long-term career goals. If you aspire to become a software developer, you might prefer Python. 

R Fundamentals

The R fundamentals series teaches you the basic skills that form the foundation of statistical programming projects in R. After completing this series, you will be familiar with R syntax, feel comfortable with the key data types, know several built-in functions, be able to do simple manipulations of data, and independently solve programming problems using R. This series of workshops is ideal for those with NO prior experience or those looking for a refresher.

R  Fundamentals 1: Numerical Variables (Workshop)

(Formerly R 1: Numerical Variables)

In this workshop you will be introduced to the basics of working with RStudio and how to use numerical variables in R. RStudio is a Integrated Development Enviroment (IDE) that is popular for programming in R. Variables contain one type of data, which will just be numerical data for this workshop.

By the end of this session you will be able to:

  • Familiarise yourself with the RStudio enviroment and R Markdown notebooks
  • Perform calculations
  • Make variables
  • Perform calculations with variables

Click on the link below to check availability and book your place: 

R Fundamentals 1: Numerical Variables 

R Fundamentals 1: Numerical Variables (online) 

R Fundamentals 2: Vectors, Functions and Indexing (Workshop)

(Formerly R  2: Vectors, Functions and Indexing

In this workshop you will be introduced to the vector data type, functions, and how to extract information from vectors. A vector contains information from one type of data, such as numerical or integer, and is one dimensional. We will learn how to extract elements from a vector using numbered indecies. Functions are something you will use all the time when using R, which comes with lots of built-in functions, such as mean or sum. A function is a set of instructions packaged together to do a task or acheive a specific outcome.

By the end of this session you will understand how to:

  • Use functions such as mean or sum to perform calculations
  • Create a vector
  • Extract information (index) from a vector
  • Perform calculations with vectors

Click on the link below to check availability and book your place: 

R Fundamentals 2: Vectors, Functions, and Indexing 

R Fundamentals 2: Vectors, Functions, and Indexing (online) 

R Fundamentals 3: Strings, Factors and Type Conversion (Workshop)

(Formerly R 3: Strings, Factors and Type Conversion)

In this workshop you will be introduced to the string and factor data types, and will learn how to perform type conversion. The string data type is text based data such as your name or country of birth. Factors are categorical variables, sometimes called dummy variables, that have a set of categorical information, such as day of the week, and an accociated integer value; they are often used in survey data and are important for many types of analysis in R.

By the end of this session you will understand how to:

  • Create string data
  • Manipulate string data
  • Create a factor and change the order of the categories
  • Perform type conversion

Click on the link below to check availability and book your place: 

R Fundamentals 3: Strings, Factors, and Type Conversion 

R Fundamentals 3: Strings, Factors, and Type Conversion (online) 

R Fundamentals 4: Data Frames (Workshop)

(Formerly R 4: Data Frames Part 1)

In this workshop you will be introduced to data frames, learning how to make, extract and add information to a data frame. Data frames have a tablular structure, like an excel spreadsheet, and are the main data type you will use in R.

By the end of this session you will understand how to:

  • Make a data frame in R
  • Add new columns and rows to a data frame
  • Extract (index) information from a data frame
  • Change a data's column names

Click on the link below to check availability and book your place: 

R Fundamentals 4: Data Frames 

R Fundamentals 4: Data Frames (online) 

R Fundamentals 5: Loading data and packages  (Workshop)

(Formerly R 4: Data Frames Part 2)

In this workshop you will be introduced to loading and exporting data into R, and how to work with directories. We introduce how to load different types of files into R using packages. Packages are extensions to R that allow us to do more than the standard R enviroment allows, and are something you will use a lot when using R. We also introduce how to use the projects feature of RStudio, which helps you manage directories and file loading in R.

By the end of this session you will understand how to:

  • Load in csv and excel data, locally and from the web, into R
  • Export csv and excel data
  • Install and load libraries
  • Set up an RStudio project for directory management

Click on the link below to check availability and book your place: 

R Fundamentals 5: Loading data and packages 

R Fundamentals 5: Loading data and packages (online) 

R Fundamentals 6: Conditionals and Logic (Workshop)

(Formerly R 5: Conditionals and Logic)

In this workshop you will be introduced to conditional operators and if-else statements, both of which are a crucial part of data science. Conditional operators, such as equal too or greater than, perform tasks depending on if a condition is true or false which is very useful when performing filtering on your data. If-else statements are used to control the flow of your analysis or make conditional changes to your dataset.

By the end of this session you will understand how to:

  • Perform conditional calculations
  • Filter your data using conditional operators
  • Categorise your data using an if-else statement

Click on the link below to check availability and book your place: 

R Fundamentals 6: Conditionals and Logic 

R Fundamentals 6: Conditionals and Logic (online) 

R Fundamentals 7: Lists and Matrices (Workshop)

In this workshop you will be introduced to the list and matrix data types, where we will learn how to use, make, and access information from them. The list data type is a multi use data type that is often used in data science for storing information from statistical tests, so learning how to access information from lists is vital. The matrix data type is a simple form of a data frame, containing only one data type, that is often required to run some statistical tests.

By the end of this session you will understand how to:

  • Create and index matrices
  • Perform calculations with matrices
  • Create and index lists and list of lists
  • Convert a list and matrix into another data type

Click on the link below to check availability and book your place: 

R Fundamentals 7: Lists and Matrices  

R Fundamentals 7: Lists and Matrices (online)  

R Data Wrangling

The R data wrangling series will give you the skills to do more complex manipulations and wrangling of data, with a focus on introducing the Tidyverse ecosystem for data science in R. The Tidyverse makes working with data far simpler, providing powerful data cleaning and wrangling tools, with easy to use syntax; this allows you to perform advanced data cleaning and wrangling techniques without being an expert in programming. The Tidyverse is a big part of why people love to use R, and prefer it to other programming languages like Python.

After completing this series, you will be familiar with Tidyverse syntax and functions, understand how to transform data, be able to perform aggregations, and know about mutating joins. This series of workshops is ideal for those with some knowledge of data types and conditional operations in R (covered in R Fundamentals 1 to 7) or those looking for a refresher.

R Data Wrangling 1: Pipes and introduction to dplyr (Workshop)

In this workshop you will be introduced to the Tidyverse through the dplyr package, learning how to use pipes, select columns, and filter rows. Pipes are designed to help you chain together sequences of data cleaning steps, making your code easier to read. Selecting columns and filterering rows are two of the key steps in any data cleaning and wrangling process.

By the end of this session you will understand how to:

  • Use the pipe operator to chain together data cleaning steps
  • Select columns you want from your data
  • Conditionally filter your data to keep only the rows you want

Click on the link below to check availability and book your place: 

R Data Wrangling 1: Pipes and introduction to dplyr  

R Data Wrangling 1: Pipes and introduction to dplyr (online)  

R Data Wrangling 2: Data wrangling with dplyr (Workshop)

In this workshop you will be introduced to the mutate function from dplyr, and how to change column names. The mutate function is designed for manipulating columns of data, this means the creation, modification, or removal of columns; all of which are cruicial steps in the data wrangling process. Changing and cleaning column names is important when you are dealing with messy data, but can be a time consuming process. We show you how to make this easier and faster using the rename function and the janitor package.

By the end of this session you will understand how to:

  • Use mutate to add new columns with calcualtions
  • Use mutate to modify and remove columns
  • Change and clean column names

Click on the link below to check availability and book your place: 

R Data Wrangling 2: Data wrangling with dplyr continued  

R Data Wrangling 2: Data wrangling with dplyr continued (online)  

R Data Wrangling 3: Joining and aggregation (Workshop)

In this workshop you will be introduced to relational joins, cross tabulation, and aggregation of data; all of which are some of the most used concepts in data science. Relational joins bring together two related datasets that contain some matching columns, usually in the form of an ID. Cross tabulation is the counting of an occurance of a categorical variable. Aggregation is the summarising of data to find the sum or average of a quantitative variable, and can be grouped by a categorical variable to provide a summary per category.

 By the end of this session you will understand how to:

  • Join two related datasets
  • Perform a cross tabluation on a categorical variable
  • Perform a grouped aggregation to find an average across a caterogical variable
  • Perform an aggregation across columns

Click on the link below to check availability and book your place: 

R Data Wrangling 3: Joining and aggregation 

R Data Wrangling 3: Joining and aggregation (online)  

 

R Data Visualisation

R is widely considered the best tool for data visualisation. This R data visualisation series teaches you how to use the excellent ggplot2 package, and some of the many extensions that are available, to generate visualisations such as histograms, scatter, line, bar, and box plots.

The ggplot2 package is designed around the grammar of graphics (book by Leland Wilkinson), which is a layered approach to data visualisation. This means you compose visualisations by combining independent components. This makes ggplot2 both powerful and flexible, allowing you to make any visualisation.

R Data Visualisation 1: Data viz with ggplot2 (Workshop)

In this workshop you will be introduced to data visualisation with the ggplot2 package, learning how to make scatter and bar plots. You will also learn how to alter the aesthetics of your visualisations such as the colour. Scatter plots are designed for visualising the relationship two quantitative variables, but we will also show other uses. Bar plots are designed for visualising categorical variables or summary statistics.

Learning Objectives

By the end of the session you will able to:

  • Make basic visualisations with ggplot2
  • Make bar plots
  • Make scatter plots
  • Change colours and other features in your visualisations

Click on the link below to check availability and book your place: 

R Data Visualisation 1: Data viz with ggplot2  

R Data Visualisation 1: Data viz with ggplot2 (online) 

R Data Visualisation 2: Box, histogram and line plots (Workshop)

In this workshop you will learn how to make box plots, histograms, and time series visualisations (line plots) with ggplot2. We will also cover how to work with dates and split your plots into facets using categorical variables. Box plots are designed to visualise the relationship between a quantitative and categorical variable. Histograms are designed to show the distribution of a quantitative variable. Line plots are designed to show the change of a quantitative variable over time.

Learning Objectives

By the end of the session you will be able to:

  • Make box plots
  • Make histograms
  • Work with dates
  • Make time series (line) plots
  • Split your plots into facet grids

Click on the link below to check availability and book your place: 

R Data Visualisation 2: Box, histogram and line plots  

R Data Visualisation 2: Box, histogram and line plots (online)