Methodology - GitHub

Milena Tsvetkova, Department of Methodology-

The majority of our students did not have a GitHub account before they took one of our courses. We have noticed that some of them have continued using the service to track various software projects or share code they have developed outside of the course.

 

Names

Milena Tsvetkova, Kenneth Benoit, Pablo Barbera

Department

Methodology

Overview

We introduced the free online repository and version control service GitHub to share course materials, submit summative assessment, and provide feedback for most of the computational methods courses at the Department of Methodology. This innovation enables learning-by-doing and facilitates collaboration.

Target audience

The intervention concerns data science graduate students from the Statistics and Methodology departments and graduate students from across LSE who are interested in learning computational methods. We introduced it to MY470 Computer Programming (32 students in MT2017, 58 students in MT2018), the 2017/18 version of ST445 Managing and Visualizing Data (31 students in MT2017), MY472 Data for Data Scientists (42 students in MT2018), and MY459 Special Topics in Quantitative Analysis: Quantitative Text Analysis (29 students in LT2018, 37 registered for LT2019). 

Details

While developing two new courses on computational methods, we considered using GitHub rather than Moodle for sharing course materials and GitHub Classroom for managing student assignments. GitHub is a code hosting online platform for version control and collaboration that is based on the version control system git. The service and the technology are widely used by software engineers and data scientists both in academia and in the industry and we decided that it is one of the essential skills data science students should master. On May 12, 2017 we had Joe Nash from GitHub present a seminar on using GitHub Education (the event was sponsored by SEDS). We studied and tested the service over the summer and introduced it for the new courses MY470 and MY472 in MT2017, as well as for the existing MY459 in LT 2018 

For all three courses we use GitHub to store and update lecture and class materials, which means that they are available year-round not only to LSE students but to anyone online (e.g. https://github.com/lse-my470). We use the lectures and classes in the first week to introduce students to the system and teach them how to use its core functionality to copy and annotate course materials, access and submit assignments, and view feedback. In addition, in MY470 and MY472 we also introduce the system as a collaboration environment by requiring students to work in pairs for 2-3 of the summative assignments.

Further, GitHub provides unique functionality for managing assignments involving data and programming scripts. We create a repository online with the assignment text, starter code, and required data and e-mail students a link to it. When students accept the link, GitHub duplicates the content into a new private repository that only the students (or the team of students) and the course instructors can view and edit. In the case of team assignments, GitHub offers a number of tools for tracking and reviewing changes and discussing issues. Students use the core GitHub operations to download their online repository locally onto their own machine, edit the files, and upload their solutions. On our end, we use the same operations to edit the student’s files and provide comments and marks. GitHub allows us to use simple scripts to automate the tasks related to managing student assignments, which is invaluable given that the computational methods courses heavily rely on continuous summative assessment.

Because GitHub can also serve as a free web server, we also used this platform for the module-specific websites (e.g. https://lse-my459.github.io/). 

The motivation behind this innovation was twofold: 1) to use the course pragmatics as a way to introduce students to a new useful technology and 2) to use technology to enhance students’ learning experience.

Because it is designed around git, which at its core is a revision control tool, GitHub also makes the entire code base and all changes completely transparent to instructors and students in the course. This serves a valuable educational function in its own right, in the same way that open-source software generally (for which GitHub is the world’s largest repository) serves to educate users about programming and motivate their participation.

When students saw a typo or possible improvement in the course materials, for instance, rather than sending an email, they could actually submit a correction with comments via GitHub, which could then be reviewed and accepted or amended by a course convenor. This act of helping to fix a problem is a much more participatory learning experience than simply reporting the problem, and much more in line with how computational teamwork happens in enterprise and other development environments.

Impact

It is common for programmers and data scientists to maintain an active GitHub account as a signal to employers of their skills and productivity. The majority of our students did not have a GitHub account before they took one of our courses. We have noticed that some of them have continued using the service to track various software projects or share code they have developed outside of the course. 

Next steps

We plan to expand this provision to a new computational methods course we are designing.

The power of this intervention is that we embedded a very useful tool in the course logistics, thus making students learn and practice without them even realizing it. This idea could be translated to qualitative and other quantitative courses, for example, by requesting students to submit assignments using a medium and a format that are typical for the field. For example, instead of essays and tests, students could prepare reports, write grant proposals, compose media releases, or deliver slide presentations.