Stata is a statistical software for data wrangling, visualization, and analysis. It is most commonly used for econometrics, political, social and epidemiological research. Stata enables you to perform a vast range of statistical analysis and present results in a well-structured way. You can choose between using the menu and writing Stata commands with a script, which enables you to easily reproduce your analysis and share it with colleagues or for publications. A major advantage of Stata is that it is easier to learn than Python or R. Stata might be the best choice for you, if your supervisor or colleagues are using it as well. Although Stata offers a very extensive documentation on its build in commands, finding help online is not as easy as it is for Python or R. Another disadvantage of Stata is the lack of built-in commands to perform machine learning, although some user-written commands are now available.
Stata vs SPSS: Which is right for you?
SPSS is similar to Stata in that you can either use drop down menus or use coding to do analysis. Stata generally has better support available from various sources such as the statalist forum, documentation online and built-in help. Stata is less expensive than SPSS, but SPSS deals better with very large datasets. You can request a license and instructions on how to download and install Stata by emailing tech.support@lse.ac.uk.
Stata vs R or Python
Stata is easier to learn than R and Python, providing easier access to running some types of analysis. It is still a popular choice in the social sciences, but R and Python are catching up. R and Python are open-source and freely available, where Stata needs a licence, which can be expensive. R and Python are better options for data visualisation and machine learning but you will have to learn to code in order to fully utilise them.
We also have workshops and self-study courses in SPSS, Python and R. See below if you're not sure which is right for you.
Stata Workshop Series
The Stata workshop series comprises the Stata Fundamentals and Stata Data Cleaning materials. Together, the Fundamentals and the Data Cleaning lessons will teach you the basics of inspecting, summarising, cleaning, analysing and visualising data in Stata.
Each workshop is two hours long, and you will work with fellow learners, utilising your prior experience, web searches, and in-application Help features to find the solutions to real-world problems, with a Stata expert on hand if you get stuck.
Sign up here.
You can choose which skill set to work on from the list below. For learners, who are new to Stata, we recommend to work through the materials in the prescribed order.
Workshops will take place on campus in LRB.R.08 in the lower ground floor of the Library throughout the year.
The Stata Fundamentals series teaches you the basics of using Stata for statistical analysis. After having completed the Stata Fundamentals series you will be able to import and explore data, use do-files to store commands in a script and perform a linear regression analysis. Our Stata Fundamentals workshop series is ideal for those with no or very little prior experience of using commands in Stata.
In Stata Fundamentals 1 - Importing, inspecting and summarising data, you will learn how to:
- change the working directory.
- import stata and csv documents.
- print an overview of the variables in a dataset.
- print a dataset and subsets of the data.
- create a summary of continuous variables.
In Stata Fundamentals 2 - Do files and computing variables, you will learn how to:
- use do-files to create analysis scripts.
- create frequency tables.
- compute new variables.
- save datasets.
In Stata Fundamentals 3 - Linear regression, you will learn how to:
- Use familiar Stata commands to explore a dataset prior to running a regression analysis.
- Create and customize scatter plots.
- Run a simple linear regression and read the analysis output.
- Create scatter plots for a set of variables using a scatter matrix.
- Run a multiple linear regression and read the analysis output.
- Create a correlation matrix.
In Stata Fundamentals 4 - Macros and for loops, you will learn how to:
- use local macros to store and retrieve values.
- use foreach to loop over a set of variables.
- apply foreach loops to create multiple plots and new variables.
- use forvalues to loop over a series of numbers.
- apply forvalues loops to select variables based on numerical suffix.
In Stata Fundamentals 5 - If qualifiers, you will learn how to:
- use if qualifiers to run a command on a subset of your data.
- use a single logical expression as an if qualifier.
- use multiple logical expressions as an if qualifier.
The Stata Data Cleaning workshop series will teach you intermediate level data cleaning techniques to prepare data for statistical analysis in Stata. You will learn about what constitutes clean/tidy data in Stata, strategies to identify unclean data and various techniques to clean data including dealing with missing, corrupted and illegal values, reshaping between long and wide dataformats, reshaping and merging data from multiple sources.
In Stata Data Cleaning 1 - Strings, you will learn how to:
- identify numerical values that have been imported as strings
- modify string values
- convert strings to numerical
- removing whitespace
- split and combining string values
In Stata Data Cleaning 2 - Missing and duplicates, you will learn how to:
- find and delete duplicates.
- identify observations containing missing values.
- correctly label missing values.
- deal with missing values.
In Stata Data Cleaning 3 - Variable and value labels, you will learn how to:
- about the importance of labelling variables and values
- apply variable labels
- create, apply and modify value labels
- turn a string variable into a numerical variable with corresponding value labels
- rename variable names with extracted variable labels
In Stata Data Cleaning 4 - Reshaping data, you will learn how to:
- differentiate between long vs wide data formats
- reshape data from wide to long
- reshape data with one and multiple identifiers
- reshape data from long to wide
- reshape data with string suffixes
In Stata Data Cleaning 5 - Merging data, you will go through a recap of reshaping and also learn how to:
- merge mulitple datasets into one using merge
- use the preserve and restore combination to merge multiple datasets one-by-one
- use a foreach loop to merge multiple datasets one-by-one
- use a more elegant solution to for a complex reshape using two reshape commands in sequence