Text Analysis Tools

This page will outline the various resources available for text analysis with both a programmatic and non-programming approach.  The below flow diagram (click to enlarge) should help you to decide which approach is best for you and which software you can use.

 If you need any help we are running daily drop-in sessions via Teams. If you need help with a specific tool not listed, please email digital.skills.lab@lse.ac.uk.

 

text_analysis

Text Analysis Tools for Qualitative Data

Qualitative data is defined as non-numerical data, such as text, video, photographs or audio recordings. This type of data can be collected using diary accounts or in-depth interviews, and analyzed using grounded theory or thematic analysis. See the Data Collection Tools section for guidance on collecting data.

NVivo

NVivo is a commercial software that provides tools for qualitative and mixed methods analysis. Features of NVivo include: 

  • Management and analysis of textual, audio, video, image, email, and spreadsheet data 
  • Tools for automatic identification of themes and sentiments 
  • Text and coding queries tools, as well as cross-tabulation 
  • Make mind, concept, and project maps 
  • Create visualisations such as word clouds, tree and sunburst maps, cluster maps, and bar or pie charts 

The Digital Skills Lab has commissioned a self-study Introduction to NVivo Moodle course which covers the basic steps needed to set up an NVivo project, add textual data and code. The course consists a series of videos together with step-by-step guides, some sample data and a completed NVivo project. QSR have a created a number of YouTube playlists of tutorials and webinars for multiple versions of NVivo for both Mac and Windows.

Please contact DTS to request a personal license. 

QDA Miner

QDA Miner is a commercial software used for mixed methods analysis. It provides quantitative tools for analysing qualitative data, as well as the common qualitative tools for qualitative data analysis that you see with software like NVivo. QDA miners features include:

  • Powerful and flexible tools to assist with coding and dealing with coded data
  • Easy to use text searching and writing tools
  • Easy to use exporting features (exporting data, visualisations, or tables for example)
  • Geocoding, allowing the linking of geographical locations and corresponding temporal dimensions
  • Wide variety of visualisation types and styles
  • Add-on features such as Wordstat
  • Able to handle multiple data types in multiple formats, such as text, image, database, reference manager (for example Endnote, Mendeley or Zotero), web survey data, online sources such as social media

Provalis Research have a number of free video tutorials and demos that will help you familiarize yourself with the basics and some advanced features of QDA Miner.   

QDA Miner is available on the LSE Remote Desktop. Please contact DTS to request a personal license.

Quantitative Data

Quantitative research involves the process of objectively collecting and analyzing numerical data to describe, predict, or control variables of interest.

QDA Miner

QDA Miner is a commercial software used for mixed methods analysis. It provides quantitative tools for analysing qualitative data, as well as the common qualitative tools for qualitative data analysis that you see with software like NVivo. QDA miners features include:

  • Powerful and flexible tools to assist with coding and dealing with coded data
  • Easy to use text searching and writing tools
  • Easy to use exporting features (exporting data, visualisations, or tables for example)
  • Geocoding, allowing the linking of geographical locations and corresponding temporal dimensions
  • Wide variety of visualisation types and styles
  • Add-on features such as Wordstat
  • Able to handle multiple data types in multiple formats, such as text, image, database, reference manager (for example Endnote, Mendeley or Zotero), web survey data, online sources such as social media

Provalis Research have a number of free video tutorials and demos that will help you familiarize yourself with the basics and some advanced features of QDA Miner.   

QDA Miner is available on the LSE Remote Desktop. Please contact DTS to request a personal license.

WordStat

Wordstat is an add-on to QDA miner, that provides sophisticated content analysis and text mining tools. Wordstat is ideal for those looking to quickly extract and analyse information from large amounts of data. Some of its features include:

  • Text mining tools that provide fast extraction of themes and patterns
  • Ability to handle multiple different data types, such as documents, online sources, data files, web surveys, and reference managers
  • Topic modeling
  • Able to relate text with structured data
  • Make or use dictionaries to categorise words or phrases
  • Integration with machine learning models such as K-Nearest Neighbours

Provalis Research have a number of free video tutorials and demos that will help you familiarize yourself with the basics and some advanced features of WordStat. 

Please contact DTS to request a personal license.

 

Text Analysis with Python

If you're not familiar with Python but would like to learn for text analysis, the Digital Skills Lab offers online taught workshops that are run weekly during Michaelmas and Lent terms, and on an ad hoc basis the rest of the year. If there are no workshops running or you would prefer self-study on your own time, the Digital Skills Lab have curated a number of free resources which can be found in the Python Collection on Moodle.

NLTK

Python is an open-source general purpose programming language, with many software extensions known as libraries. Text analysis in Python is achieved using the Natural Language Toolkit (NLTK) library. NLTK is a free, open-source platform for building Python programs to work with human language data.

The NLTK creators have a free open source tutorial and documentation on what you can do with the NLTK library. The Digital Skills Lab are working on an introductory course and we recommend you sign up to the newsletter so you are notified once it is available.

Proquest TDM

The LSE Library has secured a licence to use ProQuest TDM Studio. ProQuest TDM Studio is a platform for text and data mining across the Library's current ProQuest subscriptions. Features of the ProQuest TDM Studio platform include:  

  • Access data from the ProQuest library of resources, which includes videos, books, journal articles, and newspapers  
  • Use pre-made scripts to extract the data you want 
  • Either extract the data to use in a different platform, such as NVivo or QDA Miner, or do analysis within the TDM Studio environment using R and Python  
  • The workbench dashboard feature allows analysis to be done across the majority of the Library's ProQuest subscriptions using R and Python 
  • Visualisation dashboard has a graphical user interface with pre-built visualisations 

ProQuest TDM Studio is available by request only. To request an account, contact the data library

 

Text Anaysis with R

R is an open-source programming language for statistical computing and graphics, with many software extensions known as libraries or packages. There are two libraries that have been built for text analysis with R: Quanteda and Tidytext and information on each is below. 

Not sure whether to use Quanteda or Tidytext? If speed is an important consideration, then Quanteda is the package you should go for. Both packages link up well with Tidyverse such as use of tibbles and pipes.

If you’re not familiar with R but would like to learn it for text analysis, the Digital Skills Lab offers online taught workshops that are run weekly during Michaelmas and Lent terms, and on an ad hoc basis the rest of the year. If there are no workshops running or you would prefer self-study on your own time, the Digital Skills Lab have curated a number of free resources which can be found on Moodle, such as Codecademy’s learn R course. The Digital Skills Lab would also recommend the R for Data Science book, which is available for free online.

Quanteda

Developed by Professor Ken Benoit, Director of the LSE Data Science Institute, the quanteda package has a useful set of tools for processing and analysing textual data. These free quanteda tutorials will help you to:

  • Learn how to import various types of text data into R
  • Learn basic operations in quanteda
  • Learn how to perform statistical analysis using quanteda
  • Learn how to combine operations and analysis in quanteda
  • How to derive latent positions from text data and how to classify documents
  • Learn pre-processing of texts in different languages

Tidytext

Developed by Julia Silge, Data Scientist for RStudio, the Tidytext package like quanteda has a useful set of tools for processing and analysing textual data. These free tidytext tutorials will help you:

  • Practice important data handling skills with textual data
  • Learn about the ways text analysis can be applied
  • Extract relevant insights from real-world data.

The tutorial is organised into four case studies, each with its own data set:

  • Transcripts of TED talks
  • A collection of comedies and tragedies by Shakespeare
  • One month of newspaper headlines
  • Song lyrics spanning five decades

Further support can be found in the tidy text mining book, which available online.

Proquest TDM

The LSE Library has secured a licence to use ProQuest TDM Studio. ProQuest TDM Studio is a platform for text and data mining across the Library's current ProQuest subscriptions. Features of the ProQuest TDM Studio platform include:  

  • Access data from the ProQuest library of resources, which includes videos, books, journal articles, and newspapers  
  • Use pre-made scripts to extract the data you want 
  • Either extract the data to use in a different platform, such as NVivo or QDA Miner, or do analysis within the TDM Studio environment using R and Python  
  • The workbench dashboard feature allows analysis to be done across the majority of the Library's ProQuest subscriptions using R and Python 
  • Visualisation dashboard has a graphical user interface with pre-built visualisations 

ProQuest TDM Studio is available by request only. To request an account, contact the data library