Skip to main content

Library spaces have reopened. For more details see Library COVID-19 updates.

Text and data mining

Analyse large scale text or datasets in your research

Data mining is the process of applying open-ended computational methods to large scale datasets to discover new insights that may not be revealed through targeted smaller scale analyses. When the datasets used are bodies of text, this process is often termed text mining and can provide a complementary approach to traditional close readings of texts. Text and data mining (TDM) approaches can open up new areas of scholarly enquiry.

Before you start

Before you get started with TDM make sure that you:

  • Understand and have considered any issues around copyright and licensing conditions for the content that you wish to use
  • Understand and have considered any ethical concerns that might arise from your use of the content, particularly when linking datasets or working with sensitive information
  • that you comply with data providers’ preferences for how to access their content

Further information about these considerations can be found on the step by step guide to text and data mining.

Library licensed data sources and tools

Text and data mining is permitted in a number of the databases that the Library provides access to for University staff and students. Check out the full list of databases available for text and data mining, including licence and access conditions, to see which might be useful for your project. Some data sources may require considerable time and work to apply for, access, and prepare the data before they are mining ready, so ensure that you factor this into your project timelines. Please note that Factiva doesn’t allow text and data mining.

The Library provides access to the Gale Digital Scholar Lab. The Digital Scholar Lab allows you to clean and apply TDM methods to some Gale Primary sources content, or you can upload content from elsewhere to the Lab. No programming is required to use the Gale Digital Scholar Lab.

Help with TDM

Get started by checking out the step by step guide to text and data mining.

Library

The Library can support you with:

  • Understanding text and data mining concepts
  • Finding out which library licensed data sources can be mined
  • Advice on forming a search strategy for corpora creation
  • Using the Gale Digital Scholar Lab

Contact your Academic Liaison Librarian or email researchdatasupport@sydney.edu.au for assistance.

Sydney Informatics Hub

Sydney Informatics Hub provide free introductory to advanced training courses, including courses on programming and collecting web data for research. You can see all training courses available from the Sydney Informatics Hub on their website.