Contact

Corpora of English

Quick link:

A comprehensive list of bookmarks for corpora, corpus linguistics and corpus tools


Corpus-based Linguistics Links

 

The Santa Barbara Corpus of Spoken American English (SBCSAE)

  • Size: 249,000 words
  • Variety: American English
  • Medium: Spoken English
  • Genres: Naturally occurring mostly informal face-to-face conversations
  • Sampling period: 1990s

The International Corpus of English (ICE)

  • Size: 1 million words
  • Variety: Parallel corpus series for over 20 varieties of English
  • Medium: Spoken (600,000 words) and written English (400,000 words)
  • Genres: Naturally occurring mostly informal face-to-face conversations
  • Sampling period: 1991-present

The British National Corpus (BNC)

  • Size: 100 million words
  • Variety: British English
  • Medium: 90% written and 10% spoken language
  • Genres: Varied (with the same structure for every variety)
  • Sampling period: 1970s - 1993

The Vienna-Oxford International Corpus of English (VOICE)

  • Size: 1 million words
  • Variety: English spoken as a lingua franca by speakers of more than 50 L1s
  • Medium: Spoken English
  • Genres: Naturally occurring, non-scripted face-to-face interactions
  • Sampling period: 2001-2007

The Corpus of Contemporary American English (COCA)

  • Size: 400 million words
  • Variety: American English
  • Medium: Written English
  • Genres: Evenly balanced by genre (fiction, magazine, newspaper, other non-fiction)
  • Sampling period: 1810-2009

The Corpus of Historical American English (COHA)

  • Size: 560 million words (still growing)
  • Variety: American English
  • Medium: Written and spoken English
  • Genres: Evenly balanced by genre (spoken, fiction, popular magazines, newspapers, and academic texts)
  • Sampling period: 1990-2017

The TIME Magazine Corpus

  • Size: More than 100 million words
  • Variety: American English
  • Medium: Written English
  • Genres: Articles from the Time Magazine
  • Sampling period: 1923-2006

The Michigan Corpus of Academic Spoken English (MICASE)

  • Size: 1.8 million words
  • Variety: American English
  • Medium: Spoken English
  • Genres: Academic English (Lectures, seminars, councelling sessions, etc.)
  • Sampling period: 1997-2001
(Changed: 20 Jun 2024)  | 
Zum Seitananfang scrollen Scroll to the top of the page