Corpora of English
Contact
Corpora of English
Quick link:
A comprehensive list of bookmarks for corpora, corpus linguistics and corpus tools
→ Corpus-based Linguistics Links
The Santa Barbara Corpus of Spoken American English (SBCSAE)
- Size: 249,000 words
- Variety: American English
- Medium: Spoken English
- Genres: Naturally occurring mostly informal face-to-face conversations
- Sampling period: 1990s
The International Corpus of English (ICE)
- Size: 1 million words
- Variety: Parallel corpus series for over 20 varieties of English
- Medium: Spoken (600,000 words) and written English (400,000 words)
- Genres: Naturally occurring mostly informal face-to-face conversations
- Sampling period: 1991-present
The British National Corpus (BNC)
- Size: 100 million words
- Variety: British English
- Medium: 90% written and 10% spoken language
- Genres: Varied (with the same structure for every variety)
- Sampling period: 1970s - 1993
The Vienna-Oxford International Corpus of English (VOICE)
- Size: 1 million words
- Variety: English spoken as a lingua franca by speakers of more than 50 L1s
- Medium: Spoken English
- Genres: Naturally occurring, non-scripted face-to-face interactions
- Sampling period: 2001-2007
The Corpus of Contemporary American English (COCA)
- Size: 400 million words
- Variety: American English
- Medium: Written English
- Genres: Evenly balanced by genre (fiction, magazine, newspaper, other non-fiction)
- Sampling period: 1810-2009
The Corpus of Historical American English (COHA)
- Size: 560 million words (still growing)
- Variety: American English
- Medium: Written and spoken English
- Genres: Evenly balanced by genre (spoken, fiction, popular magazines, newspapers, and academic texts)
- Sampling period: 1990-2017
The TIME Magazine Corpus
- Size: More than 100 million words
- Variety: American English
- Medium: Written English
- Genres: Articles from the Time Magazine
- Sampling period: 1923-2006
The Michigan Corpus of Academic Spoken English (MICASE)
- Size: 1.8 million words
- Variety: American English
- Medium: Spoken English
- Genres: Academic English (Lectures, seminars, councelling sessions, etc.)
- Sampling period: 1997-2001