The list is constantly updated.
Strictly speaking, some of them are not corpora, but archives, databases or even dictionaries. Since some are a collection of corpora, overlaps are inevitable in my classification. Some naturally belong to different categories.
1. Generic Corpora
Corpus of Global Web-Based English (GloWbE): http://corpus2.byu.edu/glowbe/
COCA:http://www.americancorpus.org/ http://www.wordandphrase.info/frequencyList.asp
COHA: http://corpus.byu.edu/coha/
Download N-Grams from COCA and COHA: http://www.ngrams.info/
BYU-TIME:http://corpus.byu.edu/time/
Bank of English (BoE): http://www.collinslanguage.com/wordbanks/ 1 month free trial
British National Corpus(BNC)
BYU-BNC:http://corpus.byu.edu/bnc/
SlopeQ BNC: http://pelcra.pl/res
JustTheWord:http://www.just-the-word.com/
BNCweb:http://bncweb.lancs.ac.uk/bncwebSignup/user/login.php
BNC Sampler:
http://cqpweb.lancs.ac.uk/bncsampler/ contact: a.hardie@lancaster.ac.uk
Lexchecker: http://www.lexchecker.org/
IWILL Corpus: http://research.iwillnow.org/project/bncrce/default.htm
Wordneighbors: http://wordneighbors.ust.hk/
Leeds Corpora: http://corpus.leeds.ac.uk/protected/query.htmlSketch Engine Corpora: http://www.sketchengine.co.uk/ 1 month free trial
BNC, Ten Ten Corpus and more
Phrases in English (PIE): http://phrasesinenglish.org/
A. Search the BNC for concordances: http://phrasesinenglish.org/searchBNC.html
B. N-Grams: http://phrasesinenglish.org/explore.html
C. Phrase Frames: http://phrasesinenglish.org/explorep.html
D. POS-Grams: http://phrasesinenglish.org/explorepg.html
E. Chargrams: http://phrasesinenglish.org/explorec.html
Corpuseye: http://corp.hum.sdu.dk/cqp.en.html
Lextutor: http://www.lextutor.ca/concordancers/concord_e.html
BNC Simple Search: http://www.natcorp.ox.ac.uk/
Strathy Corpus Of Canadian English: http://corpus2.byu.edu/can/
Australian National Corpus (ANC): http://www.ausnc.org.au/corpora/ausnc
Scottish Corpus Of Texts and Speech (SCOTS):http://www.scottishcorpus.ac.uk/
Corpus Of Modern Scottish Writing (CMSW):http://www.scottishcorpus.ac.uk/cmsw/
Corpus of Electronic Texts (CELT): http://www.ucc.ie/celt/
Open American National Corpus:http://www.anc.org/da
Brown/Lob Corpus: 完整版
http://ec-concord.ied.edu.hk/paraconc/monoconcE.htm
http://vlc.polyu.edu.hk/concordance/WWWconcappE.htm
Brown Corpus: http://124.193.83.252/cqp/brown1/ ID: test password: test
Brown Corpus: 完整版
http://the.sketchengine.co.uk/open/corpus/brown/ske/first_form
Brown Corpus: All Brown 15 sublists
http://www.lextutor.ca/range/range_corpus/
On
http://211.86.103.26/linwei_web/application/conc/index.php
CLOB Corpus: http://124.193.83.252/cqp/clob/ name: test password: test
Crown Corpus: http://124.193.83.252/cqp/crown/ name: test password: test
American English 2006 (AmE06): http://cqpweb.lancs.ac.uk/ame06/ contact: a.hardie@lancaster.ac.uk
British English 2006 (BE06): http://cqpweb.lancs.ac.uk/be2006/ contact: a.hardie@lancaster.ac.uk
National Taiwan Normal University Corpora:
http://llrc.eng.ntnu.edu.tw/English/search/Default.htm
International Corpus of English (ICE): http://ice-corpora.net/ice/avail.htm
http://nltk.googlecode.com/svn/trunk/nltk_da
Frown Corpus(download): http://ishare.iask.sina.com.cn/f/11862592.html?from=like
Flob Corpus(download): http://ishare.iask.sina.com.cn/f/11862602.html?from=like
http://www.fleric.org.cn/powerconc/
2. Parallel/Comparable CorporaChinese-English and English-Chinese
General
卢伟:http://www.luweixmu.com/ec-corpus/index.htm
至善:http://www.superfection.com/
Babel:http://corpus.nie.edu.sg/cgi-bin/babel/paraconc.pl
CEO:http://www.fleric.org.cn/ceo/
HKIED: http://ec-concord.ied.edu.hk/paraconc/index.htm
Novels
Hong Lou Meng 红楼梦: http://corpus.usx.edu.cn/hongloumeng/index.asp
Hong Lou Meng 红楼梦: http://corpus.nie.edu.sg/hlm/index.htm#
Hong Lou Meng 红楼梦: http://www.superfection.com/ click on art
三国演义: http://corpus.usx.edu.cn/sanguo/index.asp
西游记: http://corpus.usx.edu.cn/xiyouji/index.asp
鲁迅小说:http://corpus.usx.edu.cn/luxun/index.asp
水浒传:http://corpus.usx.edu.cn/shuihu/index.asp
西厢记:http://corpus.usx.edu.cn/xixiangji/index.asp
Great men’s works
邓小平文选: http://corpus.usx.edu.cn/dengxiaoping/index.asp
毛泽东选集: http://corpus.usx.edu.cn/maozedong/index.asp
Classics
大学:http://corpus.usx.edu.cn/daxue/index.asp
老子:http://corpus.usx.edu.cn/laozi/index.asp
易经:http://corpus.usx.edu.cn/yijing/index.asp
Law
中国法律法规(mainland China):
http://corpus.usx.edu.cn/lawcorpus1/index.asp
中国法律法规(mainland China):
http://corpus.nie.edu.sg/law/index.htm
中国法律法规(Taiwan):
http://corpus.usx.edu.cn/lawcorpus2/index.asp
中国法律法规(Hong Kong):
http://corpus.usx.edu.cn/lawcorpus3/index.asp
Hong Kong Hansard:
http://langbank.engl.polyu.edu.hk/Concordance/ParallelTexts/default.htm
Record of HK Legislative Council: http://candle.fl.nthu.edu.tw/totalrecall/totalrecall/totalrecall.aspx
Hong Kong News & Law: http://candle.fl.nthu.edu.tw/collocation/webform2.aspx
Miscellaneous
Corpus of Newspaper Advertisements: http://corpus.nie.edu.sg/ads/index.htm
Parallel Corpus of Political Speeches (CSLG) 汉英政治平行语料库: http://pcpt.cslg.cn/
Ted Speeches:Chinese-English http://124.193.83.252/cqp/tedctoe/
English-Chinese http://124.193.83.252/cqp/tedetoc2/
英汉双语语料汇集:
http://corpus.usx.edu.cn/lawcorpus4/index.asp
Jukuu:http://www.jukuu.com/
Bing Dict: http://dict.bing.com.cn
Dict: http://dict.cn/
生物医药专业英汉双语句库: http://dict.bioon.com/sentence/
English-Non-Chinese
Open Parallel Corpus (OPUS): http://opus.lingfil.uu.se/
Korean/English Parallel Concordancer (MOA):
http://arts.monash.edu.au/korean/moa/show.php
KAIST Corpus: http://semanticweb.kaist.ac.kr/home/index.php/KAIST_Corpus
COMPARA: Parallel Corpus of English and Portuguese:
http://www.linguateca.pt/COMPARA/Welcome.html
Technical Scientific Corpus in English and Portuguese:
http://www.nilc.icmc.usp.br/cortec/ibusca.php
English-Japanese Parallel Corpora:
http://www.manythings.org/corpus/
CLUVI Parallel Corpus: http://sli.uvigo.es/CLUVI/index_en.html#correo
Polish-English Parallel Corpora: http://pelcra.pl/res
MSC Concordancer: http://multisemcor.fbk.eu/frameset2.php
RC-Acquis corpus:http://langtech.jrc.it/JRC-Acquis.html
A Six-Language Parallel Corpus:http://www.uncorpora.org/
The Unbound Bible: http://unbound.biola.edu/
European Parliament Proceedings Parallel Corpus 1996-2006:
http://www.statmt.org/europarl/
English-Russian Parallel Corpus:http://www.ruscorpora.ru/search-para.htmlEnglish-Inuktitut Parallel Corpushttp://www.inuktitutcomputing.ca/NunavutHansard/en/index.htmlEVROKORPUS Parallel Corpora
http://evrokorpus.gov.si/index.php?jezik=angl
WebTCE (Translation Corpus Explorer)
http://khnt.hit.uib.no/webtce.htm
German(-English) parallel corpora (Europarl and German News)
http://corpus.leeds.ac.uk/paraquery.html
German-English Address
Corpus: http://www.nlpado.de/~sebastian/da
Lextutor: http://www.lextutor.ca/concordancers/
Natura corpora:http://linguateca.di.uminho.pt/nat/nat.plMyMemories: http://mymemory.translated.net/
Termsearch
http://www.bible-study-in-geneva.info/termsearch/
Linguee: http://www.linguee.com/
WeBiText: http://webitext.com/Linguatools: http://www.linguatools.de/Slovene-English Parallel Corpus:http://nl.ijs.si/elan/TEP: Tehran English-Persian Parallel Corpus:http://ece.ut.ac.ir/NLP/resources.htm
Japanese-English Corpus of Presentations in Science and Engineering:
http://www.jecprese.sci.waseda.ac.jp/index.aspx
Corpus of Multilingual Texts (Little Prince): http://langbank.engl.polyu.edu.hk/corpus/little_prince.html
3. Business and Financial Corpora
Corpus of Business Correspondence:
http://langbank.engl.polyu.edu.hk/corpus/business_correspondence.html
PolyU Business Corpus:
http://langbank.engl.polyu.edu.hk/engine.aspx?Submit=Search&lang=1&corpus=30
Hong Kong Financial Services Corpus:
http://langbank.engl.polyu.edu.hk/hkfsc/
Learner Corpus of English for Business Communication:
http://langbank.engl.polyu.edu.hk/engine.aspx?Submit=Search&lang=1&corpus=15
SCMP Corpus of Business Reports: http://langbank.engl.polyu.edu.hk/engine.aspx?Submit=Search&lang=1&corpus=32
Business Letter Corpus:
http://www.someya-net.com/concordancer/
4. Literary/Historical Corpora
COHA:http://corpus.byu.edu/coha/
Hong Lou Meng Corpus: http://124.193.83.252/cqp/hlmyangs/ name: test password: test
On
CAPA (contemporary American Poetry Archive):
SETIS Australian Literary and Historical Texts:
http://setis.library.usyd.edu.au/oztexts/search.html
Corpus of Middle English Prose and Verse:
http://quod.lib.umich.edu/c/cme/
Web Concordances:
http://www.concordancesoftware.co.uk/webconcordances/
Chaucer:
http://www.umm.maine.edu/faculty/necastro/chaucer/concordance/
Sherlockian: http://www.sherlockian.net/
Dickens: http://124.193.83.252/cqp/dickens/ name: test password:test
The Complete Corpus of Anglo-Saxon Poetry:
http://www.sacred-texts.com/neu/ascp/
Bartleby: http://www.bartleby.com/
Internet Classics Archive: http://classics.mit.edu/index.html
Concordance of Shakespeare's complete works:http://www.opensourceshakespeare.org/concordance/
Shakespeare's Words: http://www.shakespeareswords.com/
Shakespeare's Sonnets Corpus:
http://www.luweixmu.com/ecorpus/sonnets/framconc.asp
Alex Catalogue of Electronic Texts:http://infomotions.com/alex/
Corpus of Electronic Texts (CEIT):
http://www.ucc.ie/celt/search.html
Modern English Collection at the University of Virginia
Electronic Text Center:
http://etext.lib.virginia.edu/etcbin.../modengpub.o2w
OED: www.oed.com user’s name:Coastline password:Oed789
Middle English Dictionary: http://quod.lib.umich.edu/m/med/
MEMEM (Michigan Early Modern English Materials):
http://www.hti.umich.edu/m/memem/
American Civil War Collection at the Electronic Text Center:
http://etext.virginia.edu/civilwar/#Letters
Zurich English Newspaper Corpus:http://www.helsinki.fi/varieng/CoRD/corpora/ZEN/index.html
Corpus of Late Modern English Texts: https://perswww.kuleuven.be/~u0044428/
卢伟: http://www.luweixmu.com/ecorpus/index.htm
转发至微博
转发至微博
评论