创建博客 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

jinchangge的博客

趣味大学英语

 
 
 

日志

 
 

免费的英语语料库汇总 Open English Corpora(1)  

2010-06-28 18:06:45|  分类: 语料库 |  标签: |举报 |字号 订阅

The list is constantly updated.

Strictly speaking, some of them are not corpora, but archives, databases or even dictionaries. Since some are a collection of corpora, overlaps are inevitable in my classification. Some naturally belong to different categories.

1. Large/Generic Corpora

Corpus of Global Web-Based English (GloWbE): http://corpus2.byu.edu/glowbe/

CORPUS OF CONTEMPORARY AMERICAN ENGLISH

(COCA)http://www.americancorpus.org/   http://www.wordandphrase.info/frequencyList.asp 

CORPUS OF HISTORICAL AMERICAN ENGLISH (COHA): http://corpus.byu.edu/coha/

Download N-Grams from COCA and COHA: http://www.ngrams.info/

BYU-TIMEhttp://corpus.byu.edu/time/

Bank of English (BoE): http://www.collins.co.uk/page/Wordbanks+Online 1 month free trial

Oxford English Corpus (OEC): freely available to few researchers:

http://www.oxforddictionaries.com/us/words/the-oxford-english-corpus

International Corpus of English (ICE): http://128.95.69.131/itweb-ice/htdocs/Query.html

http://ice-corpora.net/ICE/INDEX.HTM


British National Corpus(BNC)

BYU-BNChttp://corpus.byu.edu/bnc/

SlopeQ BNC: http://212.191.73.200/PPHome/corpora/bnc.jsp

BNCwebhttp://bncweb.lancs.ac.uk/bncwebSignup/user/login.php

BNC Sampler:  http://cqpweb.lancs.ac.uk/bncsampler/  contact: a.hardie@lancaster.ac.uk 

                           http://www.lextutor.ca/concordancers/concord_e.html

JustTheWordhttp://www.just-the-word.com/
StringNet: http://nav.stringnet.org/

Wordneighbors http://wordneighbors.ust.hk/

Leeds Corpora: http://corpus.leeds.ac.uk/protected/query.html

Sketch Engine Corpora: http://www.sketchengine.co.uk/ 1 month free trial

BNC, Ten Ten Corpus and more

Phrases in English (PIE):  http://phrasesinenglish.org/

A. Search the BNC for concordances: http://phrasesinenglish.org/searchBNC.html

B. N-Grams: http://phrasesinenglish.org/explore.html

C. Phrase Frames: http://phrasesinenglish.org/explorep.html

D. POS-Grams: http://phrasesinenglish.org/explorepg.html

E. Chargrams: http://phrasesinenglish.org/explorec.html

Corpuseye: http://corp.hum.sdu.dk/cqp.en.html

BNC Simple Search: http://www.natcorp.ox.ac.uk/  

Audio BNC: http://www.phon.ox.ac.uk/AudioBNC


Strathy Corpus Of Canadian English: http://corpus2.byu.edu/can/

Australian National Corpus (ANC): http://www.ausnc.org.au/corpora/ausnc

Scottish Corpus Of Texts and Speech (SCOTS)http://www.scottishcorpus.ac.uk/ 

Corpus Of Modern Scottish Writing (CMSW):http://www.scottishcorpus.ac.uk/cmsw/

Open American National Corpushttp://www.anc.org/data/oanc/ngram/

                                                                  http://www.anc.org/data/oanc/download/


Corpora of the Brown Family: http://www.sketchengine.co.uk/

Brown/Lob Corpus:  完整版

http://ec-concord.ied.edu.hk/paraconc/monoconcE.htm

Brown Corpus: http://124.193.83.252/cqp/brown1/  ID: test password: test

Brown Corpus:  完整版

http://the.sketchengine.co.uk/open/corpus/brown/ske/first_form

Brown Corpus:  完整版

http://www.icorpus.net/

Brown Corpus: All Brown 15 sublists

http://www.lextutor.ca/range/range_corpus/

Online Corpus Concordancer: 

http://211.86.103.26/linwei_web/application/conc/index.php

CLOB Corpus: http://124.193.83.252/cqp/clob/  name: test  password: test 

Crown Corpus: http://124.193.83.252/cqp/crown/   name: test  password: test 


American English 2006 (AmE06): http://cqpweb.lancs.ac.uk/ame06/    contact: a.hardie@lancaster.ac.uk

British English 2006 (BE06): http://cqpweb.lancs.ac.uk/be2006/    contact: a.hardie@lancaster.ac.uk 

Birmingham Blog Corpus (BBC): http://wse1.webcorp.org.uk/cgi-bin/BLOG/index.cgi

Synchronic English Web Corpus: http://wse1.webcorp.org.uk/cgi-bin/SYN/index.cgi


National Taiwan Normal University Corpora:
http://llrc.eng.ntnu.edu.tw/English/search/Default.htm

Shanghai Jiaoda Corpus: http://corpus.sjtu.edu.cn/WebCast/   click on "guest"

NLTK Corpora: 

http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml

Frown Corpus(download): http://ishare.iask.sina.com.cn/f/11862592.html?from=like

Flob Corpus(download): http://ishare.iask.sina.com.cn/f/11862602.html?from=like

http://www.fleric.org.cn/powerconc/

2.   Parallel/Comparable Corpora

Chinese-English and English-Chinese

General 

卢伟:http://www.luweixmu.com/ec-corpus/index.htm 

至善:http://search.superfection.com/

Babelhttp://124.193.83.252/cqp/babel1/ (en-ch)ID: test  password: test

http://124.193.83.252/cqp/babel1c/ (ch-en) ID: test  password: test 

http://corpus.nie.edu.sg/cgi-bin/babel/paraconc.pl
CEO
http://www.fleric.org.cn/ceo/

HKIED: http://ec-concord.ied.edu.hk/paraconc/index.htm

Novels 

Hong Lou Meng 红楼梦: http://corpus.usx.edu.cn/hongloumeng/index.asp 

Hong Lou Meng 红楼梦http://corpus.nie.edu.sg/hlm/index.htm#

Hong Lou Meng 红楼梦: http://www.superfection.com/ click on art

三国演义: http://corpus.usx.edu.cn/sanguo/index.asp 

西游记: http://corpus.usx.edu.cn/xiyouji/index.asp 

鲁迅小说:http://corpus.usx.edu.cn/luxun/index.asp

水浒传:http://corpus.usx.edu.cn/shuihu/index.asp

西厢记:http://corpus.usx.edu.cn/xixiangji/index.asp

Great men’s works 

邓小平文选: http://corpus.usx.edu.cn/dengxiaoping/index.asp

毛泽东选集: http://corpus.usx.edu.cn/maozedong/index.asp 

Classics 

大学:http://corpus.usx.edu.cn/daxue/index.asp

老子:http://corpus.usx.edu.cn/laozi/index.asp

易经:http://corpus.usx.edu.cn/yijing/index.asp

Law 

中国法律法规(mainland China): 

http://corpus.usx.edu.cn/lawcorpus1/index.asp 

中国法律法规(mainland China): 

http://corpus.nie.edu.sg/law/index.htm

中国法律法规(Taiwan):

 http://corpus.usx.edu.cn/lawcorpus2/index.asp 

中国法律法规(Hong Kong): 

http://corpus.usx.edu.cn/lawcorpus3/index.asp 

Hong Kong Hansard:  

http://langbank.engl.polyu.edu.hk/Concordance/ParallelTexts/default.htm

http://140.122.83.190/bisearch/bisearch3.pl

Record of HK Legislative Council: http://candle.fl.nthu.edu.tw/totalrecall/totalrecall/totalrecall.aspx

Hong Kong News & Law: http://candle.fl.nthu.edu.tw/collocation/webform2.aspx


Miscellaneous  

Corpus of Newspaper Advertisements: http://corpus.nie.edu.sg/ads/index.htm

Parallel Corpus of Political Speeches (CSLG) 汉英政治平行语料库http://pcpt.cslg.cn/

Ted Speeches:Chinese-English http://124.193.83.252/cqp/tedctoe/ 

                         English-Chinese http://124.193.83.252/cqp/tedetoc2/

 

英汉双语语料汇集: 

http://corpus.usx.edu.cn/lawcorpus4/index.asp

全国公示语翻译语料库http://www.bisu.edu.cn/Item/10115.aspx

                                     

Jukuuhttp://www.jukuu.com/

Bing Dict: http://dict.bing.com.cn

Dict: http://dict.cn/

生物医药专业英汉双语句库http://dict.bioon.com/sentence/

 

English-Non-Chinese

Open Parallel Corpus (OPUS): http://opus.lingfil.uu.se/

Korean/English Parallel Concordancer (MOA):  

http://arts.monash.edu.au/korean/moa/show.php

KAIST Corpus: http://semanticweb.kaist.ac.kr/home/index.php/KAIST_Corpus

COMPARA: Parallel Corpus of English and Portuguese: 

http://www.linguateca.pt/COMPARA/Welcome.html

Technical Scientific Corpus in English and Portuguese: 

http://www.nilc.icmc.usp.br/cortec/ibusca.php

English-Japanese Parallel Corpora: 

http://www.manythings.org/corpus/

CLUVI Parallel Corpus: http://sli.uvigo.es/CLUVI/index_en.html#correo

Polish-English Parallel Corpora: http://pelcra.pl/res

MSC Concordancer: http://multisemcor.fbk.eu/frameset2.php

RC-Acquis corpushttp://langtech.jrc.it/JRC-Acquis.html

A Six-Language Parallel Corpushttp://www.uncorpora.org/

The Unbound Bible: http://unbound.biola.edu/

European Parliament Proceedings Parallel Corpus 1996-2006:

http://www.statmt.org/europarl/

English-Russian Parallel Corpus
http://www.ruscorpora.ru/search-para.html
English-Inuktitut Parallel Corpus
http://www.inuktitutcomputing.ca/NunavutHansard/en/index.html
EVROKORPUS Parallel Corpora

http://evrokorpus.gov.si/index.php?jezik=angl

WebTCE (Translation Corpus Explorer)

http://khnt.hit.uib.no/webtce.htm

German(-English) parallel corpora (Europarl and German News)

http://corpus.leeds.ac.uk/paraquery.html

German-English Address Corpus: http://www.nlpado.de/~sebastian/data/tv_data.shtml

Lextutor: http://www.lextutor.ca/concordancers/

Natura corpora:http://linguateca.di.uminho.pt/nat/nat.pl
MyMemories: http://mymemory.translated.net/

Termsearch

http://www.bible-study-in-geneva.info/termsearch/

Lingueehttp://www.linguee.com/

WeBiText: http://webitext.com/
Linguatools: http://www.linguatools.de/
Slovene-English Parallel Corpus:
http://nl.ijs.si/elan/
TEP: Tehran English-Persian Parallel Corpus:
http://ece.ut.ac.ir/NLP/resources.htm

Japanese-English Corpus of Presentations in Science and Engineering:

http://www.jecprese.sci.waseda.ac.jp/index.aspx

Corpus of Multilingual Texts (Little Prince): http://langbank.engl.polyu.edu.hk/corpus/little_prince.html

 

3. Business and Financial Corpora

Corpus of Business Correspondence:

http://langbank.engl.polyu.edu.hk/corpus/business_correspondence.html

PolyU Business Corpus: 

http://langbank.engl.polyu.edu.hk/engine.aspx?Submit=Search&lang=1&corpus=30

Hong Kong Financial Services Corpus: 

http://langbank.engl.polyu.edu.hk/hkfsc/

Learner Corpus of English for Business Communication:

http://langbank.engl.polyu.edu.hk/engine.aspx?Submit=Search&lang=1&corpus=15

SCMP Corpus of Business Reports: http://langbank.engl.polyu.edu.hk/engine.aspx?Submit=Search&lang=1&corpus=32

Business Letter Corpus: 

http://www.someya-net.com/concordancer/


4. Literary/Historical Corpora

COHAhttp://corpus.byu.edu/coha/

Hong Lou Meng Corpus: http://124.193.83.252/cqp/hlmyangs/ name: test password: test

Online Corpus of Old English Poetry

http://www.oepoetry.ca/

CAPA (contemporary American Poetry Archive): 

http://capa.conncoll.edu/

SETIS Australian Literary and Historical Texts: 

http://setis.library.usyd.edu.au/oztexts/search.html

Corpus of Middle English Prose and Verse: 

http://quod.lib.umich.edu/c/cme/

Web Concordances: 

http://www.concordancesoftware.co.uk/webconcordances/

Chaucer: 

http://www.umm.maine.edu/faculty/necastro/chaucer/concordance/
Sherlockian: http://www.sherlockian.net/

Dickens: http://124.193.83.252/cqp/dickens/ name: test  password:test

The Complete Corpus of Anglo-Saxon Poetry: 

http://www.sacred-texts.com/neu/ascp/

Bartleby: http://www.bartleby.com/

Internet Classics Archive: http://classics.mit.edu/index.html

Concordance of Shakespeare's complete workshttp://www.opensourceshakespeare.org/concordance/

Shakespeare's Words: http://www.shakespeareswords.com/

Shakespeare's Sonnets Corpus: 

http://www.luweixmu.com/ecorpus/sonnets/framconc.asp

Alex Catalogue of Electronic Textshttp://infomotions.com/alex/

Corpus of Electronic Texts (CEIT):

 http://www.ucc.ie/celt/search.html

Modern English Collection at the University of Virginia Electronic Text Center
http://etext.lib.virginia.edu/etcbin.../modengpub.o2w

OED: www.oed.com  users nameCoastline  passwordOed789

Middle English Dictionary: http://quod.lib.umich.edu/m/med/

MEMEM (Michigan Early Modern English Materials):

http://www.hti.umich.edu/m/memem/

American Civil War Collection at the Electronic Text Center

http://etext.virginia.edu/civilwar/#Letters

Zurich English Newspaper  Corpushttp://www.helsinki.fi/varieng/CoRD/corpora/ZEN/index.html

Corpus of Late Modern English Texts: https://perswww.kuleuven.be/~u0044428/

卢伟: http://www.luweixmu.com/ecorpus/index.htm
The Arabian Nights:  http://cqpweb.lancs.ac.uk/aldine1001/  contact: a.hardie@lancaster.ac.uk
                                    http://cqpweb.lancs.ac.uk/burton1001/  contact: a.hardie@lancaster.ac.uk
  评论这张
 
阅读(2406)| 评论(0)
推荐 转载

历史上的今天

最近读者

热度

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2014