Using the Web as a language corpus/Economist article

The Economist currently has an article on using the Web as a language corpus. It quotes Language Log, where the whole article is given too (this link should remain functioning).

bq. Search engines, unlike the tools linguists use to analyse standard corpora, do not allow searching for a particular linguistic structure, such as “[Noun phrase] far from [verb phrase]”. This requires indirect searching via samples like “He far from succeeded”. But Philip Resnik, of the University of Maryland, has created a “Linguist’s Search Engine” (LSE) to overcome this. When trying to answer, for example, whether a certain kind of verb is generally used with a direct object, the LSE grabs a chunk of web pages (say a thousand, with perhaps a million words) that each include an example of the verb. The LSE then parses the sample, allowing the linguist to find examples of a given structure, such as the verb without an object. In short, the LSE allows a user to create and analyse a custom-made corpus within minutes.

