Summary
A famous dataset of Reuters articles from the 1980s includes “Blah blah blah.” in place of some stories. Why?
Show notes
- The link Jess sent
- SGML
- This is what the blahs look like and this is what all the entries look like.
- FTP
- Linguistic Data Consortium
- RCV1 at NIST and David D. Lewis’s README
- Construe-TIS: A System for Content-Based Indexing of a Database of News Stories (Phil Hayes and Steven Weinstein)