ImageNet is a popular data set produced by researchers at Stanford and Princeton that contains 14 million images grouped by nouns in synonym sets such as “kid, child,” “woman, adult female,” “office, business office.” So I wrote about this back at the Outline and I’m just going to read from this story, which is from 2017 so, I’m going to quote myself here.Īdrianne: “As machine learning research accelerates, scientists have started pooling their resources. It’s like this essential distillation of humans and computers trying to communicate with each other and I just think it’s really lovely. How did this happen? Was it in the Reuters database? Or did the academics they worked with introduced it? Why blah blah blah instead of just leaving the article body blank?Īdrianne: I am very into this because I love machine learning training datasets. For some reason, some articles in the dataset have article bodies containing only the words, blah blah blah. It’s very popular for machine learning research because it’s extensive and well labeled. John: Why does Thomson Reuters newswire say “blah-blah-blah”? Reuters-21578, with a link, is a dataset containing Reuters and newswire items, short businessy headlines and descriptions from 1987. John: Here it is, a website contact form message from Jess. 30:22 – Construe-TIS: A System for Content-Based Indexing of a Database of News Stories (Phil Hayes and Steven Weinstein)Īdrianne: We got an email from a listener and I called dibs on it, but I think everyone read it anyway.Īdrianne: Does someone want to read this email?.8:46 – This is what the blahs look like and this is what all the entries look like.A famous Reuters dataset from the 1980s includes “Blah blah blah.” in place of some stories.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |