This is a collection of 7,032 English sentences rated by human annotators from Amazon Mechanical Turk for formality, informativeness, and implicature on a 1-7 Likert Scale. The sentences come from three genres - blogs, news articles, and forum threads. More details on the annotation process can be found in this paper.

The dataset is available from the link below. If you use this dataset, please cite the following work:


@article{DBLP:journals/corr/Lahiri15,
  author    = {Shibamouli Lahiri},
  title     = {{SQUINKY! A Corpus of Sentence-level Formality, Informativeness,
               and Implicature}},
  journal   = {CoRR},
  volume    = {abs/1506.02306},
  year      = {2015},
  url       = {http://arxiv.org/abs/1506.02306},
  timestamp = {Wed, 01 Jul 2015 15:10:24 +0200},
  biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/Lahiri15},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}

Link to Dataset



We also release more than 2,000 free-form comments Turkers provided as part of the annotation. We believe that these comments would help spur interesting future research in this domain. The comments are available here.

This is the only authentic version of the datasets, as of creation. Other versions, if any, are to be deemed non-authentic, even if public. - May 1, 2021.

A new version of the draft is available here (Aug 30, 2016).



UPDATE: A modified version of the dataset has been released by Ellie Pavlick and Joel Tetreault as part of their TACL 2016 paper. The dataset is hosted on Ellie's publications page, as of June 3, 2016.



Home