menu ☰
menu ˟

#hardtoparse: POS tagging and parsing the twitterverse

Region:
Description:

We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging errors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a significant improvement. We analyse this improvement by examining in detail the effect of the retraining on individual dependency types.

Format:

application/pdf

Related: http://doras.dcu.ie/16484/1/Foster-AAAI11.pdf
Suggested citation:

. () #hardtoparse: POS tagging and parsing the twitterverse [Online]. Available from: http://www.thehealthwell.info/node/858295 [Accessed: 17th November 2018].

  

View your saved citations and reading lists

Contributor:


 
Click here to view all the resources gathered from this organisation's website.