i trying implement relation extraction between verb pairs. want use dependency path 1 verb other feature classifier (predicts if relation x exists or not). not sure how encode dependency path feature. following example dependency paths, space separated relation annotations stanfordcorenlp collapsed dependencies:
nsubj acl nmod:from acl nmod:by conj:and nsubj nmod:into nsubj acl:relcl advmod nmod:of
it important keep in mind these path of variable length , relation reappear without restriction.
two compromising ways of encoding feature come mind are:
1) ignore sequence, , have 1 feature each relation value being number of times appears in path
2) have sliding window of length n, , have 1 feature each possible pair of relations value being number of times 2 relations appeared consecutively. suppose how 1 encodes n-grams. however, number of possible relations 50, means cannot go approach.
any suggestions welcomed.
we had project built classifier based off of dependency paths. asked group member developed system, , said:
indicator feature whole path
so if have training data point (verb1 -e1-> w1 -e2-> w2 -e3-> w3 -e4-> verb2, relation1) feature (e1-e2-e3-e4)
and did ngram sequences, same data point, have (e1), (e2), (e3), (e4), (e1-e2), (e2-e3), (e3-e4), (e1-e2-e3), (e2-e3-e4)
he recommended collapsing appositive edges make paths smaller.
also, should note developed set of high precision rules each relation, , used create large set of training data.
Comments
Post a Comment