machine learning - How to encode dependency path as a feature for classification? -


i trying implement relation extraction between verb pairs. want use dependency path 1 verb other feature classifier (predicts if relation x exists or not). not sure how encode dependency path feature. following example dependency paths, space separated relation annotations stanfordcorenlp collapsed dependencies:

nsubj acl nmod:from acl nmod:by conj:and nsubj nmod:into nsubj acl:relcl advmod nmod:of 

it important keep in mind these path of variable length , relation reappear without restriction.

two compromising ways of encoding feature come mind are:

1) ignore sequence, , have 1 feature each relation value being number of times appears in path

2) have sliding window of length n, , have 1 feature each possible pair of relations value being number of times 2 relations appeared consecutively. suppose how 1 encodes n-grams. however, number of possible relations 50, means cannot go approach.

any suggestions welcomed.

we had project built classifier based off of dependency paths. asked group member developed system, , said:

  1. indicator feature whole path

    so if have training data point (verb1 -e1-> w1 -e2-> w2 -e3-> w3 -e4-> verb2, relation1) feature (e1-e2-e3-e4)

  2. and did ngram sequences, same data point, have (e1), (e2), (e3), (e4), (e1-e2), (e2-e3), (e3-e4), (e1-e2-e3), (e2-e3-e4)

    he recommended collapsing appositive edges make paths smaller.

also, should note developed set of high precision rules each relation, , used create large set of training data.


Comments