i working on ios swift project takes takes ocr data , searches text key phrases. ocr output looks this:
ingredients water, brown sugar, red ripe
tomato concentrate, apple cidervinegar
w01cestershlwsmjce(waterw4egar corn
syrup, salt, molasse, spice, natural flavor
garlic powder, caramel color, anchovies
cflsril,tamarin0), molasses, lemon juice,
onion, honey, modified tavioca starch,
when search string "corn syrup", nothing found. searching "corn" , "syrup" produce positive results.
i have tried
tesseract.recognizedtext.stringbytrimmingcharactersinset(nscharacterset.whitespaceandnewlinecharacterset())
to no avail.
any thoughts on how format text searching allow "corn syrup" identified? qualifier exact phrase useful - after there corn, corn starch, maple syrup, etc. potential ingredients.
thanks.
ok here solution worked
'textview.text = tesseract.recognizedtext.stringbyreplacingoccurrencesofstring("\n", withstring: " ", options: nsstringcompareoptions.literalsearch, range: nil)'
i thought initial code accomplishing same task.
if want search "corn syrup", need replace new lines spaces (and ideally check double spaces , replace single space).
the quality of character recognition not , think text deserve more maintenance before being used searching. might, example split phrases array of individual strings, trim spaces etc. beginning , end, perhaps use uitextchecker
identify misspelled terms , fix them...
Comments
Post a Comment