I want to convert an array of morphemes produced by a PTB-style tokenizer:
["The", "house", "is", "n't", "on", "fire", "."]
To a sentence:
"The house isn't on fire."
What is a sensible way to accomplish this?
If we take @sawa's advice on the apostrophe and make your array this:
["The", "house", "isn't", "on", "fire", "."]
You can get what your looking for (with punctuation support!) with this:
def sentence(array)
str = ""
array.each_with_index do |w, i|
case w
when '.', '!', '?' #Sentence enders, inserts a space too if there are more words.
str << w
str << ' ' unless(i == array.length-1)
when ',', ';' #Inline separators
str << w
str << ' '
when '--' #Dash
str << ' -- '
else #It's a word
str << ' ' unless str[-1] == ' ' || str.length == 0
str << w
end
end
str
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With