Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting an array of morphemes to a sentence in Ruby

Tags:

ruby

nlp

I want to convert an array of morphemes produced by a PTB-style tokenizer:

["The", "house", "is", "n't", "on", "fire", "."]

To a sentence:

"The house isn't on fire."

What is a sensible way to accomplish this?

like image 604
user2398029 Avatar asked Nov 04 '22 01:11

user2398029


1 Answers

If we take @sawa's advice on the apostrophe and make your array this:

["The", "house", "isn't", "on", "fire", "."]

You can get what your looking for (with punctuation support!) with this:

def sentence(array)
  str = ""
  array.each_with_index do |w, i|
    case w
    when '.', '!', '?' #Sentence enders, inserts a space too if there are more words.
      str << w
      str << ' ' unless(i == array.length-1)
    when ',', ';' #Inline separators
      str << w
      str << ' '
    when '--' #Dash
      str << ' -- '
    else #It's a word
      str << ' ' unless str[-1] == ' ' || str.length == 0
      str << w
    end
  end
  str
end
like image 82
Linuxios Avatar answered Nov 09 '22 07:11

Linuxios