Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I generate multi-word terms recursively?

Say I have a string of words: 'a b c d e f'. I want to generate a list of multi-word terms from this string.

Word order matters. The term 'f e d' shouldn't be generated from the above example.

Edit: Also, words should not be skipped. 'a c', or 'b d f' shouldn't be generated.

What I have right now:

doc = 'a b c d e f'
terms= []
one_before = None
two_before = None
for word in doc.split(None):
    terms.append(word)
    if one_before:
        terms.append(' '.join([one_before, word]))
    if two_before:
        terms.append(' '.join([two_before, one_before, word]))
    two_before = one_before
    one_before = word

for term in terms:
    print term

Prints:

a
b
a b
c
b c
a b c
d
c d
b c d
e
d e
c d e
f
e f
d e f

How would I make this a recursive function so that I can pass it a variable maximum number of words per term?

Application:

I'll be using this to generate multi-word terms from readable text in HTML documents. The overall goal is a latent semantic analysis of a large corpus (about two million documents). This is why keeping word order matters (Natural Language Processing and whatnot).

like image 240
tgray Avatar asked Mar 31 '09 19:03

tgray


People also ask

Is it possible to have multiple recursive cases?

A recursive implementation may have more than one base case, or more than one recursive step. For example, the Fibonacci function has two base cases, n=0 and n=1.

How do you create a recursive function?

Writing a recursive function is almost the same as reading one: Create a regular function with a base case that can be reached with its parameters. Pass arguments into the function that immediately trigger the base case. Pass the next arguments that trigger the recursive call just once.

What is recursive example?

A classic example of recursionThe factorial of a number is computed as that number times all of the numbers below it up to and including 1. For example, factorial(5) is the same as 5*4*3*2*1 , and factorial(3) is 3*2*1 .

Can generators be recursive?

Just like regular functions, generators may be recursive, which means they can call themselves inside their function body.


1 Answers

This isn't recursive, but I think it does what you want.

doc = 'a b c d e f'
words = doc.split(None)
max = 3          


for index in xrange(len(words)):    
    for n in xrange(max):
        if index + n < len(words):           
            print ' '.join(words[index:index+n+1])   

And here's a recursive solution:

def find_terms(words, max_words_per_term):       
    if len(words) == 0: return []
    return [" ".join(words[:i+1]) for i in xrange(min(len(words), max_words_per_term))] + find_terms(words[1:], max_words_per_term)


doc = 'a b c d e f'
words = doc.split(None) 
for term in find_terms(words, 3):
    print term

Here's the recursive function again, with some explaining variables and comments.

def find_terms(words, max_words_per_term):   

    # If there are no words, you've reached the end. Stop.    
    if len(words) == 0:
        return []      

    # What's the max term length you could generate from the remaining 
    # words? It's the lesser of max_words_per_term and how many words 
    # you have left.                                                         
    max_term_len = min(len(words), max_words_per_term)       

    # Find all the terms that start with the first word.
    initial_terms = [" ".join(words[:i+1]) for i in xrange(max_term_len)]

    # Here's the recursion. Find all of the terms in the list 
    # of all but the first word.
    other_terms = find_terms(words[1:], max_words_per_term)

    # Now put the two lists of terms together to get the answer.
    return initial_terms + other_terms 
like image 155
Patrick McElhaney Avatar answered Sep 29 '22 00:09

Patrick McElhaney