Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

average number of characters per word in a list

I'm new to python and i need to calculate the average number of characters per word in a list

using these definitions and helper function clean_up.

a token is a str that you get from calling the string method split on a line of a file.

a word is a non-empty token from the file that isn't completely made up of punctuation. find the "words" in a file by using str.split to find the tokens and then removing the punctuation from the words using the helper function clean_up.

A sentence is a sequence of characters that is terminated by (but doesn't include) the characters !, ?, . or the end of the file, excludes whitespace on either end, and is not empty.

This is my homework question from my computer science class in my college

the clean up function is:

def clean_up(s):
    punctuation = """!"',;:.-?)([]<>*#\n\"""
    result = s.lower().strip(punctuation)
    return result

my code is:

def average_word_length(text):
    """ (list of str) -> float

    Precondition: text is non-empty. Each str in text ends with \n and at
    least one str in text contains more than just \n.

    Return the average length of all words in text. Surrounding punctuation
    is not counted as part of the words. 


    >>> text = ['James Fennimore Cooper\n', 'Peter, Paul and Mary\n']
    >>> average_word_length(text)
    5.142857142857143 
    """

    for ch in text:
        word = ch.split()
        clean = clean_up(ch)
        average = len(clean) / len(word)
    return average

I get 5.0, but i am really confused, some help would be greatly appreciated :) PS I'm using python 3

like image 248
dev_prabh Avatar asked Dec 19 '22 17:12

dev_prabh


2 Answers

Let's clean up some of these functions with imports and generator expressions, shall we?

import string

def clean_up(s):
    # I'm assuming you REQUIRE this function as per your assignment
    # otherwise, just substitute str.strip(string.punctuation) anywhere
    # you'd otherwise call clean_up(str)
    return s.strip(string.punctuation)

def average_word_length(text):
    total_length = sum(len(clean_up(word)) for sentence in text for word in sentence.split())
    num_words = sum(len(sentence.split()) for sentence in text)
    return total_length/num_words

You may notice this actually condenses to a length and unreadable one-liner:

average = sum(len(word.strip(string.punctuation)) for sentence in text for word in sentence.split()) / sum(len(sentence.split()) for sentence in text)

It's gross and disgusting, which is why you shouldn't do it ;). Readability counts and all that.

like image 115
Adam Smith Avatar answered Jan 02 '23 11:01

Adam Smith


This is a short and sweet method to solve your problem that is still readable.

def clean_up(word, punctuation="!\"',;:.-?)([]<>*#\n\\"):
    return word.lower().strip(punctuation)  # you don't really need ".lower()"

def average_word_length(text):
    cleaned_words = [clean_up(w) for w in (w for l in text for w in l.split())]
    return sum(map(len, cleaned_words))/len(cleaned_words)  # Python2 use float

>>> average_word_length(['James Fennimore Cooper\n', 'Peter, Paul and Mary\n'])
5.142857142857143

Burden of all those preconditions falls to you.

like image 42
Inbar Rose Avatar answered Jan 02 '23 12:01

Inbar Rose