Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capitalization of sentences in Python

Tags:

python

I'm tinkering around with a simple code but I can't seem to get it down.

I want the user to enter a string in the prompt in the form of a sentence. For example:

hey. how are you? the c.i.a. is watching! lol. 

And it returns:

Hey. How are you? The C.I.A. Is watching! Lol.

So it's requirements are:

  1. Capitalize the first of the string if it is a letter
  2. Capitalize after every period, question mark or exclamation mark
  3. Capitalize the letter if there is a period after it and no letters before it

So far I only have

def fix_capitalization():
s = raw_input("Enter string: ")
if s[0:1] == 'a' < [char] < 'z': 
    capitalize(s)

The thought process on how i would do this is as follows

Capitalize(s) to capitalize the first letter, then go through the string and if there is a period, question mark or exclamation mark then the next letter will be capitalized. if there is a letter before the period and two characters before a period then capitalize the letter before the period.

like image 773
James Lalonde Avatar asked Nov 21 '25 16:11

James Lalonde


2 Answers

The code below matches your 3 rules. But I think your rules are not complete. The character 'i' in 'is' matches rule 2, but it shouldn't be captialized.

import re

def uppercase(matchobj):
    return matchobj.group(0).upper()

def capitalize(s):
    return re.sub('^([a-z])|[\.|\?|\!]\s*([a-z])|\s+([a-z])(?=\.)', uppercase, s)

s = """hey. how are you? the c.i.a. is watching! lol. """
print capitalize(s)

Output:

Hey. How are you? The C.I.A. Is watching! Lol. 
like image 64
Timothy Zhang Avatar answered Nov 23 '25 05:11

Timothy Zhang


This an improvement over Timothy Zhang's answer, correctly dealing with a few more cases. See the inline comments. That said, dealing with all the exceptions and oddities in capitalization is a pretty complex linguistic problem. It's probably better to use a premade solution (someone suggested the Python Natural Language Toolkit, NLTK) or to avoid this problem altogether.

import re

s1 = "hey. how are you? the c.i.a. is watching! lol."

print re.sub(r"(\A\w)|"+                  # start of string
             "(?<!\.\w)([\.?!] )\w|"+     # after a ?/!/. and a space, 
                                          # but not after an acronym
             "\w(?:\.\w)|"+               # start/middle of acronym
             "(?<=\w\.)\w",               # end of acronym
             lambda x: x.group().upper(), 
             s1)

Hey. How are you? The C.I.A. is watching! Lol.

like image 28
Junuxx Avatar answered Nov 23 '25 05:11

Junuxx