Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to replace first lowercase character in a line into uppercase

I have a very large file containing thousands of sentences. In all of them, the first word of each sentence begins with lowercase, but I need them to begin with uppercase.

I looked through the site trying to find a regex to do this but I was unable to. I learned a lot about regex in the process, which is always a plus for my job, but I was unable to find specifically what I am looking for.

I tried to find a way of compiling the code from several answers, including the following:

  • Convert first lowercase to uppercase and uppercase to lowercase (regex?)
  • how to change first two uppercase character to lowercase character on each line in vim
  • Regex, two uppercase characters in a string
  • Convert a char to upper case using regular expressions (EditPad Pro)

But for different reasons none of them served my purpose.

I am working with a translation-specific application which accepts regex.

Do you think this is possible at all? It would save me hours of tedious work.

like image 881
CanoEE Avatar asked Apr 17 '19 06:04

CanoEE


Video Answer


2 Answers

You can use this regex to search for the first letters of sentences:

(?<=[\.!?]\s)([a-z])

It matches a lowercase letter [a-z], following the end of a previous sentence (which might end with one of the following: [\.!?]) and a space character \s.

Then make a substitution with \U$1.

It doesn't work only for the very first sentence. I intentionally kept the regex simple, because it's easy to capitalize the very first letter manually.

Working example: https://regex101.com/r/hqwK26/1

UPD: If your software doesn't support \U, you might want to copy your text to Notepad++ and make a replacement there. The \U is fully supported, just checked.

UPD2: According to the comments, the task is slightly different, and just the first letters of each line should be capitalized.

There is a simple regex for that: ^([a-z]), with the same substitution pattern.

Here is a working example: https://regex101.com/r/hqwK26/2

like image 100
Ildar Akhmetov Avatar answered Oct 13 '22 20:10

Ildar Akhmetov


Taking Ildar's answer and combining both of his patterns should work with no compromises. (?<=[\.!?]\s)([a-z])|^([a-z]) This is basically saying, if first pattern OR second pattern. But because you're now technically extracting 2 groups instead of one, you'll have to refer to group 2 as $2. Which should be fine because only one of the patterns should be matched. So your substitution pattern would then be as follows... \U$1$2

Here's a working example, again based on Ildar's answer... https://regex101.com/r/hqwK26/13

like image 23
MShoukry Avatar answered Oct 13 '22 19:10

MShoukry