capturing complex names

Question

my data:

Caterina Guonçallvez braçeyro 
Francisco Ro[dr]í[gueJz luveyro
Johao de Miranda calçeteyro 
Lucas Martinz Mal-Cuzinhado, braçeyro 
Francisquo d[e] Arruda braçeyro 
Francisquo de Miranda braçeyro

-first name last name
-first name last name with brakets and J (brakets ocr recognition)
-first name last name with hyphen
-first name last name with particle
-first name last name with particle with brakets

Expected output

Caterina Guonçallvez
Francisco Ro[dr]í[gueJz
Johao de Miranda
Lucas Martinz Mal-Cuzinhado
Francisquo d[e] Arruda
Francisquo de Miranda

Names are begining with uppercases
The last part of the name is followed by a space (or comma with space) and a word beginning with a lowercase character like "braçeyro" or "calçeteyro" (people's jobs)

data <- readLines("clipboard" , encoding = "latin1")

What I tried:

^([a-zA-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð])\w+[A-Z ,.'-]\w+

giving
Antonio Guomez
Caterina Guon
Francisco Ro
Johao de
Francisquo d

L3viathan · Accepted Answer

The pattern (([A-Z][\w-]+|de|d$$e$$)\s?)+ returns:

'Caterina Guonçallvez '
'Francisco Ro[dr]í[gueJz '
'Johao de Miranda '
'Lucas Martinz Mal-Cuzinhado'
'Francisquo d[e] Arruda '
'Francisquo de Miranda '

This assumes you set your locale correctly.

The regex matches groups of letters (and hyphens), starting with an uppercase one, or "de", followed by an optional space. This means that you will need to strip the strings to remove trailing spaces.

edit: Proof it works in R:

> Sys.setlocale("LC_ALL","en_us.UTF-8")
> library(stringr)
> x <- "Caterina Guonçallvez braçeyro "
> str_match(x, '(([A-Z][\w-]+|de|d$$e$$)\s?)+')
     [,1]                    [,2]           [,3]         
[1,] "Caterina Guonçallvez " "Guonçallvez " "Guonçallvez"

capturing complex names

Tags:

regex

r

Wilcar

1 Answers

L3viathan

Recent Activity

Donate For Us

capturing complex names

Tags:

regex

r

Wilcar

1 Answers

L3viathan

Related questions

Recent Activity

Donate For Us