Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching multi-language (latin extended) characters in lua

I'm unable to find a way to match all extended alphabet characters without doing so explicitly. For example, matching the tag språk.

tag = "språk"
tag:match([[%w+]])

This doesn't work because å is not contained within %w. This can be matched with tag:match([[[%wå]+]]), but then you have to explicitly add all special.

One can also extend the range. This works tag:match([[[a-å]+]]), but I'm not 100% clear on why, or at least not where that range actually covers in the character table.

So what is the correct way to match a range that includes all ascii plus all latin extended?


The best solution I've come up with so far is:

tag = "språk"
tag:match([[[a-zA-ZÀ-ÿ]+]])

But I'm still unsure if that is completely correct, and it would be ideal if there is a shortcut class for this I'm simply overlooking.

like image 770
theherk Avatar asked Dec 01 '25 16:12

theherk


1 Answers

I will suggest how to make a set of some characters from additional Latin letters - 1. By analogy, you can make sets for the necessary sets (Latin Extended A,B,C,D,E).

------------------------ just generate set Latin-1 Supplement
local set = ""
for x = 0x80, 0xBF do
    set = set  ..   string.char("0xC3", string.format("0x%x",x) )  
end
print(set)
--------------------------

--- get it from print above
local ex = [[ÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]]

-- By analogy you can get Extended Latin A:
-- local ext_latin_A = [[ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſÍ]]

tag = "språk"

print("-----")
print( tag:match("[%w".. ex .."]+") )
like image 127
Mike V. Avatar answered Dec 04 '25 18:12

Mike V.