Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Truncate string in Julia

Tags:

string

julia

Is there a convenience function for truncating strings to a certain length?

It would equivalent to something like this

test_str = "test"
if length(test_str) > 8
   out_str = test_str[1:8]
else
   out_str = test_str
end
like image 539
tlnagy Avatar asked Sep 15 '16 01:09

tlnagy


1 Answers

In the naive ASCII world:

truncate_ascii(s,n) = s[1:min(sizeof(s),n)]

would do. If it's preferable to share memory with original string and avoid copying SubString can be used:

truncate_ascii(s,n) = SubString(s,1,min(sizeof(s),n))

But in a Unicode world (and it is a Unicode world) this is better:

truncate_utf8(s,n) = SubString(s,1, (eo=endof(s) ; neo=0 ; 
  for i=1:n 
    if neo<eo neo=nextind(s,neo) ; else break ; end ;
  end ; neo) )

Finally, @IsmaelVenegasCastelló reminded us of grapheme complexity (arrrgh), and then this is what's needed:

function truncate_grapheme(s,n)
    eo = endof(s) ; tt = 0 ; neo=0
    for i=1:n
        if (neo<eo)
            tt = nextind(s,neo)
            while neo>0 && tt<eo && !Base.UTF8proc.isgraphemebreak(s[neo],s[tt])
                (neo,tt) = (tt,nextind(s,tt))
            end
            neo = tt
        else
            break
        end
    end
    return SubString(s,1,neo)
end

These last two implementations try to avoid calculating the length (which can be slow) or allocating/copying, or even just looping n times when the length is shorter.

This answer draws on contributions of @MichaelOhlrogge, @FengyangWang, @Oxinabox and @IsmaelVenegasCastelló

like image 105
Dan Getz Avatar answered Oct 02 '22 13:10

Dan Getz