Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Latex - Apply an operation to every character in a string

Tags:

latex

I am using LaTeX and I have a problem concerning string manipulation. I want to have an operation applied to every character of a string, specifically I want to replace every character "x" with "\discretionary{}{}{}x". I want to do this because I have a long string (DNA) which I want to be able to separate at any point without hyphenation.

Thus I would like to have a command called "myDNA" that will do this for me instead of inserting manually \discretionary{}{}{} after every character.

Is this possible? I have looked around the web and there wasnt much helpful information on this topic (at least not any I could understand) and I hoped that you could help.

--edit To clarify: What I want to see in the finished document is something like this:


    the dna sequence is CTAAAGAAAACAGGACGATTAGATGAGCTTGAGAAAGCCATCACCACTCA
    AATACTAAATGTGTTACCATACCAAGCACTTGCTCTGAAATTTGGGGACTGAGTACACCAAATACGATAG
    ATCAGTGGGATACAACAGGCCTTTACAGCTTCTCTGAACAAACCAGGTCTCTTGATGGTCGTCTCCAGGT
    ATCCCATCGAAAAGGATTGCCACATGTTATATATTGCCGATTATGGCGCTGGCCTGATCTTCACAGTCAT
    CATGAACTCAAGGCAATTGAAAACTGCGAATATGCTTTTAATCTTAAAAAGGATGAAGTATGTGTAAACC
    CTTACCACTATCAGAGAGTTGAGACACCAGTTTTGCCTCCAGTATTAGTGCCCCGACACACCGAGATCCT
    AACAGAACTTCCGCCTCTGGATGACTATACTCACTCCATTCCAGAAAACACTAACTTCCCAGCAGGAATT

just plain linebreaks, without any hyphens. The DNA sequence will be one long string without any spaces or anything but it can break at any point. This is why my idea was to inesert a "\discretionary{}{}{}" after every character, so that it can break at any point without inserting any hyphens.

like image 428
hroest Avatar asked Mar 16 '10 20:03

hroest


2 Answers

This takes a string as an argument and calls \discretionary{}{}{} after each character. The input string stops at the first dollar sign, so you should not use that.

\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString}

\def\xHyphenate#1#2\wholeString {\if#1$%
\else\say{#1}\discretionary{}{}{}%
\takeTheRest#2\ofTheString
\fi}

\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}

\def\say#1{#1}

You’d call it like \hyphenateWholeString{CTAAAGAAAACAGGACG}.

Instead of \discretionary{}{}{} you can also try \hspace{0pt}, if you like that more (and are in a latex environment). In order to align the right margin, I think you’d need to do some more fine tuning (but see below). The effect is of course minimised by using a font of fixed width.

Revision:

\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString\unskip}

\def\xHyphenate#1#2\wholeString {\if#1$%
\else\transform{#1}%
\takeTheRest#2\ofTheString\fi}

\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}

\def\transform#1{#1\hskip 0pt plus 1pt}

Steve’s suggestion of using \hskip sounds like a very good idea to me, so I made a few corrections. Note that I’ve renamed the \say macro and made it more useful in that it now actually does the transformation. (However, if you remove the \hskip from \transform, you’ll also need to remove the \unskip in the main macro definition.


Edit:

There is also the seqsplit package which seems to be made for printing DNA data or long numbers. They also bring a few options for nicer output, so maybe that is what you’re looking for…

like image 97
Debilski Avatar answered Nov 02 '22 08:11

Debilski


Debilski's post is definitely a solid way to do it, although the \say is not necessary. Here's a shorter way that makes use of some LaTeX internal shortcuts (\@gobble and \@ifnextchar):

\makeatletter
\def\hyphenatestring#1{\xHyphen@te#1$\unskip}
\def\xHyphen@te{\@ifnextchar${\@gobble}{\sw@p{\hskip 0pt plus 1pt\xHyphen@te}}}
\def\sw@p#1#2{#2#1}
\makeatother

Note the use of \hskip 0pt plus 1pt instead of \discretionary - when I tried your example I ended up with a ragged margin because there's no stretchability. The \hskip adds some stretchable glue in between each character (and the \unskip afterwards cancels the extra one we added). Also note the LaTeX style convention that "end user" macros are all lowercase, while internal macros have an @ in them somewhere so that users don't accidentally call them.

If you want to figure out how this works, \@gobble just eats whatever's in front of it (in this case the $, since that branch is only run when a $ is the next char). The main point is that \sw@p is only given one argument in the "else" branch, so it swaps that argument with the next char (that isn't a $). We could just as well have written \def\hyphenate#next#1{#1\hskip...\xHyphen@te} and put that with no args in the "else" branch, but (in my opinion) \sw@p is more general (and I'm surprised it's not in standard LaTeX already).

like image 28
Steve Avatar answered Nov 02 '22 08:11

Steve