I am using LaTeX and I have a problem concerning string manipulation. I want to have an operation applied to every character of a string, specifically I want to replace every character "x" with "\discretionary{}{}{}x". I want to do this because I have a long string (DNA) which I want to be able to separate at any point without hyphenation.
Thus I would like to have a command called "myDNA" that will do this for me instead of inserting manually \discretionary{}{}{} after every character.
Is this possible? I have looked around the web and there wasnt much helpful information on this topic (at least not any I could understand) and I hoped that you could help.
--edit To clarify: What I want to see in the finished document is something like this:
the dna sequence is CTAAAGAAAACAGGACGATTAGATGAGCTTGAGAAAGCCATCACCACTCA AATACTAAATGTGTTACCATACCAAGCACTTGCTCTGAAATTTGGGGACTGAGTACACCAAATACGATAG ATCAGTGGGATACAACAGGCCTTTACAGCTTCTCTGAACAAACCAGGTCTCTTGATGGTCGTCTCCAGGT ATCCCATCGAAAAGGATTGCCACATGTTATATATTGCCGATTATGGCGCTGGCCTGATCTTCACAGTCAT CATGAACTCAAGGCAATTGAAAACTGCGAATATGCTTTTAATCTTAAAAAGGATGAAGTATGTGTAAACC CTTACCACTATCAGAGAGTTGAGACACCAGTTTTGCCTCCAGTATTAGTGCCCCGACACACCGAGATCCT AACAGAACTTCCGCCTCTGGATGACTATACTCACTCCATTCCAGAAAACACTAACTTCCCAGCAGGAATT
just plain linebreaks, without any hyphens. The DNA sequence will be one long string without any spaces or anything but it can break at any point. This is why my idea was to inesert a "\discretionary{}{}{}" after every character, so that it can break at any point without inserting any hyphens.
This takes a string as an argument and calls \discretionary{}{}{}
after each character. The input string stops at the first dollar sign, so you should not use that.
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\say{#1}\discretionary{}{}{}%
\takeTheRest#2\ofTheString
\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\say#1{#1}
You’d call it like \hyphenateWholeString{CTAAAGAAAACAGGACG}.
Instead of \discretionary{}{}{} you can also try \hspace{0pt}, if you like that more (and are in a latex environment). In order to align the right margin, I think you’d need to do some more fine tuning (but see below). The effect is of course minimised by using a font of fixed width.
Revision:
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString\unskip}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\transform{#1}%
\takeTheRest#2\ofTheString\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\transform#1{#1\hskip 0pt plus 1pt}
Steve’s suggestion of using \hskip
sounds like a very good idea to me, so I made a few corrections. Note that I’ve renamed the \say
macro and made it more useful in that it now actually does the transformation. (However, if you remove the \hskip
from \transform
, you’ll also need to remove the \unskip
in the main macro definition.
Edit:
There is also the seqsplit package which seems to be made for printing DNA data or long numbers. They also bring a few options for nicer output, so maybe that is what you’re looking for…
Debilski's post is definitely a solid way to do it, although the \say
is not necessary. Here's a shorter way that makes use of some LaTeX internal shortcuts (\@gobble
and \@ifnextchar
):
\makeatletter \def\hyphenatestring#1{\xHyphen@te#1$\unskip} \def\xHyphen@te{\@ifnextchar${\@gobble}{\sw@p{\hskip 0pt plus 1pt\xHyphen@te}}} \def\sw@p#1#2{#2#1} \makeatother
Note the use of \hskip 0pt plus 1pt
instead of \discretionary
- when I tried your example I ended up with a ragged margin because there's no stretchability. The \hskip
adds some stretchable glue in between each character (and the \unskip
afterwards cancels the extra one we added). Also note the LaTeX style convention that "end user" macros are all lowercase, while internal macros have an @
in them somewhere so that users don't accidentally call them.
If you want to figure out how this works, \@gobble
just eats whatever's in front of it (in this case the $
, since that branch is only run when a $
is the next char). The main point is that \sw@p
is only given one argument in the "else" branch, so it swaps that argument with the next char (that isn't a $
). We could just as well have written \def\hyphenate#next#1{#1\hskip...\xHyphen@te}
and put that with no args in the "else" branch, but (in my opinion) \sw@p
is more general (and I'm surprised it's not in standard LaTeX already).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With