Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to eliminate ALL line breaks in string?

I have a need to get rid of all line breaks that appear in my strings (coming from db). I do it using code below:

value.Replace("\r\n", "").Replace("\n", "").Replace("\r", "")

I can see that there's at least one character acting like line ending that survived it. The char code is 8232.

It's very lame of me, but I must say this is the first time I have a pleasure of seeing this char. It's obvious that I can just replace this char directly, but I was thinking about extending my current approach (based on replacing combinations of "\r" and "\n") to something much more solid, so it would not only include the '8232' char but also all others not-found-by-me yet.

Do you have a bullet-proof approach for such a problem?

EDIT#1:

It seems to me that there are several possible solutions:

  1. use Regex.Replace
  2. remove all chars if it's IsSeparator or IsControl
  3. replace with " " if it's IsWhiteSpace
  4. create a list of all possible line endings ( "\r\n", "\r", "\n",LF ,VT, FF, CR, CR+LF, NEL, LS, PS) and just replace them with empty string. It's a lot of replaces.

I would say that the best results will be after applying 1st and 4th approaches but I cannot decide which will be faster. Which one do you think is the most complete one?

EDIT#2

I posted anwer below.

like image 663
IamDeveloper Avatar asked Jul 19 '11 15:07

IamDeveloper


People also ask

Does trim remove line breaks?

trim method removes any line breaks from the start and end of a string. It handles all line terminator characters (LF, CR, etc). The method also removes any leading or trailing spaces or tabs. The trim() method does not change the original string, it returns a new string.

How do you remove line breaks from text in Java?

String text = readFileAsString("textfile. txt"); text. replace("\n", "");


8 Answers

Below is the extension method solving my problem. LineSeparator and ParagraphEnding can be of course defined somewhere else, as static values etc.

public static string RemoveLineEndings(this string value)
{
    if(String.IsNullOrEmpty(value))
    {
        return value;
    }
    string lineSeparator = ((char) 0x2028).ToString();
    string paragraphSeparator = ((char)0x2029).ToString();

    return value.Replace("\r\n", string.Empty)
                .Replace("\n", string.Empty)
                .Replace("\r", string.Empty)
                .Replace(lineSeparator, string.Empty)
                .Replace(paragraphSeparator, string.Empty);
}
like image 156
IamDeveloper Avatar answered Oct 25 '22 22:10

IamDeveloper


According to wikipedia, there are numerous line terminators you may need to handle (including this one you mention).

LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029

like image 29
Tremmors Avatar answered Oct 25 '22 22:10

Tremmors


8232 (0x2028) and 8233 (0x2029) are the only other ones you might want to eliminate. See the documentation for char.IsSeparator.

like image 25
Ed Bayiates Avatar answered Oct 25 '22 21:10

Ed Bayiates


Props to Yossarian on this one, I think he's right. Replace all whitespace with a single space:

data = Regex.Replace(data, @"\s+", " ");
like image 27
csharptest.net Avatar answered Oct 25 '22 22:10

csharptest.net


I'd recommend removing ALL the whitespace (char.IsWhitespace), and replacing it with single space.. IsWhiteSpace takes care of all weird unicode whitespaces.

like image 30
nothrow Avatar answered Oct 25 '22 21:10

nothrow


This is my first attempt at this, but I think this will do what you want....

var controlChars = from c in value.ToCharArray() where Char.IsControl(c) select c;
foreach (char c in controlChars)  
   value = value.Replace(c.ToString(), "");

Also, see this link for details on other methods you can use: Char Methods

like image 34
Robert Iver Avatar answered Oct 25 '22 20:10

Robert Iver


Have you tried string.Replace(Environment.NewLine, "") ? That usually gets a lot of them for me.

like image 23
Josh Avatar answered Oct 25 '22 20:10

Josh


Check out this link: http://msdn.microsoft.com/en-us/library/844skk0h.aspx

You wil lhave to play around and build a REGEX expression that works for you. But here's the skeleton...

static void Main(string[] args)
{

        StringBuilder txt = new StringBuilder();
        txt.Append("Hello \n\n\r\t\t");
        txt.Append( Convert.ToChar(8232));

        System.Console.WriteLine("Original: <" + txt.ToString() + ">");

        System.Console.WriteLine("Cleaned: <" + CleanInput(txt.ToString()) + ">");

        System.Console.Read();

    }

    static string CleanInput(string strIn)
    {
        // Replace invalid characters with empty strings.
        return Regex.Replace(strIn, @"[^\w\.@-]", ""); 
    }
like image 22
BBC Avatar answered Oct 25 '22 22:10

BBC