I'm reading clipboard data coming from excel using
var stream = (System.IO.Stream) ( Forms.Clipboard.GetDataObject() ).GetData( Forms.DataFormats.CommaSeparatedValue );
,
but unfortunately, excel is passing cell text instead of cell values. When the cells are using special formatting (such as the thousands seperator), the clipboard data for a series of cells in columns that looks like this:
1,234,123.00 2,345.00 342.00 12,345.00
is stored as this:
\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"
when what I really want is this:
1234123.00, 2345.00, 342.00, 12345.00
I had been previously using the clipData.Split(new string[] { "," }, StringSllitOptions.None))
function to turn my CSV clipboard data into a series of cells, but this fails when there is escaped formatted text containing commas.
I'm asking if anyone can think of a way to split this string into a set of cells, ignoring the commas escaped within the \"
bits, since this is how Excel is choosing to escape cells containing commas.
In short, how can I turn a single string containing this:
\" 1,234,123.00 \",\" 2,345.00 \", 342.00 ,\" 12,345.00 \"
into an array of strings containing this:
{ "1,234,123.00", "2,345.00", "342.00", "12,345.00" }
Without ruining my ability to parse a simple comma delimited string.
*****edit***
Follow up question (formulated as a DFA) here: Split a string based on each time a Deterministic Finite Automata reaches a final state?
An escape delimiter is simply a sequence that is recognized and ignored during parsing. Its purpose is to allow the use of escape sequences to embed byte sequences in data that would otherwise be seen as delimiter occurrences.
The Split method extracts the substrings in this string that are delimited by one or more of the strings in the separator parameter, and returns those substrings as elements of an array.
First off I've dealt with data from Excel before and what you typically see is comma separated values and if the value is considered to be a string it will have double quotes around it (and can contain commas and double quotes). If it is considered to be numeric then there are not double quotes. Additionally if the data contains a double quote that will be delimited by a double quote like ""
. So assuming all of that here's how I've dealt with this in the past
public static IEnumerable<string> SplitExcelRow(this string value)
{
value = value.Replace("\"\"", """);
bool quoted = false;
int currStartIndex = 0;
for (int i = 0; i < value.Length; i++)
{
char currChar = value[i];
if (currChar == '"')
{
quoted = !quoted;
}
else if (currChar == ',')
{
if (!quoted)
{
yield return value.Substring(currStartIndex, i - currStartIndex)
.Trim()
.Replace("\"","")
.Replace(""","\"");
currStartIndex = i + 1;
}
}
}
yield return value.Substring(currStartIndex, value.Length - currStartIndex)
.Trim()
.Replace("\"", "")
.Replace(""", "\"");
}
Of course this assumes the data coming in is valid so if you have something like "fo,o"b,ar","bar""foo"
this will not work. Additionally if your data contains "
then it will be turned into a " which may or may not be desirable.
There are a lot of ways to do this. One inelegant way that would work is:
Now you have what you wanted in first place
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With