I've got data which looks like this...
1 TESTAAA SERNUM A DESCRIPTION
2 TESTBBB ANOTHR ANOTHER DESCRIPTION
3 TESTXXX BLAHBL
My question is, what is the most efficient way to split this data into it's smaller substrings, as there will be hundreds of lines. Also, some of the lines will be missing the last column. I tried to do regex but wasn't successful with the pattern I used for widths. The data above should break down into these fields (length of each column listed below)
{id} {firsttext} {serialhere} {description}
4 22 6 30+
Can anyone lend a hand or suggest a good regex matching pattern to extract the information?
Thanks, Simon
To split a string into fixed size chunks:Import the wrap() method from the textwrap module. Pass the string and the max width of each slice to the method. The wrap() method will split the string into a list with items of max length N.
The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.
You can split a string by each character using an empty string('') as the splitter. In the example below, we split the same message using an empty string. The result of the split will be an array containing all the characters in the message string.
Split is used to break a delimited string into substrings. You can use either a character array or a string array to specify zero or more delimiting characters or strings. If no delimiting characters are specified, the string is split at white-space characters.
Try the following regex:
(.{4})(.{22})(.{6})(.+)?
If the values are always nonempty and separated with whitespace (that is, they don't run into each other), then try something simpler like
line.Split(" ")
I would actually recommend writing a method to do this via String.Substring directly. This will likely be more efficient at giving you the exact required widths.
This would likely work (though it's untested, and purposefully does not strip the string padding):
public static string[] SplitFixedWidth(string original, bool spaceBetweenItems, params int[] widths)
{
string[] results = new string[widths.Length];
int current = 0;
for (int i = 0; i < widths.Length; ++i)
{
if (current < original.Length)
{
int len = Math.Min(original.Length - current, widths[i]);
results[i] = original.Substring(current, len);
current += widths[i] + (spaceBetweenItems ? 1 : 0);
}
else results[i] = string.Empty;
}
return results;
}
That being said, if you're reading this from a Stream
or text file directly, using TextFieldParser will allow you to read the data directly as fixed width data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With