Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby String Split on "\t" loses "\n"

Tags:

arrays

split

ruby

\tTrying to split this Tab delimited data set:

171 1000    21  
269 1000    25  
389 1000    40  
1020    1-03    30  1
1058    1-03    30  1
1074    1-03    30  1
200 300     500

(for clarity: )

171\t1000\t21\t\n   
269\t1000\t25\t\n   
389\t1000\t40\t\n
1020\t1-03\t30\t1\n
1058\t1-03\t30\t1\n
1074\t1-03\t30\t1\n
200\t300\t\t500\n

a = text.split(/\n/)
a.each do |i|
  u = i.split(/\t/)
  puts u.size
end

==>
3
3
3
4
4
4
4

The \t\n combination seems to shave off the last \t, which I need for further importation. How can I get around this? Cheers

Edited: This is what I was expecting:

4
4
4
4
4
4
4
like image 644
Rich_F Avatar asked Mar 10 '23 13:03

Rich_F


1 Answers

If this is for production, you should be using the CSV class as @DmitryZ pointed out in the comments. CSV processing has a surprising number of caveats and you should not do it by hand.

But let's go through it as an exercise...


The problem is split does not keep the delimiter, and it does not keep trailing null columns. You've hit both issues.

When you run a = text.split(/\n/) then the elements of a do not have newlines.

a = [
    171\t1000\t21\t   
    269\t1000\t25\t   
    389\t1000\t40\t
    1020\t1-03\t30\t1
    1058\t1-03\t30\t1
    1074\t1-03\t30\t1
    200\t300\t\t500
]

Then, as documented in String#split, "if the limit parameter is omitted, trailing null fields are suppressed.", so u = i.split(/\t/) will ignore that last field unless you give it a limit.

If you know it's always going to be 4 fields, you can use 4.

u = i.split(/\t/, 4)

But it's probably more flexible to use -1 because "If [the limit is] negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed." so that will keep the empty fields without hard coding the number of columns in the CSV.

u = i.split(/\t/, -1)
like image 116
Schwern Avatar answered Mar 16 '23 03:03

Schwern