Ruby String Split on " " loses "
"

Question

Trying to split this Tab delimited data set:

171 1000    21  
269 1000    25  
389 1000    40  
1020    1-03    30  1
1058    1-03    30  1
1074    1-03    30  1
200 300     500

(for clarity: )

171	1000	21	
   
269	1000	25	
   
389	1000	40	

1020	1-03	30	1

1058	1-03	30	1

1074	1-03	30	1

200	300		500


a = text.split(/
/)
a.each do |i|
  u = i.split(/	/)
  puts u.size
end

==>
3
3
3
4
4
4
4

The combination seems to shave off the last , which I need for further importation. How can I get around this? Cheers

Edited: This is what I was expecting:

Schwern · Accepted Answer

If this is for production, you should be using the CSV class as @DmitryZ pointed out in the comments. CSV processing has a surprising number of caveats and you should not do it by hand.

But let's go through it as an exercise...

The problem is split does not keep the delimiter, and it does not keep trailing null columns. You've hit both issues.

When you run a = text.split(/ /) then the elements of a do not have newlines.

a = [
    171	1000	21	   
    269	1000	25	   
    389	1000	40	
    1020	1-03	30	1
    1058	1-03	30	1
    1074	1-03	30	1
    200	300		500
]

Then, as documented in String#split, "if the limit parameter is omitted, trailing null fields are suppressed.", so u = i.split(/ /) will ignore that last field unless you give it a limit.

If you know it's always going to be 4 fields, you can use 4.

u = i.split(/	/, 4)

But it's probably more flexible to use -1 because "If [the limit is] negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed." so that will keep the empty fields without hard coding the number of columns in the CSV.

u = i.split(/	/, -1)

Ruby String Split on "\t" loses "\n"

Tags:

arrays

split

ruby

Rich_F

1 Answers

Schwern

Recent Activity

Donate For Us