Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a very particular string structure in ruby (baseball data)

Tags:

string

regex

ruby

I'm trying to write a small baseball statistics program using data from retrosheet.org, but I'm having trouble parsing their line score data. In a game where a team does not score double digits in any particular inning, their line score would look like this 001003000 (they scored 1 run in the third inning, and 3 runs in the sixth). If, however, a team scores double digit runs, the data will look like this 00100(10)000 (1 run in the third, and 10 runs in the sixth).

For now, I'm just trying to parse out the score for each inning and put it in an array. Here's what I have so far:

scores = %w{00100300800 32004300X 00(11)34000 0000(15)000X 0000(15)000(13) 10(18)47(11)8(10)3}

scores.each do |s|
  game = []
  if s.include? "("
    # HELP!
  else 
    s.each_char { |c| game << c }
  end
  puts game.join("+")
end

I'm sure the solution involves regex, which I'm terrible at, so I've been trying all sorts of terrible string manipulation methods. In the end, I think it's going to be better to just ask for help.

So, how would you guys do this?

like image 872
bjork24 Avatar asked Mar 28 '26 16:03

bjork24


2 Answers

You can parse those with scan:

s.scan(/\(\d+\)|\d/)

For example:

>> scores = %w{00100300800 32004300X 00(11)34000 0000(15)000X 0000(15)000(13) 10(18)47(11)8(10)3}
>> scores.each { |s| puts s.scan(/\(\d+\)|\d/).inspect }
["0", "0", "1", "0", "0", "3", "0", "0", "8", "0", "0"]
["3", "2", "0", "0", "4", "3", "0", "0"]
["0", "0", "(11)", "3", "4", "0", "0", "0"]
["0", "0", "0", "0", "(15)", "0", "0", "0"]
["0", "0", "0", "0", "(15)", "0", "0", "0", "(13)"]
["1", "0", "(18)", "4", "7", "(11)", "8", "(10)", "3"]

And then just strip off the parentheses and call to_i:

s.scan(/\(\d+\)|\d/).map { |s| s[/\d+/].to_i }

For example:

>> scores.each { |s| puts s.scan(/\(\d+\)|\d/).map { |s| s[/\d+/].to_i }.inspect }
[0, 0, 1, 0, 0, 3, 0, 0, 8, 0, 0]
[3, 2, 0, 0, 4, 3, 0, 0]
[0, 0, 11, 3, 4, 0, 0, 0]
[0, 0, 0, 0, 15, 0, 0, 0]
[0, 0, 0, 0, 15, 0, 0, 0, 13]
[1, 0, 18, 4, 7, 11, 8, 10, 3]
like image 89
mu is too short Avatar answered Mar 31 '26 08:03

mu is too short


You can do something like this:

str = '00(11)34000'
str.scan(/\d{1}|\(\d{2}\)/).map { |a| a.gsub(/[()]/, '') }
# => ["0", "0", "11", "3", "4", "0", "0", "0"]

Here I get array like ["0", "0", "(11)", "3", "4", "0", "0", "0"] and remove all ( and ). I don't use complex regex here - it would be hard to read instead.

like image 20
Aliaksei Kliuchnikau Avatar answered Mar 31 '26 09:03

Aliaksei Kliuchnikau