I'm parsing a file with ruby to change the data formatting. I created a regex which has three match groups that I want to temporally store in variables. I'm having trouble getting the matches to be stored as everything is nil.
Here is what I have so far from what I've read.
regex = '^"(\bhttps?://[-\w+&@#/%?=~_|$!:,.;]*[\w+&@#/%=~_|$])","(\w+|[\w._%+-]+@[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})'
begin
file = File.new("testfile.csv", "r")
while (line = file.gets)
puts line
match_array = line.scan(/regex/)
puts $&
end
file.close
end
Here is some sample data that I'm using for testing.
"https://mail.google.com","Master","password1","","https://mail.google.com","",""
"https://login.sf.org","[email protected]","password2","https://login.sf.org","","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxUsername","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxPassword"
"http://www.facebook.com","Beast","12345678","https://login.facebook.com","","email","pass"
"http://www.own3d.tv","Earth","passWOrd3","http://www.own3d.tv","","user_name","user_password"
Thank you,
LF4
This won't work:
match_array = line.scan(/regex/)
That's just using a literal "regex" string as your regular expression, not what's in your regex variable. You can either put the big ugly regex right into your scan or create a Regexp instance:
regex = Regexp.new('^"(\bhttps?://[-\w+&@#/%?=~_|$!:,.;]*[\w+&@#/%=~_|$])","(\w+|[\w._%+-]+@[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})')
# ...
match_array = line.scan(regex)
And you should probably use a CSV library (one comes with Ruby: 1.8.7 or 1.9) for parsing CSV files, then apply a regular expression to each column from the CSV. You'll run into fewer quoting and escaping issues that way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With