I'm doing some web scraping, this is the format for the data
Sr.No. Course_Code Course_Name Credit Grade Attendance_Grade
The actual string that i receive is of the following form
1 CA727 PRINCIPLES OF COMPILER DESIGN 3 A M
The things that I am interested in are the Course_Code, Course_Name and the Grade, in this example the values would be
Course_Code : CA727
Course_Name : PRINCIPLES OF COMPILER DESIGN
Grade : A
Is there some way for me to use a regular expression or some other technique to easily extract this information instead of manually parsing through the string. I'm using jruby in 1.9 mode.
Let's use Ruby's named captures and a self-describing regex!
course_line = /
^ # Starting at the front of the string
(?<SrNo>\d+) # Capture one or more digits; call the result "SrNo"
\s+ # Eat some whitespace
(?<Code>\S+) # Capture all the non-whitespace you can; call it "Code"
\s+ # Eat some whitespace
(?<Name>.+\S) # Capture as much as you can
# (while letting the rest of the regex still work)
# Make sure you end with a non-whitespace character.
# Call this "Name"
\s+ # Eat some whitespace
(?<Credit>\S+) # Capture all the non-whitespace you can; call it "Credit"
\s+ # Eat some whitespace
(?<Grade>\S+) # Capture all the non-whitespace you can; call it "Grade"
\s+ # Eat some whitespace
(?<Attendance>\S+) # Capture all the non-whitespace; call it "Attendance"
$ # Make sure that we're at the end of the line now
/x
str = "1 CA727 PRINCIPLES OF COMPILER DESIGN 3 A M"
parts = str.match(course_line)
puts "
Course Code: #{parts['Code']}
Course Name: #{parts['Name']}
Grade: #{parts['Grade']}".strip
#=> Course Code: CA727
#=> Course Name: PRINCIPLES OF COMPILER DESIGN
#=> Grade: A
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With