My Html file is not having any classes . I am trying to get the no. from the plain Html
<html>
<head></head>
<body>
PO Number : [4587958]
</body>
</html>
I am able to find out the PO Number test by using
require 'rubygems'
require 'nokogiri'
PAGE_URL = "a.html"
page = Nokogiri::HTML(open(PAGE_URL))
data = page.css("body").text
puts data
test = data
ponumber = test.scan('PO Number')
puts ponumber
I am not able to get the no.
You can get the number by scaning with a regexp that matches numbers:
page.css('body').text.scan(/\d+/)
# ["4587958"]
page.css('body').text.scan(/\d+/).first.to_i
# 4587958
scan
returns an array with all matches. If you have multiple numbers in your document, just choose the element you want to pick:
# Example:
# Invoice Number : [78945824] PO Number : [4587958]
page.css('body').text.scan(/\d+/)
# ["78945824", "4587958"]
page.css('body').text.scan(/\d+/)[1].to_i
# 4587958
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With