I am working on a challenging problem : finding a solution to get data after a booking process. Basically, I have a page with a form (SLIM FORM), that I need to automatically fill with informations coming from provider form (e.g. easyjet.com or hotels.com, any booking site basically). For instance : https://secure.booking.com/hotel/es/royal.html?sid=1c2bab12a0c64a541728840f52cd6401;errorc_checkin_invalid=checkin;errorc_intro_error_message_invalid=intro_error_message;errorv_stage=1;errorv_checkin=2011-07-05;errorv_hotel_id=90228;errorv_installment_count=1;errorv_hostname=www.booking.com;errorv_nr_rooms_9022801_80638194_0=1;errorv_interval=1 the information in my Booking is what i need to get.
I made some tests and here are what I found out, for now :
It's not possible to have both on the same page, because with cURL, there is no communication with the external server, and with iframes, it leaves the page ASAP the src of the iframe changes.
So, I decided that the booking process should happen on a dedicated page, in the domain of the booking provider (easyjet.com...)
1) Am I right to consider performing the booking on the real site, or is there a way to include the external website on my page and perform the whole process of booking in it (basically filling forms on departure, arrival date etc...)?
If not possible, I made some tests with cURL and came to this conclusion :
_ I will have to define fitted regex for each provider, and I am under the impression that some have mechanisms to identify cURL and block it. (e.g. lufthansa.com) But it works quite well with others ( booking.com )
I have 2 additionnal questions :
2) Are there better solutions than cURL to parse some HTML in a page (especially since it doesn't work if the URL doesn't include sessionID)? I was thinking maybe of using something like Selenium...
3) How can I trigger my cURL parsing on an other tab or window? (I was thinking about a system similar to bookmarks that can trigger some JavaScript code)
Thanks for your answers and sorry for the length :-)
Update : Based on answers I received, here are fresh thoughts : for big providers (easyjet, hotels.com etc...), I will use an API if available. For small providers (e.g. http://www.hotel-gare-clermont.com/en,1,6217.html ), I think the proxy solution is worth another one, and I won't receive any complaints on legal issues from "Hotel de la Gare", while adding visibility to those small providers. What do you think?
1) This is possible, but it has the side effect of being borderline illegal. You cannot just scape providers forms and reserve their pages in an iframe. If providers caught you doing it you would likely be sued.
What you need is a partnering agreement with the various providers. with this agreement, they would likely open up an API (Application Programming Interface) for you to use. This would allow you to more directly query their site and make bookings in a clean and approved way.
2) cURL is a great library, which does the job of fetching web pages very well. There are many examples around the internet for fetching a page to a string. In terms of parsing that string, in an ideal world you could use an XML parser. Unfortunately HTML pages are very badly constructed, which makes them difficult to parse. Most coders, when they have to parse HTML chunks tend to use regular expressions.
In order to get the session ID, your first cURL request should be to a login form on example.com. Fake the submit of a login form by trying to get http://example.com?username=bob&pass=secret. You can check for a valid login by looking for the text "successful login" or similar in the server response. You can get the session ID (if it is a cookie) from the response headers. Subsequent cURL requests should send your cookie.
3) cURL operates on server side, so has absolutely no knowledge of your tabs that are open. You could use Javascript to query tabs, but I bet most browsers will not allow you to do this for security reasons.
Sending the user direct to the provider is a much more reliable solution, because you give your user control of the process. But of course, you lose control of the process :)
Alternatively, you have to create a proxy on your server that queries the site on behalf of your user:
end-user yourdomain easyjet
| | |
|-----search----->| |
|<--booking form--| |
|---user's data-->| |
| |---forward-->|
| |<--result----|
|<--pass to user--| |
| | |
v v v
To the end-user, the booking happens with you; to easyjet/lufthansa/whoever, you appear to be a customer. The problem is, each website is different, and you will have a lot of work adapting your system to what every (or most) site requires, and as you have already noticed, airlines do not want you to take their custom. That's why many brokers' sites (kelkoo, gocompare...) started out doing what you're planning, but ended up as glorified advertising.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With