A product I'm helping to develop will basically work like this:
<script>
from our server.<script>
gathers the text content of the page and sends it to our server via a POST request (cross-domain, using a <form>
inside of an <iframe>
).<script>
on the page to gather and POST the text content again.The problem is that this system seems inherently insecure. In theory, anyone could spoof the HTTP POST request (including the referer header, so we couldn't just check for that) that sends a page's content to our server. This could include any text content, which we would then use to generate the related content links for that page.
The primary difficulty in making this secure is that our JavaScript is publicly visible. We can't use any kind of private key or other cryptic identifier or pattern because that won't be secret.
Ideally, we need a method that somehow verifies that a POST request corresponding to a particular Web page is authentic. We can't just scrape the Web page and compare the content with what's been POSTed, since the purpose of having JavaScript submit the content is that it may be behind a login system.
Any ideas? I hope I've explained the problem well enough. Thanks in advance for any suggestions.
There is no smoking gun for this. However, where big guns don't exist major annoyance can. Hackers like a challenge, but they prefer an easy target. Be annoying enough that they give up.
Google and others do this effectively with ad words. Create an api token and have them send that. Have a "verification" process for sites using your script that requires the registrant for this script to allow their site to be profiled prior to the use of the script. You can then collect every bit of information about the server in question and if the server profile does not match the one on record, can the request.
Get everything you can know about the browser and client and create a profile for it. If there is any chance it's browser spoofing, drop the request. If the profile repeats but the cookie is gone ignore the input. If you get more than one request from the token in a short period (i.e. rapid page refreshes inherent with hack attempts) ignore the request.
Then go one step further and ping the actual domain to verify that it exists and is an authorized domain. Even if the page is behind a login the domain will still respond. This in itself won't stop hackers, but it is done server side and therefore hidden.
Also, you might consider profiling the content for a page. If a site dedicated to kitchen utensils starts sending back content for adult dating, raise a red flag.
Lastly, when a bad request comes in that you've profiled as a bad request, send the JSONP from what would be a good request for that page based on data you know is good (a 24 hour old version of the page etc.). Don't tell the hacker you know they are there. Act as if everything is fine. It will take them quite awhile to figure that one out!
None of these ideas fulfills the exact needs of your question, but hopefully it will inspire some insidious and creative thinking on your part.
How about this? - the <script/>
tag that a third party sites includes has a dynamic src
attribute. So, instead of loading some static Javascript resource, it comes to your server, generates a unique key as an identifier for the website and sends it back in the JS response. You save the same key in user-session or your database. The form created and submitted by this JS code will submit this key parameter too. Your backend will reject any POST request which does not have a matching key with the one in your db/session.
Give people keys on a per-domain basis.
Make people include in the requests the hash the value of the [key string + request parameters]. (The hash value should be computed on the server)
When they send you the request, you, knowing the parameters and the key, can verify the validity.
The primary weakness with the system as you described it is that you are "given" the page content, why not go and get the page content for yourself?
This stops malicious content from being "fed" to your server and allows you to provide some form of API key that ties requests and domains or pages together ( i.e. api key 123 only works for referrers on mydomain.com - anything else is obviously spoofed ). Due to the caching / proxy your app is protected to some degree from any form of DOS type attack as well because the page content is only processed once every time the cache TTL expires ( and now you can handle increasing loads by extending the TTL until you can bring additional processing capability on). Now your client side script is insanely small and simple - no more scraping content and posting it - just send an ajax request and maybe populate a couple of parameters ( api key / page ).
First of all, I would validate the domain (and maybe the "server profile") as suggested by others here, and obviously very strictly validate the content of the POST (as I hope you're already doing anyway).
If you make the URL for your script file point to something that's dynamically generated by your server, you can also include a time-sensitive session key to be sent along with the POST. This won't completely foil anyone, but if you're able to make the session expire quickly enough it will be a lot more difficult to exploit (and if I understand your application correctly, sessions should only need to last long enough for the user to enter something after loading a page).
After typing this, I realize it's basically what avlesh already suggested with the addition of an expiry.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With