I noticed that when using curl to get content from github using this format:
https://raw.githubusercontent.com/${org}/${repo}/${branch}/path/to/file
It will sometimes return cached/stale content. For example with this sequence of operations:
Step 3 will return the same content as step 1 and not reflect the new commit.
How can avoid getting a stale version?
I noticed on the Github WebUI, it adds a token to the url, eg: ?token=AABCIPALAGOZX5R which presumably avoids getting cached content. What's the nature of this token and how can I emulate this? Would tacking on ?token=$(date +%s) work?
Also I'm looking for a way to avoid the stale content without having to switch to a commit hash in the url, since it will require more changes. However, if that's the only way to achieve it, then I'll go that route.
According to my tests, raw addresses impose a cache of about 5 minutes per IP, which cannot be bypassed in any way.
The solution I use is GitHub Page. I uploaded the project to the github.io domain, and considering that the content of the project is rebuilt every time it is changed, those changes are immediately applied to the desired address. Also my URL is cleaner too. The only build of the project, which will be after every change, and of course it is done automatically, it may be time-consuming in about 30 to 60 seconds, which does not cause any problems.
GitHub caches this data because otherwise frequently requested files would involve serving a request to the backend service each time and this is more expensive than serving a cached copy. Using a CDN provides improved performance and speed. You cannot bypass it.
The token you're seeing in the URL is a temporary token that is issued for the logged-in user. You cannot use a random token, since that won't pass authentication.
If you need the version of that file in a specific commit, then you'll need to explicitly specify that commit. However, do be aware that you should not do this with some sort of large-scale automated process as a way to bypass caching. For example, you should not try to do this to always get the latest version of a file for the purposes of a program you're distributing or multiple instances of a service you're running. You should provide that data yourself, using a CDN if necessary. That way, you can decide for yourself when the cache needs to be expired and get both good performance and the very latest data.
If you run such a process anyway, you may cause an outage or overload, and your repository or account may be suspended or blocked.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With