I'd like to recursively download JSON resources from a RESTful HTTP endpoint and store these in a local directory structure, following links to related resources in the form of JSON strings containing HTTP URLs. Wget would seem to be a likely tool for the job, though its recursive download is apparently limited to HTML hyperlinks and CSS url() references.
The resources in question are Swagger documentation files similar to this one, though in my cases all of the URLs are absolute. The Swagger schema is fairly complicated, but it would be sufficient to follow any string that looks like an absolute HTTP(S) URL. Even better would be to follow absolute or relative paths specified in 'path' properties.
Can anyone suggest a general purpose recursive crawler that would do what I want here, or a lightweight way of scripting wget or similar to achieve it?
I ended up writing a shell script to solve the problem:
API_ROOT_URL="http://petstore.swagger.wordnik.com/api/api-docs"
OUT_DIR=`pwd`
function download_json {
echo "Downloading $1 to $OUT_DIR$2.json"
curl -sS $1 | jq . > $OUT_DIR$2.json
}
download_json $API_ROOT_URL /api-index
jq -r .apis[].path $OUT_DIR/api-index.json | while read -r API_PATH; do
API_PATH=${API_PATH#$API_ROOT_URL}
download_json $API_ROOT_URL$API_PATH $API_PATH
done
This uses jq
to extract the API paths from the index file, and also to pretty print the JSON as it is downloaded. As webron mentions this will probably only be of interest to people still using the 1.x Swagger schema, though I can see myself adapting this script for other problems in the future.
One problem I've found with this for Swagger is that the order of entries in our API docs is apparently not stable. Running the script several times in a row against our API docs (generated by swagger-springmvc) results in minor changes to property orders. This can be partly fixed by sorting the JSON objects' property keys with jq's --sort-keys
option, but this doesn't cover all cases, e.g. a model schema's required
property which is a plain array of string property names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With