Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RESTful Collection Resources - idiomatic JSON representations and roundtripping

I have a collection resource called Columns. A GET with Accept: application/json can't directly return a collection, so my representation needs to nest it in a property:-

{ "propertyName": [
   { "Id": "Column1", "Description": "Description 1" },
   { "Id": "Column2", "Description": "Description 2" }
  ]
}

Questions:

  1. what is the best name to use for the identifier propertyName above? should it be:

    • d (i.e. is d an established convention or is it specific to some particular frameworks (MS WCF and MS ASP.NET AJAX ?)
    • results (i.e. is results an established convention or is it specific to some particular specifications (MS OData)?)
    • Columns (i.e. the top level property should have a clear name and it helps to disambiguate my usage of generic application/json as the Media Type)

    NB I feel pretty comfortable that there should be something wrapping it, and as pointed out by @tuespetre, XML or any other representation would force you to wrap it to some degree anyway

  2. when PUTting the content back, should the same wrapping in said property be retained [given that it's not actually necessary for security reasons and perhaps conventional JSON usage idioms might be to drop such nesting for PUT and POST given that they're not necessary to guard against scripting attacks] ?

    my gut tells me it should be symmetric as for every other representation but there may be prior art for dropping the d/*results** [assuming that's the answer to part 1]*

    ... Or should a PUT-back (or POST) drop the need for a wrapping property and just go with:-

     [
           { "Id": "Column1", "Description": "Description 1" },
           { "Id": "Column2", "Description": "Description 2" }
     ]
    
    • Where would any root-level metadata go if one wished to add that?
    • How/would a person crafting a POST Just Know that it needs to be symmetric?

EDIT: I'm specifically interested in an answer that with a reasoned rationale that specifically takes into account the impacts on client usage with JSON. For example, HAL takes care to define a binding that makes sense for both target representations.

EDIT 2: Not accepted yet, why? The answers so far don't have citations or anything that makes them stand out over me doing a search and picking something out of the top 20 hits that seem reasonable. Am I just too picky? I guess I am (or more likely I just can't ask questions properly :D). Its a bit mad that a week and 3 days even with an )admittedly measly) bonus on still only gets 123 views (from which 3 answers ain't bad)

like image 436
Ruben Bartelink Avatar asked Sep 21 '12 11:09

Ruben Bartelink


People also ask

What are the common representations for resources in REST API?

REST uses various representations to represent a resource where Text, JSON, XML. The most popular representations of resources are XML and JSON.

What is RESTful API resources?

Resources are the basic building block of a RESTful service. Examples of a resource from an online book store application include a book, an order from a store, and a collection of users. Resources are addressable by URLs and HTTP methods can perform operations on resources.

What is a REST API collection?

REST API collection resources provide access to information about a list of IBM® Streams objects of the same type. For example, you can use a collection resource to access information about a list of jobs. Collection resources can be paged, sorted, and filtered.

How do you indicate media type of the resource in REST API?

Common Media Types for RESTful APIs In the HTTP protocol, Media Types are specified with identifiers like text/html , application/json , and application/xml , which correspond to HTML, JSON, and XML respectively, the most common web formats.


1 Answers

Updated Answer

Addressing your questions (as opposed than going off on a bit of a tangent in my original answer :D), here's my opinions:

1) My main opinion on this is that I dislike d. As a client consuming the API I would find it confusing. What does it even stand for anyway? data?

The other options look good. Columns is nice because it mirrors back to the user what they requested.

If you are doing pagination, then another option might be something like page or slice as it makes it clear to the client, that they are not receiving the entire contents of the collection.

{
    "offset": 0,
    "limit": 100,
    "page" : [
        ...
    ]
}

2) TBH, I don't think it makes that much difference which way you go for this, however if it was me, I probably wouldn't bother sending back the envelope, as I don't think there is any need (see below) and why make the request structure any more complicated than it needs to be?

I think POSTing back the envelope would be odd. POST should let you add items into the collection, so why would the client need to post the envelope to do this?

PUTing the envelope back could make sense from a RESTful standpoint as it could be seen as updating metadata associated with the collection as a whole. I think it is worth thinking about the sort of meta data you will be exposing in the envelope. All the stuff I think would fit well in this envelope (like pagination, aggregations, search facets and similar meta data) is all read only, so it doesn't make sense for the client to send this back to the server. If you find yourself with a lot of data in the envelope that the client is able to mutate - then it could be a sign to break that data out into a separate resource with the list as a sub collection. Rubbish example:

/animals

{
    "farmName": "farm",
    "paging": {},
    "animals": [
        ...
    ]
}

Could be broken up into:

/farm/1

{
    "id": 1,
    "farmName": "farm"
}

and

/farm/1/animals

{
    "paging": {},
    "animals": [
        ...
    ]
}

Note: Even with this split, you could still return both combined as a single response using something like Facebook's or LinkedIn's field expansion syntax. E.g. http://example.com/api/farm/1?field=animals.offset(0).limit(10)

In response, to your question about how the client should know what the JSON payload they are POSTing and PUTing should look like - this should be reflected in your API documentation. I'm not sure if there is a better tool for this, but Swagger provides a spec that allows you to document what your request bodies should look like using JSON Schema - check out this page for how to define your schemas and this page for how to reference them as a parameter of type body. Unfortunately, Swagger doesn't visualise the request bodies in it's fancy web UI yet, but it's is open source, so you could always add something to do this.

Original Answer

Check out William's comment in the discussion thread on that page - he suggests a way to avoid the exploit altogether which means you can safely use a JSON array at the root of your response and then you need not worry about either of you questions.

The exploit you link to relies on your API using a Cookie to authenticate a user's session - just use a query string parameter instead and you remove the exploit. It's probably worth doing this anyway since using Cookies for authentication on an API isn't very RESTful - some of your clients may not be web browsers and may not want to deal with cookies.

Why Does this fix work?

The exploit is a form of CSRF attack which relies on the attacker being able to add a script tag on his/her own page to a sensitive resource on your API.

<script src="http://mysite.com/api/columns"></script> 

The victims web browser will send all Cookies stored under mysite.com to your server and to your servers this will look like a legitimate request - you will check the session_id cookie (or whatever your server-side framework calls the cookie) and see the user is authenticated. The request will look like this:

GET http://mysite.com/api/columns
Cookie: session_id=123456789;

If you change your API you ignore Cookies and use a session_id query string parameter instead, the attacker will have no way of tricking the victims web browser into sending the session_id to your API.

A valid request will now look like this:

GET http://mysite.com/api/columns?session_id=123456789

If using a JavaScript client to make the above request, you could get the session_id from a cookie. An attacker using JavaScript from another domain will not be able to do this, as you cannot get cookies for other domains (see here).

Now we have fixed the issue and are ignoring session_id cookies, the script tag on the attackers website will still send a similar request with a GET line like this:

GET http://mysite.com/api/columns

But your server will respond with a 403 Forbidden since the GET is missing the required session_id query string parameter.

What if I'm not authenticating users for this API?

If you are not authenticating users, then your data cannot be sensitive and anyone can call the URI. CSRF should be a non-issue since with no authentication, even if you prevent CSRF attacks, an attacker could just call your API server side to get your data and use it in anyway he/she wants.

like image 70
theon Avatar answered Nov 02 '22 19:11

theon