Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transactional batch processing with OData

Working with Web API OData, I have $batch processing working, however, the persistence to the database is not Transactional. If I include multiple requests in a Changeset in my request, and one of those items fails, the other still completes, because each separate call to the controller has it's own DbContext.

for example, if I submit a Batch with two change sets:

Batch 1 - ChangeSet 1 - - Patch valid object - - Patch invalid object - End Changeset 1 - ChangeSet 2 - - Insert Valid Object - End ChangeSet 2 End Batch

I would expect that the first valid patch would be rolled back, as the change set could not be completed in its entirety, however, since each call gets its own DbContext, the first Patch is committed, the second is not, and the insert is committed.

Is there a standard way to support transactions through a $batch request with OData?

like image 298
Gareth Suarez Avatar asked Jan 27 '14 05:01

Gareth Suarez


People also ask

What is batch processing in OData?

Batch requests allow grouping multiple operations, as described in [OData-Operations] , into a single HTTP request payload. Batch Requests are submitted as a single HTTP POST request to the $batch endpoint of a service as described in [OData-URI].

How can you group multiple operations into a single HTTP request?

You can group multiple operations into a single HTTP request using a batch operation.

What is OData client?

The OData Client library allows you to consume data from and interact with OData services from . Net apps. There are multiple layers of abstraction available when working with the library: You can use the library directly to interact with any service.


1 Answers

  1. The theory: let's make sure we're talking about the same thing.
  2. In practice: addressing the problem as far as I can (no definitive answer) could.
  3. In practice, really (update): a canonical way of implementing the backend-specific parts.
  4. Wait, does it solve my problem?: let's not forget that the implementation (3) is bound by the specification (1).
  5. Alternatively: the usual "do you really need it?" (no definitive answer).

The theory

For the record, here is what the OData spec has to say about it (emphasis mine):

All operations in a change set represent a single change unit so a service MUST successfully process and apply all the requests in the change set or else apply none of them. It is up to the service implementation to define rollback semantics to undo any requests within a change set that may have been applied before another request in that same change set failed and thereby apply this all-or-nothing requirement. The service MAY execute the requests within a change set in any order and MAY return the responses to the individual requests in any order. (...)

http://docs.oasis-open.org/odata/odata/v4.0/cos01/part1-protocol/odata-v4.0-cos01-part1-protocol.html#_Toc372793753

This is V4, which barely updates V3 regarding Batch Requests, so the same considerations apply for V3 services AFAIK.

To understand this, you need a tiny bit of background:

  • Batch requests are sets of ordered requests and change sets.
  • Change Sets themselves are atomic units of work consisting of sets of unordered requests, though these requests can only be Data Modification requests (POST, PUT, PATCH, DELETE, but not GET) or Action Invocation requests.

You might raise an eyebrow at the fact that requests within change sets are unordered, and quite frankly I don't have proper rationale to offer. The examples in the spec clearly show requests referencing each other, which would imply that an order in which to process them must be deduced. In reality, my guess is that change sets must really be thought of as single requests themselves (hence the atomic requirement) that are parsed together and are possibly collapsed into a single backend operation (depending on the backend of course). In most SQL databases, we can reasonably start a transaction and apply each subrequest in a certain order defined by their interdependencies, but for some other backends, it may be required that these subrequests be mangled and made sense of together before sending any change to the platters. This would explain why they aren't required to be applied in order (the very notion might not make sense to some backends).

An implication of that interpretation is that all your change sets must be logically consistent on their own; for example you can't have a PUT and a PATCH that touch the same properties on the same change set. This would be ambiguous. It's thus the responsibility of the client to merge operations as efficiently as possible before sending the requests to the server. This should always be possible.

(I'd love someone to confirm this.) I'm now fairly confident that this is the correct interpretation.

While this may seem like an obvious good practice, it's not generally how people think of batch processing. I stress again that all of this applies to requests within change sets, not requests and change sets within batch requests (which are ordered and work pretty much as you'd expect, minus their non-atomic/non-transactional nature).

In practice

To come back to your question, which is specific to ASP.NET Web API, it seems they claim full support of OData batch requests. More information here. It also seems true that, as you say, a new controller instance is created for each subrequest (well, I take your word for it), which in turn brings a new context and breaks the atomicity requirement. So, who's right?

Well, as you rightly point out too, if you're going to have SaveChanges calls in your handlers, no amount of framework hackery will help much. It looks like you're supposed to handle these subrequests yourself with the considerations I outlined above (looking out for inconsistent change sets). Quite obviously, you'd need to (1) detect that you're processing a subrequest that is part of a changeset (so that you can conditionally commit) and (2) keep state between invocations.

Update: See next section for how to do (2) while keeping controllers oblivious to the functionality (no need for (1)). The next two paragraphs may still be of interest if you want more context on the problems that are solved by the HttpMessageHandler solution.

I don't know if you can detect whether you're in a changeset or not (1) with the current APIs they provide. I don't know if you can force ASP.NET to keep the controller alive for (2) either. What you could do for the latter however (if you can't keep it alive) is to keep a reference to the context elsewhere (for example in some kind of session state Request.Properties) and reuse it conditionally (update: or unconditionally if you manage the transaction at a higher-level, see below). I realize this is probably not as helpful as you might have hoped, but at least now you should have the right questions to direct to the developers/documentation writers of your implementation.

Dangerously rambling: Instead of conditionally calling SaveChanges, you could conditionally create and terminate a TransactionScope for every changeset. This doesn't remove the need for (1) or (2), just another way of doing things. It sort of follows that the framework could technically implement this automatically (as long as the same controller instance can be reused), but without knowing the internals enough I wouldn't revisit my statement that the framework doesn't have enough to go with to do everything itself just yet. After all, the semantics of TransactionScope might be too specific, irrelevant or even undesired for certain backends.

Update: This is indeed what the proper way of doing things look like. The next section shows a sample implementation that uses the Entity Framework explicit transaction API instead of TransactionScope, but this has the same end-result. Although I feel there are ways to make a generic Entity Framework implementation, currently ASP.NET doesn't provide any EF-specific functionality so you're required to implement this yourself. If you ever extract your code to make it reusable, please do share it outside of the ASP.NET project if you can (or convince the ASP.NET team that they should include it in their tree).

In practice, really (update)

See snow_FFFFFF's helpful answer, which references a sample project.

To put it in context of this answer, it shows how to use an HttpMessageHandler to implement requirement #2 which I outlined above (keeping state between invocations of the controllers within a single request). This works by hooking at a higher-level than controllers, and splitting the request in multiple "subrequests", all the while keeping state oblivious to the controllers (the transactions) and even exposing state to the controllers (the Entity Framework context, in this case via HttpRequestMessage.Properties). The controllers happily process each subrequest without knowing if they are normal requests, part of a batch request, or even part of a changeset. All they need to do is use the Entity Framework context in the properties of the request instead of using their own.

Note that you actually have a lot of built-in support to achieve this. This implementation builds on top of the DefaultODataBatchHandler, which builds on top of the ODataBatchHandler code, which builds on top of the HttpBatchHandler code, which is an HttpMessageHandler. The relevant requests are explicitly routed to that handler using Routes.MapODataServiceRoute.

How does this implementation map to the theory? Quite well, actually. You can see that each subrequest is either sent to be processed as-is by the relevant controller if it is an "operation" (normal request), or handled by more specific code if it is a changeset. At this level, they are processed in order, but not atomically.

The changeset handling code however does indeed wrap each of its own subrequests in a transaction (one transaction for each changeset). While the code could at this point try to figure out the order in which to execute statements in the transaction by looking at the Content-ID headers of each subrequest to build a dependency graph, this implementation takes the more straightforward approach of requiring the client to order these subrequests in the right order itself, which is fair enough.

Wait, does it solve my problem?

If you can wrap all your operations in a single changeset, then yes, the request will be transactional. If you can't, then you must modify this implementation so that it wraps the entire batch in a single transaction. While the specification supposedly doesn't preclude this, there are obvious performance considerations to take into account. You could also add a non-standard HTTP header to flag whether you want the batch request to be transactional or not, and have your implementation act accordingly.

In any case, this wouldn't be standard, and you couldn't count on it if you ever wanted to use other OData servers in an interoperable manner. To fix this, you need to argue for optional atomic batch requests to the OData committee at OASIS.

Alternatively

If you can't find a way to branch code when processing a changeset, or you can't convince the developers to provide you with a way to do so, or you can't keep changeset-specific state in any satisfactory way, then it looks like you must [you may alternatively want to] expose a brand new HTTP resource with semantics specific to the operation you need to perform.

You probably know this, and this is likely what you're trying to avoid, but this involves using DTOs (Data Transfer Objects) to populate with the data in the requests. You then interpret these DTOs to manipulate your entities within a single handler controller action and hence with full control over the atomicity of the resulting operations.

Note that some people actually prefer this approach (more process-oriented, less data-oriented), though it can be very difficult to model. There's no right answer, it always depends on the domain and use-cases, and it's easy to fall into traps that would make your API not very RESTful. It's the art of API design. Unrelated: The same remarks can be said about data modeling, which some people actually find harder. YMMV.

Summary

There's a few approaches to explore, some information to retrieve from the developers a canonical implementation technique to use, an opportunity to create a generic Entity Framework implementation, and a non-generic alternative.

It'd be nice if you could update this thread as you gather answers elsewhere (well, if you feel motivated enough) and with what you eventually decide to do, as it seems like something many people would love to have some kind of definitive guidance for.

Good luck ;).

like image 59
tne Avatar answered Oct 18 '22 19:10

tne