How to design generic filtering operators in the query string of an API?

Tags:

I'm building a generic API with content and a schema that can be user-defined. I want to add filtering logic to API responses, so that users can query for specific objects they've stored in the API. For example, if a user is storing event objects, they could do things like filter on:

Array contains: Whether properties.categories contains Engineering
Greater than: Whether properties.created_at is older than 2016-10-02
Not equal: Whether properties.address.city is not Washington
Equal: Whether properties.name is Meetup
etc.

I'm trying to design filtering into the query string of API responses, and coming up with a few options, but I'm not sure which syntax for it is best...

1. Operator as Nested Key

Click to copy

/events?properties.name=Harry&properties.address.city.neq=Washington

This example is uses just a nested object to specific the operators (like neq as shown). This is nice in that it is very simple, and easy to read.

But in cases where the properties of an event can be defined by the user, it runs into an issue where there is a potential clash between a property named address.city.neq using a normal equal operator, and a property named address.city using a not equal operator.

Example: Stripe's API

2. Operator as Key Suffix

Click to copy

/events?properties.name=Harry&properties.address.city+neq=Washington

This example is similar to the first one, except it uses a + delimiter (which is equivalent to a space) for operations, instead of . so that there is no confusion, since keys in my domain can't contain spaces.

One downside is that it is slightly harder to read, although that's arguable since it might be construed as more clear. Another might be that it is slightly harder to parse, but not that much.

3. Operator as Value Prefix

Click to copy

/events?properties.name=Harry&properties.address.city=neq:Washington

This example is very similar to the previous one, except that it moves the operator syntax into the value of the parameter instead of the key. This has the benefit of eliminating a bit of the complexity in parsing the query string.

But this comes at the cost of no longer being able to differentiate between an equal operator checking for the literal string neq:Washington and a not equal operator checking for the string Washington.

Example: Sparkpay's API

4. Custom Filter Parameter

Click to copy

/events?filter=properties.name==Harry;properties.address.city!=Washington

This example uses a single top-level query paramter, filter, to namespace all of the filtering logic under. This is nice in that you never have to worry about the top-level namespace colliding. (Although in my case, everything custom is nested under properties. so this isn't an issue in the first place.)

But this comes at a cost of having a harder query string to type out when you want to do basic equality filtering, which will probably result in having to check the documentation most of the time. And relying on symbols for the operators might lead to confusion for non-obvious operations like "near" or "within" or "contains".

Example: Google Analytics's API

5. Custom Verbose Filter Parameter

Click to copy

/events?filter=properties.name eq Harry; properties.address.city neq Washington

This example uses a similar top-level filter parameter as the previous one, but it spells out the operators with word instead of defining them with symbols, and has spaces between them. This might be slightly more readable.

But this comes at a cost of having a longer URL, and a lot of spaces that will need to be encoded?

Example: OData's API

6. Object Filter Parameter

Click to copy

/events?filter[1][key]=properties.name&filter[1][eq]=Harry&filter[2][key]=properties.address.city&filter[2][neq]=Washington

This example also uses a top-level filter parameter, but instead of creating a completely custom syntax for it that mimics programming, it instead builds up an object definition of filters using a more standard query string syntax. This has the benefit of bring slightly more "standard".

But it comes at the cost of being very verbose to type and hard to parse.

Example Magento's API

Given all of those examples, or a different approach, which syntax is best? Ideally it would be easy to construct the query parameter, so that playing around in the URL bar is doable, but also not pose problems for future interoperability.

I'm leaning towards #2 since it seems like it is legible, but also doesn't have some of the downsides of other schemes.

734

asked Nov 15 '16 19:11

Ian Storm Taylor

1 Answers

I might not answer the "which one is best" question, but I can at least give you some insights and other examples to consider.

First, you are talking about "generic API with content and a schema that can be user-defined".

That sound a lot like solr / elasticsearch which are both hi level wrappers over Apache Lucene which basically indexes and aggregates documents.

Those two took totally different approaches to their rest API, I happened to work with both of them.

Elasticsearch :

They made entire JSON based Query DSL, which currently looks like this :

Click to copy

GET /_search {   "query": {      "bool": {        "must": [         { "match": { "title":   "Search"        }},          { "match": { "content": "Elasticsearch" }}         ],       "filter": [          { "term":  { "status": "published" }},          { "range": { "publish_date": { "gte": "2015-01-01" }}}        ]     }   } }

Taken from their current doc. I was surprised that you can actually put data in GET... It actually looks better now, in earlier versions it was much more hierarchical.

From my personal experience, this DSL was powerful, but rather hard to learn and use fluently (especially older versions). And to actually get some result you need more than just play with URL. Starting with the fact that many clients don't even support data in GET request.

SOLR :

They put everything into query params, which basically looks like this (taken from the doc) :

Click to copy

q=*:*&fq={!cache=false cost=5}inStock:true&fq={!frange l=1 u=4 cache=false cost=50}sqrt(popularity)

Working with that was more straightforward. But that's just my personal taste.

Now about my experiences. We were implementing another layer above those two and we took approach number #4. Actually, I think #4 and #5 should be supported at the same time. Why? Because whatever you pick people will be complaining, and since you will be having your own "micro-DSL" anyway, you might as well support few more aliases for your keywords.

Why not #2? Having single filter param and query inside gives you total control over DSL. Half a year after we made our resource, we got "simple" feature request - logical OR and parenthesis (). Query parameters are basically a list of AND operations and logical OR like city=London OR age>25 don't really fit there. On the other hand parenthesis introduced nesting into DSL structure, which would also be a problem in flat query string structure.

Well, those were the problems we stumbled upon, your case might be different. But it is still worth to consider, what future expectations from this API will be.

166

answered Oct 03 '22 06:10

James Cube

Related questions
                            
                                Get the 2 digit year in T-SQL
                            
                                SqlBulkCopy from a List<>
                            
                                Return ID on INSERT?
                            
                                How do I search for names with apostrophe in SQL Server?
                            
                                How to count items in comma separated list MySQL
                            
                                Bulk Record Update with SQL
                            
                                What are differences between INSERT and UPDATE in MySQL?
                            
                                MySQL bulk INSERT or UPDATE
                            
                                MySQL select fields containing leading or trailing whitespace
                            
                                Concatenate row values T-SQL
                            
                                field type for unix timestamp
                            
                                Addition with NULL values
                            
                                Java Iterator backed by a ResultSet
                            
                                Return value using String result=Command.ExecuteScalar() error occurs when result returns null
                            
                                How to update field to add value to existing value?
                            
                                How do I use alias in where clause? [duplicate]
                            
                                Multiple COUNT() for multiple conditions in one query (MySQL)
                            
                                How to generate unique id in MySQL?
                            
                                What is difference between INNER join and OUTER join [duplicate]
                            
                                Parse Cloud Code relational query syntax

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to design generic filtering operators in the query string of an API?

Tags:

rest

sql

database

api

1. Operator as Nested Key

2. Operator as Key Suffix

3. Operator as Value Prefix

4. Custom Filter Parameter

5. Custom Verbose Filter Parameter

6. Object Filter Parameter

Ian Storm Taylor

People also ask

1 Answers

James Cube

Recent Activity

Donate For Us