Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use Etags for Youtube v3 Data API?

I am building an extension and it makes a lot of requests. The feature I'm working on is to display the total length of time it would take to watch a playlist. Given a playlist of size 1000, I have to make 40 requests just to find this information (50 videos at a time limit, 1st call to /v3/playlistItems for an array of videoID's, 2nd call to /v3/videos for duration information). As far as I can tell, just for that one playlist, I lose 600 quota. Per load of page. I know, nothing to get worked up about because I have 50,000,000 quota per day allowed, but I want to optimize early. This is also a speed issue. It takes a solid minute and a half just to get the playlist length.

Now, ETags. For some reason, every time I make a request to youtube's data API for videos or playlistItems I get a completely new Etag (Most of the time, I have had cases where it returns the same ETag), regardless of playlist (I haven't tried private playlists, did not do OAuth yet). I'm assuming that the reason is that something is changing somewhere in a playlist, causing a new Etag very fast. Views? PlaylistItems doesn't even return views!

Here are example API calls to a macaroni playlist. The ETags are always different! How am I supposed to use them if they don't work? They are specific, there is no way that the length of videos changes in between requests. The api key is omitted because you can make your own api key.

```
Playlist Items, give me video id's, page tokens, and Etag for playlist for items 100-150
https://www.googleapis.com/youtube/v3/playlistItems?part=contentDetails&maxResults=50&playlistId=PLF-hTvh6KCehzImlI2pAKsOFPR62QZTv-&fields=etag%2Citems%2FcontentDetails%2CnextPageToken%2CprevPageToken&key={YOUR_API_KEY}&pageToken=CGQQAA

Videos, give me durations and Etag for these video ids
https://www.googleapis.com/youtube/v3/videos?part=contentDetails&id=SswxpqGX1F0,3Hy5BuFTBbI,ZnlW1fSXZZM,8sb_YOrReZ4,6IN_mupBjh8,VzoqsRLY5Qk,5m8H9YrPvPA,JdRbtGdR68E,hEzPBiYPsDU,bJuioKFYv-c,1N8O8OOG2_U,QDgqSL8nU5U,gP4gB45Z52M,pI1oB2y9c0M,WZGn5Vh_mc4,A0KpbS5WjSU,b0yoIOX8Bk0,5Y7iQt7vtOE,qIijCwjUApQ,RgHjqvznjxg,QzceROWtn5o,8z0VnMQFGR8,5olHoTWB1Hw,vz0T59Ql7fQ,LhktiZYQraU,WIuuZOD9ahI,rwEHW6GRH1Q,FjT1BpKvfgo,FRZL2yaZyZk,U5-vjCDwDUU,b21Lj9bfDWc,yox3-U7r_i8,rXJ5ph83Vrs,nXrk2finMcA,VfagTkQWHuI,K_ZaRAtZQOg,_JIcREsn9pU,y9WGvudeDAM,O08jNtrieI4,9UkEzW1AY7Y,jOaBdnYsobg,y7dSbhc-8h0,IfpPiCGcF8g,2rTRmb9nKbY,bHgv3A26O6Y,hFQmV-zvcbM,Osc4y45oQxw,GHusS6Yd5A8,T2Z3CuUWUQc,OPV-DopMqxs&fields=etag%2Citems%2FcontentDetails%2Fduration&key={YOUR_API_KEY}
```

I want to cache this data. I'm thinking of making an extra beginning request for the playlist's total videos, because that is something that is directly correlated to the total length of time for a playlist. But that feels like a lot of logic. What video was added/removed? How many? If it was added to the beginning, I imagine to optimize, I have to compare the first 50 video id's with my cached video id durations. If it was changed somewhere in the middle I have to keep querying. Maybe cache something else to make this easier? Multiple playlists can have the same videos, playlists can have the same video more than once, I dunno. Maybe there is no way around querying an entire playlist, maybe I should just cache the calls to /v3/videos. The thing is that I want to optimize the the call to /v3/playlistItems because is the long one (Takes 3x the time to /v3/videos).

My main questions are: What do I cache to optimize getting playlist length, How do I do that, and what's up with the ETags?

like image 439
Ignat Avatar asked Apr 11 '16 19:04

Ignat


4 Answers

I figured out how to cache the data a while ago, sorry!

You can make a call to /playlists to get both the total count of items in a playlist, plus the etag changes if and only if the playlist itself changed, which is what I want. I only want to make new requests if the base playlist changed.

A call to /playlistItems always generates a new etag, regardless of changes. I think this endpoint is meant for temporary querying to figure out metadata of a video as it relates to a playlist, not for static data lookup. Playlists are very flexible and I think YouTube decided against caching this data since calls to /playlistItems are often on a case-by-case basis. It's likely their backend automatically generates an etag, but doesn't actually save anything for this endpoint.

So, these are the steps to get the total length of time of a playlist, plus caching:

  1. get playlist id
  2. lookup etag in cache by playlist id
  3. call /playlists with the etag in the If-None-Match header (should work even if etag is empty)
    • if the api returns 304, use cached playlist length
    • if the api returns 200, save the new etag in cache
    • You can do more caching!
  4. call /playlistItems with playlist id (with all the pageTokens)
  5. lookup each videoId in cache to get video length
    • Cache is defined as a dictionary of videoId:videoLength
    • if videoLength not found, add videoId to a videos array
    • if videoLength is found, add to a lengths array
  6. call /videos with all the video id's that are not found in cache up to 50 elements
    • Could be done right after /playlistItems call or when all calls are done, I think it's ok to be lazy right now and do it right after each call
    • Also you can cache video calls with etags and save that to check if the length hasn't changed, but then you would have to call the api per each video. I dunno, but I think this is over-optimizing. Still might want to keep in mind that video length can change via YouTube's editing tools when debugging
  7. (continued from 7) For each video in the response, cache the video length in a dictionary as a videoId:videoLength pair, then add length to a lengths array
  8. Reduce lengths array into a moment.js duration object
  9. Save a formatted string of the length of the playlist to cache by etag as key
  10. Return the formatted string of the length of the playlist

Here is the implementation on my github

like image 89
Ignat Avatar answered Oct 20 '22 06:10

Ignat


The eTag does not change every request. But you get a specific amount of different eTags for a specific request. Reason for that is that some elements inside the response change their order and therefore the algorithm creating the eTag produces different eTags. As soon as two responses have exactly the same order of their content elements the eTag will be the same again. I recorded a bunch of requests and came exactly to this conclusion. Tested with channel request by id and part=brandingSettings,snippet . If you select multiple parts with lots of nested elements in die response, you'll get more different combinations and therefore different eTags.

like image 34
Bjoern Borg Avatar answered Oct 20 '22 05:10

Bjoern Borg


When you run the same query and the content hasn't changed, the YouTube Data API returns an always-changing Etag. So, it looks like the Etag implementation is broken.

But, in fact, it's not. If you provide a previously received Etag in the request, then the YouTube Data API will behave correctly. It will recognize the Etag and will answer with an HTTP status 304 Not Modified.

like image 32
lacton Avatar answered Oct 20 '22 06:10

lacton


I've found, that etag works correctly only when you use "part=id" and do not use "maxResults=NN". Otherwise every call to API returns new etag.

like image 1
Элёржон Кимсанов Avatar answered Oct 20 '22 06:10

Элёржон Кимсанов