Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to query a playlist properly and safely

I want to extract the information from a YouTube playlist but querying the whole playlist at once seems to be quite unreliable even if I use the ignoreerrors flag, because sometimes it gets stuck, especially if the internet connection is a bit shaky. Should I just download the playlist one by one by setting the playliststart and playlistend values and processing it in a loop?

My current code looks like this:

simulate_ydl_opts = {
    'format': "251",
    'playlistend': 50,
    'ignoreerrors': True,
    'simulate':True
}
youtube_dl_object = youtube_dl.YoutubeDL(simulate_ydl_opts)
test_info=youtube_dl_object.extract_info("https://www.youtube.com/user/Rasenfunk")
like image 873
csabinho Avatar asked Mar 01 '19 08:03

csabinho


2 Answers

IMHO, I think you can use the ratelimit: 50000 as a configuration, in case it depends on your download speed, coupled with playliststart, playlistend, retries and continuedl.

As per their github common module, ratelimit is

Download speed limit, in bytes/sec.

If you already know the speed at which this download needs to be capped at and then try the same download. It's basically going to help you to throttle if your bandwidth can't handle it.

In case you are not sure of the max limit, I suggest using something like speedtest-cli, wherein you can identify the download speed and apply that for throttling the speed, I have just hit this sample code to try something out:

import speedtest
import math
import youtube_dl

s = speedtest.Speedtest()
s.get_best_server()
s.download()

# This get's the download in bits per second
print(s.results.dict()['download'])

simulate_ydl_opts = {
    'format': "251",
    'playlistend': 50,
    'ignoreerrors': True,
    'verbose': True, # Might give you a clue on the Download speed as it prints
    'ratelimit': s.results.dict()['download'],
    'retries': 10, #retry 10 times if there is a failure
    'continuedl': True #Try to continue downloads if possible.
}


youtube_dl_object = youtube_dl.YoutubeDL(simulate_ydl_opts)
test_info=youtube_dl_object.extract_info("https://www.youtube.com/user/Rasenfunk")
like image 60
Nagaraj Tantri Avatar answered Nov 12 '22 00:11

Nagaraj Tantri


Yes, you should probably try doing it one at a time. I recommend you also make your program keep track of what URL it got last so it can continue from where it last left off. A threading based timeout-and-restart system (For each video, make a new thread and include a timeout) would help the process go more seamlessly.

like image 1
Dorijan Cirkveni Avatar answered Nov 12 '22 01:11

Dorijan Cirkveni