I'm fetching list format depending on URL link and I want to extract terminal output to pandas Dataframe structure before doing so I must convert it to dictionary. How can I achieve that?
Here's the code:
import subprocess
import pandas as pd
url = 'https://www.youtube.com/watch?v=kjYW63CVbsE'
command = subprocess.getoutput('yt-dlp --list-formats "{url}"'.format(url=url))
print(command
Here's the output

I tried to split it into lines but not my expected result
import subprocess
import pandas as pd
url = 'https://www.youtube.com/watch?v=kjYW63CVbsE'
command = subprocess.getoutput('yt-dlp --list-formats "{url}"'.format(url=url))
output_lines = command.split('\n')
headers = output_lines[6].split()
format_list = []
for line in output_lines[::]:
values = line.split()
format_info = {}
for i in range(len(headers)):
format_info[headers[i]] = values[i]
format_list.append(format_info)
df = pd.DataFrame(format_list)
print(df)
I would recommend against parsing the raw string and suggest using yt_dlp.YoutubeDL (or use --dump-json or --print "%()j" like the answer @AndrejKesely provided) as said in the yt-dlp doc:
Your program should avoid parsing the normal stdout since they may change in future versions. Instead they should use options such as -J, --print, --progress-template, --exec etc to create console output that you can reliably reproduce and parse. From a Python program, you can embed yt-dlp in a more powerful fashion:
import pandas as pd
from yt_dlp import YoutubeDL
url = 'https://www.youtube.com/watch?v=kjYW63CVbsE'
ydl_opts = {'listformats':True}
with YoutubeDL(ydl_opts) as ydl:
info_dict = ydl.extract_info(url, download=False)
data = pd.DataFrame()
df = pd.DataFrame(ydl.sanitize_info(info_dict).get("formats"))
print(df)
Prints:
format_id format_note ext protocol acodec vcodec ... audio_channels language_preference dynamic_range container downloader_options filesize_approx
0 sb3 storyboard mhtml mhtml none none ... NaN NaN NaN NaN NaN NaN
1 sb2 storyboard mhtml mhtml none none ... NaN NaN NaN NaN NaN NaN
2 sb1 storyboard mhtml mhtml none none ... NaN NaN NaN NaN NaN NaN
3 sb0 storyboard mhtml mhtml none none ... NaN NaN NaN NaN NaN NaN
4 233 Default mp4 m3u8_native NaN none ... NaN NaN NaN NaN NaN NaN
5 234 Default mp4 m3u8_native NaN none ... NaN NaN NaN NaN NaN NaN
6 293 Default mp4 m3u8_native NaN none ... NaN NaN NaN NaN NaN NaN
7 294 Default mp4 m3u8_native NaN none ... NaN NaN NaN NaN NaN NaN
8 599 ultralow m4a https mp4a.40.5 none ... 2.0 -1.0 None m4a_dash {'http_chunk_size': 10485760} NaN
9 600 ultralow webm https opus none ... 2.0 -1.0 None webm_dash {'http_chunk_size': 10485760} NaN
10 139 low m4a https mp4a.40.5 none ... 2.0 -1.0 None m4a_dash {'http_chunk_size': 10485760} NaN
11 249 low webm https opus none ... 2.0 -1.0 None webm_dash {'http_chunk_size': 10485760} NaN
12 250 low webm https opus none ... 2.0 -1.0 None webm_dash {'http_chunk_size': 10485760} NaN
13 140 medium m4a https mp4a.40.2 none ... 2.0 -1.0 None m4a_dash {'http_chunk_size': 10485760} NaN
14 251 medium webm https opus none ... 2.0 -1.0 None webm_dash {'http_chunk_size': 10485760} NaN
15 17 144p 3gp https mp4a.40.2 mp4v.20.3 ... 1.0 -1.0 SDR NaN {'http_chunk_size': 10485760} NaN
16 602 NaN mp4 m3u8_native none vp09.00.10.08 ... NaN NaN SDR NaN NaN NaN
17 597 144p mp4 https none avc1.4d400b ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
18 598 144p webm https none vp9 ... NaN -1.0 SDR webm_dash {'http_chunk_size': 10485760} NaN
19 269 NaN mp4 m3u8_native none avc1.4D400C ... NaN NaN SDR NaN NaN NaN
20 281 NaN mp4 m3u8_native none avc1.4D400C ... NaN NaN SDR NaN NaN NaN
21 603 NaN mp4 m3u8_native none vp09.00.11.08 ... NaN NaN SDR NaN NaN NaN
22 394 144p mp4 https none av01.0.00M.08 ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
23 160 144p mp4 https none avc1.4D400C ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
24 278 144p webm https none vp09.00.11.08 ... NaN -1.0 SDR webm_dash {'http_chunk_size': 10485760} NaN
25 229 NaN mp4 m3u8_native none avc1.4D4015 ... NaN NaN SDR NaN NaN NaN
26 282 NaN mp4 m3u8_native none avc1.4D4015 ... NaN NaN SDR NaN NaN NaN
27 604 NaN mp4 m3u8_native none vp09.00.20.08 ... NaN NaN SDR NaN NaN NaN
28 395 240p mp4 https none av01.0.00M.08 ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
29 133 240p mp4 https none avc1.4D4015 ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
30 242 240p webm https none vp09.00.20.08 ... NaN -1.0 SDR webm_dash {'http_chunk_size': 10485760} NaN
31 230 NaN mp4 m3u8_native none avc1.4D401E ... NaN NaN SDR NaN NaN NaN
32 283 NaN mp4 m3u8_native none avc1.4D401E ... NaN NaN SDR NaN NaN NaN
33 605 NaN mp4 m3u8_native none vp09.00.21.08 ... NaN NaN SDR NaN NaN NaN
34 396 360p mp4 https none av01.0.01M.08 ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
35 134 360p mp4 https none avc1.4D401E ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
36 18 360p mp4 https mp4a.40.2 avc1.42001E ... 2.0 -1.0 SDR NaN {'http_chunk_size': 10485760} 9195945.0
37 243 360p webm https none vp09.00.21.08 ... NaN -1.0 SDR webm_dash {'http_chunk_size': 10485760} NaN
38 231 NaN mp4 m3u8_native none avc1.4D401F ... NaN NaN SDR NaN NaN NaN
39 284 NaN mp4 m3u8_native none avc1.4D401F ... NaN NaN SDR NaN NaN NaN
40 606 NaN mp4 m3u8_native none vp09.00.30.08 ... NaN NaN SDR NaN NaN NaN
41 397 480p mp4 https none av01.0.04M.08 ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
42 135 480p mp4 https none avc1.4D401F ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
43 244 480p webm https none vp09.00.30.08 ... NaN -1.0 SDR webm_dash {'http_chunk_size': 10485760} NaN
44 287 NaN mp4 m3u8_native none avc1.4D401F ... NaN NaN SDR NaN NaN NaN
45 232 NaN mp4 m3u8_native none avc1.4D401F ... NaN NaN SDR NaN NaN NaN
46 609 NaN mp4 m3u8_native none vp09.00.31.08 ... NaN NaN SDR NaN NaN NaN
47 22 720p mp4 https mp4a.40.2 avc1.64001F ... 2.0 -1.0 SDR NaN {'http_chunk_size': 10485760} 16860413.0
48 398 720p mp4 https none av01.0.05M.08 ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
49 136 720p mp4 https none avc1.4D401F ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
50 247 720p webm https none vp09.00.31.08 ... NaN -1.0 SDR webm_dash {'http_chunk_size': 10485760} NaN
51 290 NaN mp4 m3u8_native none avc1.640028 ... NaN NaN SDR NaN NaN NaN
52 270 NaN mp4 m3u8_native none avc1.640028 ... NaN NaN SDR NaN NaN NaN
53 614 NaN mp4 m3u8_native none vp09.00.40.08 ... NaN NaN SDR NaN NaN NaN
54 399 1080p mp4 https none av01.0.08M.08 ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
55 137 1080p mp4 https none avc1.640028 ... NaN -1.0 SDR mp4_dash {'http_chunk_size': 10485760} NaN
56 248 1080p webm https none vp09.00.40.08 ... NaN -1.0 SDR webm_dash {'http_chunk_size': 10485760} NaN
57 616 Premium mp4 m3u8_native none vp09.00.40.08 ... NaN NaN SDR NaN NaN NaN
[58 rows x 37 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With