Determine if url is a pdf or html file

Tags:

I am requesting urls using the requests package in python (e.g. file = requests.get(url)). The urls do not specify an extension in them, and sometimes a html file is returned and sometimes a pdf is returned.

Is there a way of determining if the returned file is a pdf or a html, or more generally, what the file format is? The browser is able to determine, so I assume it must be indicated in the response.

847

asked Aug 01 '16 03:08

kyrenia

1 Answers

This will be found in the Content-Type header, either text/html or application/pdf

 import requests

 r = requests.get('http://example.com/file')
 content_type = r.headers.get('content-type')

 if 'application/pdf' in content_type:
     ext = '.pdf'
 elif 'text/html' in content_type:
     ext = '.html'
 else:
     ext = ''
     print('Unknown type: {}'.format(content_type))

 with open('myfile'+ext, 'wb') as f:
     f.write(r.raw.read())

answered Sep 26 '22 19:09

Wayne Werner

Related questions
                            
                                Recommended setup involving Scitools, NumPy, and SciPy
                            
                                Are class objects/ instances in Python 3 passed by reference?
                            
                                24-Hour Time Conversion to 12-Hour Clock (ProblemSetQuestion) on Python
                            
                                create a list of integers from a to b in python
                            
                                How do I access the data sent to my server using BaseHTTPRequestHandler? [duplicate]
                            
                                How to deploy zip files (or other binaries) trough cgi in Python?
                            
                                How to shift bits in a 2-5 byte long bytes object in python?
                            
                                Lambda functions unequal behaviors in Python 3 and Python 2
                            
                                argparse - disable same argument occurrences
                            
                                unexpected keyword argument 'buffering' - python client
                            
                                SyntaxError when trying to use backslash for Windows file path
                            
                                Why isn't range getting exhausted in Python-3?
                            
                                Can't install python-ldap via pip
                            
                                What is the proper way to print a nested list with the highest value in Python
                            
                                How to send mail with Python
                            
                                Python file() function
                            
                                Jupyter Notebook Broken by Python 3.5
                            
                                Extracting Text Between HTML Comments with BeautifulSoup
                            
                                JSON encoding error publishing SNS message with Boto 3
                            
                                Py.Test add marker to all tests

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Determine if url is a pdf or html file

Tags:

python-3.x

python-requests

kyrenia

People also ask

1 Answers

Wayne Werner

Recent Activity

Donate For Us