I wish to display the full path of the pdf file along with its contents displayed on the browser. My script has an input html, where user will input file name and submit the form. The script will search for the file, if found in the subdirectories will output the file contents into the browser and also display its name. I am able to display the contents but unable to display the full fine name also simultaneously Or if I display the filename I get garbage character display for the contents. Please guide.
enter link description here
script a.py:
import os
import cgi
import cgitb
cgitb.enable()
import sys
import webbrowser
def check_file_extension(display_file):
input_file = display_file
nm,file_extension = os.path.splitext(display_file)
return file_extension
form = cgi.FieldStorage()
type_of_file =''
file_nm = ''
nm =''
not_found = 3
if form.has_key("file1"):
file_nm = form["file1"].value
type_of_file = check_file_extension(file_nm)
pdf_paths = [ '/home/nancy/Documents/',]
# Change the path while executing on the server , else it will throw error 500
image_paths = [ '/home/nancy/Documents/']
if type_of_file == '.pdf':
search_paths = pdf_paths
else:
# .jpg
search_paths = image_paths
for path in search_paths:
for root, dirnames, filenames in os.walk(path):
for f in filenames:
if f == str(file_nm).strip():
absolute_path_of_file = os.path.join(root,f)
# print 'Content-type: text/html\n\n'
# print '<html><head></head><body>'
# print absolute_path_of_file
# print '</body></html>'
# print """Content-type: text/html\n\n
# <html><head>absolute_path_of_file</head><body>
# <img src=file_display.py />
# </body></html>"""
not_found = 2
if search_paths == pdf_paths:
print 'Content-type: application/pdf\n'
else:
print 'Content-type: image/jpg\n'
file_read = file(absolute_path_of_file,'rb').read()
print file_read
print 'Content-type: text/html\n\n'
print absolute_path_of_file
break
break
break
if not_found == 3:
print 'Content-type: text/html\n'
print '%s not found' % absolute_path_of_file
The html is a regular html with just 1 input field for file name.
It is not possible. At least not that simple. Some web browsers don't display PDFs but ask the user to download the file, some display them themselves, some embed an external PDF viewer component, some start an external PDF viewer. There is no standard, cross browser way to embed PDF into HTML, which would be needed if you want to display arbitrary text and the PDF content.
A fallback solution, working on every browser, would be rendering the PDF pages on the server as images and serve those to the client. This puts some stress on the server (processor, memory/disk for caching, bandwidth).
Some modern, HTML5 capable browsers can render PDFs with Mozilla's pdf.js on a canvas element.
For other's you could try to use <embed>
/<object>
to use Adobe's plugin as described on Adobe's The PDF Developer Junkie Blog.
Rendering and serving the PDF pages as images needs some software on the server to query the number of pages and to extract and render a given page as image.
The number of pages can be determined with the pdfinfo
program from Xpdf or the libpoppler command line utilities. Converting a page from the PDF file to a JPG image can be done with convert
from the ImageMagick tools. A very simple CGI program using these programs:
#!/usr/bin/env python
import cgi
import cgitb; cgitb.enable()
import os
from itertools import imap
from subprocess import check_output
PDFINFO = '/usr/bin/pdfinfo'
CONVERT = '/usr/bin/convert'
DOC_ROOT = '/home/bj/Documents'
BASE_TEMPLATE = (
'Content-type: text/html\n\n'
'<html><head><title>{title}</title></head><body>{body}</body></html>'
)
PDF_PAGE_TEMPLATE = (
'<h1>{filename}</h1>'
'<p>{prev_link} {page}/{page_count} {next_link}</p>'
'<p><img src="{image_url}" style="border: solid thin gray;"></p>'
)
SCRIPT_NAME = os.environ['SCRIPT_NAME']
def create_page_url(filename, page_number, type_):
return '{0}?file={1}&page={2}&type={3}'.format(
cgi.escape(SCRIPT_NAME, True),
cgi.escape(filename, True),
page_number,
type_
)
def create_page_link(text, filename, page_number):
text = cgi.escape(text)
if page_number is None:
return '<span style="color: gray;">{0}</span>'.format(text)
else:
return '<a href="{0}">{1}</a>'.format(
create_page_url(filename, page_number, 'html'), text
)
def get_page_count(filename):
def parse_line(line):
key, _, value = line.partition(':')
return key, value.strip()
info = dict(
imap(parse_line, check_output([PDFINFO, filename]).splitlines())
)
return int(info['Pages'])
def get_page(filename, page_index):
return check_output(
[
CONVERT,
'-density', '96',
'{0}[{1}]'.format(filename, page_index),
'jpg:-'
]
)
def send_error(message):
print BASE_TEMPLATE.format(
title='Error', body='<h1>Error</h1>{0}'.format(message)
)
def send_page_html(_pdf_path, filename, page_number, page_count):
body = PDF_PAGE_TEMPLATE.format(
filename=cgi.escape(filename),
page=page_number,
page_count=page_count,
image_url=create_page_url(filename, page_number, 'jpg'),
prev_link=create_page_link(
'<<', filename, page_number - 1 if page_number > 1 else None
),
next_link=create_page_link(
'>>',
filename,
page_number + 1 if page_number < page_count else None
)
)
print BASE_TEMPLATE.format(title='PDF', body=body)
def send_page_image(pdf_path, _filename, page_number, _page_count):
image_data = get_page(pdf_path, page_number - 1)
print 'Content-type: image/jpg'
print 'Content-Length:', len(image_data)
print
print image_data
TYPE2SEND_FUNCTION = {
'html': send_page_html,
'jpg': send_page_image,
}
def main():
form = cgi.FieldStorage()
filename = form.getfirst('file')
page_number = int(form.getfirst('page', 1))
type_ = form.getfirst('type', 'html')
pdf_path = os.path.abspath(os.path.join(DOC_ROOT, filename))
if os.path.exists(pdf_path) and pdf_path.startswith(DOC_ROOT):
page_count = get_page_count(pdf_path)
page_number = min(max(1, page_number), page_count)
TYPE2SEND_FUNCTION[type_](pdf_path, filename, page_number, page_count)
else:
send_error(
'<p>PDF file <em>{0!r}</em> not found.</p>'.format(
cgi.escape(filename)
)
)
main()
There is Python bindings for libpoppler, so the call to the external pdfinfo
program could be replaced quite easily with that module. It may also be used to extract more information for the pages like links on the PDF pages to create HTML image maps for them. With the libcairo Python bindings installed it may be even possible to do the rendering of a page without an external process.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With