parse a pdf using python

1 Answers

Use PyPDF2:

from PyPDF2 import PdfFileReader

with open('CT1-All.pdf', 'rb') as f:
    reader = PdfFileReader(f)
    contents = reader.getPage(0).extractText().split('\n')
    pass

When you print contents, it will look like this (I have trimmed it here):

[u'Serial NoRoll NoNameCT1 Marks (50)111MA20026KARADI KALYANI212AR10029MUKESH K
MAR5', u'312MI31004DEEPAK KUMAR7', u'413AE10008FADKE PRASAD DIPAK27', u'513AE10
22RAHUL DUHAN37', u'613AE30005HIMANSHU PRABHAT26.5', u'713AE30019VISHAL KUMAR39
, u'813AG10014HEMANT17', u'913AG10028SHRESTH KR KRISHNA37.51013AG30009HITESH ME
RA33.5', u'1113AG30023RACHIT MADHUKAR40.5', u'1213AR10002ACHARY SUDHEER11', u'1
13AR10004AMAN ASHISH20.5', u'1413AR10008ANKUR44', u'1513AR10010CHUKKA SHALEM RA
U11.5', u'1613AR10012DIKKALA VIJAYA RAGHAVA20.5', u'1713AR10014HRISHABH AMRODIA
1', u'1813AR10016JAPNEET SINGH CHAHAL19.5', u'1913AR10018K VIGNESH42.5', u'2013
R10020KAARTIKEY DWIVEDI49.5', u'2113AR10024LAKSHMISRI KEERTI MANNEY49', u'2213A
10026MAJJI DINESH9.5', u'2313AR10028MOUNIKA BHUKYA17.5', u'2413AR10030PARAS PRA

156

answered Sep 21 '22 11:09

Burhan Khalid

Related questions
                            
                                Storing Python objects in a Python list vs. a fixed-length Numpy array
                            
                                Python function for capping a string to a maximum length
                            
                                using gen.task with Tornado for a simple function
                            
                                Set the default to false if another mutually exclusive argument is true
                            
                                PyCharm autocomplete
                            
                                Python module "cx_Oracle" module could not be found
                            
                                Prepare data for text classification using Scikit Learn SVM
                            
                                networkx: Draw text on edges
                            
                                Exposing model method with Tastypie
                            
                                Lucas Kanade python numpy implementation uses enormous amount of memory
                            
                                Can't get Fabric's detached screen session example to work
                            
                                Plot x-y data if x entry meets condition python
                            
                                strncmp in python
                            
                                Python list of tuples to list of int
                            
                                Python openCV detect parallel lines
                            
                                Using Twitter Bootstrap radio buttons with Flask
                            
                                How to replace custom tabs with spaces in a string, depend on the size of the tab?
                            
                                Setuptools unable to use link from dependency_links
                            
                                Extracting all rows from pandas Dataframe that have certain value in a specific column
                            
                                Execute shell script from python with a variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

parse a pdf using python

Tags:

python

pdf

IcyFlame

People also ask

1 Answers

Burhan Khalid

Recent Activity

Donate For Us