I want to extract the information from a scanned table and store it a csv. Right now my table extraction algorithm does the following steps.
This algorithm is working fine for digital born pdfs and most of the scanned documents. But, Some of the documents have a noisy table and thus its not identifying the lines correctly.
Here is a sample image in which my algorithm fails.
These are the operations I am doing on this table. 1.Gaussian blur
2.Otsu thresholding
3.Morphological opening
4.Canny edge detection
5.filtered lines,as you can see the lines are clearly not identified correctly.
Can anyone please suggest better method for extracting horizontal and vertical lines from this kind of less quality scans.
Thanks in advance!!
I found a perfect solution in this blog. https://medium.com/coinmonks/a-box-detection-algorithm-for-any-image-containing-boxes-756c15d7ed26
Here,We are doing morphological transformations using a vertical kernel to detect vetical lines and horizontal kernel to detect horizontal lines and then combining them to get all the required lines.
Vertical lines
Horizontal lines
required output
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With