I am a beginner in R programming and a supposed to write a code to read in text from images! I am using the Tesseract and Magick packages for doing the same and am facing an issue where the code converts an "&" to "8:" I have attached the image that I am using as an input. Image used for processing
Below is the code that I am running
test2 <- image_read("C:/Users/admin/Desktop/testimage.jpg") %>%
image_resize("2000") %>%
image_convert(colorspace = 'gray') %>%
image_trim() %>%
image_ocr()
cat(test2)
write.table(test2, "C:/Users/admin/Desktop/output2.txt", sep="\t")
Below is the output that I am getting
No relation between boycotting
panchayat polls 8: Article 35A:
Subramanian Swamy
I have referred to the following source to gain some understanding but did not find any suitable solution for this specific problem.
I have also gone through this website but did not find much help in reading in special characters.
If someone can help me, that would be really helpful.
Can you use Imagemagick with a TIF instead of a JPG to do the same ? I used the below query and it worked.
test20 <- image_read("E:/xx/image.tif") %>%
image_resize("4000") %>%
image_convert(colorspace = 'gray') %>%
image_trim() %>%
image_ocr()
cat(test20)
write.table(test2, "E:/xx/output.txt", sep="\t")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With