At my work, I sometimes have to take some printed source code and manually type the source code into a text editor. Do not ask why.
Obviously typing it up takes a long time and always extra time to debug typing errors (oops missed a "$" sign there).
I decided to try some OCR solutions like:
I feel like source code would be very easy to OCR given the font is sans serif and monospace.
Have any of you found a good OCR solution that works well on source code?
Maybe I just need a better OCR solution (not necessarily source code specific)?
What is this? Program List OCR is a peice of OCR (Optical Character Recognition) software which is specific to computer program listings published in 1980s. It converts scanned program listing images into plain text. You can convert this text into an emulator’s input file, e.g. casette tape image
Under Apache Licence 2.0. MMOCR is another open-source OCR tool that was developed under the famous OpenMMLab project. The project is developed by the team from The Chinese University of Hong Kong and has been one of the leading projects in the area of Computer Vision.
For example, they provide the PPOCRLabel for you to quickly label the text in the image. As data is important to train the OCR model, they also have a tool called Style-text for you to quickly synthesize your image so that you have more images to train your model, making it robust to use in the production environment.
The app offers users the convenience of scanning questions they have on paper and having them translated into machine-readable format through scanning. OCR can be done using either traditional computer vision techniques or more advanced deep learning techniques. The focus of this article will only be on tools that use deep learning models.
Google Drive's built-in OCR worked pretty well for me. Just convert scans to a PDF, upload to Google Drive, and choose "Open with... Google Docs". There are some weird things with color and text size, but it still includes semicolons and such.
The original screenshot: The Google Docs OCR:
Plaintext version:
#include <stdio.h> int main(void) {
char word[51]; int contains = -1; int i = 0; int length = 0; scanf("%s", word); while (word[length] != "\0") i ++; while ((contains == 1 || contains == 2) && word[i] != "\0") {
if (word[i] == "t" || word[i] == "T") {
if (i <= length / 2) {
contains = 1; } else contains = 2;
return 0;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With