I'm trying to use MODI to OCR a window's program. It works fine for screenshots I grab programmatically using win32 interop like this:
public string SaveScreenShotToFile()
{
RECT rc;
GetWindowRect(_hWnd, out rc);
int width = rc.right - rc.left;
int height = rc.bottom - rc.top;
Bitmap bmp = new Bitmap(width, height);
Graphics gfxBmp = Graphics.FromImage(bmp);
IntPtr hdcBitmap = gfxBmp.GetHdc();
PrintWindow(_hWnd, hdcBitmap, 0);
gfxBmp.ReleaseHdc(hdcBitmap);
gfxBmp.Dispose();
string fileName = @"c:\temp\screenshots\" + Guid.NewGuid().ToString() + ".bmp";
bmp.Save(fileName);
return fileName;
}
This image is then saved to a file and ran through MODI like this:
private string GetTextFromImage(string fileName)
{
MODI.Document doc = new MODI.DocumentClass();
doc.Create(fileName);
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
MODI.Image img = (MODI.Image)doc.Images[0];
MODI.Layout layout = img.Layout;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < layout.Words.Count; i++)
{
MODI.Word word = (MODI.Word)layout.Words[i];
sb.Append(word.Text);
sb.Append(" ");
}
if (sb.Length > 1)
sb.Length--;
return sb.ToString();
}
This part works fine, however, I don't want to OCR the entire screenshot, just portions of it. I try cropping the image programmatically like this:
private string SaveToCroppedImage(Bitmap original)
{
Bitmap result = original.Clone(new Rectangle(0, 0, 250, 250), original.PixelFormat);
var fileName = "c:\\" + Guid.NewGuid().ToString() + ".bmp";
result.Save(fileName, original.RawFormat);
return fileName;
}
and then OCRing this smaller image, however MODI throws an exception; 'OCR running error', the error code is -959967087.
Why can MODI handle the original bitmap but not the smaller version taken from it?
You can either take a new screenshot to extract text from or upload an image file. To use it, right-click on its icon in the system tray and select Image OCR. A small window will open up where you can drag and drop the image, and it will automatically process it.
This is most easily achieved using the NuGet Package Manager using the ID name OCR for the package we wish to install. The Auto OCR class is a . NET class allowing OCR to be achieved in a single line of code. It can read text from images in C# .
If you need your documents to be prepared for further text-based processing, use our free OCR service. It has powerful text recognition capabilities and is able to handle more than 100 languages, including Japanese, Chinese and Hindi. Aspose OCR Service turns pictures into text quickly, efficiently and accurately.
Looks as though the answer is in giving MODI a bigger canvas. I was also trying to take a screenshot of a control and OCR it and ran into the same problem. In the end I took the image of the control, copied the image into a larger bitmap and OCRed the larger bitmap.
Another issue I found was that you must have a proper extension for your image file. In other words, .tmp doesn't cut it.
I kept the work of creating a larger source inside my OCR method, which looks something like this (I deal directly with Image objects):
public static string ExtractText(this Image image)
{
var tmpFile = Path.GetTempFileName();
string text;
try
{
var bmp = new Bitmap(Math.Max(image.Width, 1024), Math.Max(image.Height, 768));
var gfxResize = Graphics.FromImage(bmp);
gfxResize.DrawImage(image, new Rectangle(0, 0, image.Width, image.Height));
bmp.Save(tmpFile + ".bmp", ImageFormat.Bmp);
var doc = new MODI.Document();
doc.Create(tmpFile + ".bmp");
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
var img = (MODI.Image)doc.Images[0];
var layout = img.Layout;
text = layout.Text;
}
finally
{
File.Delete(tmpFile);
File.Delete(tmpFile + ".bmp");
}
return text;
}
I'm not sure exactly what the minimum size is, but it appears as though 1024 x 768 does the trick.
yes the posts in this thread helped me gettin it to work, here what i have to add:
was trying to download images ( small ones ) then ocr...
-when processing images, it seems that theyr size must be power of 2 ! ( was able to ocr images: 512x512 , 128x128, 256x64 .. other sizes mostly failed ( like 1103x334 ))
transparent background also made troubles. I got the best results when creating a new tif with powerof2 boundary, white background, paste the downloaded image into it, save.
scaling the image did not succeed for me, since OCR is getting wrong results , specially for "german" characters like "ü"
in the end i also used: doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);
using modi from office 2003
greetings
womd
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With