Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different text detection result from Google Vision API

Get different result for text detection from .Net code and demo app for the same image google vision api result and .net result

this is my code:

            var response = vision.Images.Annotate(
            new BatchAnnotateImagesRequest()
            {
                Requests = new[]
                {
                    new AnnotateImageRequest()
                    {
                        Features = new[]
                        {
                            new Feature()
                            {
                                Type =
                                    "TEXT_DETECTION"
                            }
                        },
                        Image = image
                    }
                }
            }).Execute();
like image 458
eugenekv Avatar asked Dec 08 '25 17:12

eugenekv


2 Answers

As noted in Emil's answer, you want the DOCUMENT_TEXT_DETECTION feature rather than TEXT_DETECTION. However, you can do it all rather more simply than with the current code.

Rather than using Google.Apis.Vision.V1 (which it looks like you're doing, and which uses the REST endpoint), I'd suggest using Google.Cloud.Vision.V1 (which uses the gRPC endpoint, and is hopefully easier to use. Disclaimer: I work on the latter library). You can do most work with the REST endpoint, mind you.

Here's a complete example using the latter library.

using Google.Cloud.Vision.V1;
using System;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        var client = ImageAnnotatorClient.Create();
        var image = Image.FromUri("https://i.sstatic.net/H21rL.png");
        var annotations = client.DetectDocumentText(image);
        var paragraphs = annotations.Pages
            .SelectMany(page => page.Blocks)
            .SelectMany(block => block.Paragraphs);
        foreach (var para in paragraphs)
        {
            var box = para.BoundingBox;
            Console.WriteLine($"Bounding box: {string.Join(" / ", box.Vertices.Select(v => $"({v.X}, {v.Y})"))}");
            var symbols = string.Join("", para.Words.SelectMany(w => w.Symbols).SelectMany(s => s.Text));
            Console.WriteLine($"Paragraph: {symbols}");
            Console.WriteLine();
        }
    }
}

That loses the spaces between the symbols, but shows that all the text is being detected - and the method call to perform the actual detection is very simple:

var client = ImageAnnotatorClient.Create();
var image = Image.FromUri("https://i.sstatic.net/H21rL.png");
var annotations = client.DetectDocumentText(image);

Most of the code above is handling the response.

like image 77
Jon Skeet Avatar answered Dec 11 '25 11:12

Jon Skeet


Found the problem, as per documentation when using Vision API we should use DOCUMENT_TEXT_DETECTION

+-------------------------+-----------------------------------------------------------------------------------------------------------------+
| TEXT_DETECTION          | Run OCR.                                                                                                        |
+-------------------------+-----------------------------------------------------------------------------------------------------------------+
| DOCUMENT_TEXT_DETECTION | Run dense text document OCR. Takes precedence when both DOCUMENT_TEXT_DETECTION and TEXT_DETECTION are present. |
+-------------------------+-----------------------------------------------------------------------------------------------------------------+

Therefore the code should look like this:

var response = vision.Images.Annotate(
            new BatchAnnotateImagesRequest()
            {
                Requests = new[]
                {
                    new AnnotateImageRequest()
                    {
                        Features = new[]
                        {
                            new Feature()
                            {
                                Type =
                                    "DOCUMENT_TEXT_DETECTION"
                            }
                        },
                        Image = image
                    }
                }
            }).Execute();
like image 25
Emil Borconi Avatar answered Dec 11 '25 12:12

Emil Borconi