Detecting Text in Images with Google Cloud Vision API and Python in Colab

The Modern Developer Academy - by Alex Madrazo

4 min readFeb 26, 2024

Photo by Possessed Photography on Unsplash

In this tutorial, we'll explore how to leverage the powerful Google Cloud Vision API to detect text within images using Python in a Google Colab notebook. This capability can be incredibly useful for a variety of applications, such as automating data entry, enhancing accessibility features, or developing content analysis tools.

Prerequisites

A Google Cloud account.
The Google Cloud Vision API enabled for your project.
A project created in the Google Cloud Console.

Step 1: Enable the Vision API and Create Credentials

Navigate to the Google Cloud Console.
Select or create a new project.
Go to APIs & Services > Dashboard and click on + ENABLE APIS AND SERVICES to enable the Vision API for your project.
In the Credentials section, create new credentials for your project. Opt for a service account key for enhanced security and download the JSON file containing your key.

Step 2: Set Up Your Colab Environment

First, we need to install the Google Cloud Vision client library in our Colab notebook:

!pip install --upgrade google-cloud-vision

Step 3: Authenticate Your Session

Upload your service account key JSON file to Colab. Then, authenticate your session with the following code:

import os
# Replace 'YOUR_SERVICE_ACCOUNT_KEY.json' with the actual path to your service account key.
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/YOUR_SERVICE_ACCOUNT_KEY.json"

Step 4: Detecting Text in an Image

Now that we’re set up, let’s write a function to detect text in an image:

from google.cloud import vision

def detect_text(path):
    """Detects text in the file."""
    client = vision.ImageAnnotatorClient()
    with open(path, 'rb') as image_file:
        content = image_file.read()
    image = vision.Image(content=content)
    response = client.text_detection(image=image)
    texts = response.text_annotations
    print('Texts:')
    for text in texts:
        print('\n"{}"'.format(text.description))
        vertices = [f'({vertex.x},{vertex.y})' for vertex in text.bounding_poly.vertices]
        print('bounds: {}'.format(','.join(vertices)))
    if response.error.message:
        raise Exception(f'{response.error.message}\nFor more info on error messages, check: https://cloud.google.com/apis/design/errors')

# Replace 'PATH_TO_YOUR_IMAGE' with the path to the image file you want to analyze.
detect_text('PATH_TO_YOUR_IMAGE')

Replace 'PATH_TO_YOUR_IMAGE' with the path to your image file. This function will print the detected texts and their bounding polygons.

Example:

Using this image,

Output:

Texts:
"eis
GLA 250
QALBETION
O
SN66 XMZ
4MATIC"
bounds: (209,67),(766,67),(766,631),(209,631)
"eis"
bounds: (216,67),(263,78),(255,109),(209,97)
"GLA"
bounds: (391,279),(434,278),(434,290),(391,291)
"250"
bounds: (443,278),(487,277),(487,289),(443,290)
"QALBETION"
bounds: (345,620),(408,619),(408,630),(345,631)
"O"
bounds: (453,468),(462,468),(462,478),(453,478)
"SN66"
bounds: (476,341),(587,342),(587,380),(476,379)
"XMZ"
bounds: (600,342),(685,343),(685,381),(600,380)
"4MATIC"
bounds: (699,280),(766,281),(766,292),(699,291)

Extra: Detecting Handwriting

In the Google Cloud Vision API, detect_text and document_text_detection serve distinct purposes for text recognition within images.

detect_text: This method is optimized for detecting and extracting printed text across a wide array of items, such as street signs, product labels, and informational panels.
document_text_detection: Tailored for more complex text extraction tasks, especially useful for documents or images dense with text. This method shines when dealing with structured documents or handwriting, as it not only detects text but also understands the layout and organization of the text, such as paragraphs and lines.

In essence, use detect_text for straightforward, scene-based text recognition, and opt for document_text_detection when dealing with documents, structured text, or handwriting.

def detect_handwriting(path):
    """Detects handwriting in the file."""
    client = vision.ImageAnnotatorClient()

    with open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)
    response = client.document_text_detection(image=image)  # Use document_text_detection for handwriting
    texts = response.text_annotations

    print('Detected Text:')

    for text in texts:
        print('\n"{}"'.format(text.description))
        
        # Optional: print the bounds of the detected text
        vertices = ['({},{})'.format(vertex.x, vertex.y) for vertex in text.bounding_poly.vertices]
        print('Bounds: {}'.format(','.join(vertices)))

    if response.error.message:
        raise Exception('{}\nFor more info on error messages, check: https://cloud.google.com/apis/design/errors'.format(response.error.message))

# Replace 'PATH_TO_YOUR_IMAGE' with the path to your image file.
detect_handwriting('PATH_TO_YOUR_IMAGE')

Example for handwriting:

Input:

Output:

Detected Text:

"My mother, Jancy Adkulty, always
told her daughters, "Done marry
him until you see how he treats
we
we're allowed to mistreat is the
measure of who we all.
ме
Caming somellyy
#handwritingsday #moleskine"

Conclusion

You’ve just learned how to use the Google Cloud Vision API to detect text in images using Python in a Colab notebook. This process can be incredibly useful for a wide range of applications, from data processing to content analysis.

Remember to review the pricing details of the Vision API, as charges may apply depending on your usage.

Happy coding, and I look forward to seeing what you build with this powerful tool!