Skip to content

Yes, Google Can Read Text in Images

We were surprised to find not one, but several online marketing people sharing stories on social media recently about “new developments in search engine optimization.” Those developments, while interesting, are neither new nor especially important for the legal marketing field.

Google Can “Read” Text in Images

The big SEO development is that Google has learned to “read” text that is embedded – or actually inside – an image.

As we discussed in our blog way back in 2016, back in the day, computers and search engines only “saw” images as a line of computer code, like this:

<img class=”aligncenter size-full wp-image-2430src=”” alt=”Google can read the text in this picture of a Manhattan streetwidth=”1280height=”853

Web browsers turn that text into an image that human beings can see, like this:

Google can read the text in this picture of a Manhattan street

Recently, though, search engine companies like Google have been using artificial intelligence (AI) to “learn” to recognize different shapes and objects in images. One of the first things that Google taught its AI machines to recognize was text – the written words that we know and understand and the letters that make up those words.

As a result, you may find the image above in a Google Images search for “one way sign,” or potentially even in a search for “no standing anytime sign,” even though neither of those phrases are in the title or the alt tag for the picture. It can do this because Google’s AI machines have learned those particular patterns are letters and, when put together in specific ways, they form words.

A Fascinating Development, But Far From New

The news that Google had come up with this ability was shared on social media channels recently. However, one online marketer noticed it as early as July, 2017, when he stumbled on the fact that some of his images were appearing in the results for Google Image searches that mentioned the text in those pictures, but not the file’s title or alt tag.

But Google has been developing the capability for probably a decade before then.

Recall that, when you search for things in a regular, non-image Google search, you occasionally discover PDF documents. Some of these PDFs are text-searchable (these let you highlight the words or lines of text, much like you can in a word processor – here’s an example). However, some PDF documents, especially older ones but primarily scanned documents, like this one, are not text-searchable. Instead, they’re technically images.

Back on October 30, 2008, Google announced that it had developed an Optical Character Recognition (OCR) technology that could turn a scanned PDF image into words. Those words could then be indexed by the search engine, letting them turn up in Google searches.

Search engines “reading” text in images? Yep, it all started with images in PDF format.

But it almost certainly did not start in 2008. That was just when Google announced its development.

Remember Google’s ambitious project to scan 25 million books and post them online in Google Books? It was launched in 2002, and was almost certainly connected to this effort to teach AI to recognize text in images.

Why Is This a Red Herring for Legal Blogging?

The lede for this article said that these developments were also not that important for legal blogging and online legal content marketing.

The reason these developments are fairly trivial is actually very simple: Potential clients rarely look for lawyers in Google Image searches. Instead, they do a traditional search.

Developments in image searches tend to be red herrings for the legal marketing field. Image searches are just not how law firm websites get their traffic. At best, it’s negligible.

That’s not to say that images are irrelevant for online legal marketing: Legal blogs especially benefit from imagery. Studies routinely show that sharing a post on social media gets twice the traffic if it has an image, which can help sites that are struggling to overcome the self-feeding cycle of high rankings and good traffic.