Scanned Pdfs will be Visible in Google Now

Glenn Paul

Pin It DMCA.com - 5081 Views

Optical Character Recognition (OCR) in Google

Google in its approach to make each and every bit of information in the world searchable has turned on the OCR for crawling scanned pdf documents. Google has been blobbing the OCR technologies over the last couple of years. Today, the search engine giant has officially launched the OCR technology to handle the scanned pdf documents and images.

As said in Google’s official blog. “In the past, scanned documents were rarely included in search results as we couldn't be sure of their content. We had occasional clues from references to the document-- so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe's PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words -- words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world's information accessible and useful.

However, regarding the perfection of the crawling, Google said “To people reading these pdf documents, the distinction between words and pictures makes little difference, but for a computer the picture is almost unintelligible. Consider a circle. Should it be read it as a zero, the letter 'O', just a circle? People learn to answer this kind of question very quickly, but for the computer it is a painstaking and error-prone process.

The new system has been put to work and can been seen in the following search results:

http://www.google.com/search?q=repairing+aluminum+wiring
http://www.google.com/search?q=spin+lock+performance
http://www.google.com/search?q=Mumps+and+Severe+Neutropenia
http://www.google.com/search?q=Steady+success+in+a+volatile+world

This OCR addition to Google will certainly make more content searchable and accessible.

Category :

Google

Tags :

Google, technology

About Glenn Paul

Glenn Paul The Author is a Search Engine Optimization researcher with more than 5 years of expertise in the .... more info


Comment Using FB

News Comments

comments powered by Disqus

Quick Contact

Free Quote

Subscribe to SEO News

Google +
Facebook
Rss Feed
Subscribe

IN Your Mailbox

  • Algorithm Updates
  • Latest Trends
  • Case Studies
  • White Paper