Languages - DevSource
DevSource: Microsoft Developer Resource DevSource Home Sponsored by Microsoft Home Add Ons Architecture Languages Techniques Using VS Forums
Home arrow Languages arrow Using The Office 2007 OCR Component in C#
Using The Office 2007 OCR Component in C#
By Rick Leinecker

Rate This Article: Add This Article To:

Using The Office 2007 OCR Component in C#
( Page 1 of 5 )

If you have office 2007 installed on Vista, Optical Character Recognition (OCR) is now easier than you think.

I recently went to a training seminar in order to learn how to use Microsoft Office 2007. It's a great package, but what really impressed my was its capability to turn images into text with almost no effort. This technique is known as Optical Character Recognition (OCR), and it has been under development for decades. Finally, it is accurate and easy to use.

I've never included OCR capabilities in software until now. The main reason was because of its inaccuracy. I tried several packages in the early 90s. After carefully converting scanned pages into text and then correcting the errors, I found that it was faster to just type the content than to convert and correct it.

ADVERTISEMENT

Another small detail in the early days of OCR was the cost of third party libraries. Most of them came at a high price, usually more than $500. For me, knowing that it was fairly inaccurate anyway, that price was too high. If it had been inexpensive, I might have been willing to add OCR as a feature to some projects just so I'd have another marketing bullet point.

If you have Office 2007 installed, the OCR component is available for you to use. The only dependency that's added to your software is Office 2007. Requiring Office 2007 to be installed in order for your software to work may or may not fit a situation. But if your client can guarantee that machines that your software will run on have Office 2007 installed, you're gold. I've encountered many situations where this is the case. I've even encountered a few situations where clients were willing to install Office 2007 in order to use my applications.

One important point, though, is that Office 2003 has its own OCR component, too. It's not as accurate as the Office 2007 version, but works. The issue for a developer is that they are different COM objects with different GUIDs. If you write software for one, then the other won't work. I'm only supporting Office 2007 in this article, but you could easily support Office 2003 by adding a reference to that COM component.

This article also is intended for the Vista operating system, but works on Windows XP systems, too.

Adding a Reference to the Office 2007 Component

The name of the COM object that you need to add as a reference is Microsoft Office Document Imaging 12.0 Type Library. By default, Office 2007 doesn't install it. You'll need to make sure that it's added by using the Office 2007 installation program. Just run the installer, click on the Continue button with the "Add or Remove Features" selection made, and insure that the imaging component is installed as shown in the figure to the right.

The next step is to create a Windows C# application project from Visual Studio. Once the project has been created, you'll need to add a reference to the Microsoft Office Document Imaging (MODI) component. From the Visual Studio Solution Explorer window, right-click on the References folder. When the dialog box appears, select the COM tab. Finally, select the object named Microsoft Office Document Imaging 12.0 Type Library as shown in the following figure.

Now we can talk about code. For this section, I'm going to assume that you have an image with some text named SampleForOCR.tiff in the c:\SampleImages directory. If you don't, you can download one here (the file is SampleForOCR.tiff, compressed into a zip file).



 
 
>>> More Languages Articles          >>> More By Rick Leinecker
 



Microsoft's Future: A Chat With Their CTO, Barry Briggs

Play Video >

All Videos >

Julia explores the Robotics Studio!

Read now >

Messages to Bill Gates!

Read now >

View Now
DevSource RSS FEEDS
XML Want an easy way to keep up with breaking tech news? And the Get DevSource headlines delivered to your desktop with RSS.