Using Windows 10 OCR within a BCX application?

Started by Quin, July 09, 2024, 03:12:09 PM

Previous topic - Next topic

Quin

Hi,
I want to make an application that uses the Windows 10 OCR technology to OCR pDF documents and computer screens completely agnostic of any screen reader. I found some PureBasic code to do this, but I really would like to use BCX for this. I'm just nowhere near that good with COM in anything. Can someone point me in the right direction to get started at least? I've used the COM and UNCOM functions before for talking to SAPI and a particular screen reader, but honestly when I've had to go any further than that (even getting into those ugly CLSIDs or whatever), I've ran away screaming and covering my eyes. Me and COM do not get along.
Any help would be much appreciated!
-Quin.
GitHub

MrBcx

#1
Hi Quinn,

I think you have a tough road ahead of you.  I spent a bit of time looking at OCR options
and concluded that using C# to create a command line OCR tool that targets the Windows 10
OCR technology that could then be shelled to by whatever interface in whatever programming
language would be the quickest road to success.  Don't get your hopes up.  I'm not a C#
programmer and I'm not volunteering for anything. 

Another area of research that might help is an OCR discussion over at the PowerBasic
forum.  You might need to sign up which could be next to impossible as PowerBasic seems
to be in its death throes. 

But if you can reach it, Gary Beene has written a lot of code for the blind, and continues
to do so.  Notably:

https://forum.powerbasic.com/forum/user-to-user-discussions/source-code/820142-gbcapture

is his take on OCR'ing pdf pages using the Tesseract OCR engine. Gary is a very competent
PB coder and is active on the PB forum.  I've quickly examine the source of GBCAPTURE.
95% of it is user interface, the rest is shelling to Tesseract and pre-processing and post-processing.

Hope that helps.