
The Tesseract engine can be also used to extract text from multilingual documents. You can find the language data files for all the available languages in this GitHub repository. The Language data path field contains the language data files (.traineddata) used to train the OCR engine. The Language abbreviation field indicates to the engine which language to look for during OCR. When this option is enabled, the action displays two more settings: the Language abbreviation and Language data path fields. To extract text in a language outside the mentioned list, enable the Use other languages option in the OCR engine settings of the OCR action. This engine can extract text in five languages without further configuration: English, German, Spanish, French, and Italian. Using the Tesseract OCR engineĪpart from the Windows OCR engine, Power Automate supports the Tesseract engine. The Windows OCR engine supports 25 languages, including Chinese (Simplified and Traditional), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian (Cyrillic and Latin), Slovak, Spanish, Swedish, and Turkish.
#Users guide for pdf ocr x install#
You can find more information regarding downloading and installing language packs in Language packs for Windows.Īfter you install the appropriate language pack, extend the OCR engine settings of the OCR action and then select the language you want. If the appropriate language pack isn't installed, Power Automate throws an error, prompting you to install it. To extract texts using the Windows OCR engine, you need to install the appropriate languages pack for the language you want to extract.

The default OCR engine in Power Automate is the Windows OCR engine. However, you may need to download language packs or data files to extract texts in specific languages. All the available OCR engines are pre-installed in Power Automate and work locally without connecting to the cloud.
