Tesseract OCR 入門使用

以下只針對widows平台，linux下沒有測試

tesserocr與pytesseract是python的乙個ocr識別庫，但其實是對tesseract做的一層python api封裝，pytesseract是google的tesseract-ocr引擎包裝器；所以它們的核心是tesseract,因此在安裝tesserocr之前，我們需要先安裝tesseract

windows下安裝一路next

新增環境變數：將安裝目錄c:\program files (x86)\tesseract-ocr新增到環境變數中。

這一步，我們需要選擇新增語言 chinese ******

在進入安裝目錄，執行.\tesseract

pytesseract模組進行安裝請使用whl檔案安裝或者使用conda安裝。

執行pip install pytesseract

如果在pytesseract執行是找不到tesseract直譯器，這種情況一般是在虛擬環境下會發生，我們需要將tesseract-ocr的執行檔案tesseract.ext配置到windows系統中的path環境中，或者修改pytesseract.py檔案，將其中的「tesseract_cmd」字段指定為tesseract.exe的完整路徑即可。

import pytesseract
from pil import image
defmain()
: image = image.
open
("test01.png"
) text = pytesseract.image_to_string(image, lang=
'chi_sim'
)# 使用簡體中文解析
print
(text)
with
open
("output.txt"
,"w"
)as f:
# 將識別出來的文字存到本地
print
(text)
f.write(
str(text)
)if __name__ ==
'__main__'
: main(
)

報錯：pytesseract.pytesseract.tesseractnotfounderror: tesseract is not installed or it's not in your path

# change this if tesseract is not in your path, or is named differently tesseract_cmd = 'tesseract' 修改為： tesseract_cmd = r'c:\program files (x86)\tesseract-ocr\tesseract.exe'

報錯解決方法：

或者直接如下使用方法：

)鳴謝

Tesseract OCR引擎入門

ocr optical character recognition 光學字元識別,是指對檔案中的文字進行分析識別，獲取的過程。tesseract 開源的ocr識別引擎，初期tesseract引擎由hp實驗室研發，後來貢獻給了開源軟體業，後經由google進行改進，消除bug，優化，重新發布。當前版本...

Tesseract OCR 入門使用

Tesseract OCR引擎 入門

Tesseract OCR引擎 入門

Tesseract OCR引擎 入門

相關推薦

Tesseract OCR引擎入門

Tesseract OCR引擎入門

Tesseract OCR引擎入門