文章詳情頁(yè)

Python 實(shí)現(xiàn)任意區(qū)域文字識(shí)別(OCR)操作

瀏覽：51日期：2022-06-25 17:38:24

本文的OCR當(dāng)然不是自己從頭開發(fā)的，是基于百度智能云提供的API（我感覺是百度在中國(guó)的人工智能領(lǐng)域值得稱贊的一大貢獻(xiàn)），其提供的API完全可以滿足個(gè)人使用，相對(duì)來(lái)說(shuō)簡(jiǎn)潔準(zhǔn)確率高。

安裝OCR Python SDK

OCR Python SDK目錄結(jié)構(gòu)

├── README.md├── aip //SDK目錄│ ├── __init__.py //導(dǎo)出類│ ├── base.py //aip基類│ ├── http.py //http請(qǐng)求│ └── ocr.py //OCR└── setup.py //setuptools安裝

支持Python版本：2.7.+ ,3.+

安裝使用Python SDK有如下方式：

如果已安裝pip，執(zhí)行pip install baidu-aip即可。

如果已安裝setuptools，下載后執(zhí)行python setup.py install即可。

代碼實(shí)現(xiàn)

下面讓我們來(lái)看一下代碼實(shí)現(xiàn)。

主要使用的模塊有

import os # 操作系統(tǒng)相關(guān)import sys # 系統(tǒng)相關(guān)import time # 時(shí)間獲取import signal # 系統(tǒng)信號(hào)import winsound # 提示音from aip import AipOcr # 百度OCR APIfrom PIL import ImageGrab # 捕獲剪切板中的圖片import win32clipboard as wc # WINDOWS 剪切板操作import win32con # 這里用于獲取 WINDOWS 剪貼板數(shù)據(jù)的標(biāo)準(zhǔn)格式

第一步這里的APP_ID,API_KEY,SECRET_KEY是通過(guò)登陸百度智能云后自己在OCR板塊申請(qǐng)的, 實(shí)現(xiàn)基本的OCR程序，可以通過(guò)圖片獲取文字。

''' 你的 APPID AK SK '''APP_ID = ’xxx’API_KEY = ’xxx’SECRET_KEY = ’xxx’client = AipOcr(APP_ID, API_KEY, SECRET_KEY)''' 讀取圖片 '''def get_file_content(filePath): with open(filePath, ’rb’) as fp: return fp.read()''' 從API的返回字典中獲取文字 '''def getOcrText(txt_dict): txt = '' if type(txt_dict) == dict: for i in txt_dict[’words_result’]: txt = txt + i['words'] if len(i['words']) < 25: # 這里使用字符串長(zhǎng)度決定了文本是否換行，讀者可以根據(jù)自己的喜好控制回車符的輸出，實(shí)現(xiàn)可控的文本顯示形式 txt = txt + 'nn' return txt''' 調(diào)用通用/高精度文字識(shí)別, 圖片參數(shù)為本地圖片 '''def BaiduOcr(imageName,Accurate=True): image = get_file_content(imageName) if Accurate: return getOcrText(client.basicGeneral(image)) else: return getOcrText(client.basicAccurate(image)) ''' 帶參數(shù)調(diào)用通用文字識(shí)別, 圖片參數(shù)為遠(yuǎn)程url圖片 '''def BaiduOcrUrl(url): return getOcrText(client.basicGeneralUrl(url))

第二步，實(shí)現(xiàn)快捷鍵獲取文字，將識(shí)別文字放入剪切板中，提示音提醒以及快捷鍵退出程序

''' 剪切板操作函數(shù) '''def get_clipboard(): wc.OpenClipboard() txt = wc.GetClipboardData(win32con.CF_UNICODETEXT) wc.CloseClipboard() return txtdef empty_clipboard(): wc.OpenClipboard() wc.EmptyClipboard() wc.CloseClipboard()def set_clipboard(txt): wc.OpenClipboard() wc.EmptyClipboard() wc.SetClipboardData(win32con.CF_UNICODETEXT, txt) wc.CloseClipboard() ''' 截圖后,調(diào)用通用/高精度文字識(shí)別'''def BaiduOcrScreenshots(Accurate=True,path='./',ifauto=False): if not os.path.exists(path): os.makedirs(path) image = ImageGrab.grabclipboard() if image != None: print('rThe image has been obtained. Please wait a moment!',end=' ') filename = str(time.time_ns()) image.save(path+filename+'.png') if Accurate: txt = getOcrText(client.basicAccurate(get_file_content(path+filename+'.png'))) else: txt = getOcrText(client.basicGeneral(get_file_content(path+filename+'.png'))) os.remove(path+filename+'.png') # f = open(os.path.abspath(path)+''+filename+'.txt',’w’) # f.write(txt) set_clipboard(txt) winsound.PlaySound(’SystemAsterisk’,winsound.SND_ASYNC) # os.startfile(os.path.abspath(path)+''+filename+'.txt') # empty_clipboard() return txt else : if not ifauto: print('Please get the screenshots by Shift+Win+S! ',end='') return '' else: print('rPlease get the screenshots by Shift+Win+S ! ',end='')def sig_handler(signum, frame): sys.exit(0) def removeTempFile(file = ['.txt','.png'],path='./'): if not os.path.exists(path): os.makedirs(path) pathDir = os.listdir(path) for i in pathDir: for j in file: if j in i: os.remove(path+i)def AutoOcrFile(path='./',filetype=['.png','.jpg','.bmp']): if not os.path.exists(path): os.makedirs(path) pathDir = os.listdir(path) for i in pathDir: for j in filetype: if j in i: f = open(os.path.abspath(path)+''+str(time.time_ns())+'.txt',’w’) f.write(BaiduOcr(path+i)) breakdef AutoOcrScreenshots(): signal.signal(signal.SIGINT, sig_handler) signal.signal(signal.SIGTERM, sig_handler) print('Waiting For Ctrl+C to exit ater removing all picture files and txt files!') print('Please get the screenshots by Shift+Win+S !',end='') while(1): try: BaiduOcrScreenshots(ifauto=True) time.sleep(0.1) except SystemExit: removeTempFile() break else : pass finally: pass

最終運(yùn)行函數(shù) AutoOcrScreenshots 函數(shù)便可以實(shí)現(xiàn)了：

if __name__ == ’__main__’: AutoOcrScreenshots()使用方法

使用 Windows 10 系統(tǒng)時(shí)，將以上代碼放置在一個(gè) .py 文件下，然后運(yùn)行便可以使用Shift+Win+S快捷鍵實(shí)現(xiàn)任意區(qū)域截取，截取后圖片將暫時(shí)存放在剪切板中，程序自動(dòng)使用Windows API獲取圖片內(nèi)容，之后使用百度的OCR API獲取文字，并將文字放置在剪切版內(nèi)存中后發(fā)出提示音。

使用者則可以在開啟程序后，使用快捷鍵截圖后靜待提示音后使用Ctrl+V將文字內(nèi)容放置在自己所需的位置。

補(bǔ)充：Python 中文OCR

有個(gè)需求，需要從一張圖片中識(shí)別出中文，通過(guò)python來(lái)實(shí)現(xiàn)，這種這么高大上的黑科技我們普通人自然搞不了，去github找了一個(gè)似乎能滿足需求的開源庫(kù)-tesseract-ocr：

Tesseract的OCR引擎目前已作為開源項(xiàng)目發(fā)布在Google Project，其項(xiàng)目主頁(yè)在這里查看https://github.com/tesseract-ocr，

它支持中文OCR，并提供了一個(gè)命令行工具。python中對(duì)應(yīng)的包是pytesseract. 通過(guò)這個(gè)工具我們可以識(shí)別圖片上的文字。

筆者的開發(fā)環(huán)境如下：

macosx

python 3.6

brew

安裝tesseract

brew install tesseract

安裝python對(duì)應(yīng)的包：pytesseract

pip install pytesseract

Python 實(shí)現(xiàn)任意區(qū)域文字識(shí)別(OCR)操作

怎么用？

如果要識(shí)別中文需要下載對(duì)應(yīng)的訓(xùn)練集：https://github.com/tesseract-ocr/tessdata，下載”chi_sim.traineddata”，然后copy到訓(xùn)練數(shù)據(jù)集的存放路徑，如：

Python 實(shí)現(xiàn)任意區(qū)域文字識(shí)別(OCR)操作

具體代碼就幾行:

#!/usr/bin/env python3# -*- coding: utf-8 -*-import pytesseractfrom PIL import Image# open imageimage = Image.open(’test.png’)code = pytesseract.image_to_string(image, lang=’chi_sim’)print(code)

OCR速度比較慢，大家可以拿一張包含中文的圖片試驗(yàn)一下。

以上為個(gè)人經(jīng)驗(yàn)，希望能給大家一個(gè)參考，也希望大家多多支持好吧啦網(wǎng)。如有錯(cuò)誤或未考慮完全的地方，望不吝賜教。

Python 編程

上一條：Python locust工具使用詳解下一條：如何用Python中Tushare包輕松完成股票篩選(詳細(xì)流程操作)

相關(guān)文章：

1. 詳解CSS開發(fā)過(guò)程中的20個(gè)快速提升技巧2. 一篇文章帶你了解JavaScript-語(yǔ)句3. .Net Core使用Coravel實(shí)現(xiàn)任務(wù)調(diào)度的完整步驟4. 常見 PHP ORM 框架與簡(jiǎn)單代碼實(shí)現(xiàn)5. asp讀取xml文件和記數(shù)6. XML 取得元素的字符數(shù)據(jù)7. Ajax實(shí)現(xiàn)動(dòng)態(tài)顯示并操作表信息的方法8. 如何使用ASP.NET Core 配置文件9. .Net core 的熱插拔機(jī)制的深入探索及卸載問(wèn)題求救指南10. ASP+ajax實(shí)現(xiàn)頂一下、踩一下同支持與反對(duì)的實(shí)現(xiàn)代碼

排行榜

					
					Ajax實(shí)現(xiàn)動(dòng)態(tài)顯示并操作表信息的方法
asp讀取xml文件和記數(shù)
常見 PHP ORM 框架與簡(jiǎn)單代碼實(shí)現(xiàn)
詳解CSS開發(fā)過(guò)程中的20個(gè)快速提升技巧
一篇文章帶你了解JavaScript-語(yǔ)句
XML 取得元素的字符數(shù)據(jù)
.Net core 的熱插拔機(jī)制的深入探索及卸載問(wèn)題求救指南
.Net Core使用Coravel實(shí)現(xiàn)任務(wù)調(diào)度的完整步驟
PHP下載CSS文件中的圖片
ASP+ajax實(shí)現(xiàn)頂一下、踩一下同支持與反對(duì)的實(shí)現(xiàn)代碼
JSwiff 0.9 - open source Flash framework