文章詳情頁

python re.match()用法相關示例

瀏覽：2日期：2022-06-29 08:25:32

學習python爬蟲時遇到了一個問題，書上有示例如下：

import reline=’Cats are smarter than dogs’matchObj=re.match(r’(.*)are(.*?).*’,line)if matchObj: print(’matchObj.group():’,matchObj.group()) print(’matchObj.group(1):’, matchObj.group(1)) print(’matchObj.group(2):’, matchObj.group(2))else: print(’No match!n’)

書上的期望輸出是：

matchObj.group(): Cats are smarter than dogsmatchObj.group(1): Cats matchObj.group(2):smarter

但是我在電腦上跑了一遍得到的輸出卻是：

matchObj.group(): Cats are smarter than dogsmatchObj.group(1): Cats matchObj.group(2):

于是開始想辦法徹底搞清楚這個差別的原因所在。

首先要讀懂這幾行代碼，而這一行代碼的關鍵在于這一句：

matchObj=re.match(r’(.*)are(.*?).*’,line)

匹配的正則表達式是

(.*)are(.*?).*前面的r表示的是匹配的字符不進行轉義，而要匹配的字符串是line，也就是Cats are smarter than dogs后面使用group（num），個人理解是，按照正則表達式中的括號數可以捕獲得到對應數量的捕獲組，而調用group（num）就可以得到對應捕獲組的內容，其中group（0）表示的是匹配的整個表達式的字符串，在本例中就是‘Cats are smarter than dogs’。參照網上可以搜到的符號的作用：.匹配除換行符以外的任意字符*重復之前的字符零次或更多次？重復之前的字符零次或一次那么第一個括號的內容，應當就是匹配要匹配的字符串中are之前的所有字符（除換行符），而第二個括號的內容應當是匹配are之后的內容，但具體想指代什么卻顯得有些不明確。不明確的點就在于*和？這兩個符號的連用，根據優先級這兩個符號是同一優先級的，那么應當按照順序生效，那么如此翻譯的話，這一語句匹配的就是長度為0到無限大的任意字符串，為了探清此時程序判斷的具體內容，我們給匹配字符串末尾的.*也加上括號以提取其內容，而后在輸出部分加上對應語句：

import reline=’Cats are smarter than dogs’matchObj=re.match(r’(.*)are(.*?)(.*)’,line)if matchObj: print('matchObj.group():',matchObj.group()) print('matchObj.group(1):', matchObj.group(1)) print('matchObj.group(2):', matchObj.group(2)) print('matchObj.group(3):', matchObj.group(3))else: print(’No match!n’)

得到的結果是：

matchObj.group(): Cats are smarter than dogsmatchObj.group(1): Cats matchObj.group(2): matchObj.group(3): smarter than dogs

可見第二個括號里的內容被默認為空了，然后刪去那個？，可以看到結果變成：

matchObj.group(): Cats are smarter than dogsmatchObj.group(1): Cats matchObj.group(2): smarter than dogsmatchObj.group(3):

那么這是否就意味著？的默認值很可能是0次，那？這個符號到底有什么用呢

仔細想來這個說法并不是很嚴謹。嘗試使用單獨的.?組合可以看到這個組合可以用于提取

單個不知道是否存在的字符，而如下代碼

import reline=’Cats are smarter than dogs’matchObj=re.match(r’(.*) are(.*)?’,line)if matchObj: print('matchObj.group():',matchObj.group()) print('matchObj.group(1):', matchObj.group(1)) print('matchObj.group(2):', matchObj.group(2))

也能在組別2中正常提取到are之后的字符內容，但稍微改動一下將？放到第二個括號內，

就什么也提取不到，同時導致group(0)中匹配的字符到Cats are就截止了（也就是第二個括號匹配失敗）。

令人感到奇怪的是，如果將上面的代碼改成

import reline=’Cats are smarter than dogs’matchObj=re.match(r’(.*) are (.*)+’,line)if matchObj: print('matchObj.group():',matchObj.group()) print('matchObj.group(1):', matchObj.group(1)) print('matchObj.group(2):', matchObj.group(2))

也就是僅僅將？改為+，雖然能成功匹配整個line但group(2)中沒有內容，

如果把+放到第二個括號中就會產生報錯，匹配失敗。

那么是否可以認為.*?這三個符號連用只是一個不規范的操作，但由于？的特殊性所以沒有報錯反而匹配成功了呢？

具體的可能要研究代碼本身的機理了，暫且擱置。還有一個問題就是如何達到樣例本身想要的，用第二個括號提取單個單詞的目的。

如果單單考慮這個例子的話，把原本第二個括號中的?換成r就可以了，也就是如下代碼：

import reline=’Cats are smarter than dogs’matchObj=re.match(r’(.*) are (.*r).*’,line)if matchObj: print('matchObj.group():',matchObj.group()) print('matchObj.group(1):', matchObj.group(1)) print('matchObj.group(2):', matchObj.group(2)) #print('matchObj.group(3):', matchObj.group(3))else: print(’No match!n’)

為了泛用性嘗試了一下把r改成‘ ’但是得到的結果是‘smarter than ’。于是嘗試把.換成表示任意字母的

[a-zA-Z]，成功提取出了單個smarter，代碼如下：

import reline=’Cats are smarter than dogs’matchObj=re.match(r’(.*) are ([a-zA-Z]* ).*’,line)if matchObj: print('matchObj.group():',matchObj.group()) print('matchObj.group(1):', matchObj.group(1)) print('matchObj.group(2):', matchObj.group(2)) #print('matchObj.group(3):', matchObj.group(3))else: print(’No match!n’)

到此這篇關于python re.match（）用法相關示例的文章就介紹到這了,更多相關python re.match（）內容請搜索好吧啦網以前的文章或繼續瀏覽下面的相關文章希望大家以后多多支持好吧啦網！

Python 編程

上一條：用Python實現定時備份Mongodb數據并上傳到FTP服務器下一條：selenium+python實現基本自動化測試的示例代碼

相關文章：

1. Python 通過正則表達式快速獲取電影的下載地址2. node.js降低版本的方式詳解(解決sass和node.js沖突問題)3. python 實現添加標簽&打標簽的操作4. 在Archlinux系統中安裝Scim-Python輸入法5. PHP程序員的技術成長規劃6. SpringMVC生成的驗證碼圖片不顯示問題及解決方法7. 詳解PHP實現HTTP服務器過程8. Ajax實現登錄案例9. HTML-Canvas的優越性能以及實際應用10. vue動態渲染svg、添加點擊事件的實現

排行榜

					
					Intellij Idea修改代碼方法參數自動提示快捷鍵的操作
實例學習PHP程序對用戶身份認證實現兩種方法
PHP程序員的技術成長規劃
vue動態渲染svg、添加點擊事件的實現
idea導入項目不顯示maven側邊欄的問題及解決方法
HTML-Canvas的優越性能以及實際應用
通過對服務器端特性的配置加強php的安全
node.js降低版本的方式詳解(解決sass和node.js沖突問題)
Intellij IDEA 閱讀源碼的 4 個絕技(必看)
Python 通過正則表達式快速獲取電影的下載地址
Ajax實現登錄案例