文章詳情頁

python3.x - python 中的maketrans在utf-8文件中該怎么使用

瀏覽：189日期：2022-07-05 10:59:36

問題描述

我寫了一個處理文本的文件就是把文本中所有的符號都替換掉，替換成空格。用的python中maketrans和translate。其中在使用對于ASCII編碼的文件時是正常的，但對于utf-8文件時，就報錯，提示maketrans中的參數不等長，但是明明是一樣長的啊：

File '/Users/lgq/Desktop/p3.py', line 10, in text_to_words

'abcdefghijklmnopqrstuvwxyz ')

ValueError: the first two maketrans arguments must have equal length

我查了一下說是maketrans在utf-8下不能用，那我在utf-8下該怎么替換掉字符呢，求各位大神指點。

def text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdsdef get_words_in_book(filename): ''' Read a book from filename, and return a list of its words.''' f = open(filename, 'r', encoding = 'utf-8') content = f.read() f.close() wds = text_to_words(content) return wdsbook_words = get_words_in_book('alice.txt')print('There are {0} words in the book, the first 100 aren{1}'.format(len(book_words), book_words[:100]))

問題解答

回答1：

首先這兩個字符串長度不相等， ' 是一個字符，也是一個字符你可以用 len() 查看。然后關于字符串什么的問題，最好說明 python 的版本

maketrans 參數長度不相等

my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ')

測試代碼：

from string import translate, maketransdef text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdstext_to_words(’ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’測試’)

output

[’abcdefghijklmnopqrstuvwxyz’, ’xe6xb5x8bxe8xafx95’]

這是 python2 的運行結果

Python 編程

上一條：python - 含中文JSON未能按期待進行dumps，(\xxx\xxx\xxx)?下一條：python - subprocess模塊怎樣返回執行文件內容？

相關文章：

1. 如何解決docker宿主機無法訪問容器中的服務？2. 前端 - CSS3 box-shadow如何設置，或者用什么方法可以產生圖中這樣陰影的效果。3. html - css 如何讓文字標題顯示在邊框上？4. docker 下面創建的IMAGE 他們的 ID 一樣？這個是怎么回事？？？？5. 在應用配置文件 app.php 中找不到’route_check_cache’配置項6. html按鍵開關如何提交我想需要的值到數據庫7. html - 微信端video標簽播放mp4視頻，安卓端提示視頻解析錯誤8. HTML5中怎么判斷用戶是否正在瀏覽頁面？9. html5 - 微信開發的時候老是報這樣的錯誤errmsg config invalid signature10. vim中編輯HTML文件時換行不能縮進

排行榜

					
					如何解決docker宿主機無法訪問容器中的服務？
前端 - CSS3 box-shadow如何設置，或者用什么方法可以產生圖中這樣陰影的效果。
html - css 如何讓文字標題顯示在邊框上？
docker  下面創建的IMAGE 他們的 ID 一樣？這個是怎么回事？？？？
在應用配置文件 app.php 中找不到’route_check_cache’配置項
html按鍵開關如何提交我想需要的值到數據庫
HTML5中怎么判斷用戶是否正在瀏覽頁面？
css3 - 自己寫的CSS與使用的框架沖突如何解決呢？
vim中編輯HTML文件時換行不能縮進
html - 微信端video標簽播放mp4視頻，安卓端提示視頻解析錯誤
node.js - 在阿里云搭建vue環境后npm run dev 沒有看到vue歡迎頁面而是 UnhandledPromiseRejection
				

熱門標簽

久久福利_99r_国产日韩在线视频_直接看av的网站_中文欧美日韩_久久一

python3.x - python 中的maketrans在utf-8文件中該怎么使用