python3.x - python 中的maketrans在utf-8文件中該怎么使用
問(wèn)題描述
我寫(xiě)了一個(gè)處理文本的文件就是把文本中所有的符號(hào)都替換掉,替換成空格。用的python中maketrans和translate。其中在使用對(duì)于ASCII編碼的文件時(shí)是正常的,但對(duì)于utf-8文件時(shí),就報(bào)錯(cuò),提示maketrans中的參數(shù)不等長(zhǎng),但是明明是一樣長(zhǎng)的啊:
File '/Users/lgq/Desktop/p3.py', line 10, in text_to_words
'abcdefghijklmnopqrstuvwxyz ')
ValueError: the first two maketrans arguments must have equal length
我查了一下說(shuō)是maketrans在utf-8下不能用,那我在utf-8下該怎么替換掉字符呢,求各位大神指點(diǎn)。
def text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdsdef get_words_in_book(filename): ''' Read a book from filename, and return a list of its words.''' f = open(filename, 'r', encoding = 'utf-8') content = f.read() f.close() wds = text_to_words(content) return wdsbook_words = get_words_in_book('alice.txt')print('There are {0} words in the book, the first 100 aren{1}'.format(len(book_words), book_words[:100]))
問(wèn)題解答
回答1:首先 這兩個(gè)字符串長(zhǎng)度不相等, ' 是一個(gè)字符, 也是一個(gè)字符你可以用 len() 查看。然后關(guān)于字符串什么的問(wèn)題,最好說(shuō)明 python 的版本
maketrans 參數(shù)長(zhǎng)度不相等
my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ')
測(cè)試代碼:
from string import translate, maketransdef text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdstext_to_words(’ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’測(cè)試’)
output
[’abcdefghijklmnopqrstuvwxyz’, ’xe6xb5x8bxe8xafx95’]
這是 python2 的運(yùn)行結(jié)果
相關(guān)文章:
1. 跨類(lèi)調(diào)用后,找不到方法2. 編輯成功不顯示彈窗3. sql語(yǔ)句 - 如何在mysql中批量添加用戶?4. 在應(yīng)用配置文件 app.php 中找不到’route_check_cache’配置項(xiàng)5. mysql - 表名稱(chēng)前綴到底有啥用?6. 為什么php修改數(shù)據(jù)無(wú)法同步到數(shù)據(jù)庫(kù),只是當(dāng)前頁(yè)面修改成功?7. 哭遼 求大佬解答 控制器的join方法怎么轉(zhuǎn)模型方法8. 怎么php怎么通過(guò)數(shù)組顯示sql查詢結(jié)果呢,查詢結(jié)果有多條,如圖。9. 在mybatis使用mysql的ON DUPLICATE KEY UPDATE語(yǔ)法實(shí)現(xiàn)存在即更新應(yīng)該使用哪個(gè)標(biāo)簽?10. wamp中的mySQL可以單獨(dú)使用嗎
