python2.7 - python 中文寫入文件后亂碼
問(wèn)題描述
一個(gè)很簡(jiǎn)單的小爬蟲程序
for i in L:content = urllib2.urlopen(’http://X.X.X.X/cgi-bin/GetDomainOwnerInfo?domain=%s’ %i)html = content.read()with open(’domain_test.xml’,’a’) as f: f.write(html) print html
print 的結(jié)果是中文:
<domaininfo strDomain='XXX.com.' strOwner='XXX' strDepartment='云平臺(tái)部' strBusiness='[互聯(lián)網(wǎng)業(yè)務(wù)系統(tǒng) - XXX' strUser='XXX;'>
但直接打開xml文本的時(shí)候卻是亂碼:
<domaininfo strDomain='XXX.com.' strOwner='XXX' strDepartment='?o‘?13??°é?¨' strBusiness='[?o’è?”??‘???????3???? - ?????‰?–1?o”?”¨]' StrUser='XXX;'>
Windows 7 操作系統(tǒng),python 2.7
請(qǐng)問(wèn)一下各位,這個(gè)問(wèn)題如何解決?
問(wèn)題解答
回答1:你需要知道 content 的編碼方式,并考慮是否要轉(zhuǎn)換
你需要用 utf-8 打開文件,然后寫入
codecs.open(filename, mode[, encoding[, errors[, buffering]]])
Open an encoded file using the given mode and return a wrapped versionproviding transparent encoding/decoding. The default file mode is ’r’meaning to open the file in read mode.
Note The wrapped version will only accept the object format defined bythe codecs, i.e. Unicode objects for most built-in codecs. Output isalso codec-dependent and will usually be Unicode as well. Note Filesare always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using8-bit values. This means that no automatic conversion of ’n’ is doneon reading and writing. encoding specifies the encoding which is to beused for the file.errors may be given to define the error handling. It defaults to’strict’ which causes a ValueError to be raised in case an encodingerror occurs.buffering has the same meaning as for the built-in open() function. Itdefaults to line buffered.
import codecsf = codecs.open('domain_test.xml', 'w', 'utf-8')回答2:
試試在文件開頭加上 # -*- coding: utf-8 -*-
回答3:在文件開頭加上 #coding:utf-8
相關(guān)文章:
1. docker api 開發(fā)的端口怎么獲取?2. nignx - docker內(nèi)nginx 80端口被占用3. java - 為什么在C#中可以直接使用1,100這些整型?4. javascript - uc、qq、搜狗 以上三個(gè)手機(jī)瀏覽器判斷移動(dòng)端的js代碼不執(zhí)行5. javascript - react native在run-android時(shí)出現(xiàn)這個(gè)錯(cuò)誤該怎么解決?大神賜教6. node.js - websocket后端用什么做比較好?7. javascript - 在移動(dòng)設(shè)備上快速滾動(dòng)屏幕然后點(diǎn)擊使?jié)L動(dòng)停止,如何盡量避免在點(diǎn)擊時(shí)誤觸<a>標(biāo)簽,跳轉(zhuǎn)到其他頁(yè)?8. java類加載機(jī)制-類定義中new如何理解9. python - pandas html格式的excel文件10. javascript - 關(guān)于這組數(shù)據(jù)如何實(shí)現(xiàn) 按字母列表分類展示 不改動(dòng)數(shù)據(jù)結(jié)構(gòu)
