文章詳情頁

網(wǎng)頁爬蟲 - Python3.6 下的爬蟲總是重復(fù)爬第一頁的內(nèi)容

瀏覽：171日期：2022-06-30 17:08:03

問題描述

問題如題：改成while，試了很多，然沒有效果，請(qǐng)教大家

# coding:utf-8# from lxml import etreeimport requests,lxml.html,osclass MyError(Exception): def __init__(self, value):self.value = value def __str__(self):return repr(self.value) def get_lawyers_info(url): r = requests.get(url) html = lxml.html.fromstring(r.content) # phones = html.xpath(’//span[@class='law-tel']’) phones = html.xpath(’//span[@class='phone pull-right']’) # names = html.xpath(’//p[@class='fl']/p/a’) names = html.xpath(’//h4[@class='text-center']’) if(len(phones) == len(names)):list(zip(names,phones))phone_infos = [(names[i].text, phones[i].text_content()) for i in range(len(names))] else:error = 'Lawyers amount are not equal to the amount of phone_nums: '+urlraise MyError(error) phone_infos_list = [] for phone_info in phone_infos:if(phone_info[0] == ''): info = '沒留姓名'+': '+phone_info[1]+'rn'else: info = phone_info[0]+': '+phone_info[1]+'rn'print (info)phone_infos_list.append(info) return phone_infos_listdir_path = os.path.abspath(os.path.dirname(__file__))print (dir_path)file_path = os.path.join(dir_path,'lawyers_info.txt')print (file_path)if os.path.exists(file_path): os.remove(file_path)with open('lawyers_info.txt','ab') as file: for i in range(1000):url = 'http://www.xxxx.com/cooperative_merchants?searchText=&industry=100&provinceId=19&cityId=0&areaId=0&page='+str(i+1)# r = requests.get(url)# html = lxml.html.fromstring(r.content)# phones = html.xpath(’//span[@class='phone pull-right']’)# names = html.xpath(’//h4[@class='text-center']’) # if phones or names:info = get_lawyers_info(url)for each in info: file.write(each.encode('gbk'))

問題解答

回答1：

# coding: utf-8import requestsfrom pyquery import PyQuery as Qurl = ’http://www.51myd.com/cooperative_merchants?industry=100&provinceId=19&cityId=0&areaId=0&page=’with open(’lawyers_info.txt’, ’ab’) as f: for i in range(1, 5):r = requests.get(’{}{}’.format(url, i))usernames = Q(r.text).find(’.username’).text().split()phones = Q(r.text).find(’.phone’).text().split()print zip(usernames, phones)

Python 編程

上一條：python from fileutils import FileUtils文件操作下一條：網(wǎng)頁爬蟲 - python+smtp發(fā)送郵件附件問題

相關(guān)文章：

1. docker 下面創(chuàng)建的IMAGE 他們的 ID 一樣？這個(gè)是怎么回事？？？？2. 在應(yīng)用配置文件 app.php 中找不到’route_check_cache’配置項(xiàng)3. html按鍵開關(guān)如何提交我想需要的值到數(shù)據(jù)庫4. css - width設(shè)置為100%之后列表無法居中5. ios - vue－cli開發(fā)項(xiàng)目webstrom會(huì)在stylus樣式報(bào)錯(cuò)，飆紅，請(qǐng)大神幫忙6. css3 - 怎么感覺用 rem 開發(fā)的不多啊7. python - 在pyqt中做微信的機(jī)器人,要在表格中顯示微信好友的名字,卻顯示不出來,怎么解決?8. html5 - 用Egret寫的小游戲，怎么分享到微信呢？9. javascript - 一個(gè)頁面有四個(gè)圖片，翻頁的時(shí)候想固定住某個(gè)圖片然后翻頁，如何實(shí)現(xiàn)呢?10. objective-c - 自定義導(dǎo)航條為類似美團(tuán)的搜索欄樣式

排行榜

					
					docker  下面創(chuàng)建的IMAGE 他們的 ID 一樣？這個(gè)是怎么回事？？？？
在應(yīng)用配置文件 app.php 中找不到’route_check_cache’配置項(xiàng)
html按鍵開關(guān)如何提交我想需要的值到數(shù)據(jù)庫
html5 - 用Egret寫的小游戲，怎么分享到微信呢？
css - width設(shè)置為100%之后列表無法居中
ios - vue－cli開發(fā)項(xiàng)目webstrom會(huì)在stylus樣式報(bào)錯(cuò)，飆紅，請(qǐng)大神幫忙
css - BEM 中塊(Block)有木有什么標(biāo)準(zhǔn) 何時(shí)決定一個(gè)部分提取為塊而不是其父級(jí)的元素呢(Element)?~
python - 在pyqt中做微信的機(jī)器人,要在表格中顯示微信好友的名字,卻顯示不出來,怎么解決?
javascript - 一個(gè)頁面有四個(gè)圖片，翻頁的時(shí)候想固定住某個(gè)圖片然后翻頁，如何實(shí)現(xiàn)呢?
objective-c - 自定義導(dǎo)航條為類似美團(tuán)的搜索欄樣式
css3 - 怎么感覺用 rem 開發(fā)的不多啊
				

熱門標(biāo)簽

久久福利_99r_国产日韩在线视频_直接看av的网站_中文欧美日韩_久久一

網(wǎng)頁爬蟲 - Python3.6 下的爬蟲總是重復(fù)爬第一頁的內(nèi)容