python爬取”頂點(diǎn)小說網(wǎng)“《純陽劍尊》的示例代碼
爬取”頂點(diǎn)小說網(wǎng)“《純陽劍尊》
代碼
import requestsfrom bs4 import BeautifulSoup# 反爬headers = { ’User-Agent’: ’Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36’}# 獲得請(qǐng)求def open_url(url): response = requests.get(url, headers=headers) response.encoding = response.apparent_encoding html = response.text return html# 提取標(biāo)題def get_title(url): soup = BeautifulSoup(url, ’lxml’) title_tag = soup.find(’dd’) title = ’n’ + title_tag.h1.get_text() + ’n’ return title# 提取文本def get_texts(url): soup2 = BeautifulSoup(url, ’lxml’) text_tags = soup2.find_all(’dd’, id='contents') return text_tags# 保存標(biāo)題def save_title(filename, title): with open(filename, ’a+’, encoding=’utf-8’) as file: file.write(title)# 保存文本def save_text(filename, text): with open(filename, ’a+’, encoding=’utf-8’) as file: file.write(text)# 主程序函數(shù)def main(): num = input(’《純陽劍尊》你想要下載第幾章?(1-802)’) num = int(num) number = 8184027 + num url = ’https://www.23us.so/files/article/html/15/15905/’ + str(number) + ’.html’ filename = ’純陽劍尊.txt’ r = open_url(url) title = get_title(r) tags = get_texts(r) save_title(filename, title) for text_tag in tags: text = text_tag.get_text() + ’n’ save_text(filename, text) print(’第{}章已經(jīng)下載完成!’.format(num))if __name__ == ’__main__’: main()
爬取結(jié)果:
以上就是python爬取”頂點(diǎn)小說網(wǎng)“《純陽劍尊》的示例代碼的詳細(xì)內(nèi)容,更多關(guān)于python 爬取頂點(diǎn)小說網(wǎng)的資料請(qǐng)關(guān)注好吧啦網(wǎng)其它相關(guān)文章!
相關(guān)文章:
1. JSP 中response.setContentType()的作用及參數(shù)2. idea開啟代碼提示功能的方法步驟3. ASP.NET MVC使用jQuery的Load方法加載靜態(tài)頁面及注意事項(xiàng)4. Java 如何使用正則表達(dá)式去除前導(dǎo)05. python for循環(huán)內(nèi)輸出和外輸出方式6. vue項(xiàng)目中使用vue-layer彈框插件的方法7. 對(duì)String的深刻理解8. jsp filter 過濾器功能與簡(jiǎn)單用法示例9. 用javascript實(shí)現(xiàn)倒計(jì)時(shí)效果10. .Net Core和RabbitMQ限制循環(huán)消費(fèi)的方法
