python使用百度進行爬蟲簡單學習例子

2021-09-21 04:43:41 字數 2059 閱讀 9627

分析頁面:

獲取每一頁的鏈結。

#輸入python關鍵字進行查詢

text = "python"

starurl = "" % text

html = urllib.urlopen(starurl).read()

pageurllist =

page = etree.html(html.lower().decode('utf-8'))

#crapy pageurl list

#解析出id為page的所有div下的a標籤的href屬性,如果要顯示a標籤的內容則把「@href」替換成「text()」即可

hrefs = page.xpath("//div[@id='page']//a/@href")

for href in hrefs:

hrefurl = ""+href

print "list:"

print pageurllist

執行結果

root@kali:~/py# python table.py 

list:

['&pn=10&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum', '&pn=20&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum', '&pn=30&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum', '&pn=40&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum', '&pn=50&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum', '&pn=60&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum', '&pn=70&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum', '&pn=80&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum', '&pn=90&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum', '&pn=10&oq=python&ie=utf-8&usm=4&rsv_pq=897a8df20000da9b&rsv_t=075dbfoz2dplnlb7ts%2boyopf06je%2bi1j1whmgcrvjurdkieecwvsl%2bhdvum&rsv_page=1']

Python爬蟲 百度貼吧

get請求 from urllib import request import urllib import time 第一頁 第二頁 2 1 50 第三頁 3 1 50 第四頁 4 1 50 第n頁 n 1 50 推測第一頁 headers 根據url傳送請求,獲取伺服器響應檔案 defloadpa...

python百度貼吧爬蟲

coding utf 8 coding utf 8 import urllib import urllib2 import reimport thread import time class bdtb def init self,baseurl,seelz self.baseurl baseurl ...

爬蟲 requests 使用百度翻譯

import requests import json if name main 1.指定url post url 2.ua偽裝 headers 3.傳送請求 word input enter a word if not word print 您沒有輸入任何單詞 data if word respo...