爬蟲專欄3 xpath爬取貓眼

2021-10-05 19:18:57 字數 1322 閱讀 2623

from lxml import etree

import requests

import time

url =

''headers =

response = requests.get(url,headers=headers)

html = response.text

movie_name_xpath =

s = etree.html(html)

movie_name = s.xpath(movie_name_xpath)

print

(movie_name)

defget_one_page

(url)

: response = requests.get(url,headers=headers)

selector = etree.html(response.text)

#主節點配合子節點爬取

film = selector.xpath(

) items = selector.xpath(

)#print(items)

for item in items:

print

(item)

#print(a)

''' #利用主節點配合子節點爬取主要就是因為元素太多儲存在多個列表裡面輸出會有所困難,不然只能輸出乙個長列表了

for div in film:

number = div.xpath('i/text()')[0]

title = div.xpath('div/div/div[1]/p[1]/a/text()')[0]

star = div.xpath('div/div/div[1]/p[2]/text()')[0]

#這個不知道為啥報錯list index out of range如果加了[0]

time = div.xpath('div/div/div[1]/p[3]/text()')

print(" ",str(number)," ",title," ",star," ",time)

'''if __name__==

'__main__'

:for i in

range(3

):url =

''.format

(i*10

)print

('第{}頁抓取完畢'

.format

(i+1))

get_one_page(url)

#t -- 推遲執行的秒數。

time.sleep(

0.5)

datawhale爬蟲(xpath爬取丁香網評論)

1.xpath基礎學習 前面我們介紹了 beautifulsoup 的用法,這個已經是非常強大的庫了,不過還有一些比較流行的解析庫,例如 lxml,使用的是 xpath 語法,同樣是效率比較高的解析方法。如果大家對 beautifulsoup 使用不太習慣的話,可以嘗試下 xpath。xpath 是...

python爬蟲 爬取貓眼電影資料

定義乙個函式獲取貓眼電影的資料 import requests def main url url html requests.get url text print html if name main main 利用正則匹配,獲得我們想要的資訊 dd i class board index board...

python爬蟲基礎爬取貓眼電影

import requests from requests.exceptions import requestexception from sqlalchemy import create engine from lxml import etree import pandas as pd impor...