python 爬蟲 xpath 儲存到資料庫

2021-10-01 09:49:30 字數 1756 閱讀 3819

參考

安裝 lxml 庫

import pymysql

import requests

from lxml import etree

def get_movies(page):

url =

"" % page

# 獲取url中的內容

response = requests.get(url)

html_content = response.text

# 使用xpath進行內容解析

html = etree.html(html_content)

# 根據規則提取內容

movies = html.xpath(

"/html/body/div[8]/div[2]/ul/li"

)# 存入資料庫

dbparmas =

conn = pymysql.connect(**dbparmas)

# 任意關鍵字引數

# 獲取游標

cursor = conn.cursor(

)for movie in movies:

title = movie.xpath(

"./div/div[1]/a/p/text()"

)[0]

cover_image = movie.xpath(

"./a/img/@_src"

)[0]

durations = movie.xpath(

"./a/span/text()"

)if durations:

duration = durations[0]

else:

duration =

'無資訊'

publish_time = movie.xpath(

"./a/div[2]/p/text()"

)[0]

cate = movie.xpath(

"./div/div[1]/div[1]/span[1]/text()"

)[0]

play_num = movie.xpath(

"./div/div[1]/div[2]/span[1]/text()"

)[0]

like_num = movie.xpath(

"./div/div[1]/div[2]/span[2]/text()"

)[0]

descriptions = movie.xpath(

"./a/div[2]/div/text()"

)if descriptions:

description = descriptions[0]

else:

description =

"描述"

print(title, cover_image, duration, description, publish_time, cate, play_num, like_num)

# 執行sql 只是新增到執行佇列中

# % (cover_image, duration, description, publish_time, title, cate, play_num, like_num))

# # # 提交

# conn.commit()

if __name__ ==

'__main__'

:for i in range(2, 10):

get_movies(i)

python爬蟲XPath學習

xpath簡介和基本使用 1.前言 之前爬蟲的時候沒有用過xpath,就是沒用過lxml這個包,遇到json格式網頁我用的json.loads html格式用的beautifulsoup裡面有find和find all函式查詢標籤之類的。但是xpath在爬蟲裡面也算乙個比較重要的工具,當然要學習啦。...

python 爬蟲(XPATH使用)

xpath xml path language 是一門在xml文件中查詢資訊的語言,可用來在xml文件中對元素和屬性進行遍歷。w3school官方文件 pip install lxml 如果出現網路延遲,可使用清華源進行安裝匯入兩種匯入方式 第一種 直接匯入from lxml import etre...

python爬蟲學習 xpath

1.例項化乙個etree的物件,且需要將被解析的頁面原始碼資料載入到該物件中。2.呼叫etree物件中的xpath方法結合著xpath表示式實現標籤的定位和內容的捕獲。pip install lxml1.將本地的html文件中的原始碼資料載入etree物件中 etree.parse filepath...