爬蟲爬取騰訊熱點

1.了解ajax載入

2.通過chrome的開發者工具，監控網路請求，並分析

3.用selenium完成爬蟲

4.實現：用selenium爬取的熱點精選，熱點精選至少爬50個出來，儲存成 csv 每一行如下標號（從1開始）,標題,鏈結,…（前三個為必做，後面內容可以自己加）

import time
from selenium import webdriver
driver = webdriver.chrome(executable_path=
'd:\anaconda\scripts\chromedriver.exe'
)driver.get(
"")#使用ajax載入
for i in
range(1
,100):
time.sleep(2)
driver.execute_script(
"window.scrollto(window.scrollx, %d);"
%(i*
200)
)from bs4 import beautifulsoup
html=driver.page_source
#解析html
bsobj=beautifulsoup(html,
"lxml"
)jxtits=bsobj.find_all(
"div",)
[0].find_next_sibling(
).find_all(
"li"
)import pandas as pd
res = pd.dataframe(
)print
("index"
,","
,"title"
,","
,"url"
)csvrow_index =
csvrow_title =
csvrow_url =
for i,jxtit in
enumerate
(jxtits)
:try
: text=jxtit.find_all(
"img")[
0]["alt"
]except
: text=jxtit.find_all(
"div",)
[0].text
try:
url=jxtit.find_all(
"a")[0
]["href"
]except
:print
(jxtit)
print
(i+1
,","
,text,
",",url)
import pandas as pd
csv_file = pd.dataframe(
)csv_file[
'index'
]= csvrow_index
csv_file[
'title'
]= csvrow_title
csv_file[
'url'
]= csvrow_url
csv_file.to_csv(
'csv_file.csv'
,index=
none
)

爬蟲爬取騰訊疫情資料

網頁結構實現爬取的資料結語右鍵檢查，分析網頁找到我們需要的資料所在的找到下面就是相關實現了。首先匯入python相關庫 requests 網頁請求，獲取原始資料 json 網頁解析，去除多餘字元 pandas 資料處理 import requests import json impor...

python爬蟲爬取騰訊招聘資訊（靜態爬蟲）

環境 windows7，python3.4 親測可正常執行 1 import requests 2from bs4 import beautifulsoup 3from math import ceil 45 header 78 9 獲取崗位頁數 10def getjobpage url 11 re...

scrapy爬蟲》爬取騰訊社招資訊

dos視窗輸入 scrapy startproject tencent cd tencent coding utf 8 define here the models for your scraped items see documentation in import scrapy class ten...

爬蟲 爬取騰訊熱點

爬蟲 爬取騰訊疫情資料

python爬蟲爬取騰訊招聘資訊 （靜態爬蟲）

scrapy爬蟲》爬取騰訊社招資訊

相關推薦

爬蟲爬取騰訊熱點

爬蟲爬取騰訊疫情資料

python爬蟲爬取騰訊招聘資訊（靜態爬蟲）