爬蟲獲取文章標題鏈結

bs4--基本使用

# -*- coding: utf-8 -*-
# @time : 2019/7/30 1:04
# @author : hakim
# @file : pq.py
import requests
from bs4 import beautifulsoup
link = ""
headers = 
if __name__ == '__main__':
pos=1
while(true):
key_dict = 
r = requests.get(link, headers=headers,params=key_dict,timeout=1)
# print (r.url)
soup = beautifulsoup(r.text, "html.parser") #使用beautifulsoup解析這段**
res=soup.find_all("a",class_="posttitle2") # bs4.element.tag
# print(res) #列印a標籤所有含有class_="posttitle2"
# print(type(res))
if not len(res):exit(0) #定義了幾乎無限個標籤。所以在這裡進行退出判斷
for tag in res:
title = tag.text.strip() # 使用strip自動刪除字串的前導空格
print('['+title+']('+tag['href']+')')
pos=pos+1

# -*- coding: utf-8 -*-
# @time : 2019/7/30 1:04
# @author : hakim
# @file : pq.py
import re
import requests
from bs4 import beautifulsoup
link = ""
headers = 
if __name__ == '__main__':
pos=1
while(true):
tag_links=link+str(pos)
r = requests.get(tag_links, headers=headers,timeout=2)
# print (r.url)
# print (r.text)
soup = beautifulsoup(r.text, "html.parser") #使用beautifulsoup解析這段**
res=soup.find_all("a",id=re.compile("postslist1_rpposts_titleurl")) # 檢視id包含"postslist1_rpposts_titleurl"的所有a標籤
# print(res) #列印a標籤所有含有class_="posttitle2"
# print(type(res))
if not len(res):exit(0) #定義了幾乎無限個標籤。所以在這裡進行退出判斷
for tag in res:
title = tag.text.strip() # 使用strip自動刪除字串的前導空格
print('['+title+']('+tag['href']+')')
pos=pos+1

# -*- coding: utf-8 -*-
# @time : 2019/8/6 15:15
# @author : hakim
# @file : jianshu.py
from selenium import webdriver
import time
#無介面操作
options = webdriver.chromeoptions()
options.add_argument('headless')
browser = webdriver.chrome(chrome_options=options)
browser.get("")
for i in range(3):
browser.execute_script("window.scrollto(0, document.body.scrollheight);")
time.sleep(2)
# print(browser)
for j in range(10):
try:
button = browser.execute_script("var a = document.getelementsbyclassname('load-more'); a[0].click();")
time.sleep(2)
except:
pass
#titles = browser.find_elements_by_class_name("title")
with open("article_jianshu.txt", "w", encoding="utf-8") as f:
for t in titles:
try:
print(('['+t.text + "](" + t.get_attribute("href")+')'))
f.write('['+t.text + "](" + t.get_attribute("href")+')')
f.write("\n")
except typeerror:
pass

文章標題 MYSQL

1.mysql使用 like 表示模糊查詢比如 select from city c where 1 1 and c.city name like ch 表示查詢出city name包含ch的所有城市列表。2.mysql使用limit進行分頁比如 select from city c where...

無標題文章

1.對映為計算屬性state mapstates getters mapgetters 對 state 進行運算過濾返回新的狀態getters 接收 state 作為第乙個引數，其它 getters 作為第二個引數 getters 讓 getter 返回乙個函式，來實現給 getter 傳參。在你...

Python爬取學習猿地文章標題，鏈結，時間，作者

爬取學習猿地猿圈爬取內容文章標題，文章連線，作者，時間工具 bs4,requests 結果爬取到檔案之中 from bs4 import beautifulsoup import requests,json 1.定義請求頭和url url headers alldata 2.請求獲取資料 ...

爬蟲 獲取文章標題鏈結

文章標題 MYSQL

無標題文章

Python爬取學習猿地文章標題，鏈結，時間，作者

相關推薦

爬蟲獲取文章標題鏈結