爬蟲實戰(二) 爬取糗事百科段子

2021-08-21 19:38:00 字數 1385 閱讀 1335

源**為:

from urllib.request import request, urlopen

import requests

import re

import time

def gethtml(url):

headers = # 設定虛擬headers資訊

request = request(url, headers=headers)

response = urlopen(request)

html = response.read().decode('utf-8')

return html

def write_to_file(content):

with open('duanzi.txt','a',encoding='utf=8') as f:

f.write(json.dumps(content,ensure_ascii=false)+'\n')

def gettext(pagenum=1):

text_list=

for page in range(1,pagenum+1):

url = '' + str(page)

html=gethtml(url)

time.sleep(1)

pattern = re.compile(

'(.*?).*?(.*?)',

re.s)

items = re.findall(pattern, html)

for each_items in text_list: # 迭代獲取每乙個網頁的每乙個段子資訊

for item in each_items:

count=0

for i in item: # 處理文字,加強閱讀效果

i = i.strip('\n') # 將'\n'去掉,避免多個換行符疊加

i = i.replace('

', '\n') #

是html中的用於段落的換行標籤,

# 為了保持原本的段落格式,所以需要在我們閱讀時替換成文字換行符'\n'

print(i)

count+=1

if count%3==0:

print('----' * 20)

if __name__ == '__main__':

try:

num=int(input('請輸入你想要爬取的頁面數量:'))

gettext(num)

except exception as e:

print("對不住,出錯了!")

爬取糗事百科段子

user bin env python coding utf 8 author holley file baike1.py datetime 4 12 2018 14 32 description import requests import re import csv from bs4 impor...

Scrapy 爬取糗事百科段子

1.python爬蟲實戰一之爬取糗事百科段子 2.在工作目錄建立myproject scrapy startproject myproject3.編寫 myproject myproject items.py coding utf 8 define here the models for your ...

爬取糗事百科,朗讀段子

一閒下來就不務正業了,寫個爬蟲,聽段子。額,mac自帶的語音朗讀,windows我就不知道啦,有興趣的可以去研究一下哈。環境 python 2.7 mac os 10.12 使用朗讀的 from subprocess import call call say hello pengge 當然了,聽起來...