29 煲劇狂人

2022-05-30 19:06:14 字數 3692 閱讀 5258

練習介紹

要求:請使用多協程和佇列,爬取時光網電視劇集top100的資料(劇名、導演、主演和簡介),並用csv模組將資料儲存下來。

目的:1.練習掌握gevent的用法

2.練習掌握queue的用法

1

from gevent import

monkey

2monkey.patch_all()34

from bs4 import

beautifulsoup

5import

gevent,requests,csv

6from gevent.queue import

queue

78 url_list = ['

']9for i in range(2,11):

index-{}.html

'.format(i))

1112 work =queue()

1314

for url in

url_list:

15work.put_nowait(url)

1617

defpachong():

18while

notwork.empty():

19 url =work.get_nowait()

20 res =requests.get(url)

21 items = beautifulsoup(res.text,'

html.parser

').find_all('

div',class_='

mov_con')

22for item in

items:

23 title = item.find('h2'

).text.strip()

24 director = '

null

'25 actor = '

null

'26 remarks = '

null

'27 tag_ps = item.find_all('p'

)28for tag_p in

tag_ps:

29if tag_p.text[:2] == '導演'

:30 director = tag_p.text[3:].strip()

31elif tag_p.text[:2] == '主演'

:32 actor = tag_p.text[3:].strip().replace('

\t','')33

elif tag_p['

class']:

34 remarks =tag_p.text.strip()

35 with open('

top100.csv

','a

',newline='',encoding='

utf-8-sig

') as csv_file:

36 writer =csv.writer(csv_file)

37writer.writerow([title,director,actor,remarks])

3839 task_list =

4041

for x in range(3):

42 task =gevent.spawn(pachong)

4344

45 with open('

top100.csv

','w

',newline='',encoding='

utf-8-sig

') as csv_file:

46 writer =csv.writer(csv_file)

47 writer.writerow(['

電視劇集名

','導演

','主演

','簡介'])

老師的**