Python正則練習貓眼電影

2021-10-07 12:57:15 字數 1536 閱讀 2862

不知道做什麼專案,跟著練一下,先熟練requests吧,scrapy放幾天,練下正則,爬一下貓眼電影top100寫入csv檔案,明後天寫ip**池的

import requests

import re

import time

defgethtml

(url)

: header =

; mojo-trace-id=2; hm_lpvt_703e94591e87be68cc8da0da7cbd0be2=1593172421; _lxsdk_s=172f079e98b-3b8-0b8-141%7c%7c3'

,'user-agent':}

try:

html = requests.get(url, headers=header, timeout=30)

.text

return html

except

:return

'發生異常'

defgetpage

(ulist,html)

: pattern = re.

compile

('(.*?)

.*?integer">(.*?).*?fraction">(.*?)'

,re.s)

results = re.findall(pattern,html)

for result in results:

title,author,num1,num2 = result

author = re.sub(

'\s+',''

,author)

number = num1+num2

[title,author,number]

)return ulist

definfo

(ulist)

:with

open

('movie.csv'

,'w'

,encoding=

'utf-8-sig'

)as f:

forlist

in ulist:

print

(list

) res =

','.join(

list

) f.writelines(res+

'\n'

)def

main()

: starturl =

''depth =

10 ulist =

for i in

range

(depth)

: url = starturl +

str(i*10)

html = gethtml(url)

ulist = getpage(ulist,html)

time.sleep(2)

info(ulist)

main(

)

很簡單的40行小**實現,都不能算專案只能叫練習,加油加油

利用正則爬取貓眼電影

爬取貓眼電影 import json import requests from requests.exceptions import requestexception import redef get one page url 獲取乙個頁面的資訊 try proxies get random ip ...

Python爬取貓眼電影

不多說,直接上 import requests import re import random import pymysql import time 連線資料庫 db pymysql.connect host localhost port 3306,user root passwd a db pyt...

Python之爬蟲 貓眼電影

usr bin env python coding utf 8 import json import requests import re import time 貓眼多了反爬蟲,速度過快,則會無響應,所以這裡多了乙個延時等待 from requests.exceptions import requ...