python爬蟲豆瓣高分電影前一百部

2021-10-03 07:29:30 字數 1932 閱讀 2187

處理json

顯示採用jsonpath得到電影名

因為博主初學,還不會得到電影名與評分,故採取了兩次jsonpath.jsonpath

得到之後的list進行交叉合併,此處使用的是chain

在合併後的list採取演算法使其進行換行以及隔開

最後儲存在本地

import requests

import json

import jsonpath

from itertools import chain

url =

""headers =

r = requests.get(url=url,headers=headers)

# print(r.content.decode())

ret = json.dumps(r.content.decode(

),ensure_ascii=

false

,indent=4)

# print(type(r.content.decode()))

with

open

("douban.json"

,"w"

,encoding=

"utf-8"

)as f:

f.write(ret)

## with open("douban.json","r",encoding="utf-8") as f:

# ret4 = json.load(f)

# print(ret4)

# print(type(ret4))

# print(r.json())

# res = r.json()['subjects'][0]['title']

# print(r.json()['subjects'][0]['title'])

# print(type(res))

# print(r.json())

# print(type(r.json()))

name = jsonpath.jsonpath(r.json(),

'$..title'

)rate = jsonpath.jsonpath(r.json(),

'$..rate'

)# print(name)

# print(rate)

# print(type(name))

want =

list

(chain.from_iterable(

zip(name,rate)))

# print(want)

count1 =

0for w in want:

if count1%3==

1:want.insert(count1,

":")

# elif count%2 == 0:

# want.insert(count,"\n")

# print()

count1 +=

1count2 =

0for w in want:

if count2%4==

0:want.insert(count2,

"\n"

)# elif count%2 == 0:

# want.insert(count,"\n")

# print()

count2 +=

1print

(want)

str1 =

" ".join(want)

print

(str1)

with

open

("want.txt"

,"w"

,encoding=

"utf-8"

)as f:

f.write(str1)

python爬蟲 豆瓣電影

最近學習python 順便寫下爬蟲練手 爬的是豆瓣電影排行榜 python版本2.7.6 安裝 beautiful soup sudo apt get install python bs4 安裝 requests sudo apt get install python requests下面是py a...

python爬蟲之獲取豆瓣電影資訊

本質就是 發起請求 獲取響應內容 解析內容 儲存資料首先,需要做的就是匯入模組pip install requests pip install lxml coding utf 8 import requests from lxml import etree 選取網頁並做解析 這裡以 titanic ...

Python爬蟲 爬取豆瓣電影(二)

檢視上乙個專案,請看 上乙個專案中獲取到了一定數量的電影url資訊,這次來獲取單個電影的電影詳情。對傳遞的url返回乙個名為soup的beautifulsoup物件 defget url html soup url header request body.get header proxies req...