使用Python處理網路檔案

最近在學習網路課程，聽課總結如下：

python抓取網頁並從中獲取資料的步驟：

2.解析網頁格式，獲取其中的資料，可以使用beautifulsoup4庫

安裝辦法：

pip install requests

示例：

import requests
res.raise_for_status
file=open("c:\\users\\gyf\\desktop\\doc\\weather.txt",'wb')
for line in res.iter_content():#以位元組形式獲取網頁內容，並寫入檔案
file.write(line)
file.close()

注意：為什麼第四行要用b呢？是因為網頁檔案是二進位制的檔案。

接下來讀取標籤內容,以讀取head標籤內容為例

import requests
res=requests.get("")
res.raise_for_status
file=open("c:\\users\\gyf\\desktop\\doc\\weather.txt",'wb')
for line in res.iter_content():
file.write(line)
file.close()
from bs4 import beautifulsoup
res=open("c:\\users\\gyf\\desktop\\doc\\weather.txt",'r',encoding="utf-8").read()#res是網頁內容字串
soup=beautifulsoup(res,features="html.parser")#soup是beautifulsoup物件
str=soup.find('head')
print(str)

（為了學習時隨時調整**，主要是懶得重新建原始檔，我這裡沒有遵循pep8規範）倒數第二行使用find方法來抓取指定標籤

執行結果：

可以看到整個head標籤都被讀了出來。

我們知道乙份html檔案中會有很多一樣的標籤，使用find只會顯示出發現的第乙個標籤（find會返回給我們結果字串），如果需要拿到所有標籤，需要使用find_all(),它將以列表的形式返回找到的所有結果。

我們知道裡a是標籤名，href是標籤a下的屬性。how do we get the website address?

import requests
res=requests.get("")
res.raise_for_status
file=open("c:\\users\\gyf\\desktop\\doc\\weather.txt",'wb')
for line in res.iter_content():
file.write(line)
file.close()
from bs4 import beautifulsoup
res=open("c:\\users\\gyf\\desktop\\doc\\weather.txt",'r',encoding="utf-8").read()#res是網頁內容字串
soup=beautifulsoup(res,features="html.parser")#soup是beautifulsoup物件
resultlist=soup.find_all('a')
#print(str)
geturls=
for i in range(len(resultlist)):
print(geturls)

上面這個**片段中，我們用resultlist儲存了所以a標籤的內容，然後，用空列表geturls來儲存我們想要的**，可以看到我們使用了resultlist[i].attrs[''href"]來達到我們的目的。

待續

python檔案處理

def cal input input.txt output output.txt cal方法為主程式，推薦這樣做而不是python.exe xx.py 預設引數為python目錄的兩個txt，如為其他檔案自己指定。infile file input,r 開啟源資料檔案 outfile file o...

python 檔案處理

1.開啟檔案 open a.txt 當前目錄下的a.txt open root a.txt 開啟某個目錄下的檔案 2.按行顯示檔案 a open a.txt a.readline ni hao n a.readline wo xianzai hen xiang ni n a.readline ni ...

Python檔案處理

open name mode buf read size readline size readlines size 這裡的size是指，io定義的default buffer size為單位大小 iter 迭代器迭代每行 write str writelines sequwence of strin...

使用Python處理網路檔案

python檔案處理

python 檔案處理

Python檔案處理

相關推薦