python獲取頁面所有a標籤下href的值

# -*- coding:utf-8 -*-
#python 2.7
##標籤操作
from bs4 import beautifulsoup
import urllib.request
import re
#如果是**，可以用這個辦法來讀取網頁
#html_doc = ""
#req = urllib.request.request(html_doc) 
#webpage = urllib.request.urlopen(req) 
#html = webpage.read()
html="""
the dormouse's story
once upon a time there were three little sisters; and their names were
,lacie and
tillie;
lacie
and they lived at the bottom of a well.
..."""
soup = beautifulsoup(html, 'html.parser') #文件物件
#查詢a標籤,只會查詢出乙個a標籤
#print(soup.a)#
for k in soup.find_all('a'):
print(k)
print(k['class'])#查a標籤的class屬性
print(k['id'])#查a標籤的id值
print(k['href'])#查a標籤的href值
print(k.string)#查a標籤的string
如果，標籤中含有其他標籤，比如..，此時要提取中的資料，需要用k.get_text()

soup = beautifulsoup(html, 'html.parser') #文件物件

#查詢a標籤,只會查詢出乙個a標籤

for k in soup.find_all('a'):

print(k)

print(k['class'])#查a標籤的class屬性

print(k['id'])#查a標籤的id值

print(k['href'])#查a標籤的href值

print(k.string)#查a標籤的string

如果，標籤中含有其他標籤，比如..，此時要提取中的資料，需要用k.get_text()

通常我們使用下面這種模式也是能夠處理的，下面的方法使用了get()。

html = urlopen(url)

soup = beautifulsoup(html, 'html.parser')

t1 = soup.find_all('a')

print t1

href_list =

for t2 in t1:

t3 = t2.get('href')

python 如何獲取頁面所有a標籤下href的值

nofcmboq coding utf 8 python 2.7 標籤操作 from bs4 import beautifulsoup import urllib.request import re 如果是可以用這個辦法來讀取網頁 html doc req urllib.request.reque...

獲得頁面獲取所有控制項

本例以獲取web窗體上所有的textbox為例加以說明 foreach control c in page.controls 採用上述方法不能獲得所有控制項，它只能獲得頁面上一級控制項，如果某個控制項還有子控制項，將不能獲得。可以採用下述方法 1 static arraylist al null 存...

python 搜尋頁面標籤

import urllib2 from sgmllib import sgmlparser class listname sgmlparser def init self sgmlparser.init self self.is h4 self.name def start h4 self,attr...

python獲取頁面所有a標籤下href的值

python 如何獲取頁面所有a標籤下href的值

獲得頁面獲取所有控制項

python 搜尋頁面標籤

相關推薦