Python資料提取 PyQuery

1.1 介紹

如果你對css選擇器與jquery有有所了解，那麼還有個解析庫可以適合你–jquery

官網

1.2 安裝

pip install pyquery

1.3 使用方式

1.3.1 初始化方式

from pyquery import pyquery as pq
doc = pq(str)
print(doc(tagname))

from pyquery import pyquery as pq
doc = pq(url='')
print(doc('title'))

from pyquery import pyquery as pq
doc = pq(filename='demo.html')
print(doc(tagname))

1.3.2 選擇節點

from pyquery import pyquery as pq
doc = pq(filename='demo.html')
doc('#main #top')

from pyquery import pyquery as pq
doc = pq(filename='demo.html')
doc('#main #top').children()

獲取兄弟節點

1.3.3 獲取屬性

from pyquery import pyquery as pq
doc = pq(filename='demo.html')
a = doc('#main #top')
print(a.attrib['href'])
print(a.attr('href'))

1.3.4 獲取內容

from pyquery import pyquery as pq
doc = pq(filename='demo.html')
div = doc('#main #top')
print(a.html())
print(a.text())

1.3.5 樣例

from pyquery import pyquery as pq
# 1.可載入一段html字串，或乙個html檔案，或是乙個url位址，
d=pq("hello
")d=pq(filename=path_to_html_file)
d=pq(url='')注意：此處url似乎必須寫全
# 2.html()和text() ——獲取相應的html塊或文字塊，
p=pq("")
p('head').html()#返回hello
p('head').text()#返回hello
# 3.根據html標籤來獲取元素，
d=pq('test 1
test 2
')d('p')#返回[,]
print d('p')#返回test 1
test 2
print d('p').html()#返回test 1
# 注意：當獲取到的元素不只乙個時，html()方法只返回首個元素的相應內容塊
# 4.eq(index) ——根據給定的索引號得到指定元素。接上例，若想得到第二個p標籤內的內容，則可以：
print d('p').eq(1).html() #返回test 2
# 5.filter() ——根據類名、id名得到指定元素，例：
d=pq("test 1
test 2
")d('p').filter('#1') #返回
d('p').filter('.2') #返回
# 6.find() ——查詢巢狀元素，例：
d=pq("test 1
test 2
")d('div').find('p')#返回[, ]
d('div').find('p').eq(0)#返回
#7.直接根據類名、id名獲取元素，例：
d=pq("test 1
test 2
")d('#1').html()#返回test 1
d('.2').html()#返回test 2
# 8.獲取屬性值，例：
d=pq("hello
")d('a').attr('href')#返回
d('p').attr('id')#返回my_id
# 9.修改屬性值，例：
d('a').attr('href', '')把href屬性修改為了baidu
# 10.addclass(value) ——為元素新增類，例：
d=pq('
')d.addclass('my_class')#返回
# 11.hasclass(name) #返回判斷元素是否包含給定的類，例：
d=pq("
")d.hasclass('my_class')#返回true
# 12.children(selector=none) ——獲取子元素，例：
d=pq("hello
world
")d.children()#返回[, ]
d.children('#2')#返回
# 13.parents(selector=none)——獲取父元素，例：
d=pq("hello
world
")d('p').parents()#返回
d('#1').parents('span')#返回
d('#1').parents('p')#返回
# 14.clone() ——返回乙個節點的拷貝
#15.empty() ——移除節點內容
# 16.nextall(selector=none) ——返回後面全部的元素塊，例：
d=pq("hello
world
d('p:first').nextall()#返回[,
d('p:last').nextall()#返回[
# 17.not_(selector) ——返回不匹配選擇器的元素，例：
d=pq("test 1
test 2
")d('p').not_('#2')#返回

python資料提取方法

json 資料交換格式,看起來像python格式字典列表型別的字串使用前需要import json 會返回json資料？1.瀏覽器切換到手機版 json.loads 把json字串轉化成python型別 json.loads json字串 json.dumps 把python 型別轉化為js...

python根據時間提取資料

時間抽取時間抽取，是根據一定的條件，對時間格式的資料進行抽取 1 根據索引進行抽取 dataframe.ix start end dataframe.ix dates 2 根據時間列進行抽取 dataframe condition import pandas 案例 lambda 獲取所有的資料，對...

python提取內容使用Python提取小說內容

具體實現功能如下輸入目錄頁的url之後，指令碼會自動分析目錄頁，提取的章節名和章節鏈結位址。然後再從章節鏈結位址逐個提取章節內容。現階段只是將從第一章開始，每次提取一章內容，回車之後提取下一章內容。其他的結果可能有不同，需要做一定修改。在逐浪測試過正常。coding utf8 usr bi...

Python資料提取 PyQuery

python資料提取方法

python根據時間提取資料

python提取內容 使用Python提取小說內容

相關推薦

python提取內容使用Python提取小說內容