Python開發簡單爬蟲之爬蟲介紹（一）

本部落格來自慕課網—python開發簡單爬蟲

爬蟲主要場景:

- 不需要登入的靜態網頁

- 使用ajax非同步載入的內容

- 需要使用者登入才可以訪問的網頁

以下主要介紹不需要登入的靜態網頁。

3中實現方式：

python中已經存在關鍵字class,故當屬性為class時，用class_代替。

# -*- coding: utf-8 -*-
import re
from bs4 import beautifulsoup
html_doc = """
the dormouse's story
once upon a time there were three little sisters; and their names were
elsie,
lacie and
tillie;
and they lived at the bottom of a well.
..."""
def run_demo():
soup = beautifulsoup(html_doc,'html.parser',from_encoding='utf-8')
links = soup.find_all('a')
for link in links:
print link.name,link['href'],link.get_text()
print '獲取lacie的鏈結'
link_node = soup.find('a',href='')
print link_node.name, link_node['href'], link_node.get_text()
print '獲取正則匹配'
link_node = soup.find('a', href=re.compile(r'ill'))
print link_node.name, link_node['href'], link_node.get_text()
#a tillie
print '獲取p段落文字'
p_node = soup.find('p', class_='title')
print p_node.name,p_node.get_text()
if __name__ == '__main__':
run_demo()

python爬蟲簡單 python爬蟲簡單版

學過python的帥哥都知道，爬蟲是python的非常好玩的東西，而且python自帶urllib urllib2 requests等的庫，為爬蟲的開發提供大大的方便。這次我要用urllib2，爬一堆風景。先上重點 1 response urllib2.urlopen url read 2 soup...

python爬蟲入門簡單爬蟲

coding utf 8 from bs4 import beautifulsoup,soupstrainer from threading import lock,thread import sys,time,os from urlparse import urlparse,urljoin fro...

Python爬蟲之爬蟲概述

知識點模擬瀏覽器，傳送請求，獲取響應網路爬蟲又被稱為網頁蜘蛛，網路機械人就是模擬客戶端主要指瀏覽器傳送網路請求，接收請求響應，一種按照一定的規則，自動地抓取網際網路資訊的程式。知識點了解爬蟲的概念爬蟲在網際網路世界中有很多的作用，比如資料採集抓取招聘的招聘資訊資料分析挖掘...

Python開發簡單爬蟲之爬蟲介紹（一）

python爬蟲簡單 python爬蟲 簡單版

python爬蟲入門簡單爬蟲

Python爬蟲之爬蟲概述

相關推薦

python爬蟲簡單 python爬蟲簡單版