Python 新浪微博爬蟲之模擬登陸

目前，親測能用的步驟是：①通過預登入，使用get方法，獲得登入所需的servertime, nonce, pubkey, rsakv；②使用encode64加密使用者名稱，使用rsa演算法加密密碼；③登入。

步驟一：

response格式為（換行是我自己加上去的）：

sinassocontroller.prelogincallback()

python**為：

prelogin_url = '
' %username
response =urllib2.urlopen(prelogin_url)
p = re.compile(r'
\((.*?)\)
') #用來解析括號中的json格式內容
strurl = p.search(response.read()).group(1)
dic = dict(eval(strurl)) #
json格式的response
pubkey = str(dic.get('
pubkey'))
servertime = str(dic.get('
servertime'))
nonce = str(dic.get('
nonce'))
rsakv = str(dic.get('
rsakv
'))

步驟二：

加密使用者名稱：

username_ =urllib.quote(username)
username = base64.encodestring(username_)[:-1]

加密密碼：

rsapublickey = int(pubkey, 16)
key = rsa.publickey(rsapublickey, 65537) #
建立公鑰
message = servertime + '
\t' + nonce + '
\n' + password #
拼接明文js加密檔案中得到
passwd = rsa.encrypt(message, key) #
加密passwd = binascii.b2a_hex(passwd) #
將加密資訊轉換為16進製制。

完整**：

建立公鑰

48 message = self.servertime + '

\t' + self.nonce + '

\n' + self.password #

拼接明文js加密檔案中得到

49 passwd = rsa.encrypt(message, key) #

加密50 passwd = binascii.b2a_hex(passwd) #

將加密資訊轉換為16進製制。

51return

passwd

5253

def__get_suser

(self):

54 username_ =urllib.quote(self.username)

55 username = base64.encodestring(username_)[:-1]

56return

username

5758

def__prelogin

(self):

59 prelogin_url = '

' %self.username

60 response =urllib2.urlopen(prelogin_url)

61 p = re.compile(r'

\((.*?)\)')

62 strurl = p.search(response.read()).group(1)

63 dic = dict(eval(strurl)) #

json格式的response

64 self.pubkey = str(dic.get('

pubkey'))

65 self.servertime = str(dic.get('

servertime'))

66 self.nonce = str(dic.get('

nonce'))

67 self.rsakv = str(dic.get('

rsakv'))

6869

deflogin(self):

70 url = '

'71try:

72 self.__prelogin() #

預登入73

except:74

'prelogin error'75

return

76global

postdata

77 postdata['

servertime

'] =self.servertime

78 postdata['

nonce

'] =self.nonce

79 postdata['

su'] = self.__get_suser

()80 postdata['

sp'] = self.__get_spwd

()81 postdata['

rsakv

'] =self.rsakv

82 postdata =urllib.urlencode(postdata)

83 headers =

84 req =urllib2.request(

85 url =url,

86 data =postdata,

87 headers =headers88)

89 result =urllib2.urlopen(req)

90 text =result.read()

91 p = re.compile('

location\.replace\(\'(.*?)\'\)')

92try

:93 login_url = p.search(text).group(1)

94urllib2.urlopen(login_url)

95print

"login succeed!"96

except:97

'login error!'98

99if

__name__ == '

__main__':

100 uid = '

your username

'101 psw = '

your password

'102 simlogin =weibologin(uid, psw)

103 simlogin.login()

模擬新浪微博

1.專案效果圖主要如下 android layout width match parent android layout height match parent android padding 10dip android orientation vertical 2 獲取xml資源在andro...

微博爬蟲python 微博爬蟲 python

本文爬取的是m站的微博內容，基於python 2.7 一微博內容爬取 1.要爬取的微博首頁 2.手機微博是看不到翻頁，是一直往下載入的，但是其json格式的資料仍然以翻頁的形式呈現。3.開啟開發者工具，向下翻頁面，可以在network下的xhr的響應檔案中，找到json檔案的如通過分析發現每個...

爬蟲初探新浪微博搜尋爬蟲總覽

在這裡需要說明一下，一般來說，資料抓取工作主要有兩種方式一是通過抓包工具 fiddle 進行抓包分析，獲取ajax請求的url，通過url抓取資料，這也是更為通用推薦的方法另外一種方法就是後面要使用的模擬瀏覽器行為的爬蟲。那麼，在源中資訊不可見的情況下，通過什麼方法能夠提取js 中的html...

Python 新浪微博爬蟲之模擬登陸

模擬新浪微博

微博爬蟲python 微博爬蟲 python

爬蟲初探 新浪微博搜尋爬蟲總覽

相關推薦

爬蟲初探新浪微博搜尋爬蟲總覽