python爬蟲爬取nn online的資料

寫乙個爬蟲爬取nn-online.org的np散射的pwa93的各分波的理論值

需要先安裝python3，python-pip3，selenium，geckodriver

python crawler**如下

# wang jianfeng  dec 14 2018
# python3
# install selenium first: pip3 install selenium
from selenium import webdriver 
from selenium.webdriver.common.by import by
from selenium.webdriver.common.keys import keys
from selenium.webdriver.support.ui import select
from selenium.common.exceptions import nosuchelementexception
from selenium.common.exceptions import noalertpresentexception
import unittest, time, re
from urllib.request import urlopen
# install geckodriver first
driver = webdriver.firefox(
)# driver is firefox
phaselist =([
'1s0'
,'3p0'
,'1p1'
,'3s1'
,'3p1'
,'3d1'
,'e1'
,'1d2'
,'3p2'
,'3d2'
,'3f2'
,'e2'
,'1f3'
,'3p2'
,'3p2'
,'3f2'
,'e3'
,'1g4'
,'3f4'
,'3g4'
,'3h4'
,'e4'
,'1h5'
,'3g5'
,'3h5'
,'3i5'
,'e5'])
url1 =
""url2 =
"&tmax="
url3 =
"&tint=0.01&ps="
nntype =
"np_"
txt =
".txt"
tmin =
0.01
tmax =
10.00
for phase in phaselist:
fw =
open
(nntype + phase + txt,
"w",encoding=
"utf-8"
) fw.write(
"\b\b"
) tmin =
0.01
tmax =
10while tmax <=
300:
url = url1 +
str(
round
(tmin,2)
)+ url2 +
str(tmax)
+ url3 + phase
driver.get(url)
html = driver.page_source
res = re.findall(r"pwa93\b\b\b(.+?)

"
,html,flags=re.dotall)
fw.write(res[0]
.strip())
fw.write(
"\n"
)if tmax <
100:
fw.write(
"\b"
) tmin = tmin+
10 tmax = tmax+
10 fw.close(

)

由於有時nnonline返回的資料最後一行有時會有重複

再寫各python**重新編輯一下

**如下

phaselist =([
'1s0'
,'3p0'
,'1p1'
,'3s1'
,'3p1'
,'3d1'
,'e1'
,'1d2'
,'3p2'
,'3d2'
,'3f2'
,'e2'
,'1f3'
,'3p2'
,'3p2'
,'3f2'
,'e3'
,'1g4'
,'3f4'
,'3g4'
,'3h4'
,'e4'
,'1h5'
,'3g5'
,'3h5'
,'3i5'
,'e5'])
nntype =
"np_"
txt =
".txt"
dat =
".dat"
path =
"out/"
for phase in phaselist:
ii=0 fr =
open
(nntype + phase + txt,
"r",encoding=
"utf-8"
) fw =
open
(path + nntype + phase + dat,
"w",encoding=
"utf-8"
) line1 = fr.readline(8)
line2 = fr.readline(11)
line3 = fr.readline(
)while line1:
line11=line1
line22=line2
line1 = fr.readline(8)
line2 = fr.readline(11)
line3 = fr.readline(
)if line11 != line1:
fw.write(line11+line22+
'\b\n'
) ii=ii+
1 fr.close(
) fw.close(
)print
(phase +
" complete. line = "
+str
(ii)
)

Python 爬蟲爬取網頁

工具 python 2.7 import urllib import urllib2 defgetpage url 爬去網頁的方法 request urllib.request url 訪問網頁 reponse urllib2.urlopen request 返回網頁 return response...

python爬蟲爬取策略

在爬蟲系統中，待抓取url佇列是很重要的一部分。待抓取url佇列中的url以什麼樣的順序排列也是乙個很重要的問題，因為這涉及到先抓取那個頁面，後抓取哪個頁面。而決定這些url排列順序的方法，叫做抓取策略。下面重點介紹幾種常見的抓取策略一深度優先遍歷策略深度優先遍歷策略是指網路爬蟲會從起始頁開始...

python爬蟲 seebug爬取

1.找相關的標籤一步一步往下查詢 2.有cookie才能查詢 3.用import re而不用from re import 是為了防止衝突 coding utf 8 from requests import import re from bs4 import beautifulsoup as bs h...

python爬蟲爬取nn online的資料

Python 爬蟲爬取網頁

python爬蟲爬取策略

python爬蟲 seebug爬取

相關推薦