curl根據cookie防採集

<?php
header('content-type: text/html; charset=utf-8');
$cookie_file = dirname(__file__).'/cookie.txt'; 
//$cookie_file = tempnam("tmp","cookie");
//先獲取cookies並儲存
$url = "";
$ch = curl_init($url); //初始化
curl_setopt($ch, curlopt_header, 0); //不返回header部分
curl_setopt($ch, curlopt_returntransfer, true); //返回字串，而非直接輸出
curl_setopt($ch, curlopt_cookiejar, $cookie_file); //儲存cookies
curl_exec($ch);
curl_close($ch);
//使用上面儲存的cookies再次訪問
$url = "/search?oe=utf8&ie=utf8&source=uds&hl=zh-cn&q=qq";
$ch = curl_init($url);
curl_setopt($ch, curlopt_header, 0);
curl_setopt($ch, curlopt_returntransfer, true);
curl_setopt($ch, curlopt_cookiefile, $cookie_file); //使用上面獲取的cookies
$response = curl_exec($ch);
curl_close($ch);
echo $response;
?>

或者獲取cookie通過以下方式

// 初始化curl
$ch = curl_init();
curl_setopt($ch, curlopt_url, $url);
// 獲取頭部資訊
curl_setopt($ch, curlopt_header, 1);
// 返回原生的（raw）輸出
curl_setopt($ch, curlopt_returntransfer, true);
// 執行並獲取返回結果
$content = curl_exec($ch);
// 關閉curl
curl_close($ch);
// 解析http資料流
list($header, $body) = explode("\r\n\r\n", $content);
// 解析cookie
preg_match("/set\-cookie:([^\r\n]*)/i", $header, $matches);
// 後面用curl提交的時候可以直接使用
// curl_setopt($ch, curlopt_cookie, $cookie);
$cookie = $matches[1];

curl登入採集

cookie file tempnam temp cookie 設定cookie存放目錄 login url 設定url位址將post提交的所有必須字段賦值給 post fileds post fileds ckyime 31536000 step 2 pwuser mangmu126 pwpwd...

curl採集迴圈資料

header content type text html charset utf 8 author cpath time 2016 5 8 curl採集獲取10頁資料 if is file photo dns mysql host 127.0.0.1 dbname article p new pd...

採集與反採集或說防採集

反採集原理集程式的主要步驟如下一獲取被採集的頁面的內容二從獲取中提取所有用的資料這種辦法，比較流行的採集器就是火車頭的2.1版本，今天我也測試了一下這個版本，用著還是不錯它的例程上面講的是採集落伍的貼子，我發現落伍對此還是非常大方的，雖然discuz程式針對採集也採取了反採集的策略...

curl根據cookie防採集

curl登入採集

curl採集迴圈資料

採集與反採集或說防採集

相關推薦