lxml xpath 爬取並正常顯示中文內容

2021-09-22 13:30:30 字數 673 閱讀 3153

import

osimport

lxml

from urllib2 import urlopen #

mac#

from urllib.request import request, urlopen # win

from lxml import

etree

hfile = urlopen('

').read()

tree =etree.html(hfile)

strs = tree.xpath( "

//title")

strs =strs[0]

#strs = (etree.tostring(strs)) # 不能正常顯示中文

strs = (etree.tostring(strs, encoding = "

utf-8

", pretty_print = true, method = "

html

")) #

可以正常顯示中文

print (strs)

如果不在tostring函式中正確配置的話,會列印出:

&#

30334;度一下,你就知道

而正確的應該是:

本文**grandyang

python爬取並計算成績

模擬登入後抓取成績,計算績點。coding utf 8 import urllib import urllib2 import cookielib import reimport string 績點運算 class sdu 類的初始化 def init self 登入url self.loginur...

爬取天氣資訊並郵件傳送

直接上 usr bin env python coding utf 8 from urllib.request import urlopen from pyquery import pyquery as pq import smtplib from email.mime.text import mi...

模擬登陸並爬取Github

因為崔前輩給出的 執行有誤,略作修改和簡化了。書上例題,不做介紹。import requests from lxml import etree class login object def init self self.headers 登陸位址 self.login url post請求位址 sel...