scrapy中輸出中文儲存中文

#
!/usr/bin/python
#coding=utf-8
#author=dahu
import
json
with open(
'huxiu.json
','r
') as f:
data=json.load(f)
print data[0]['
title']
for key in
data[0]:
print
'\"%s\":\"%s\",
'%(key,data[0][key])

read_from_json

中文寫入json：

#
!/usr/bin/python
#coding=utf-8
#author=dahu
import
json
data=
with open(
'tmp.json
','w
') as f:
json.dump(data,f,ensure_ascii=false) #
指定ensure_ascii

write_to_json

例如：

scrapy crawl huxiu --nolog -o huxiu.json

$ head

huxiu.json [,

結合上面儲存json檔案為中文的技巧：

settings.py檔案改動：

item_pipelines =

注釋去掉

pipelines.py改成如下：

#
-*- coding: utf-8 -*-
#define your item pipelines here##
don't forget to add your pipeline to the item_pipelines setting
#see: 
import
json
#import codecs
class
coolscrapypipeline(object):
#def __init__(self):
#self.file = codecs.open('data_cn.json', 'wb', encoding='utf-8')
defprocess_item(self, item, spider):
#line = json.dumps(dict(item),ensure_ascii=false) + '\n'
#self.file.write(line)
with open(
'data_cn1.json
', 'a'
) as f:
json.dump(dict(item), f, ensure_ascii=false)
f.write(
',\n')
return item

注釋的部分是另一種寫法，核心在於settings裡啟動pipeline，會自動執行process_item程式，所以就可以儲存我們想要的任何格式

此時終端輸入命令

scrapy crawl huxiu --nolog

如果仍然加 -o file.json ，file和pipeline裡定義檔案都會生成，但是file的json格式仍然是亂碼。

由上分析可以得出另乙個結論，setting裡的item_pipelines 是控制著pipeline的，如果我們多開啟幾個呢：

item_pipelines =

#
-*- coding: utf-8 -*-
#define your item pipelines here##
don't forget to add your pipeline to the item_pipelines setting
#see: 
import
json
#import codecs
class
coolscrapypipeline(object):
#def __init__(self):
#self.file = codecs.open('data_cn.json', 'wb', encoding='utf-8')
defprocess_item(self, item, spider):
#line = json.dumps(dict(item),ensure_ascii=false) + '\n'
#self.file.write(line)
with open(
'data_cn1.json
', 'a'
) as f:
json.dump(dict(item), f, ensure_ascii=false)
f.write(
',\n')
return
item
class
coolscrapypipeline1(object):
defprocess_item(self, item, spider):
with open(
'data_cn2.json
', 'a'
) as f:
json.dump(dict(item), f, ensure_ascii=false)
f.write(
',hehe\n')
return item

pipelines.py

執行：

$ scrapy crawl huxiu --nolog

$ head -n 2 data_cn*
==> data_cn1.json <==,
,==> data_cn2.json <==,hehe
,hehe

可以看到兩個檔案都生成了！而且還是按照我們想要的格式！

Scrapy 中文輸出與儲存

1 中文輸出 python3.x中中文資訊直接可以輸出處理 python2.x中採用中文encode gbk 或者encode utf 8 2 中文儲存在scrapy中對資料進行處理的檔案是pipelines.py 檔案，首先開啟專案設定檔案setting.py 配置pipelines。conf...

解決mysql中文儲存問題

show variables like character 檢視mysql當前編碼可以看出database和server的字符集使用的是latin1，latin1是不支援中文的，導致存放中文錯誤。處理方法設定server和database為utf8型別使用以下命令來設定在命令列輸入 show ...

解決mysql中文儲存問題

在mysql安裝目錄下先找到my.ini，給 mysql 和 mysqld 下的default character set賦值為utf8 即改為 default character set utf8 在建立表單的時候新增engine myisam default charset utf8 語句例如...

scrapy中輸出中文儲存中文

Scrapy 中文輸出與儲存

解決mysql中文儲存問題

解決mysql中文儲存問題

相關推薦