資料預處理 json csv

現在，我們將探索將json資料轉換為**格式。在nobel-laureates.json中，我們擁有900多位諾貝爾獎獲得者的資料。讓我們載入它，看看：

nobel_laureates = json.load(
open
("data/nobel-laureates.json"
,"r"))
pprint(nobel_laureates[
"laureates"][
:2])

輸出如下：

[ ], 'category' :'physics', 'motivation' :'"in recognition of the extraordinary services he ' 'has rendered by the discovery of the remarkable ' 'rays subsequently named after him"', 'share' :'1', 'year' :'1901'}], 'surname' :'röntgen'}, ], 'category' :'physics', 'motivation' :'"in recognition of the extraordinary service they ' 'rendered by their researches into the influence ' 'of magnetism upon radiation phenomena"', 'share' :'2', 'year' :'1902'}], 'surname' :'lorentz'

}]

我們希望將其中一些資料放入表中。我們對每個獲獎者的名字，姓氏和獲獎年份感興趣。但是，如果我們檢查資料，就會莫名其妙地發現有些條目沒有任何資訊，例如它們甚至沒有列出名稱。因此，我們需要將此條件構建到**中：有效的獲獎者必須具有名字。有時，獎項是頒發給組織的，組織沒有姓。在這些情況下，我們將姓氏留空。一些獲獎者（例如居里夫人）可能在不同年份獲得了多個諾貝爾獎。對於這些情況，我們希望將year字串構建為包含不同年份，並在其之間使用空格。牢記所有這些注意事項，我們可以將資料轉換為**格式：

laureates_table =
for n, laureate in
enumerate
(nobel_laureates[
"laureates"])
:if"firstname"
in laureate.keys():
if"surname"
in laureate.keys():
surname = laureate[
"surname"
]else
: surname =
"" years =
for prize in laureate[
"prizes"]:
"year"])
iflen
(years)==1
: years = years[0]
else
: years =
" ".join(years)
firstname = laureate[
"firstname"
] row =
[surname, firstname, years]
laureates_table[:10
]

輸出：

[
['röntgen', 'wilhelm conrad', '1901'
], [
'lorentz', 'hendrik antoon', '1902'
], [
'zeeman', 'pieter', '1902'
], [
'becquerel', 'antoine henri', '1903'
], [
'curie', 'pierre', '1903'
], [
'curie, née sklodowska', 'marie', '1903 1911'
], [
'(john william strutt)', 'lord rayleigh', '1904'
], [
'von lenard', 'philipp eduard anton', '1905'
], [
'thomson', 'joseph john', '1906'
], [
'michelson', 'albert abraham', '1907'
]]

一些獲獎者的姓名可能實際上包含逗號。因此，在這種情況下，逗號不能很好地分隔資料。相反，我們可以使用製表符**\t**作為分隔符，並將此資料儲存到「製表符分隔值」或「 .tsv」檔案中。

with
open
("data/nobel-laureates-info.tsv"
,"w"
)as f:
for laureate in laureates_table:
f.write(
"\t"
.join(laureate)
+"\n"
)

資料預處理 json csv

mongo匯出資料（json csv）

資料預處理

資料預處理

相關推薦