pandas的基本操作（三）

**參考自《python3人工智慧入門到實戰破冰》

首先，建立乙個dataframe型別的資料，並且賦予缺失值。

import pandas as pd
import numpy as np
df=pd.dataframe(np.random.randint(1,
10,[5
,3])
,index=
['a'
,'c'
,'e'
,'f'
,'h'
],columns=
['one'
,'two'
,'three'])
df.loc[
'a',
'one'
]=np.nan
df.loc[
'c',
'two'
]=np.nan
df.loc[
'c',
'three'
]=np.nan
df.loc[
'a',
'two'
]=np.nan
df['four']=
'bar'
df['five'
]=df[
'one'
]>
0df2=df.reindex(
['a'
,'b'
,'c'
,'d'
,'e'
,'f'
,'g'
,'h'])
print
(df2)

看一下輸出結果：

one two three four five a nan nan 8.0 bar false b nan nan nan nan nan c 2.0 nan nan bar true d nan nan nan nan nan e 5.0 2.02.0 bar true f 1.0 3.03.0 bar true g nan nan nan nan nan h 3.0

9.04.0 bar true

有好多的nan值。

dropna()函式

1 刪除缺失值所在的行（列）。對，你沒看錯，就是整行整列的刪除。可選引數axis=0,1 。

0代表刪除行，1代表刪除列，預設0。

刪除缺失行效果print(df2.dropna(axis=0))

one two three four five c 6.0 -90.0 -80.0 bar true e 2.0 1.07.0 bar true f 7.0 9.01.0 bar true h 4.0

2.03.0 bar true

刪除缺失列效果

empty dataframe
columns:
index:
[a, b, c, d, e, f, g, h]

每一列都有缺失值，所有全部資料刪除。

2 刪除一行中全部為nan的元素，只有一行全部為nan，才刪除。dropna(how=『all』)

3 設定閾值，刪除欄位中屬性值小於4的行。什麼意思呢？就是非nan值的數量不小於4的行。

print
(df2.dropna(thresh=4)
)

結果：

one two three four five a nan - 100.0 9.0 bar false c 5.0 -90.0 -80.0 bar true e 5.0 8.09.0 bar true f 4.0 4.09.0 bar true h 9.0

7.04.0 bar true

4 刪除指定列值為空資料的行print(df2.dropna(subset=['one','five']))

one two three four five c 1.0 -90.0 -80.0 bar true e 1.0 3.05.0 bar true f 5.0 1.01.0 bar true h 3.0

2.05.0 bar true

fillna() 將空值賦值為指定值

例如：print(df2.fillna(1))將所有空值替換為1，還可以嘗試替換為其他。

結果：

one two three four five a 1.0 -100.0 9.0 bar false b 1.0 1.01.011 c 7.0 -90.0 -80.0 bar true d 1.0 1.01.011 e 8.0 6.04.0 bar true f 4.0 8.01.0 bar true g 1.0 1.01.011 h 8.0

7.07.0 bar true

重點來了,replace()函式可將指定值替換。例如，我們將-100 替換為+100。print(df2.replace(-100,100))

one two three four five a nan 100.0 7.0 bar false b nan nan nan nan nan c 5.0 -90.0 -80.0 bar true d nan nan nan nan nan e 9.0 1.09.0 bar true f 9.0 9.07.0 bar true g nan nan nan nan nan h 1.0

7.05.0 bar true

當然，我們還可以通過前面說過的，選的行列。替換指定位置的資料。

duplicated() 和 drop_duplicates()。

print
(df2.duplicated())
print
(df2.drop_duplicates(
))

結果：

a false b false c false d true e false f false g true h false dtype: bool one two three four five a nan - 100.0 4.0 bar false b nan nan nan nan nan c 4.0 -90.0 -80.0 bar true e 1.0 1.08.0 bar true f 4.0 2.09.0 bar true h 1.0

7.02.0 bar true

pandas的基本操作（三）

Pandas的基本操作（三）

pandas的基本操作

pandas的基本操作

相關推薦