pandas記錄之缺失資料

資料讀取（帶資料型別）

import pandas as pd
df=pd.read_csv(
'data/missing_data_two.csv'
).convert_dtypes(
)df.info(
)<
class
'pandas.core.frame.dataframe'
>
rangeindex:
36 entries,
0 to 35
data columns (total 6 columns)
:# column non-null count dtype --
----
----
----
----
----
----
--0 編號 36 non-null int64 
1 地區 36 non-null string 
2 身高 36 non-null float64
3 體重 28 non-null float64
4 年齡 27 non-null int64 
5 工資 28 non-null int64 
dtypes: int64(3)
, float64(2)
, string(1)
memory usage:
1.9 kb

通過***convert_dtypes()***能夠忽略空值對資料型別的影響，進行資料計算或者寫入資料庫時不用再進行資料型別的轉換

3. 缺失值統計

df.isna().
sum(
)編號 0
地區 0
身高 0
體重 8
年齡 9
工資 8
dtype: int64

選擇有空值的行

#any 至少有乙個缺失值
4. 缺失值處理
#使用前值填充
df['年齡'
].fillna(method=
'ffill'
).head()0
471252
253774
62name: 年齡, dtype: int64
#使用後值填充
df['體重'
].fillna(method=
'bfill'
).head()0
91.80
191.80
262.18
359.95
478.42
name: 體重, dtype: float64
#刪除 subset 刪除『工資『為空的行資料
df2=df.dropna(subset=
['工資'])
df2.head(
)#其它待補充：均值、0、最小值、方差等
具體缺失值處理方法需根據實際資料情況及業務需要
5. 按條件賦值（常用）
根據地區填充空值
df.loc[條件,列]
xs=
(df[
(df[
'地區']==
'a')
&(df[
'體重'
].notna())
]['身高'
]/df[
(df[
'地區']==
'a')
&(df[
'體重'
].notna())
]['體重'])
.mean(
)df.loc[df[
'地區']==
'a',
'體重'
].fillna(df[
'身高'
]/xs)
061.330405
359.950000
578.420000
775.420000
1192.300000
1564.515692
1865.550000
2149.170000
2364.465070
2557.990000
2975.360000
3387.000000
name: 體重, dtype: float64
#錯誤一
df[df[
'地區']==
'a'& df[
'體重'
].notna()]
typeerror traceback (most recent call last)
273# (xint or xbool) and (yint or bool)--
>
274 result = op(x, y)
275except typeerror:..
....
... typeerror: cannot perform 'rand_'
with a dtyped [
bool
] array and scalar of type
[bool
]#錯誤寫法二
df[(df[
'地區']==
'a')
and(df[
'體重'
].notna())
]---
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
----
valueerror traceback (most recent call last)
input-56
-d949835001af>
in--
-->
1 df[
(df[
'地區']==
'a')
and(df[
'體重'
].notna())
]1477
def__nonzero__
(self)
:1478
raise valueerror(
->
1479 f"the truth value of a is ambiguous. "
1480
"use a.empty, a.bool(), a.item(), a.any() or a.all()."
1481
)valueerror: the truth value of a series is ambiguous. use a.empty, a.
bool()
, a.item(
), a.
any(
)or a.
all(
).
#正確寫法
df[(df[
'地區']==
'a')
&(df[
'體重'
].notna())
]
 pandas學習之缺失資料
今天總結下缺失資料處理心得。在拿到資料 拼接資料 彙總資料時，一定要使用df.isna sum 或df.isna mean 觀察是否存在缺失資料，後乙個方法還可以檢測出缺失資料佔比。我今天在使用groupby彙總資料時，忘記檢視缺失資料，導致結果不全，白白浪費1個小時時間 因時間關係，我就簡單講解下...
Pandas 缺失資料
一.處理缺失資料 二.濾除缺失資料 三.填充缺失資料 方法說明 dropna根據各標籤的值中是否存在缺失資料對軸標籤進行過濾，可通過閾值調節對缺失值的容忍度 fillna用指定值或插值方法 如 ffill 或 bfill 填充缺失資料 isnull返回乙個含有布林值的物件，這些布林值表示哪些值時預設...
Pandas缺失資料
一 缺失值的統計和刪除 缺失資訊的統計 資料處理中經常需要根據缺失值的大小 比例或其他特徵來進行行樣本或列特徵的刪除，pandas中提供了dropna函式來進行操作。dropna的主要引數為軸方向axis 預設為0，即刪除行 刪除方式how 刪除的非缺失值個數閾值thresh 非 缺 失 值 col...

pandas記錄之缺失資料

pandas學習之缺失資料

Pandas 缺失資料

Pandas缺失資料

相關推薦