pandas資料清洗

（1）檢視重複的行 df.duplicated()

（2）檢視某列重複的行df.duplicated(『列標籤』)

（3）刪除重複的行df.drop_duplicates()

（4）刪除某一列重複的行df.drop_duplicates(『列標籤』)

（1）判斷資料缺失df.isnull()

（2）資料未缺失df.notnull()

（3）刪除資料為空所對應的行 df.dropna()

（4）使用?替代缺失值df.fillna(』?』)

（5）使用前乙個資料值替代nan df.fillna(method=『pad』)

（6）使用後乙個資料值替代nan df.fillna(method=『bfill』)

（7）使用均值來填充資料 df.fillna(df.mean())

（8）使用選擇列的值為某列空值來填充資料 df.fillna(df.mean()[『列1』:『列2』])

用列2的平均值來填充列1的空值

（9）為不同的列填充不同的值來填充資料df.fillna()

（10）刪除字串左右首尾指定字元 df[』』].str.lstrip() df[』』].str.rstrip()

（1）字段抽取 slice(start,stop)

df[『**』].str.slice(0,3)

（2）字段拆分 split(step,n,expand=flase)

df[『ip』].str.strip()

newdf=df[『ip』].str.strip(』.』,1,true)

newdf.columns=[『ip1』,『ip2-ip4』]

df.set_index(『列名』)

df[condition(條件)]

df[df.**==13322252452]

df[df.**》13322252452]

df[df.**.between(13322252452,13999999999)]

r=numpy.random.randint(strat,end,num)

df.loc[r,:]

pandas資料清洗

df.query 查詢符合某個條件語句的 and or 新增一列的值等於df其中兩列的加和分組求和 df.groupby 可以指定某列進行求和df.groupby 姓名 df插入一列在指定索引方法一 df.insert 0,colname,value insert one col at firs...

Pandas的資料清洗

如果一列中含有多個型別,則該列的型別會是object,同樣字串型別的列也會被當成object型別.提取需要的2列資料 data statistic key data statistic time key 刪除空資料的行 data statistic key data statistic key.dr...

（pandas）評論資料清洗

df df.dropna subset comment 根據使用者id與comment兩列作為參照，如存在使用者id與comment同時相同，那麼只保留最開始出現的。df.drop duplicates subset user id comment keep first inplace true 重...

pandas資料清洗

pandas資料清洗

Pandas的資料清洗

（pandas）評論資料清洗

相關推薦