Pandas資料處理

資料處理**pandas：

from sklearn.preprocessing import minmaxscaler

data = [[-1,2],

[-0.5,6],

[0.,10],

[1,18]]

# 將 numpy 轉換成 pd 表

pd.dataframe(data)

# 歸一化【0，1】之間

scaler = minmaxscaler()  # 例項化
scaler = scaler.fit(data) # fit , 生成 min(x) 和 max(x)
result = scaler.transform(data)  # 通過介面匯出 結果
data2 = scaler.inverse_transform(result)  # 反轉回去到 原 data
scaler = minmaxscaler(feature_range=[5,10])  # 歸一化到 【5， 10】之間
result = scaler.fit_transform(data2)
data.info() # 檢視資料的基本情況

# 檢視是否存在 nan

df.isnull().values.any()

# 讀取原始資料

trian = pd.read_csv（path）

# drop_duplicates 去除重複項

all_customer = pd.dataframe(trian[['customer_id']]).drop_duplicates(['customer_id']).dropna()

# 轉成 pandas 格式的日期形式

trian['order_pay_time'] = pd.to_datetime( trian['order_pay_time'])
trian['order_pay_date'] = trian['order_pay_time'].dt.date

# 改變資料型別， .astype(str) <= 進行比較

trian['order_pay_date'].astype(str) <='2013-07-03'

# iloc[r_start:r_end , c_start: c_end] index 值

status2 = train.iloc[x,20]

# 修改某個值

all_customer.loc[all_customer['customer_id'] ==6 ,'mermber_status'] = mermber_status2

# 去除 customer_id = null 的所有行

train.drop(train[train['customer_id'].isnull()].index)

# 按照 customer_id 來分組，並計算每一組的 customer_id 的數量

customer_id = train.groupby(['customer_id'] )['count'].agg()

# 使用者是否已評價 0-未評，1-評價有缺失值nan

train.loc[train['is_customer_rate'].isnull() , 'is_customer_rate'] = 0

# 增加一列，並且指定值

all_customer['mermber_status'] = 'one_values'

pandas 資料處理

pandas中資料可以分為series，dataframe，panel分別表示一維至三維資料。其中在構造時，index表示行名，columns表示列名構造方式 s pd.series data index index s pd series np random randn 5 index a b ...

pandas資料處理

dataframe.duplicated subset none,keep first 判斷dataframe中的資料是否有重複必須一行中所有資料都重複才算重複，只能判斷行，不能判斷列返回series dataframe.drop duplicates subset none,keep firs...

Pandas資料處理

原始資料如下型別為datetime 要拆分上述資料，分別將日期和時間賦予到不同的列。原始資料為data，型別為datetime test1 hour data timestamp map lambda x x.strftime h test1 hour data timestamp map lam...

Pandas資料處理

pandas 資料處理

pandas資料處理

Pandas資料處理

相關推薦