PANDAS idioms 根據多個條件選擇

import functools
import pandas as pd
import numpy as np
df = pd.read_excel("examples.xls")
# review what learned yesterday
df["level"] = np.where(df.年級 <= 2013, "old", "new")
df.to_excel("example_new.xls")
# spliting
# select the index satisfy some condition
df_new = df.loc[df.年級 > 2013]
# building criteria
# 選擇滿足多個條件的行， 這其實也是昨天的內容
df_new = df[(df.年級 == 2013) & (df.是否在職生 == 0)]
# 根據條件修改某列
df.loc[(df.年級 == 2013) | (df.學習形式** == 1), "註冊狀態"] = 1
# 根據條件增加某列
df["滿足條件"] = np.where((df.年級 == 2013) | (df.學習形式** == 1), "是", "否")
# 根據條件進行排序
df2 = pd.dataframe()
df2_sort = df2.loc[(df2.aaa-5.5).abs().argsort()]
df2_sort2 = df2.loc[(df2.aaa-5.5).argsort()]
a = df2.aaa # 這得到的是乙個series
print(df2_sort2)
# 多個條件選擇
crit1 = df2.aaa <= 5.5
crit2 = df2.bbb == 10
crit3 = df2.ccc > -40.0
critlist = [crit1, crit2, crit3]
allcrit = functools.reduce(lambda x, y: x & y, critlist) # reduce： x&y&z
print(df2.loc[allcrit])

今天的很多知識都是昨天提到過的，僅增加兩個知識點：
1. 根據某一列排序更快的寫法： df.loc[df.aaa.argsort()] #事實上這就是用argsort()函式先生成乙個index的array
2. 根據多個條件篩選，更快的寫法：
df.loc[functools.reduce(lambda x, y: x & y, critlist)] 
# lambda x, y: x & y是乙個整體，作為乙個function
# critlist作為sequence
# 對於reduce的解釋：for example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5)

**在

其他記錄：

（1）通過迴圈的方式新增某一列

def filter(df, target1, category_name1, target2,  category_name2, xls_name):
fangxiangma = 
for course0 in df.業務課二**.values:
if course0 in target1:
elif course0 in target2:
else:
df["方向碼"] = fangxiangma
df.to_excel("added"+xls_name)

（2）在組內排序

比如在同乙個"方向碼"內按照"總成績"排序

df.sort_values(['方向碼','總成績'], ascending = [true,false], inplace=true)

Openlayers 根據座標點畫點線圓多邊形

根據座標點畫點線圓多邊形，如下 geometry type point linestring polygon circle none bezier 繪製繪製圓形繪製線段繪製多邊形繪製單個點清除所有清除指定地圖部分 var map new ol.map controls ol.contr...

根據程序號查詢占用資源多的執行緒

根據程序號查詢占用資源多的執行緒 1.top h p 10365 程序號 shift h開啟show threads on功能，展示執行緒資源占用情況找到消耗cpu等最多的pid為 10599 2.printf x n 10599 2967 轉為16進製制 3.jstack 10365 grep ...

求助，awk如何根據多列去除重複的記錄

例子如下 0052850101003,20,285,410010001 0052850101003,20,281,410010001 0052850101003,22,280,410010001 0052850101003,21,20,410010001 0052850101003,21,28,41...

PANDAS idioms 根據多個條件選擇

Openlayers 根據座標點畫點線圓多邊形

根據程序號查詢占用資源多的執行緒

求助，awk如何根據多列去除重複的記錄

相關推薦