pandas學習摘要二

時間序列

按照列索引合併

in [16]: df1 = pd.dataframe(np.ones((2,4)),index=["a","b"],columns=list('abcd'))
in [17]: df1
out[17]:
a b c d
a 1.0 1.0 1.0 1.0
b 1.0 1.0 1.0 1.0
in [18]: df2 = pd.dataframe(np.ones((3,3)),index=["a","b","c"],columns=list('xyz'))
in [19]: df2
out[19]:
x y z
a 1.0 1.0 1.0
b 1.0 1.0 1.0
c 1.0 1.0 1.0
in [20]: df2.join(df1)
out[20]:
x y z a b c d
a 1.0 1.0 1.0 1.0 1.0 1.0 1.0
b 1.0 1.0 1.0 1.0 1.0 1.0 1.0
c 1.0 1.0 1.0 nan nan nan nan
in [21]: df1.join(df2)
out[21]:
a b c d x y z
a 1.0 1.0 1.0 1.0 1.0 1.0 1.0
b 1.0 1.0 1.0 1.0 1.0 1.0 1.0
in [22]:
in [23]: df3 = pd.dataframe(np.zeros((3,3)),columns=list("fax"))
in [24]: df3
out[24]:
f a x
0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
in [25]: df1.merge(df3,on='a')
out[25]:
empty dataframe
columns: [a, b, c, d, f, x]
index:

聚合：

sum

mean

median

std、var

min、max

多個分組

返回dataframe型別

grouped1 = data[
["brand"]]
.groupby(by=
[data[
"country"
],data[
"state/province"]]
).count(
)grouped2 = data.groupby(by=
[data[
"country"
],data[
"state/province"]]
)[["brand"]]
.count(
)grouped3 = data.groupby(by=
[data[
"country"
],data[
"state/province"]]
).count()[
["brand"
]]

指定index

重新設定index

指定某一列作為索引

返回index 的唯一值

index是乙個可迭代物件

復合索引中取值

按照標籤取值的話series

dataframe

freq: 頻率天/月/年為單位

m: 月

一般是start和end / start和periods一起使用， end和periods一般不一起使用

in [3]: import pandas as pd
in [4]: pd.date_range(start="20200826",end='20200930',freq="d")
out[4]:
datetimeindex(['2020-08-26', '2020-08-27', '2020-08-28', '2020-08-29',
'2020-08-30', '2020-08-31', '2020-09-01', '2020-09-02',
'2020-09-03', '2020-09-04', '2020-09-05', '2020-09-06',
'2020-09-07', '2020-09-08', '2020-09-09', '2020-09-10',
'2020-09-11', '2020-09-12', '2020-09-13', '2020-09-14',
'2020-09-15', '2020-09-16', '2020-09-17', '2020-09-18',
'2020-09-19', '2020-09-20', '2020-09-21', '2020-09-22',
'2020-09-23', '2020-09-24', '2020-09-25', '2020-09-26',
'2020-09-27', '2020-09-28', '2020-09-29', '2020-09-30'],
dtype='datetime64[ns]', freq='d')
in [5]: pd.date_range(start="20200826",end='20200930',freq="10d")
out[5]: datetimeindex(['2020-08-26', '2020-09-05', '2020-09-15', '2020-09-25'], dtype='datetime64[ns]', freq='10d')
in [6]: pd.date_range(start="20200826",periods=10,freq="d")
out[6]:
datetimeindex(['2020-08-26', '2020-08-27', '2020-08-28', '2020-08-29',
'2020-08-30', '2020-08-31', '2020-09-01', '2020-09-02',
'2020-09-03', '2020-09-04'],
dtype='datetime64[ns]', freq='d')
in [7]: pd.date_range(start="20200826",periods=10,freq="m")
out[7]:
datetimeindex(['2020-08-31', '2020-09-30', '2020-10-31', '2020-11-30',
'2020-12-31', '2021-01-31', '2021-02-28', '2021-03-31',
'2021-04-30', '2021-05-31'],
dtype='datetime64[ns]', freq='m')

pd.periodindex()

pandas 重取樣

Pandas學習筆記（二）

注意講述如何獲取乙個dataframe的某些行和某些列注意value count 0方法，可以利用這個函式獲知某一列的各個取值的個數主要講述如何按照人為設定的條件去選取dataframe的部分行和部分列。示例語句 is noise complaints complaint type noise ...

入門學習（二）Pandas

import pandas as pd import numpy as nytrucks data trucks print trucks 所得如下 1712 before back yes daylight 2613 after back yes daylight 3192 before forw...

學習筆記 Pandas（二）

選擇資料 1 loc 通過行和列的名字來獲取值注意 iloc使用索引定位時，會按照索引規則取值，如 1 5 會取出1，2，3，4這4個值 loc按照label標籤取值，如 a c 則a，b，c都取到 3 at 通過行列標籤獲得指定值獲得單個資料 4 iat 通過行列標籤的整數索引來選取指定值 5...

pandas學習摘要二

Pandas學習筆記（二）

入門學習（二）Pandas

學習筆記 Pandas（二）

相關推薦