pandas資料結構之DataFrame筆記

dataframe輸出的為表的形式，由於要把輸出的**貼上來比較麻煩，在此就不在貼出相關輸出結果，**在jupyter notebook可以順利執行

**中有相關解釋用來加深理解方便記憶

1
import
numpy as np
2import
pandas as pd
34 d = 
67 df = pd.dataframe(d) #
通過字典建立dataframe，其中'one'列，沒有'd'索引,所以賦值8#
為nan
910 df = pd.dataframe(d, index=['
d','
b','
a']) #
可以對原有的dataframe再建立，選取其中11#
的行索引，index表示行索引
1213 df = pd.dataframe(d, columns=['
two','
three
']) #
也可以對列進行操作，如果選擇的列14#
不存在，則自動新增到下一列15#
並賦值為nan，colums表示的是列索引
1617 d = #
字典建立的dataframe，鍵表示列索引，如果沒有給定行索19#
引，dataframe會賦預設值
20 df =pd.dataframe(d)
2122 data = [(1,2.2,'
hello
'),(2,3.,'
world
')] #
這個列表中的兩個元組將作為行值被操作
23 df = pd.dataframe(data,index=['
one','
two'],columns=list('
abc'
))24
25 data = [,] #
這個列表中的字典的鍵將被作為列索引26#
建立，每個字典分別作為行處理
2728 df = pd.dataframe(data,index=['
a','
b'],columns=['
a','
b','e'
]) 29#
再次定義列索引時，若字典中沒有對應的列將賦值為nan，對應的列位置也將被30#
cloums對應的列索引取代
3132 d = , #
字典中最外層作為鍵的元組的第乙個元素作為33#
第一行的列索引，
34 ('
a','
a'):, #
第二個元素作為第二行的列索引
35 ('
a','
c'):, #
作為值的字典，將被作為行操作，其中作為鍵的36#
元組第乙個元素
37 ('
b','
a'):, #
作為第一例的行索引，第二個元素作為第二列的38#
行索引39 ('
b','
b'):}
40 df =pd.dataframe(d)
4142 s = pd.series(np.random.randn(5), index=['
a','
b','
c','
d','e'
])43 pd.dataframe(s,columns=['
a'],index=list('
acd'
)) 44#
取series建立的表的一部分（a,c,d）
4546 df = pd.dataframe(np.random.randn(6,4),columns=['
one','
two','
three
','four'])
47 df['
one'] #
在dataframe中預設取列索引
48 df.loc[1] #
用loc函式取到的是行索引的值
49 df['
three
'] = df['
one']+df['
two'] #
通過索引賦值，元表改變，索引值可以相加
50del df['
three
'] #
也可以刪除索引值
51 df['
flag
'] = df['
one'] > 0.2 #
這是布林型索引
52 df['
five
'] = 5 #
對列索引賦單值時，這個列對應的所有行值為所賦的那個值
5354 s = df.pop('
four
') #
pop函式取出某個列也可以用在datafram中
55 df.insert(1,'
bar',df['
one']+df['
two'
]) 56#
1表示插入在1位置，'bar',插入的列名，df['one']+df['two']為列的值 df改變
5758 df.assign(ratio = df['
one']/df['
two'
]) 59#
assign函式新增末列其中ratio表示列索引，df['one']/df['two']為列值60#
但是df沒有改變 但是df沒有改變 但是df沒有改變 
6162 df.assign(ratio = lambda x: x.one-x.two) 63#
x表示整個表，x.one, x.two表示對應的列值
6465 df.assign(abratio = df.one / df.two).assign(barvalue = lambda x: x.abratio*x.bar) #
連續assign也可以
6667 df = pd.dataframe(np.random.randint(1,10,(6,4)),index=list('
abcdef
'),columns=list('
abcd
')) 
68 df['
a'] #
列索引取值
69 df.loc['
a'] #
行索引取值
70 df.iloc[1] #
通過數值取行索引值
71 df[1:4] #
通過行範圍索引取值
72 df.iloc[1:4] #
與df[1:4]效果一樣
73 df.a>4 #
判斷a列大於4的情況
74 df[df.a>=4] #
索引也可以是表示式，以表的形式返回大於4的值
7576 df1 = pd.dataframe(np.random.randn(10,4),index=list('
abcdefghij
'),columns=['
a','
b','
c','d'
])77 df2 = pd.dataframe(np.random.randn(7,3),index=list('
cdefghi
'),columns=['
a','
b','c'
])78 df1+df2 #
行列索引無法對應的取nan
79 df1-df1.iloc[0] #
dataform可以與單行相減
80 np.exp(df2) #
dataform可以使用numpy的函式
81np.sin(df2)
82 df2.values #
返回的是陣列
8384
type(df2.values)85#
輸出numpy.ndarray
8687 np.asarray(df2) ==df2.values88#
輸出89
array([[ true, true, true],
90[ true, true, true],
91[ true, true, true],
92[ true, true, true],
93[ true, true, true],
94[ true, true, true],
95 [ true, true, true]])

pandas資料結構之Series

series 是一種類似於一維陣列的物件，它由一組資料和一組與之相關的資料標籤 lable 或者說索引 index 組成。現在我們使用series生成乙個最簡單的series物件，因為沒有給series指定索引，所以此時會使用預設索引從0到n 1 from pandas import series...

Pandas資料結構之Series

import pandas as pd series類生成series類的方法 1.obj pd.series 4,7,5,3 obj2 pd.series 4,7,5,3 index a b c d print obj2.values,obj2.index print obj2 a print ...

pandas資料結構之Dataframe

綜述 numpy主要用於進行運算 dataframe更切合於業務邏輯 dataframe的常用的屬性屬性說明 shape dataframe的形狀 values dataframe的值,numpy.ndarray index 行索引 index.name 行索引的名字 columns 列索引 c...

pandas資料結構之DataFrame筆記

pandas資料結構之Series

Pandas資料結構之Series

pandas資料結構之Dataframe

相關推薦