把字串離散化

2022-04-01 03:40:25 字數 2988 閱讀 9177

1.獲取字串的去重後列表

2.構造全為0的陣列(dataframe), columns為字串的列表

3.給全為0的陣列賦值

第一步

import

pandas as pd

import

numpy as np

df = pd.dataframe()

#print(df)

print('

=' * 40)

print(df['c'

])"""

0 one,two,three

1 one,two

2 two,four

3 two,five,four,six

4 seven,eight,one

5 nine,ten,six,four

6 ten,six,two,seven

name: c, dtype: object

"""a = df['

c'].str.split(','

)print

(a)"""

0 [one, two, three]

1 [one, two]

2 [two, four]

3 [two, five, four, six]

4 [seven, eight, one]

5 [nine, ten, six, four]

6 [ten, six, two, seven]

name: c, dtype: object

"""print('

=' * 50)

a_lst = df['

c'].str.split(','

).tolist()

print

(a_lst)

#[['one', 'two', 'three'], ['one', 'two'], ['two', 'four'],

#['two', 'five', 'four', 'six'], ['seven', 'eight', 'one'],

#['nine', 'ten', 'six', 'four'], ['ten', 'six', 'two', 'seven']]

print('

*' * 60)

new_lst =

for i in

a_lst:

for j in

i:

if j not

innew_lst:

print

(new_lst)

#['one', 'two', 'three', 'four', 'five',

#'six', 'seven', 'eight', 'nine', 'ten']

第二步

df_zeros = pd.dataframe(data=np.zeros((df.shape[0], len(new_lst))), columns=new_lst)

print

(df_zeros)

"""one two three four five six seven eight nine ten

0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

"""

方法二(資料量大的情況下使用)

for i in

new_lst:

df_zeros[i][df['c

'].str.contains(i)] = 1

print(df_zeros)

第三步

for i in

range(df_zeros.shape[0]):

df_zeros.loc[i, a_lst[i]] = 1

print

(df_zeros)

"""one two three four five six seven eight nine ten

0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

1 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

2 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0

3 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0

4 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0

5 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0

6 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0

"""

資料分析 series字串離散化

問題 1 假設dataframe中有一列名為type,其字段中內容為a,b,c 等用,隔開的值,如 type a,b,c a,f,x b,c,e 統計type中每個型別出現的次數 並繪圖 import pandas as pd import numpy as np from matplotlib i...

06 統計方法和字串離散化

假設現在我們有一組從2006年1000部最流行的電影資料,我們想知道這些電影資料中的評分的平均分,導演的人數等資訊,我們應該怎麼獲取?import pandas as pd from matplotlib import pyplot as plt file path imdb movie data....

python統計電影分類(字串離散化案例)

以下兩句是顯示中文的方法 from pylab import mpl.rcparams font.sans serif simhei 有效的方法 file path c users ming desktop dataanalysis master day05 code imdb movie data...