python將arff檔案轉為csv檔案

2021-10-10 10:43:10 字數 3066 閱讀 2274

資料集有可能是以arff格式(weka用的)儲存,一般的機器學習使用numpy,pandas和sklearn多一些,無法直接讀取檔案,所以需要scipy.io.arff.loadarff過渡下。

from scipy.io import arff

import pandas as pd

file_name=

'/users/schillerxu/documents/sourcecode/python/pandas/cm1.arff'

data,meta=arff.loadarff(file_name)

#print(data)

print

(meta)

df=pd.dataframe(data)

print

(df.head())

#print(df)

#儲存為csv檔案

# out_file='/users/schillerxu/documents/sourcecode/python/pandas/cm1.csv'

# output=pd.dataframe(df)

# output.to_csv(out_file,index=false)

程式執行的結果如下:

[running] python -u "/users/schillerxu/documents/sourcecode/python/pandas/arff_to_csv.py"

dataset: cm1

loc_blank's type is numeric

branch_count's type is numeric

call_pairs's type is numeric

loc_code_and_comment's type is numeric

loc_comments's type is numeric

condition_count's type is numeric

cyclomatic_complexity's type is numeric

cyclomatic_density's type is numeric

decision_count's type is numeric

decision_density's type is numeric

design_complexity's type is numeric

design_density's type is numeric

edge_count's type is numeric

essential_complexity's type is numeric

essential_density's type is numeric

loc_executable's type is numeric

parameter_count's type is numeric

halstead_content's type is numeric

halstead_difficulty's type is numeric

halstead_effort's type is numeric

halstead_error_est's type is numeric

halstead_length's type is numeric

halstead_level's type is numeric

halstead_prog_time's type is numeric

halstead_volume's type is numeric

maintenance_severity's type is numeric

modified_condition_count's type is numeric

multiple_condition_count's type is numeric

node_count's type is numeric

normalized_cylomatic_complexity's type is numeric

num_operands's type is numeric

num_operators's type is numeric

num_unique_operands's type is numeric

num_unique_operators's type is numeric

number_of_lines's type is numeric

percent_comments's type is numeric

loc_total's type is numeric

defective's type is nominal, range is (

'y', 'n'

) loc_blank branch_count call_pairs ... percent_comments loc_total defective

0 6.0 9.0 2.0 ... 4.00 25.0 b'n'

1 15.0 7.0 3.0 ... 39.22 32.0 b'y'

2 27.0 9.0 1.0 ... 47.27 33.0 b'y'

3 7.0 3.0 2.0 ... 0.00 12.0 b'n'

4 51.0 25.0 13.0 ... 11.67 106.0 b'n'

[5 rows x 38 columns]

[done] exited with code=0 in 0.664 seconds

可以明顯看到meta儲存的是資料集的基本資訊。

python載入arff檔案

生成arff檔案,csv轉為arff

一 什麼是arff格式檔案 1 arff是attribute relation file format縮寫,從英文本面也能大概看出什麼意思。它是weka資料探勘開源程式使用的一種檔案模式。由於weka是個很出色的資料探勘開源專案,所以使用的比較廣,這也無形中推廣了它的資料儲存格式。2 下面是weka...

python將nc檔案轉為tiff

import numpy as np import netcdf4 as nc from osgeo import gdal,osr var sa data r c users 13290 desktop soil data nc format var f nc.dataset data var l...

python3 將pdf檔案轉為text

pdf檔案儘管可以用python提取文字,但存在加密的情況,那種pdf就是解析不了的。另外pdf更類似於,所以即使可以用python提取,結果也容易有問題。所以效果不敢保證。在python3中解析pdf一般用pdfminer3k,就是pdfminer的python3版本。直接pip安裝即可 pip ...