python獲取csv檔案中某一列或者某些列

2021-09-12 20:07:17 字數 3023 閱讀 5503

把三個csv檔案中的feature值整合到乙個檔案中,同時新增相應的label。

# -*-coding:utf-8 -*-

import csv;

label1 = '1'

label2 = '2'

label3 = '3'

a = "feature1,feature2,feature3,feature4,feature5,feature6,feature7,feature8,feature9,feature10,label" + "\n"

with open("./dataset/datatime2.csv", 'a') as rfile:

rfile.writelines(a)

with open("./dataset/f02.csv", 'rb') as file:

a = file.readline().strip()

while a:

a = a + ',' + label1 + "\n"

#a = label1 + ',' + a + "\n"

with open("./dataset/datatime2.csv", 'a') as rfile:

rfile.writelines(a)

a = file.readline().strip()

with open("./dataset/g03.csv", 'rb') as file:

a = file.readline().strip()

while a:

a = a + ',' + label2 + "\n"

#a = label2 + ',' + a + "\n"

with open("./dataset/datatime2.csv", 'a') as rfile:

rfile.writelines(a)

a = file.readline().strip()

with open("./dataset/normal05.csv", 'rb') as file:

a = file.readline().strip()

while a:

a = a + ',' + label3 + "\n"

#a = label3 + ',' + a + "\n"

with open("./dataset/datatime2.csv", 'a') as rfile:

rfile.writelines(a)

a = file.readline().strip()

獲取csv檔案中某一列,下面可以獲得label為表頭的列中對應的所有數值。

filename = "./dataset/datatime2.csv"

list1 =

with open(filename, 'r') as file:

reader = csv.dictreader(file)

column = [row['label'] for row in reader]

獲取csv檔案中某些列,下面可以獲得除label表頭的對應列之外所有數值。

import pandas as pd

odata = pd.read_csv(filename)

y = odata['label']

x = odata.drop(['label'], axis=1) #除去label列之外的所有feature值

也可以處理成list[np.array]形式的資料。

filename = "./dataset/datatime2.csv"

list1 =

with open(filename, 'r') as file:

a = file.readline()

while a:

c = np.array(a.strip("\n").split(","))

也可以處理成tensor格式資料集

# -*-coding:utf-8 -*-

import tensorflow as tf

# 讀取的時候需要跳過第一行

filename = tf.train.string_input_producer(["./dataset/datatime.csv"])

reader = tf.textlinereader(skip_header_lines=1)

key, value = reader.read(filename)

record_defaults = [[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], tf.constant(, dtype=tf.int32)]

col1, col2, col3, col4, col5, col6, col7, col8, col9, col10, col11= tf.decode_csv(

value, record_defaults=record_defaults)

features = tf.stack([col1, col2, col3, col4, col5, col6, col7, col8, col9, col10])

with tf.session() as sess:

# start populating the filename queue.

coord = tf.train.coordinator()

threads = tf.train.start_queue_runners(coord=coord)

trainx =

trainy =

for i in range(81000):

# retrieve a single instance:

example, label = sess.run([features, col11])

coord.request_stop()

coord.join(threads)

#最後長度是81000,trainx是10個特徵

使用python獲取csv文字的某行或某列資料

csv是comma separated values的縮寫,是用文字檔案形式儲存的 資料,比如如下的 要提取其中某一列,可以用下面的 import csv with open a.csv rb as csvfile reader csv.reader csvfile column row 2 for...

使用python獲取csv文字的某行或某列資料

站長用python寫了乙個可以提取csv任一列的 歡迎使用。github鏈結 要提取其中某一列,可以用下面的 import csv with open a.csv rb as csvfile reader csv.reader csvfile column row 2 for row in read...

python 讀取 csv檔案某列遇到的問題

1.gbk codec can t decode byte 0xbd in position 3182 illegal multibyte sequence 換用 utf 8 編碼 2.utf 8 codec can t decode byte 0xae in position 3180 inval...