pandas資料分析入門

起始時間 start time (str型別:yyyy-mm-dd hh:mm:ss)

結束時間 end time (str型別:yyyy-mm-dd hh:mm:ss)

騎行時長 trip duration(int型別，秒)

起始車站 start station（例如百老匯街和巴里大道）

結束車站 end station（例如塞奇威克街和北大道）

使用者型別 user type（訂閱者subscriber/registered 或客戶customer/casual）

性別 gender (male,female)

出生年份 birth year

當得到乙份資料要對其進行分析，就會針對分析的主題提出一些問題，從中獲得一些分析結論。針對共享單車分析

提了一下問題：（ps：問題是我copy的）

1）起始時間（start time 列）中哪個月份最常見？

2）起始時間中，一周的哪一天（比如 monday, tuesday）最常見？

3）起始時間中，一天當中哪個小時最常見？

4）總騎行時長（trip duration）是多久，平均騎行時長是多久？

5）哪個起始車站（start station）最熱門，哪個結束車站（end station）最熱門？

6）哪一趟行程最熱門（即，哪乙個起始站點與結束站點的組合最熱門）？

7）每種使用者型別有多少人？

8）每種性別有多少人？

9）出生年份最早的是哪一年、最晚的是哪一年，最常見的是哪一年？

匯入相關的包

import numpy as np
import pandas as pd
import matplotlib as plt
city_data=

讀取資料

def read_file(city_name):
data=pd.read_csv(city_data[city_name])
return data,city_name

為了回答上面的問題，定義幾個自定義的函式：

1. time_stats()回答

1）起始時間（start time 列）中哪個月份最常見？

2）起始時間中，一周的哪一天（比如 monday, tuesday）最常見？

3）起始時間中，一天當中哪個小時最常見？

def time_stats(data):
data['start month']=data['start time'].str[5:7]
common_start_month=int(data['start month'].mode().values[0]) # 取眾數
data['weed_day']=pd.to_datetime(data['start time']).dt.weekday
common_weed_day=int(data['weed_day'].mode().values[0])
data['hour']=pd.to_datetime(data['start time']).dt.hour
common_hour=int(data['hour'].mode().values[0])
print('起始時間中%d月份最常見'%(common_start_month))
print('起始時間中,一周的周%d最常見'%(common_weed_day))
print('起始時間中,一天當中第%d小時最常見'%(common_hour))

2. station_stats() 回答

4）總騎行時長（trip duration）是多久，平均騎行時長是多久？

5）哪個起始車站（start station）最熱門，哪個結束車站（end station）最熱門？

6）哪一趟行程最熱門（即，哪乙個起始站點與結束站點的組合最熱門）？

def station_stats(data):
sum_trip_duration=data['trip duration'].sum()
mean_trip_duration=data['trip duration'].mean()
common_end_station=data['end station'].mode().values
common_start_station=data['start station'].mode().values
data['trip line']=data['start station']+'->'+data['end station']
common_trip_line=data['trip line'].mode().values
print('總騎行時長是%f,平均騎行時長是%f'%(sum_trip_duration,mean_trip_duration))
print("起始車站 %s最熱門，結束車站 %s最熱門"%(common_start_station[0],common_end_station[0]))
print('行程 %s最熱門'%(common_trip_line[0]))

3. usr_stats() 回答

7）每種使用者型別有多少人？

8）每種性別有多少人？

9）出生年份最早的是哪一年、最晚的是哪一年，最常見的是哪一年？

def usr_stats(data):
gender=data['gender'].value_counts()
user_type=data['user type'].value_counts()
lastest_year=int(data['birth year'].max())
earliest_year=int(data['birth year'].min())
common_year=int(data['birth year'].mode().values)
print('****男女分布:*****\n',gender)
print("****使用者型別分布:****\n",user_type)
print("出生年份最早的使用者是%d年、最晚的使用者是%d年，最常見的使用者是%d年" %(earliest_year,lastest_year,common_year))

最後執行python指令碼，得到結果

利用Python資料分析 pandas入門（三）

obj series range 3 index a b c index obj.index index index 1 index 1 d index物件是不能被修改的 index does not support mutable operations index pd.index np.aran...

資料分析 pandas

pandas是乙個強大的python資料分析的工具包，它是基於numpy構建的，正因pandas的出現，讓python語言也成為使用最廣泛而且強大的資料分析環境之一。pandas的主要功能具備對其功能的資料結構dataframe，series 整合時間序列功能提供豐富的數算和操作靈活處理缺失...

Python資料分析入門（三） Pandas介紹

那麼問題來了 numpy已經能夠幫助我們處理資料，能夠結合matplotlib解決我們資料分析的問題，那麼pandas學習的目的在什麼地方呢？numpy能夠幫我們處理處理數值型資料，但是這還不夠，很多時候，我們的資料除了數值之外，還有字串，還有時間序列等比如我們通過爬蟲獲取到了儲存在資料庫中的資...

pandas資料分析入門

利用Python資料分析 pandas入門（三）

資料分析 pandas

Python資料分析入門（三） Pandas介紹

相關推薦