Python（10）文字處理，正規表示式

os.getcwd() 返回當前目錄

>>> os.getcwd()


os.listdir(directory) 返回directory目錄中的檔名和子目錄列表

>>> a = os.getcwd()
>>> os.listdir(a)
['dlls', 'doc', 'include', 'lib', 'libs', 'license.txt', 'news.txt', 'python.exe', 'python3.dll', 'python36.dll', 'pythonw.exe', 'readme.txt', 'scripts', 'tcl', 'tools', 'vcruntime140.dll']

os.stat(path) 返回乙個元組，關於檔案的資訊

st_mode:檔案的訪問許可權

st_ino：節點數（unix）

st_dev：裝置號

st_nlink：鏈結號（unix）

st_uid：所有者的使用者id

st_gid：所有者的組id

st_size：檔案的大小

st_atime：最後訪問時間

st_mtime：最後修改時間

st_ctime：建立時間

>>> b = os.listdir(a)
>>> os.stat(b[1])
os.stat_result(st_mode=16895, st_ino=281474976800257, st_dev=3537546670, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1486357558, st_mtime=1486357558, st_ctime=1486357558)

os.path.split(path) 將路徑分割為符合當前作業系統的組成名稱。返回乙個元組

>>> os.path.split(a)

其實就是將最後的檔案或者目錄分割開
os.path.join(components) 將名稱鏈結成符合當前作業系統的路徑，
os.path.normcase(path) 規範化路徑的大小寫。unix下檔名是區分大小寫的，所以沒影響，但是在windows下，作業系統在比較檔名的時候是忽略大小寫的。在windows下，會返回乙個全小寫的路徑。
os.walk(top, topdown = true, onerror = none, followlinks = false)
可以自上而下或者自下而上迭代目錄樹。對每個目錄，建立乙個由dirpath、dirnames、filenames組成的三元組。
dirpath部分是乙個儲存目錄路徑的字串。
dirnames部分是dirpath中子目錄的列表，不包括"."和".."。
filenames是dirpath中每個非目錄檔案的乙個列表。
找到某個目錄下所有的pdf檔案：

#!/usr/bin/env python 3.6
import os, os.path #os系統相關的模組
import re #正規表示式的模組
def print_pdf(root, dirs, files):#找到pdf檔案，列印出來
for file in files:
path = os.path.join(root, file)
path = os.path.normcase(path) #拼湊乙個完整的路徑出來
if re.search(r".*\.pdf", path): #看看是不是pdf檔案
print(path)
#對當前目錄進行遍歷，並搜尋pdf檔案
for root, dirs, files in os.walk('.'):
print_pdf(root, dirs, files)

正規表示式的簡單實用例子：

import re #引入正規表示式的模組
#建立乙個字串元組供搜尋
s = ('***','abc***abc','xyx','abc','x.x','axa','a***xa','axxya')
a = filter((lambda s: re.match(r"***",s)), s)#match 完全匹配
print(*a) #***
a = filter((lambda s: re.search(r"***",s)), s)#search 搜尋
print(*a) #*** abc***abc a***xa
a = filter((lambda s: re.search(r"x.x",s)), s)#search 搜尋,'.'匹配任意字元
print(*a) #*** abc***abc xyx x.x a***xa
a = filter((lambda s: re.search(r"x\.x",s)), s)#search 搜尋,'\.'就是.了
print(*a) #x.x
a = filter((lambda s: re.search(r"x.*x",s)), s)#search 搜尋,*匹配任意次數
print(*a) #*** abc***abc xyx x.x a***xa axxya
a = filter((lambda s: re.search(r"x.+x",s)), s)#search 搜尋,+至少出現一次
print(*a) #*** abc***abc xyx x.x a***xa
a = filter((lambda s: re.search(r"c+",s)), s)#search 搜尋,至少有乙個c
print(*a) #abc***abc abc
#用表示要匹配的特殊字符集，用^表示非
#要用^和$在開頭和結尾表示從頭到尾不包含c字元
a = filter((lambda s: re.search(r"^[^c]*$",s)), s)#search 搜尋,至少有乙個c
print(*a) #*** xyx x.x axa a***xa axxya

9 文字處理

root localhost cat 選項檔名或者 root localhost cat 檔案1 檔案2 檔案3 前者用於顯示檔案的內容 a 相當於 vet 選項的整合，用於列出所有隱藏符號 e 列出每行結尾的回車符 n 對輸出的所有行進行編號 b 同 n 不同，此選項表示只對非空行進行編號。t...

python文字處理

基本的文字操作在python中，可以用下列方式表現乙個文字字串 this is a literal string out 1 this is a literal string this is a literal string out 2 this is a literal string 使用3引用...

python 文字處理

我們談到文字處理時，我們通常是指處理的內容。python 將文字檔案的內容讀入可以操作的字串變數非常容易。檔案物件提供了三個讀方法 read readline 和 readlines 每種方法可以接受乙個變數以限制每次讀取的資料量，但它們通常不使用變數。read 每次讀取整個檔案，它通常用於...

Python（10） 文字處理，正規表示式

9 文字處理

python文字處理

python 文字處理

相關推薦

Python（10）文字處理，正規表示式