Python3處理檔案中每個詞的方法

''''' 
created on dec 21, 2012 
處理檔案中的每個詞 
@author: liury_lab 
''' 
import codecs 
the_file = codecs.open('d:/text.txt', 'ru', 'utf-8') 
for line in the_file: 
for word in line.split(): 
print(word, end = "|") 
the_file.close() 
# 若詞的定義有變，可使用正規表示式 
# 如詞被定義為數字字母，連字元或單引號構成的序列 
import re 
the_file = codecs.open('d:/text.txt', 'ru', 'utf-8') 
print() 
print('*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*******') 
re_word = re.compile('[w'-]+') 
for line in the_file: 
for word in re_word.finditer(line): 
print(word.group(0), end = "|") 
the_file.close() 
# 封裝成迭代器 
def words_of_file(file_path, line_to_words = str.split): 
the_file = codecs.open('d:/text.txt', 'ru', 'utf-8') 
for line in the_file: 
for word in line_to_words(line): 
yield word 
the_file.close() 
print() 
print('*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*******') 
for word in words_of_file('d:/text.txt'): 
print(word, end = '|') 
def words_by_re(file_path, repattern = '[w'-]+'): 
the_file = codecs.open('d:/text.txt', 'ru', 'utf-8') 
re_word = re.compile('[w'-]+') 
def line_to_words(line): 
for mo in re_word.finditer(line): 
yield mo.group(0) # 原書為return，發現結果不對，改為yield 
return words_of_file(file_path, line_to_words) 
print() 
print('*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*****
*******') 
for word in words_by_re('d:/text.txt'): 
print(word, end = '|')

python3 處理檔案

fhand open text.txt python裡面的open 函式返回乙個file handler,如果你print fhand 的話，得到一些跟檔案有關的資訊 name text.txt mode r encoding us ascii 今天實現了乙個讀取每一行，分別輸出並統計行數的功能 f...

Python3處理HTTP請求

python3處理http請求的包 http.client，urllib，urllib3，requests 其中，http 比較 low level，一般不直接使用 urllib更 high level一點，屬於標準庫。urllib3跟urllib類似，擁有一些重要特性而且易於使用，但是屬於擴充套件...

Python3處理日期與時間

import time 獲取當前時間的時間戳 print time.time 獲取10位時間戳 print int time.time 獲取13位時間戳示例 import time 時間戳結構化時間元組 print time.localtime print time.localtime time...

Python3處理檔案中每個詞的方法

python3 處理檔案

Python3處理HTTP請求

Python3處理日期與時間

相關推薦