Python文字處理（1）

每次處理乙個字元

解決方法：

建立列表

thestring='abcdefg'
thelist=list(thestring)
print thelist

結果

['a', 'b', 'c', 'd', 'e', 'f', 'g']

使用for語句迴圈遍歷

thestring='abcdefg' for c in thestring:

print c

使用列表推導式（注意這裡使用ord表示將字元轉為字元值，例如a轉為97）

thestring='abcdefg'
results=map(ord,thestring)
print results

討論：

想要獲得字串中所有字元的集合，呼叫sets.set

import sets
magic_chars=sets.set('abracadabra')
poppins_chars=sets.set('supercalifragilisticexpialidocious')
print ''.join(magic_chars & poppins_chars)

字元和字元值之間的轉換

解決方法：

使用函式ord和chr

print ord('a')
print chr(97)

測試乙個物件是否為乙個類字串（物件是否有字串的行為模式）

解決方法：

使用isinstance和basestring檢查

def isastring(anobj):
return isinstance(anobj,basestring)
anobj='abcde'
print isastring(anobj)
otherobj=list(anobj)
print isastring(otherobj)

字串對齊（左對齊、居中對齊、右對齊）

解決方法：

使用string物件的ljust、rjust和center，引數指明寬度

print '|','hey'.ljust(20),'|','hey'.rjust(20),'|','hey'.center(20),'|'

討論：

可以不列印空格，而以其他字元列印，只需增加第二個引數

print 'hey'.center(20,'+')

去除字串兩端的空格

解決方法：

使用string物件的lstrip、rstrip和strip

x='     hey     '
print '|',x.lstrip(),'|',x.rstrip(),'|',x.strip(),'|'

合併字串

解決方法;

使用字串操作符join

x=['i','love','python']
largestring=' '.join(x)
print largestring

同樣，使用最基本的%也可以達到這樣的效果

x=('i','love','python')
largestring='%s %s %s !' % x
print largestring

討論：

當然，使用字串的+操作似乎能夠獲得更加簡潔的操作，但別忘了，在python中，字串是無法改變的，任何的改動都將會建立當前字串的乙個副本，當有大量的小段的字串相加時，所建立的副本正比於其平方，此時使用join方式就是乙個必要的選擇了。當需要在建立的新的字串中新增額外的內容時，使用%較為方便。

將字串逐字元或逐詞反轉

解決方法：

使用步長為-1的切片方法

astring='i love python'
revchars=astring[::-1]
print revchars
結果 nohtyp evol i

按照單詞進行反轉，則需要建立乙個單詞的列表，將列表反轉，最後使用join合併

astring='i love python'
revwords=' '.join(astring.split()[::-1])
print revwords
結果python love i

想要逐詞反轉但又不想改變原先的空格，使用正規表示式分割原字串

import re
astring='i love python'
revwords=' '.join(re.split(r'(\s+)',astring)[::-1])
print revwords
結果python love i

檢查字串中是否包含某字元集合中的字元

解決方法：

最簡單的方法如下

def containany(seq,aset):
for c in seq:
if c in aset:
return true
return false
seq='abc'
aset='hjkyuia'
print containany(seq,aset)

也可以使用基於標準庫itertools模組的方法，不過本質上使用的相同的方法

import itertools
def containany(seq,aset):
for item in itertools.ifilter(aset.__contains__,seq):
return true
return false
seq='abc'
aset='ghjka'
print containany(seq,aset)

檢查乙個字串是文字還是二進位制

解決方法：

還沒有精確的演算法，不過可以使用一些啟發式方法，如果字串中包含了空值或者有超過30%的高位被置為1或是奇怪的控制碼，就認為這段資料是二進位制資料

控制大小寫（大小寫轉換）

解決方法：

使用upper和lower方法比較簡單，但一般使用更多的是capitalize和title方法

print 'one two three'.capitalize()
print 'one two three'.title()
結果one two three
one two three

python 文字處理1

1.字元和字元值之間的轉換內建函式 ord chr print ord a 97 print chr 97 a 注意 ord 需要雙引號或者單引號，chr 不需要 chr n 與str n 區別 print str 97 97 print chr 97 a chr是將乙個小整數作為引數並返回對應a...

python文字處理

基本的文字操作在python中，可以用下列方式表現乙個文字字串 this is a literal string out 1 this is a literal string this is a literal string out 2 this is a literal string 使用3引用...

python 文字處理

我們談到文字處理時，我們通常是指處理的內容。python 將文字檔案的內容讀入可以操作的字串變數非常容易。檔案物件提供了三個讀方法 read readline 和 readlines 每種方法可以接受乙個變數以限制每次讀取的資料量，但它們通常不使用變數。read 每次讀取整個檔案，它通常用於...

Python文字處理（1）

python 文字處理1

python文字處理

python 文字處理

相關推薦