批量處理total text資料集格式

total-text資料集的格式不同於ctw-1500和icdar家族，後者是直接以座標的形式存放在.txt檔案中的，而total-text的標註格式長這樣，取其中一張圖的gt為例：

x: [
[115 503 494 115]
], y: [
[322 346 426 404]
], ornt: [u'm'
], transcriptions: [u'naughty'
]x: [
[734 1058 1061 744]
], y: [
[360 369 449 430]
], ornt: [u'm'
], transcriptions: [u'nuris'
]x: [
[558 682 682 557]
], y: [
[370 375 404 398]
], ornt: [u'm'
], transcriptions: [u'nuris'
]x: [
[562 595 651 687 653 637 604 588]
], y: [
[347 304 305 360 366 334 332 361]
], ornt: [u'c'
], transcriptions: [u'naughty'
]x: [
[603 632 630 603]
], y: [
[408 413 426 423]
], ornt: [u'h'
], transcriptions: [u'est'
]x: [
[599 638 637 596]
], y: [
[419 422 441 437]
], ornt: [u'h'
], transcriptions: [u'1996'
]x: [
[583 602 633 656 679 648 594 558]
], y: [
[410 445 445 411 428 476 472 432]
], ornt: [u'c'
], transcriptions: [u'warung'
]x: [
[543 583 660 701 691 653 592 557]
], y: [
[347 288 288 347 358 308 302 355]
], ornt: [u'#'
], transcriptions: [u'#'
]x: [
[557 580 640 683 698 649 583 537]
], y: [
[419 470 481 422 432 497 491 432]
], ornt: [u'#'
], transcriptions: [u'#'
]

分別存放所有x的座標、所有y的座標，文字的方向以及所包含字元的內容。

而ctw-1500或者icdar2015是（x1,y1,x2,y2,x3,y3,x4,y4…）這樣的形式直接給出的，（icdar15裡面最後還有個###代表忽略）所以這裡要做的是批量處理total-text的標註格式使其轉換成ctw-1500風格的。

還是直接上**：

#正規表示式庫
import re
import cv2
import os
import numpy as np
root_path =
'./'
_indexes =
sorted
([f.split(
'.')[0
]for f in os.listdir(os.path.join(root_path,
'train_rename_totaltext_labels_sqfree'))
])for index in _indexes:
print
('processing: '
+ index)
anno_file = os.path.join(root_path,
'train_rename_totaltext_labels_sqfree/'
)+ index +
'.txt'
with
open
(anno_file,
'r+'
)as f:
#lines是每個檔案中包含的內容
lines =
[line for line in f.readlines(
)if line.strip()]
single_list =
all_list =
for i, line in
enumerate
(lines)
:#if i == 0:
#continue
#parts是每一行包含的內容
parts = line.strip(
).split(
',')
xy_list =
for a, part in
enumerate
(parts)
:if a >1:
break
piece = part.strip(
).split(
',')
numberlist = re.findall(r'\d+'
,piece[0]
) xy_list.extend(numberlist)
length =
len(xy_list)
n =int(length /2)
x_list = xy_list[
:n] y_list = xy_list[n:
] single_list =
[none]*
(len
(x_list)
+len
(y_list)
) single_list[::
2]= x_list
single_list[1:
:2]= y_list
with
open
(anno_file,
'w')
as w:
for all_list_piece in all_list:
for string in all_list_piece:
w.write(string)
w.write(
',')
w.write(
'\n'
)

這樣剛才那個標註格式就會變成這樣：

115,322,503,346,494,426,115,404, 734,360,1058,369,1061,449,744,430, 558,370,682,375,682,404,557,398, 562,347,595,304,651,305,687,360,653,366,637,334,604,332,588,361, 603,408,632,413,630,426,603,423, 599,419,638,422,637,441,596,437, 583,410,602,445,633,445,656,411,679,428,648,476,594,472,558,432, 543,347,583,288,660,288,701,347,691,358,653,308,592,302,557,355,

557,419,580,470,640,481,683,422,698,432,649,497,583,491,537,432,

極大方便了後續的處理過程。

批量處理total text資料集格式

spring ibatis 批量處理資料

利用陣列處理批量資料

Pytorch之批量處理資料

批量處理total text資料集格式

spring ibatis 批量處理資料

利用陣列處理批量資料

Pytorch之批量處理資料

相關推薦