唐詩片語頻率,高頻字居然跟宋詞相似

2022-07-19 03:54:14 字數 3469 閱讀 9653

既然r語言的程式已經能執行了,那直接把唐詩的也統計一下。(有空還是用c++寫一下,r非常簡潔,但是判斷不是很精確。)

l = scan("tangshi.txt", "character", sep = "\n");

l.len = nchar(l);

# 某些行是作者和標題,所以選取長度大於10的行;

# 另外這個文字檔案不太規整,有些**什麼的,

# 所以也要排除那些長度太長的。

ci = l[l.len > 10 & l.len < 500];

# 句子用標點符號分割。

sentences = strsplit(ci, ",|。|!|?|、");

sentences = unlist(sentences);

sentences = sentences[sentences != ""];

s.len = nchar(sentences);

#獲取的分詞的長度

group = 2;

兩個詞"word" "freq"

"1" "何處" 1653

"2" "不知" 1457

"3" "萬里" 1439

"6" "千里" 1294

"7" "今日" 1150

"8" "不見" 1139

"9" "不可" 1133

"10" "春風" 1118

"11" "白雲" 1099

"12" "不得" 942

"13" "明月" 888

"14" "人間" 879

"15" "無人" 869

"16" "風吹" 831

"17" "故人" 784

"18" "惆悵" 768

"19" "秋風" 745

"20" "悠悠" 733

"21" "相思" 723

"22" "長安" 721

"23" "白日" 687

"24" "如何" 683

"25" "十年" 674

"26" "青山" 662

"27" "何人" 655

"28" "少年" 628

"29" "相逢" 627

"30" "平生" 585

"31" "寂寞" 584

"32" "天子" 584

"33" "天地" 581

"34" "**" 578

"35" "年年" 578

"36" "人不" 576

"37" "何事" 573

"38" "江上" 555

"39" "流水" 548

"40" "回首" 531

"41" "可憐" 531

"42" "主人" 521

"43" "如此" 520

"44" "白髮" 516

"45" "今朝" 513

"46" "從此" 503

"47" "日月" 502

"48" "月明" 502

"49" "行人" 500

"50" "落日" 493

"51" "不如" 492

"52" "將軍" 492

"53" "歸去" 489

"54" "日暮" 482

"55" "別離" 478

"56" "洛陽" 476

"57" "不能" 471

"58" "此時" 470

"59" "天下" 470

"60" "何時" 469

"61" "無事" 467

"62" "芳草" 466

"63" "江南" 463

"64" "相見" 462

"65" "歸來" 461

"66" "夕陽" 458

"67" "當時" 454

"68" "楊柳" 451

"69" "風雨" 448

"70" "》)" 445

"71" "東風" 436

"72" "洞庭" 433

"73" "青雲" 432

"74" "花落" 428

"75" "參差" 427

"76" "天涯" 426

"77" "芙蓉" 425

"78" "落花" 424

"79" "清風" 421

"80" "不是" 416

"81" "煙霞" 416

"82" "三十" 414

"83" "白頭" 413

"84" "桃花" 411

"85" "不相" 410

"86" "唯有" 407

"87" "何如" 404

"88" "南山" 397

"89" "誰能" 395

"90" "君不" 394

"91" "千年" 391

"92" "天上" 389

"93" "如今" 385

"94" "花開" 382

"95" "桃李" 380

"96" "與君" 380

"97" "此地" 378

"98" "殷勤" 378

"99" "浮雲" 376

"100" "君王" 375

3個詞"word" "freq"

"6" "君不見" 224

"11" "不知何" 127

"13" "行路難" 108

"14" "三千里" 108

"17" "不可見" 100

"22" "知何處" 90

"23" "在何處" 89

"24" "二十年" 87

"28" "三十六" 85

"30" "三十年" 75

"31" "無訊息" 74

"32" "不相見" 73

"33" "何處去" 70

"34" "無一事" 70

"35" "洛陽城" 69

"36" "千萬里" 69

"38" "何處是" 68

"40" "水東流" 67

"44" "歸未得" 65

"45" "向人間" 63

"46" "歌一曲" 62

"49" "千里外" 61

"50" "一杯酒" 61

"52" "明月夜" 58

"53" "歸何處" 57

"54" "從此去" 56

"55" "東風吹" 56

"56" "今何在" 55

"57" "皮日休" 55

"58" "人不知" 55

"59" "春風吹" 54

"61" "不知誰" 53

"62" "草萋萋" 53

"63" "歸去來" 53

"64" "不得意" 52

"65" "人不見" 52

"66" "無人知" 52

"67" "長安道" 52

"68" "復何如" 51

"69" "人間事" 51

"70" "與君同" 51

python古詩詞生成 唐詩生成器

使用唐詩語料庫,經過去噪預處理 分詞 生成搭配 生成主題等過程,生成唐詩。環境python 2.7 flask jieba 執行方法 pip install flask pip install jieba python preprocess.py python get collocations.py...

Linux使用頻率高的命令

rz be 以二進位制形式上傳檔案 chmod who opt what file 給某個檔案新增許可權 例如 chmod a x hello.sh 給hello.s 件新增所有使用者可執行許可權 who u 檔案所有者 g 同組使用者 o 其他使用者 a 所有使用者 opt 新增許可權 取消許可權...

頻率域濾波 2 高通濾波

快速傅利葉變換 void fft2image inputarray src,outputarray dst 求傅利葉變換的幅度譜 amplitude spectrum void amplitudespectrum inputarray srcfft,outputarray dstspectrum 傅...