Flume採集目錄及檔案到HDFS案例

2021-08-21 07:04:29 字數 3034 閱讀 6686

使用flume採集目錄需要啟動hdfs集群

vi spool-hdfs.conf

# name the components on this

agent

a1.sources =r1

a1.sinks =k1

a1.channels =c1

# describe/configure the source

##注意:不能往監控目中重複丟同名檔案

a1.sources.r1.type =spooldir

a1.sources.r1.spooldir = /root/logs2

a1.sources.r1.fileheader = true

# describe the sink

a1.sinks.k1.type =hdfs

a1.sinks.k1.channel =c1

a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%h%m/a1.sinks.k1.hdfs.fileprefix = events-

#控制資料夾的滾動頻率

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundvalue = 10a1.sinks.k1.hdfs.roundunit =minute

#控制檔案的滾動頻率

a1.sinks.k1.hdfs.rollinterval = 3 #時間維度a1.sinks.k1.hdfs.rollsize = 20  #檔案大小維度a1.sinks.k1.hdfs.rollcount = 5  #event數量維度a1.sinks.k1.hdfs.batchsize = 1a1.sinks.k1.hdfs.uselocaltimestamp = true

#生成的檔案型別,預設是sequencefile,可用datastream,則為普通文字

a1.sinks.k1.hdfs.filetype =datastream

# use a channel which buffers events in memory

a1.channels.c1.type =memory

a1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# bind the source and sink to the channel

a1.sources.r1.channels =c1

a1.sinks.k1.channel = c1

mkdir /root/logs2
spooldir source 監控指定目錄 如果目錄下有新檔案產生 就採集走

啟動命令:

bin/flume-ng agent -c ./conf -f ./conf/spool-hdfs.conf -n a1 -dflume.root.logger=info,console

vi tail-hdfs.conf

# name the components on this

agent

a1.sources =r1

a1.sinks =k1

a1.channels =c1

# describe/configure the source

a1.sources.r1.type =exec

a1.sources.r1.command = tail -f /root/logs/test.log

a1.sources.r1.channels =c1

# describe the sink

a1.sinks.k1.type =hdfs

a1.sinks.k1.channel =c1

a1.sinks.k1.hdfs.path = /flume/tailout/%y-%m-%d/%h-%m/a1.sinks.k1.hdfs.fileprefix = events-a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundvalue = 10a1.sinks.k1.hdfs.roundunit =minute

a1.sinks.k1.hdfs.rollinterval = 3a1.sinks.k1.hdfs.rollsize = 20a1.sinks.k1.hdfs.rollcount = 5a1.sinks.k1.hdfs.batchsize = 1a1.sinks.k1.hdfs.uselocaltimestamp = true

#生成的檔案型別,預設是sequencefile,可用datastream,則為普通文字

a1.sinks.k1.hdfs.filetype =datastream

# use a channel which buffers events in memory

a1.channels.c1.type =memory

a1.channels.c1.capacity = 1000a1.channels.c1.transactioncapacity = 100# bind the source and sink to the channel

a1.sources.r1.channels =c1

a1.sinks.k1.channel = c1

mkdir /root/logs
啟動命令

bin/flume-ng agent -c conf -f conf/tail-hdfs.conf -n a1
exec source 可以執行乙個shell命令 (tail -f sx.log) 實時採集檔案資料變化

模擬資料生成的腳步:

while

true;do date >> /root/logs/test.log;sleep 0.5;done

或 #!/bin/bash

while

true

dodate >> /root/logs/test.log

sleep 1done

Flume採集檔案到HDFS

在flume和hadoop安裝好的情況下 1.遇到的坑 在安裝hadoop時,配置 core site.xml 檔案一定要注意。fs.defaultfs name hdfs master 9000 value property 上述的value值使用的是主機名稱 master 或者ip位址,不能使用...

Hadoop之Flume採集檔案到hdfs

內容如下 定義三大元件的名稱,myagent可以自己定義 myagent.sources source1 myagent.sinks sink1 myagent.channels channel1 配置source元件 myagent.sources.source1.type spooldir 定義...

Flume採集檔案到HDFS(跟蹤檔案內容)

1.配置檔案編寫 在flume下新建tail hdfs.conf pooldir flume中自帶的讀取目錄的source,只要出現新檔案就會被讀走 定義三大元件的名稱 ag1.sources source1 ag1.sinks sink1 ag1.channels channel1 配置sourc...