Flume 中文入门手册
Flume 中文入门手册
原文:https://cwiki.apache.org/confluence/display/FLUME/Getting+Started
什么是 Flume NG?
Flume NG 旨在比起 Flume OG 变得明显更简单,更小,更容易部署。在这种情况下,我们不提交Flume NG 到 Flume OG 的后向兼容。当前。我们期待来自感兴趣测试Flume NG 正确性、易用性和与别的系统集成的可能性的人的反馈。
变了什么?
Flume NG (下一代)的实现中虽然保持了很多原来的概念,但 与 Flume OG (原版) 还是有很大的区别。如果你熟悉 Flume, h这些可能是你想知道的。
- 你仍会有 sources 和sinks ,他们还做同样的事情.
他们由 channels 连接.
- Channels 可插入式的、命令持久的。 Flume NG ships with an in-memory channel for fast, but non-durable event delivery and a file-based channel for durable event delivery. ?
- 没有更多的逻辑或物理的节点。我们可以把所有的物理节点叫做 agents,agents 可以执行0到多个 sources 和 sinks。
- 没有 master 和 ZooKeeper 的依赖了。此时, Flume 运行于一个简单的基于文件配置的系统。
- 一切都是插件,一些面向最终用户的,一些面向工具和系统开发者的。可插入组件包括 channels, sources, sinks, interceptors, sink processors, 和 event serializers.
获得 Flume NG
Flume在下载页面上有源码包和二进制文件可用。如果你并不打算为Flume 创建 补丁,二进制文件可能是开始的最好方式。
从源码中创建
要从源码中创建,你需要git, Sun JDK 1.6, Apache Maven 3.x, 大约 90MB 的本地硬盘空间和网络连接。
1. 签出源码
$ git clone https: //git-wip-us.apache.org/repos/asf/flume.git
flume
$ cd flume
$ git checkout trunk
|
2. 编译项目
Apache Flume 的创建需要比默认配置更多的内存。我们推荐设置Maven的如下选项:
export MAVEN_OPTS= "-Xms512m
-Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=512m"
|
# 创建代码和执行测试 (注意: 用 mvn install, 不是 mvn package ,
因为我们每天都部署 Jenkins SNAPSHOT jars , 而且Flume 是一个多模块的项目)
$ mvn install
# ...或者不执行测试的安装
$ mvn install -DskipTests
|
(请注意为编译成功 Flume 要求 Google Protocol Buffers 编译器在path 中。你可以按照这里的步骤下载安装它。 here.)
这些在 flume-ng-dist/target 中生成两种包.他们是:
- apache-flume-ng-dist-1.4.0-SNAPSHOT-bin.tar.gz - Flume 的二进制版, 待运行
- apache-flume-ng-dist-1.4.0-SNAPSHOT-src.tar.gz - 仅有源码的 Flume 发布版
如果你是一个用户,只想要运行 Flume, 你可能想要的是 -bin 版本。复制一个、解压之,你就准备好用了。
$ cp flume-ng-dist/target/apache-flume- 1.4 . 0 -SNAPSHOT-bin.tar.gz
.
$ tar -zxvf apache-flume- 1.4 . 0 -SNAPSHOT-bin.tar.gz
$ cd apache-flume- 1.4 . 0 -SNAPSHOT-bin
|
3.基于工作模板创建你的属性文件(或从头创建一个)
$ cp conf/flume-conf.properties.template conf/flume.conf
|
4. (可选) 基于模板创建你的 flume-env.sh 文件(或从头创建一个)。flume-ng 可执行文件通过在命令行中指定--conf/-c 在conf 目录中寻找一个名为 "flume-env.sh" 的文件。 一个使用 flume-env.sh 的例子是在开发你自己的如sources 和 sinks的 Flume NG组件时通过 JAVA_OPTS 指定debugging 或 profiling 选项。
$ cp conf/flume-env.sh.template conf/flume-env.sh
|
5. 配置和运行Flume NG
在你配置完 Flume NG (见下),你可以用 bin/flume-ng
执行它. 这个脚本有一些参数和模式。
配置
Flume 用一个基于配置格式的 Java 属性文件。当运行一个 agent时,需要你通过 -f <file> 选项(见上)的方式告诉 Flume 哪个文件要用。这个文件可放在任何地方,但是从传统-和在未来-conf目录才是正确放置配置文件的地方。
让我们开始一个简单的例子. 复制粘贴这些到 conf/flume.conf
:
# 在 agent1上定义一个 叫做ch1的内存channel
agent1.channels.ch1.type = memory
# 在 agent1 上定义一个叫做avro-source1 的 Avro source 并告诉它
# 绑定到 0.0 . 0.0 : 41414 .
把它和 channel ch1 连接起来.
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0 . 0.0
agent1.sources.avro-source1.port = 41414
# 定义一个 logger sink ,记录它收到的所有事件
# 把它和在同一 channel 上的别的终端相连
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = logger
# 最后,既然我们已经定义了所有的组件,告诉agent1 我们想要激活 哪一个
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1
|
这是例子创建了一个内存channel(如,一个不可信或“最小效果”的传输),一个 Avro RPC source,和一个连接他们的日志sink. Avro source 接收的任何事件 被路由给 channel ch1并发送给日志sink。需要注意的是定义组件是配置 Flume 的第一半,他们必须被通过列在 <agent>.channels,
<agent>.sources
,
(和 sections. Multiple sources, sinks, 和 channels 也可能被列入,按空格分隔)激活。
要看更多细节,请看 org.apache.flume.conf.properties.PropertiesFileConfigurationProvider
类的 文档。.
这是一列此时已实现了的 sources, sinks, 和 channels。每个插件有其自身的选项并需要配置属性,所以请 看文档(现在)。
组件 |
类型 |
描述 |
实现类 |
---|---|---|---|
Channel |
memory |
内存中,快,非持久事件传输 |
MemoryChannel |
Channel |
file |
一个 reading, writing, mapping, 和 manipulating 一个文件 的 channel |
FileChannel |
Channel |
jdbc |
JDBC-based, durable event transport (Derby-based) |
JDBCChannel |
Channel |
recoverablememory |
一个用本地文件系统做存储的非持久 channel 实现 |
RecoverableMemoryChannel |
Channel |
org.apache.flume.channel.PseudoTxnMemoryChannel |
主要用作测试,不是生产用的 |
PseudoTxnMemoryChannel |
Channel |
(custom type as FQCN) |
你自己的 Channel 实现 |
(custom FQCN) |
Source |
avro |
Avro Netty RPC event source |
AvroSource |
Source |
exec |
Execute a long-lived Unix process and read from stdout |
ExecSource |
Source |
netcat |
Netcat style TCP event source |
NetcatSource |
Source |
seq |
Monotonically incrementing sequence generator event source |
SequenceGeneratorSource |
Source |
org.apache.flume.source.StressSource |
主要用作测试,不是生产用的。Serves as a continuous source of events where each event has the same payload. The payload consists of some number of bytes (specified by size property, defaults to 500) where each byte has the signed value Byte.MAX_VALUE (0x7F, or 127). |
org.apache.flume.source.StressSource |
Source |
syslogtcp |
|
SyslogTcpSource |
Source |
syslogudp |
|
SyslogUDPSource |
Source |
org.apache.flume.source.avroLegacy.AvroLegacySource |
|
AvroLegacySource |
Source |
org.apache.flume.source.thriftLegacy.ThriftLegacySource |
|
ThriftLegacySource |
Source |
org.apache.flume.source.scribe.ScribeSource |
|
ScribeSource |
Source |
(custom type as FQCN) |
你自己的 Source 实现 |
(custom FQCN) |
Sink |
hdfs |
Writes all events received to HDFS (with support for rolling, bucketing, HDFS-200 append, and more) |
HDFSEventSink |
Sink |
org.apache.flume.sink.hbase.HBaseSink |
A simple sink that reads events from a channel and writes them to HBase. |
org.apache.flume.sink.hbase.HBaseSink |
Sink |
org.apache.flume.sink.hbase.AsyncHBaseSink |
|
org.apache.flume.sink.hbase.AsyncHBaseSink |
Sink |
logger |
Log events at INFO level via configured logging subsystem (log4j by default) |
LoggerSink |
Sink |
avro |
Sink that invokes a pre-defined Avro protocol method for all events it receives (when paired with an avro source, forms tiered collection) |
AvroSink |
Sink |
file_roll |
|
RollingFileSink |
Sink |
irc |
|
IRCSink |
Sink |
null |
/dev/null for Flume - blackhole all events received |
NullSink |
Sink |
(custom type as FQCN) |
你自己的 Sink 实现 |
(custom FQCN) |
ChannelSelector |
replicating |
|
ReplicatingChannelSelector |
ChannelSelector |
multiplexing |
|
MultiplexingChannelSelector |
ChannelSelector |
(custom type) |
你自己的 ChannelSelector 实现 |
(custom FQCN) |
SinkProcessor |
default |
|
DefaultSinkProcessor |
SinkProcessor |
failover |
|
FailoverSinkProcessor |
SinkProcessor |
load_balance |
多sink时提供平衡载入的能力 |
LoadBalancingSinkProcessor |
SinkProcessor |
(custom type as FQCN) |
你自己的 SinkProcessor 实现 |
(custom FQCN) |
Interceptor$Builder |
host |
|
HostInterceptor$Builder |
Interceptor$Builder |
timestamp |
TimestampInterceptor |
TimestampInterceptor$Builder |
Interceptor$Builder |
static |
|
StaticInterceptor$Builder |
Interceptor$Builder |
regex_filter |
|
RegexFilteringInterceptor$Builder |
Interceptor$Builder |
(custom type as FQCN) |
你自己的 Interceptor$Builder 实现 |
(custom FQCN) |
EventSerializer$Builder |
text |
|
BodyTextEventSerializer$Builder |
EventSerializer$Builder |
avro_event |
|
FlumeEventAvroEventSerializer$Builder |
EventSerializer |
org.apache.flume.sink.hbase.SimpleHbaseEventSerializer |
|
SimpleHbaseEventSerializer |
EventSerializer |
org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer |
|
SimpleAsyncHbaseEventSerializer |
EventSerializer |
org.apache.flume.sink.hbase.RegexHbaseEventSerializer |
|
RegexHbaseEventSerializer |
HbaseEventSerializer |
Custom implementation of serializer for HBaseSink. |
你自己的 HbaseEventSerializer 实现 |
(custom FQCN) |
AsyncHbaseEventSerializer |
Custom implementation of serializer for AsyncHbase sink. |
你自己的 AsyncHbaseEventSerializer 实现 |
(custom FQCN) |
EventSerializer$Builder |
Custom implementation of serializer for all sinks except for HBaseSink and AsyncHBaseSink. |
你自己的 EventSerializer$Builder 实现 |
(custom FQCN) |
flume-ng 让你运行一个有利于测试和实验的 Flume NG agent 或一个 Avro client 。不管怎样,你需要指定一个命令(如, agent
或 avro-client
) 和一个
conf 目录 (--conf <conf dir>
).。所有别的选项都在命令行指定。
用上面的 flume.conf 启动flume 服务器:
bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console
-n agent1
|
注意,agent 名称是以 -n
agent1
指定必须与 -f conf/flume.conf 中给定的名字匹配
你的输出应该像这样:
$ bin/flume-ng agent --conf conf/ -f conf/flume.conf -n
agent1
2012 - 03 - 16 16 : 36 : 11 , 918 (main)
[INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java: 58 )]
Starting lifecycle supervisor 1
2012 - 03 - 16 16 : 36 : 11 , 921 (main)
[INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java: 54 )]
Flume node starting - agent1
2012 - 03 - 16 16 : 36 : 11 , 926 (lifecycleSupervisor- 1 - 0 )
[INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java: 110 )]
Node manager starting
2012 - 03 - 16 16 : 36 : 11 , 928 (lifecycleSupervisor- 1 - 0 )
[INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java: 58 )]
Starting lifecycle supervisor 10
2012 - 03 - 16 16 : 36 : 11 , 929 (lifecycleSupervisor- 1 - 0 )
[DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java: 114 )]
Node manager started
2012 - 03 - 16 16 : 36 : 11 , 926 (lifecycleSupervisor- 1 - 1 )
[INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java: 67 )]
Configuration provider starting
2012 - 03 - 16 16 : 36 : 11 , 930 (lifecycleSupervisor- 1 - 1 )
[DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java: 87 )]
Configuration provider started
2012 - 03 - 16 16 : 36 : 11 , 930 (conf-file-poller- 0 )
[DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java: 189 )]
Checking file:conf/flume.conf for changes
2012 - 03 - 16 16 : 36 : 11 , 931 (conf-file-poller- 0 )
[INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java: 196 )]
Reloading configuration file:conf/flume.conf
2012 - 03 - 16 16 : 36 : 11 , 936 (conf-file-poller- 0 )
[DEBUG - org.apache.flume.conf.properties.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java: 225 )]
Starting validation of configuration for agent:
agent1, initial-configuration: AgentConfiguration[agent1]
SOURCES: {avro-source1=ComponentConfiguration[avro-source1]
CONFIG:
{port= 41414 ,
channels=ch1, type=avro, bind= 0.0 . 0.0 }
RUNNER:
ComponentConfiguration[runner]
CONFIG:
{}
}
CHANNELS: {ch1=ComponentConfiguration[ch1]
CONFIG:
{type=memory}
}
SINKS: {log-sink1=ComponentConfiguration[log-sink1]
CONFIG:
{type=logger, channel=ch1}
RUNNER:
ComponentConfiguration[runner]
CONFIG:
{}
}
2012 - 03 - 16 16 : 36 : 11 , 936 (conf-file-poller- 0 )
[INFO - org.apache.flume.conf.properties.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java: 119 )]
Post-validation flume configuration contains configuation for agents:
[agent1]
2012 - 03 - 16 16 : 36 : 11 , 937 (conf-file-poller- 0 )
[DEBUG - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java: 67 )]
Creating instance of channel ch1 type memory
2012 - 03 - 16 16 : 36 : 11 , 944 (conf-file-poller- 0 )
[DEBUG - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java: 73 )]
Creating instance of source avro-source1, type avro
2012 - 03 - 16 16 : 36 : 11 , 957 (conf-file-poller- 0 )
[INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java: 69 )]
Creating instance of sink log-sink1 typelogger
2012 - 03 - 16 16 : 36 : 11 , 963 (conf-file-poller- 0 )
[INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java: 52 )]
Node configuration change:{ sourceRunners:{avro-source1=EventDrivenSourceRunner: { source:AvroSource: { bindAddress: 0.0 . 0.0 port: 41414 }
}} sinkRunners:{log-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor @79f6f296 counterGroup:{
name: null counters:{}
} }} channels:{ch1=org.apache.flume.channel.MemoryChannel @43b09468 }
}
2012 - 03 - 16 16 : 36 : 11 , 974 (lifecycleSupervisor- 1 - 1 )
[INFO - org.apache.flume.source.AvroSource.start(AvroSource.java: 122 )]
Avro source starting:AvroSource: { bindAddress: 0.0 . 0.0 port: 41414 }
2012 - 03 - 16 16 : 36 : 11 , 975 (Thread- 1 )
[DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java: 123 )]
Polling sink runner starting
2012 - 03 - 16 16 : 36 : 12 , 352 (lifecycleSupervisor- 1 - 1 )
[DEBUG - org.apache.flume.source.AvroSource.start(AvroSource.java: 132 )]
Avro source started
|
flume-ng global 选项
选项 |
描述 |
---|---|
--conf,-c <conf> |
在 <conf> 目录使用配置 |
--classpath,-C <cp> |
追加到 classpath |
--dryrun,-d |
不真正启动 Flume,只打印命令 |
-Dproperty=value |
设置一个JDK 系统的合适值 |
flume-ng agent 选项
给定 agent 命令,一个 Flume NG agent 将被一个给定的配置文件(必须) 启动。
选项 |
描述 |
---|---|
--conf-file,-f <file> |
声明你要运行哪一个配置文件 (必须) |
--name,-n <agentname> |
声明我们要运行的 agent 的名字(必须) |
flume-ng avro-client 选项
从标准输入运行一个 Avro 客户端,发送文件或数据给一个 Flume NG Avro Source正在监听的指定的主机和端口。
选项 |
描述 |
---|---|
--host,-H <hostname> |
指定 Flume agent 的主机名 (可能是本机) |
--port,-p <port> |
指定 Avro source 监听的端口号 |
--filename,-F <filename> |
发送 <filename> 的每一行给 Flume (可选) |
--headerFile,-F <file> |
头文件的每一行包含 键/值对 |
Avro 客户端把每一行(以 \n
, \r
, 或 \r\n
结尾
) 都当作一个事件。对Flume 来说 avro-client
命令就是 cat。例如,下面为每一个linux用户创建一个事件并将其发送到本机的41414端口上的
Flume
的 avro source 上。
在一个新窗口中键入 :
$ bin/flume-ng avro-client --conf conf -H localhost -p 41414 -F
/etc/passwd -Dflume.root.logger=DEBUG,console
|
你应该看到像这样 :
2012 - 03 - 16 16 : 39 : 17 , 124 (main)
[DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java: 175 )]
Finished
2012 - 03 - 16 16 : 39 : 17 , 127 (main)
[DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java: 178 )]
Closing reader
2012 - 03 - 16 16 : 39 : 17 , 127 (main)
[DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java: 183 )]
Closing transceiver
2012 - 03 - 16 16 : 39 : 17 , 129 (main)
[DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java: 73 )]
Exiting
|
在你的第一个窗口,即服务器运行的那个:
2012 - 03 - 16 16 : 39 : 16 , 738 (New
I/O server boss # 1 ([id: 0x49e808ca ,
/ 0 : 0 : 0 : 0 : 0 : 0 : 0 : 0 : 41414 ]))
[INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )]
[id: 0x0b92a848 ,
/ 1
27.0 . 0.1 : 39577 =>
/ 127.0 . 0.1 : 41414 ]
OPEN
2012 - 03 - 16 16 : 39 : 16 , 742 (New
I/O server worker # 1 - 1 )
[INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )]
[id: 0x0b92a848 ,
/ 127.0 . 0.1 : 39577 =>
/ 127.0 . 0.1 : 41414 ]
BOU
ND: / 127.0 . 0.1 : 41414
2012 - 03 - 16 16 : 39 : 16 , 742 (New
I/O server worker # 1 - 1 )
[INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )]
[id: 0x0b92a848 ,
/ 127.0 . 0.1 : 39577 =>
/ 127.0 . 0.1 : 41414 ]
CON
NECTED: / 127.0 . 0.1 : 39577
2012 - 03 - 16 16 : 39 : 17 , 129 (New
I/O server worker # 1 - 1 )
[INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )]
[id: 0x0b92a848 ,
/ 127.0 . 0.1 : 39577 :>
/ 127.0 . 0.1 : 41414 ]
DISCONNECTED
2012 - 03 - 16 16 : 39 : 17 , 129 (New
I/O server worker # 1 - 1 )
[INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )]
[id: 0x0b92a848 ,
/ 127.0 . 0.1 : 39577 :>
/ 127.0 . 0.1 : 41414 ]
UNBOUND
2012 - 03 - 16 16 : 39 : 17 , 129 (New
I/O server worker # 1 - 1 )
[INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java: 123 )]
[id: 0x0b92a848 ,
/ 127.0 . 0.1 : 39577 :>
/ 127.0 . 0.1 : 41414 ]
CLOSED
2012 - 03 - 16 16 : 39 : 17 , 302 (Thread- 1 )
[INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java: 68 )]
Event: { headers:{} body:[B @5c1ae90c }
2012 - 03 - 16 16 : 39 : 17 , 302 (Thread- 1 )
[INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java: 68 )]
Event: { headers:{} body:[B @6aba4211 }
2012 - 03 - 16 16 : 39 : 17 , 302 (Thread- 1 )
[INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java: 68 )]
Event: { headers:{} body:[B @6a47a0d4 }
2012 - 03 - 16 16 : 39 : 17 , 302 (Thread- 1 )
[INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java: 68 )]
Event: { headers:{} body:[B @48ff4cf }
...
|
祝贺你 !你正在运行 Apache Flume !
评论暂时关闭