HDFS — enables you to read and write messages from/to an HDFS file system
HDFS is the distributed file system at the heart of Hadoop. It can only be built using JDK1.6 or later because this is a strict requirement for Hadoop itself. This component is hosted at http://github.com/dgreco/camel-hdfs. We decided to put it temporarily on this github because currently Apache Camel is being built and tested using JDK1.5 and for this reason we couldn't put that component into the Apache Camel official distribution.
The URI format for an HDFS endpoint is:
hdfs://hostname
[:port
][/path
][?options
]
The path is treated in the following way:
as a consumer, if it's a file, it just reads the file, otherwise if it represents a directory it scans all the file under the path satisfying the configured pattern. All the files under that directory must be of the same type.
as a producer, if at least one split strategy is defined, the path is considered a
directory and under that directory the producer creates a different file per split named
seg0
, seg1
, seg2
, etc.
Table 15, “HDFS options” lists the options for HDFS endpoint.
Table 15. HDFS options
Name | Default Value | Description |
---|---|---|
overwrite
| true
| Specifies if the file can be overwritten. |
bufferSize
| 4096
| Specifies the buffer size used by HDFS. |
replication
| 3
| Specifies the HDFS replication factor. |
blockSize
| 67108864
| Specifies the size of the HDFS blocks in bytes. |
fileType
| NORMAL_FILE
|
Specifies the type of file to use. Valid values are:
See the Hadoop documentation for more information. |
fileSystemType
| HDFS
| It can be LOCAL for local filesystem |
keyType
| NULL
|
Specifies the type for the key in case of sequence or map files. |
valueType
| TEXT
|
Specifies the type for the key in case of sequence or map files. |
splitStrategy
|
A string describing the strategy on how to split the file based on different criteria. | |
openedSuffix
| opened
|
When a file is opened for reading/ writing the file is renamed with this suffix to avoid to read it during the writing phase. |
readSuffix
| read
|
Once the file has been read is renamed with this suffix to avoid to read it again. |
initialDelay
| 0
|
Specifies how long a consumer will wait, in milliseconds, before starting to scanning the directory. |
delay
| 0
|
Specifies the interval, in milliseconds, between the directory scans. |
pattern
| *
|
The pattern used for scanning the directory |
chunkSize
| 4096
|
When reading a normal file, this is split into chunks producing a message per chunk |