Name

File — provides access to file systems

Overview

The File component provides access to file systems, allowing files to be processed by any other Apache Camel components or messages from other components to be saved to disk.

URI format

The URI format for a file endpoint is one of:

file:directoryName[?options]
file://directoryName[?options]

Common options

Table 7, “Common file options” list the options that can be set on any file endpoint.

Table 7. Common file options

NameDefault ValueDescription
autoCreate true Automatically create missing directories in the file's pathname. For the file consumer, that means creating the starting directory. For the file producer, it means the directory to where the files should be written.
bufferSize 128kb Write buffer sized in bytes.
fileName null Use an expression language to dynamically set the filename. For consumers, it's used as a filename filter. For producers, it's used to evaluate the filename to write. If an expression is set, it take precedence over the CamelFileName header. (Note: The header itself can also be an expression). The expression options support both String and Expression types. If the expression is a String type, it is always evaluated using the file language. If the expression is an Expression type, the specified Expression type is used - this allows you, for instance, to use OGNL expressions. For the consumer, you can use it to filter filenames, so you can for instance consume today's file using the file language syntax: mydata-${date:now:yyyyMMdd}.txt.
flatten false Flatten is used to flatten the file name path to strip any leading paths, so it's just the file name. This allows you to consume recursively into sub-directories, but when you eg write the files to another directory they will be written in a single directory. Setting this to true on the producer enforces that any file name received in CamelFileName header will be stripped for any leading paths.
charset null Specifies the encoding of the file, and camel will set the Exchange.CHARSET_NAME property with the value.

Consumer options

Table 8, “File consumer options” list the options that can be set on a file consuming endpoint.

Table 8. File consumer options

NameDefault ValueDescription
initialDelay 1000 Milliseconds before polling the file/directory starts.
delay 500 Milliseconds before the next poll of the file/directory.
useFixedDelay false Set to true to use fixed delay between pools, otherwise fixed rate is used. See ScheduledExecutorService in JDK for details.
recursive false If a directory, will look for files in all the sub-directories as well.
delete false If true, the file will be deleted after it is processed
noop false If true, the file is not moved or deleted in any way. This option is good for readonly data, or for ETL type requirements. If noop=true, Apache Camel will set idempotent=true as well, to avoid consuming the same files over and over again.
preMove null Use an expression to dynamically set the filename when moving it before processing. For example, to move in-progress files into the order directory set this value to order.
move .camel Use an expression to dynamically set the filename when moving it after processing. To move files into a .done subdirectory just enter .done.
moveFailed null Use an expression to dynamically set the filename when moving failed files after processing. To move files into a error subdirectory just enter error. Note: When moving the files to another location it can/will handle the error when you move it to another location so Apache Camel cannot pick up the file again.
include null Is used to include files, if filename matches the regex pattern.
exclude null Is used to exclude files, if filename matches the regex pattern.
antInclude null Ant-style filter inclusion. For example, antInclude=*/.txt. You can use comma-delimited format to specify multiple inclusion. This option is also available in the FTP component.
antExclude null Ant-style filter exclusion. For example, antExclude=*/.txt. antExclude takes precedence over antInclude when both are used. You can use comma-delimited format to specify multiple exclusions. This option is also available in the FTP component.
idempotent false Option to use the Idempotent Consumer EIP pattern to let Apache Camel skip already processed files. Will by default use a memory based LRUCache that holds 1000 entries. If noop=true then idempotent will be enabled as well to avoid consuming the same files over and over again.
idempotentRepository null Pluggable repository as a org.apache.camel.processor.idempotent.MessageIdRepository class. Will by default use MemoryMessageIdRepository if none is specified and idempotent is true.
inProgressRepository memory Pluggable in-progress repository as a org.apache.camel.processor.idempotent.MessageIdRepository class. The in-progress repository is used to account the current in progress files being consumed. By default a memory based repository is used.
filter null

Pluggable filter as a GenericFileFilter class. Will skip files if filter returns false in its accept() method. Apache Camel also ships with an ANT path matcher filter in the camel-spring component.

As of Apache Camel 2.10, you can also filter directories using the GenericFileFilter's isDirectory method.

This option is also available in the FTP component.

sorter null Pluggable sorter as a java.util.Comparator<org.apache.camel.component.file.GenericFile> class.
sortBy null Built-in sort using the File Language. Supports nested sorts, so you can have a sort by file name and as a 2nd group sort by modified date. See sorting section below for details.
readLock markerFile

Used by consumer, to poll only files if it has exclusive read-lock on the file (i.e. the file is not in-progress or being written). Apache Camel will wait until the file lock is granted.

The readLock option supports these built-in strategies:

  • markerFile forces Apache Camel will create a marker file and hold a lock on the marker file. This option is not available for the FTP component.

  • changed uses a length/modification timestamp to detect whether the file is currently being copied or not. Will wait at least 1 second to determine this, so this option cannot consume files as fast as the others, but can be more reliable as the JDK IO API cannot always determine whether a file is currently being used by another process.

    As of Apache Camel 2.10, a consumer with readLock=changed, considers any file of zero-length as a file in-progress.

    This option is not available for the FTP component.

  • fileLock uses java.nio.channels.FileLock. This option is not available for the FTP component.

  • rename attempts to rename the file to test whether we can get an exclusive read-lock.

  • none is for no read locks at all.

As of Apache Camel 2.10, the changed, filelock, and rename options also use a markerFile to prevent files being processed by another Camel consumer running on another node (for example, in a cluster) from being picked up. This option is not available for the FTP component.

readLockTimeout 0 Optional timeout in milliseconds for the read-lock, if supported by the read-lock. If the read-lock could not be granted and the timeout triggered, then Apache Camel will skip the file. At next poll Apache Camel, will try the file again, and this time maybe the read-lock could be granted. Currently fileLock, changed and rename support the timeout.
exclusiveReadLockStrategy null Pluggable read-lock as an implementation of the GenericFileExclusiveReadLockStrategy interface.
maxMessagesPerPoll 0 An integer that defines the maximum number of messages to gather per poll. By default (0), no maximum is set. Can be used to set a limit of, for example, 1000 to avoid having the server read thousands of files as it starts up. To disable this option, set it to 0 or a negative integer.
eagerMaxMessagesPerPoll true

Specifies whether the limit defined by maxMessagesPerPoll is eager. When set to true (default), the limit is eagerly applied during file scanning. When set to false, the limit is applied after all files have been scanned and sorted, which increases memory consumption since sorting is performed on file details in memory.

This option is also available in the FTP component.

processStrategy null A pluggable GenericFileProcessStrategy allowing you to implement your own readLock option or similar. Can also be used when special conditions must be met before a file can be consumed, such as a special ready file exists. If this option is set then the readLock option does not apply.
consumer.bridgeErrorHandler false

Enables the consumer to bridge over to the Camel error handler, so exceptions that occur while the consumer attempts to picked up files are processed as messages and handled by the route's error handler.

By default (false), the consumer uses org.apache.camel.spi.ExceptionHandler to deal with exceptions, which logs them at the WARN/ERROR level, then ignores them.

This option is also available in the FTP component.

scheduledExecutorService null

Enables you to configure a custom thread pool that multiple file consumers can share, reducing the overall number of threads in a JVM. By default, each consumer has its own single-threaded thread pool.

This option is also available in the FTP component.

startingDirectoryMustExist false Whether the starting directory must exist. Mind that the autoCreate option is default enabled, which means the starting directory is normally auto-created if it doesn't exist. You can disable autoCreate and enable this to ensure the starting directory must exist. Will throw an exception, if the directory doesn't exist.
directoryMustExist false Similar to startingDirectoryMustExist but this applies during polling recursive sub-directories.

Producer options

Table 9, “File producer options” list the options that can be set on a file producing endpoint.

Table 9. File producer options

NameDefault ValueDescription
fileExist Override

Specifies what to do if a file with the same name already exists. The following values can be specified:

  • Override—replace the existing file

  • Append—add content to the existing file

  • Fail—throw a GenericFileOperationException exception to indicate that there is an existing file

  • Ignore—silently ignore the problem and do not override the existing file

tempPrefix null This option is used to write the file using a temporary name and then, after the write is complete, rename it to the real name. Can be used to identify files being written and also avoid consumers (not using exclusive read locks) reading in progress files. Is often used by FTP when uploading big files.
tempFileName null The same as tempPrefix option but offering a more fine grained control on the naming of the temporary filename as it uses the File Language.
keepLastModifiedfalseSpecifies if the file will keep the last modified time stamp from the source file (if any). The Exchange.FILE_LAST_MODIFIED header is used to store the time stamp. If the time stamp exists and the option is enabled it will set this time stamp in the exchange header on the written file.
eagerDeleteTargetFile true Specifies whether or not to eagerly delete any existing target file. This option only applies when you use fileExists=Override and the tempFileName option.

Message Headers

The following headers are supported by this component:

Table 10. File producer headers

HeaderDescription
CamelFileName Specifies the name of the file to write (relative to the endpoint directory). The name can be a String; a String with a File Language or Simple expression; or an Expression object. If it's null then Apache Camel will auto-generate a filename based on the message unique ID.
CamelFileNameProduced The actual absolute filepath (path + name) for the output file that was written. This header is set by Camel and its purpose is providing end-users with the name of the file that was written.

Table 11. File consumer headers

Header Description
CamelFileName Name of the consumed file as a relative file path with offset from the starting directory configured on the endpoint.
CamelFileNameOnly Only the file name (the name with no leading paths).
CamelFileAbsolute A boolean option specifying whether the consumed file denotes an absolute path or not. Should normally be false for relative paths. Absolute paths should normally not be used but we added to the move option to allow moving files to absolute paths. But can be used elsewhere as well.
CamelFileAbsolutePath The absolute path to the file. For relative files this path holds the relative path instead.
CamelFilePath The file path. For relative files this is the starting directory + the relative filename. For absolute files this is the absolute path.
CamelFileRelativePath The relative path.
CamelFileParent The parent path.
CamelFileLength A long value containing the file size.
CamelFileLastModified A Date value containing the last modified timestamp of the file.

Exchange properties

As the file consumer is BatchConsumer it supports batching the files it polls. By batching it means that Apache Camel will add some properties to the exchange so you know the number of files polled the current index in that order.

Table 12. Exchange properties used by a file consumer

Property Description
CamelBatchSize The total number of files that was polled in this batch.
CamelBatchIndex The current index of the batch. Starts from 0.
CamelBatchComplete A boolean value indicating the last exchange in the batch. Is only true for the last entry.

This allows you for instance to know how many files exists in this batch and for instance let the Aggregator aggregate this number of files.

Related topics

Expression and Predicates Languages
FTP/SFTP