Datavolo DevCenter


Tutorials | Learning paths | Community | Resources

Resources

Connector processors

Processors documentation


Component Name Datavolo or Community Structured or Unstructured System / Protocol Source Target Description
PutSnowflakeInternalStageFileDatavoloUnstructuredSnowflakePuts files into a Snowflake internal stage. The internal stage must be created in the Snowflake account beforehand. This processor can be connected to a StartSnowflakeIngest processor to ingest the file in the internal stage.
DeleteDBFSResourceDatavoloUnstructuredDatabricksDelete a DBFS files and directories.
GetDBFSFileDatavoloUnstructuredDatabricksRead a DBFS file.
ListDBFSDirectoryDatavoloUnstructuredDatabricksList file names in a DBFS directory and output a new FlowFile with the filename.
PutDBFSFileDatavoloUnstructuredDatabricksWrite FlowFile content to DBFS.
DeleteUnityCatalogResourceDatavoloUnstructuredDatabricksDelete a Unity Catalog file or directory.
GetUnityCatalogFileDatavoloUnstructuredDatabricksRead a Unity Catalog file up to 5 GiB.
GetUnityCatalogFileMetadataDatavoloUnstructuredDatabricksChecks for Unity Catalog file metadata.
ListUnityCatalogDirectoryDatavoloUnstructuredDatabricksList file names in a Unity Catalog directory and output a new FlowFile with the filename.
PutUnityCatalogFileDatavoloUnstructuredDatabricksWrite FlowFile content with max size of 5 GiB to Unity Catalog.
PutDatabricksSQLDatavoloStructuredDatabricksSubmit a SQL Execution using Databricks REST API then write the JSON response to FlowFile Content. For high performance SELECT or INSERT queries use ExecuteSQL instead.
CaptureGoogleDriveChangesDatavoloUnstructuredGoogle DriveCaptures changes to a Shared Google Drive and emits a FlowFile for each change that occurs. This includes addition and deletion of files,\nas well as changes to file metadata and permissions. The processor is designed to be used in conjunction with the FetchGoogleDrive processor.\n
PutHubSpotDatavoloStructuredHubspotUpsert a HubSpot object.
CaptureSharepointChangesDatavoloUnstructuredMicrosoft SharepointCaptures changes from a Sharepoint Document Library and emits a FlowFile for each change that occurs. This includes additions and deletions of files and folders, as well as changes to permissions, metadata, and file content.
FetchSharepointFileDatavoloUnstructuredMicrosoft SharepointFetches the contents of a file from a Sharepoint Drive, optionally downloading a PDF or HTML version of the file when applicable.\nAny FlowFile that represents a Sharepoint folder will be routed to success without fetching contents.\n
DeletePineconeDatavoloStructuredPineconeDeletes vectors from a Pinecone index.
QueryPineconeDatavoloStructuredPineconeQueries Pinecone for vectors that are similar to the input vector, or retrieves a vector by ID.
UpsertPineconeDatavoloStructuredPineconePublishes vectors, including metadata, and optionally text, to a Pinecone index.
PutVectaraDocumentDatavoloUnstructuredVectaraGenerate and upload a JSON document to Vectara's upload endpoint. The input text can be JSON Object, JSON Array, or JSONL format.
PutVectaraFileDatavoloUnstructuredVectaraUpload a FlowFile content to Vectara's index endpoint. Document filter attributes and metadata attributes can be set by referencing FlowFile attributes.
ConsumeAMQPCommunityUnstructuredAMQPConsumes AMQP Messages from an AMQP Broker using the AMQP 0.9.1 protocol. Each message that is received from the AMQP Broker will be emitted as its own FlowFile to the 'success' relationship.
PublishAMQPCommunityUnstructuredAMQPCreates an AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange. In a typical AMQP exchange model, the message that is sent to the AMQP Exchange will be routed based on the 'Routing Key' to its final destination in the queue (the binding). If due to some misconfiguration the binding between the Exchange, Routing Key and Queue is not set up, the message will have no final destination and will return (i.e., the data will not make it to the queue). If that happens you will see a log in both app-log and bulletin stating to that effect, and the FlowFile will be routed to the 'failure' relationship.
PutCloudWatchMetricCommunityStructuredAWS CloudwatchPublishes metrics to Amazon CloudWatch. Metric can be either a single value, or a StatisticSet comprised of minimum, maximum, sum and sample count.
DeleteDynamoDBCommunityStructuredAWS Dynamo DBDeletes a document from DynamoDB based on hash and range key. The key can be string or number. The request requires all the primary keys for the operation (hash or hash and range key)
GetDynamoDBCommunityStructuredAWS Dynamo DBRetrieves a document from DynamoDB based on hash and range key. The key can be string or number.For any get request all the primary keys are required (hash or hash and range based on the table keys).A Json Document ('Map') attribute of the DynamoDB item is read into the content of the FlowFile.
PutDynamoDBCommunityStructuredAWS Dynamo DBPuts a document from DynamoDB based on hash and range key. The table can have either hash and range or hash key alone. Currently the keys supported are string and number and value can be json document. In case of hash and range keys both key are required for the operation. The FlowFile content must be JSON. FlowFile content is mapped to the specified Json Document attribute in the DynamoDB item.
PutDynamoDBRecordCommunityStructuredAWS Dynamo DBInserts items into DynamoDB based on record-oriented data. The record fields are mapped into DynamoDB item fields, including partition and sort keys if set. Depending on the number of records the processor might execute the insert in multiple chunks in order to overcome DynamoDB's limitation on batch writing. This might result partially processed FlowFiles in which case the FlowFile will be transferred to the \unprocessed\" relationship with the necessary attribute to retry later without duplicating the already executed inserts."
PutKinesisFirehoseCommunityUnstructuredAWS KinesisSends the contents to a specified Amazon Kinesis Firehose. In order to send data to firehose, the firehose delivery stream name has to be specified.
ConsumeKinesisStreamCommunityUnstructuredAWS KinesisReads data from the specified AWS Kinesis stream and outputs a FlowFile for every processed Record (raw) or a FlowFile for a batch of processed records if a Record Reader and Record Writer are configured. At-least-once delivery of all Kinesis Records within the Stream while the processor is running. AWS Kinesis Client Library can take several seconds to initialise before starting to fetch data. Uses DynamoDB for check pointing and CloudWatch (optional) for metrics. Ensure that the credentials provided have access to DynamoDB and CloudWatch (optional) along with Kinesis.
PutKinesisStreamCommunityUnstructuredAWS KinesisSends the contents to a specified Amazon Kinesis. In order to send data to Kinesis, the stream name has to be specified.
PutLambdaCommunityUnstructuredAWS LambdaSends the contents to a specified Amazon Lambda Function. The AWS credentials used for authentication must have permissions execute the Lambda function (lambda:InvokeFunction).The FlowFile content must be JSON.
DeleteS3ObjectCommunityUnstructuredAWS S3Deletes a file from an Amazon S3 Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success.
FetchS3ObjectCommunityUnstructuredAWS S3Retrieves the contents of an S3 Object and writes it to the content of a FlowFile
ListS3CommunityUnstructuredAWS S3Retrieves a listing of objects from an S3 bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchS3Object. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutS3ObjectCommunityUnstructuredAWS S3Writes the contents of a FlowFile as an S3 Object to an Amazon S3 Bucket.
TagS3ObjectCommunityUnstructuredAWS S3Adds or updates a tag on an Amazon S3 Object.
PutSNSCommunityStructuredAWS SNSSends the content of a FlowFile as a notification to the Amazon Simple Notification Service
DeleteSQSCommunityStructuredAWS SQSDeletes a message from an Amazon Simple Queuing Service Queue
GetSQSCommunityStructuredAWS SQSFetches messages from an Amazon Simple Queuing Service Queue
PutSQSCommunityStructuredAWS SQSPublishes a message to an Amazon Simple Queuing Service Queue
PutAzureCosmosDBRecordCommunityStructuredAzure CosmosDBThis processor is a record-aware processor for inserting data into Cosmos DB with Core SQL API. It uses a configured record reader and schema to read an incoming record set from the body of a Flowfile and then inserts those records into a configured Cosmos DB Container.
PutAzureDataExplorerCommunityStructuredAzure Data ExplorerActs as an Azure Data Explorer sink which sends FlowFiles to the provided endpoint. Data can be sent through queued ingestion or streaming ingestion to the Azure Data Explorer cluster.
QueryAzureDataExplorerCommunityStructuredAzure Data ExplorerQuery Azure Data Explorer and stream JSON results to output FlowFiles
ConsumeAzureEventHubCommunityUnstructuredAzure Event HubReceives messages from Microsoft Azure Event Hubs with checkpointing to ensure consistent event processing. Checkpoint tracking avoids consuming a message multiple times and enables reliable resumption of processing in the event of intermittent network failures. Checkpoint tracking requires external storage and provides the preferred approach to consuming messages from Azure Event Hubs. In clustered environment, ConsumeAzureEventHub processor instances form a consumer group and the messages are distributed among the cluster nodes (each message is processed on one cluster node only).
GetAzureEventHubCommunityUnstructuredAzure Event HubReceives messages from Microsoft Azure Event Hubs without reliable checkpoint tracking. In clustered environment, GetAzureEventHub processor instances work independently and all cluster nodes process all messages (unless running the processor in Primary Only mode). ConsumeAzureEventHub offers the recommended approach to receiving messages from Azure Event Hubs. This processor creates a thread pool for connections to Azure Event Hubs.
PutAzureEventHubCommunityUnstructuredAzure Event HubSend FlowFile contents to Azure Event Hubs
CopyAzureBlobStorage_v12CommunityUnstructuredAzure Blob StorageCopies a blob in Azure Blob Storage from one account/container to another. The processor uses Azure Blob Storage client library v12.
DeleteAzureBlobStorage_v12CommunityUnstructuredAzure Blob StorageDeletes the specified blob from Azure Blob Storage. The processor uses Azure Blob Storage client library v12.
DeleteAzureDataLakeStorageCommunityUnstructuredAzure DataLake StorageDeletes the provided file from Azure Data Lake Storage
FetchAzureBlobStorage_v12CommunityUnstructuredAzure Blob StorageRetrieves the specified blob from Azure Blob Storage and writes its content to the content of the FlowFile. The processor uses Azure Blob Storage client library v12.
FetchAzureDataLakeStorageCommunityUnstructuredAzure DataLake StorageFetch the specified file from Azure Data Lake Storage
ListAzureBlobStorage_v12CommunityUnstructuredAzure Blob StorageLists blobs in an Azure Blob Storage container. Listing details are attached to an empty FlowFile for use with FetchAzureBlobStorage. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. The processor uses Azure Blob Storage client library v12.
ListAzureDataLakeStorageCommunityUnstructuredAzure DataLake StorageLists directory in an Azure Data Lake Storage Gen 2 filesystem
MoveAzureDataLakeStorageCommunityUnstructuredAzure DataLake StorageMoves content within an Azure Data Lake Storage Gen 2. After the move, files will be no longer available on source location.
PutAzureBlobStorage_v12CommunityUnstructuredAzure Blob StoragePuts content into a blob on Azure Blob Storage. The processor uses Azure Blob Storage client library v12.
PutAzureDataLakeStorageCommunityUnstructuredAzure DataLake StorageWrites the contents of a FlowFile as a file on Azure Data Lake Storage Gen 2
GetAzureQueueStorage_v12CommunityUnstructuredAzure Queue StorageRetrieves the messages from an Azure Queue Storage. The retrieved messages will be deleted from the queue by default. If the requirement is to consume messages without deleting them, set 'Auto Delete Messages' to 'false'. Note: There might be chances of receiving duplicates in situations like when a message is received but was unable to be deleted from the queue due to some unexpected situations.
PutAzureQueueStorage_v12CommunityUnstructuredAzure Queue StorageWrites the content of the incoming FlowFiles to the configured Azure Queue Storage.
FetchBoxFileCommunityUnstructuredBoxFetches files from a Box Folder. Designed to be used in tandem with ListBoxFile.
ListBoxFileCommunityUnstructuredBoxLists files in a Box folder. Each listed file may result in one FlowFile, the metadata being written as FlowFile attributes. Or - in case the 'Record Writer' property is set - the entire result is written as records to a single FlowFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutBoxFileCommunityUnstructuredBoxPuts content to a Box folder.
FetchDropboxCommunityUnstructuredDropboxFetches files from Dropbox. Designed to be used in tandem with ListDropbox.
ListDropboxCommunityUnstructuredDropboxRetrieves a listing of files from Dropbox (shortcuts are ignored). Each listed file may result in one FlowFile, the metadata being written as FlowFile attributes. When the 'Record Writer' property is set, the entire result is written as records to a single FlowFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutDropboxCommunityUnstructuredDropboxPuts content to a Dropbox folder.
ConsumeElasticsearchCommunityStructuredElasticA processor that repeatedly runs a paginated query against a field using a Range query to consume new Documents from an Elasticsearch index/query. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached, after which the Range query will automatically update the field constraint based on the last retrieved Document value.
DeleteByQueryElasticsearchCommunityStructuredElasticDelete from an Elasticsearch index using a query. The query can be loaded from a flowfile body or from the Query parameter.
GetElasticsearchCommunityStructuredElasticElasticsearch get processor that uses the official Elastic REST client libraries to fetch a single document from Elasticsearch by _id. Note that the full body of the document will be read into memory before being written to a FlowFile for transfer.
JsonQueryElasticsearchCommunityStructuredElasticA processor that allows the user to run a query (with aggregations) written with the Elasticsearch JSON DSL. It does not automatically paginate queries for the user. If an incoming relationship is added to this processor, it will use the flowfile's content for the query. Care should be taken on the size of the query because the entire response from Elasticsearch will be loaded into memory all at once and converted into the resulting flowfiles.
PaginatedJsonQueryElasticsearchCommunityStructuredElasticA processor that allows the user to run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. It will use the flowfile's content for the query unless the QUERY attribute is populated. Search After/Point in Time queries must include a valid \sort\" field."
PutElasticsearchJsonCommunityStructuredElasticAn Elasticsearch put processor that uses the official Elastic REST client libraries. Each FlowFile is treated as a document to be sent to the Elasticsearch _bulk API. Multiple FlowFiles can be batched together into each Request sent to Elasticsearch.
PutElasticsearchRecordCommunityStructuredElasticA record-aware Elasticsearch put processor that uses the official Elastic REST client libraries. Each Record within the FlowFile is converted into a document to be sent to the Elasticsearch _bulk APi. Multiple documents can be batched into each Request sent to Elasticsearch. Each document's Bulk operation can be configured using Record Path expressions.
SearchElasticsearchCommunityStructuredElasticA processor that allows the user to repeatedly run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. Search After/Point in Time queries must include a valid \sort\" field. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached after which the query will restart with the first page of results being retrieved.
UpdateByQueryElasticsearchCommunityStructuredElasticUpdate documents in an Elasticsearch index using a query. The query can be loaded from a flowfile body or from the Query parameter. The loaded Query can contain any JSON accepted by Elasticsearch's _update_by_query API, for example a \query\" object to identify what documents are to be updated lus a \"script\" to define the updates to perform.
ConsumeIMAPCommunityUnstructuredIMAPConsumes messages from Email Server using IMAP protocol. The raw-bytes of each received email message are written as contents of the FlowFile
ConsumePOP3CommunityUnstructuredPOP3Consumes messages from Email Server using POP3 protocol. The raw-bytes of each received email message are written as contents of the FlowFile
PutBigQueryCommunityStructuredGoogle BigQueryWrites the contents of a FlowFile to a Google BigQuery table. The processor is record based so the schema that is used is driven by the RecordReader. Attributes that are not matched to the target schema are skipped. Exactly once delivery semantics are achieved via stream offsets.
FetchGoogleDriveCommunityUnstructuredGoogle DriveFetches files from a Google Drive Folder. Designed to be used in tandem with ListGoogleDrive. Please see Additional Details to set up access to Google Drive.
ListGoogleDriveCommunityUnstructuredGoogle DrivePerforms a listing of concrete files (shortcuts are ignored) in a Google Drive folder. If the 'Record Writer' property is set, a single Output FlowFile is created, and each file in the listing is written as a single record to the output file. Otherwise, for each file in the listing, an individual FlowFile is created, the metadata being written as FlowFile attributes. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Please see Additional Details to set up access to Google Drive.
PutGoogleDriveCommunityUnstructuredGoogle DriveWrites the contents of a FlowFile as a file in Google Drive.
ConsumeGCPubSubCommunityUnstructuredGoogle DriveConsumes message from the configured Google Cloud PubSub subscription. If the 'Batch Size' is set, the configured number of messages will be pulled in a single request, else only one message will be pulled.
PublishGCPubSubCommunityUnstructuredGoogle PubSubPublishes the content of the incoming flowfile to the configured Google Cloud PubSub topic. The processor supports dynamic properties. If any dynamic properties are present, they will be sent along with the message in the form of 'attributes'.
ConsumeGCPubSubLiteCommunityUnstructuredGoogle PubSub LiteConsumes message from the configured Google Cloud PubSub Lite subscription.
PublishGCPubSubLiteCommunityUnstructuredGoogle PubSub LitePublishes the content of the incoming FlowFile to the configured Google Cloud PubSub Lite topic. The processor supports dynamic properties. If any dynamic properties are present, they will be sent along with the message in the form of 'attributes'.
DeleteGCSObjectCommunityUnstructuredGoogle Cloud StorageDeletes objects from a Google Cloud Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success.
FetchGCSObjectCommunityUnstructuredGoogle Cloud StorageFetches a file from a Google Cloud Bucket. Designed to be used in tandem with ListGCSBucket.
ListGCSBucketCommunityUnstructuredGoogle Cloud StorageRetrieves a listing of objects from a GCS bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchGCSObject. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutGCSObjectCommunityUnstructuredGoogle Cloud StorageWrites the contents of a FlowFile as an object in a Google Cloud Storage.
GetGcpVisionAnnotateFilesOperationStatusCommunityUnstructuredGoogle VisionRetrieves the current status of an Google Vision operation.
GetGcpVisionAnnotateImagesOperationStatusCommunityUnstructuredGoogle VisionRetrieves the current status of an Google Vision operation.
StartGcpVisionAnnotateFilesOperationCommunityUnstructuredGoogle VisionTrigger a Vision operation on file input. It should be followed by GetGcpVisionAnnotateFilesOperationStatus processor in order to monitor operation status.
StartGcpVisionAnnotateImagesOperationCommunityUnstructuredGoogle VisionTrigger a Vision operation on image input. It should be followed by GetGcpVisionAnnotateImagesOperationStatus processor in order to monitor operation status.
GetHubSpotCommunityStructuredHubspotRetrieves JSON data from a private HubSpot application. This processor is intended to be run on the Primary Node only.
ConsumeJMSCommunityUnstructuredJMSConsumes JMS Message of type BytesMessage, TextMessage, ObjectMessage, MapMessage or StreamMessage transforming its content to a FlowFile and transitioning it to 'success' relationship. JMS attributes such as headers and properties will be copied as FlowFile attributes. MapMessages will be transformed into JSONs and then into byte arrays. The other types will have their raw contents as byte array transferred into the flowfile.
PublishJMSCommunityUnstructuredJMSCreates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage or TextMessage. FlowFile attributes will be added as JMS headers and/or properties to the outgoing JMS message.
ConsumeKafkaCommunityUnstructuredKafkaConsumes messages from Apache Kafka Consumer API. The complementary NiFi processor for sending messages is PublishKafka. The Processor supports consumption of Kafka messages, optionally interpreted as NiFi records. Please note that, at this time (in read record mode), the Processor assumes that all records that are retrieved from a given partition have the same schema. For this mode, if any of the Kafka messages are pulled but cannot be parsed or written with the configured Record Reader or Record Writer, the contents of the message will be written to a separate FlowFile, and that FlowFile will be transferred to the 'parse.failure' relationship. Otherwise, each FlowFile is sent to the 'success' relationship and may contain many individual messages within the single FlowFile. A 'record.count' attribute is added to indicate how many messages are contained in the FlowFile. No two Kafka messages will be placed into the same FlowFile if they have different schemas, or if they have different values for a message header that is included by the property.
PublishKafkaCommunityUnstructuredKafkaSends the contents of a FlowFile as either a message or as individual records to Apache Kafka using the Kafka Producer API. The messages to send may be individual FlowFiles, may be delimited using a user-specified delimiter (such as a new-line), or may be record-oriented data that can be read by the configured Record Reader. The complementary NiFi processor for fetching messages is ConsumeKafka.
DeleteMongoCommunityStructuredMongoDBExecutes a delete query against a MongoDB collection. The query is provided in the body of the flowfile and the user can select whether it will delete one or many documents that match it.
GetMongoCommunityStructuredMongoDBCreates FlowFiles from documents in MongoDB loaded by a user-specified query.
GetMongoRecordCommunityStructuredMongoDBA record-based version of GetMongo that uses the Record writers to write the MongoDB result set.
PutMongoCommunityStructuredMongoDBWrites the contents of a FlowFile to MongoDB
PutMongoBulkOperationsCommunityStructuredMongoDBWrites the contents of a FlowFile to MongoDB as bulk-update
PutMongoRecordCommunityStructuredMongoDBThis processor is a record-aware processor for inserting/upserting data into MongoDB. It uses a configured record reader and schema to read an incoming record set from the body of a flowfile and then inserts/upserts batches of those records into a configured MongoDB collection. This processor does not support deletes. The number of documents to insert/upsert at a time is controlled by the \Batch Size\" configuration property. This value should be set to a reasonable size to ensure that MongoDB is not overloaded with too many operations at once."
RunMongoAggregationCommunityStructuredMongoDBA processor that runs an aggregation query whenever a flowfile is received.
DeleteGridFSCommunityUnstructuredMongoDBDeletes a file from GridFS using a file name or a query.
FetchGridFSCommunityUnstructuredMongoDBRetrieves one or more files from a GridFS bucket by file name or by a user-defined query.
PutGridFSCommunityUnstructuredMongoDBWrites a file to a GridFS bucket.
ConsumeMQTTCommunityUnstructuredMQTTSubscribes to a topic and receives messages from an MQTT broker
PublishMQTTCommunityUnstructuredMQTTPublishes a message to an MQTT topic
ListenOTLPCommunityStructuredOpen TelemetryCollect OpenTelemetry messages over HTTP or gRPC. Supports standard Export Service Request messages for logs, metrics, and traces. Implements OpenTelemetry OTLP Specification 1.0.0 with OTLP/gRPC and OTLP/HTTP. Provides protocol detection using the HTTP Content-Type header.
PutRedisHashRecordCommunityStructuredRedisPuts record field data into Redis using a specified hash value, which is determined by a RecordPath to a field in each record containing the hash value. The record fields and values are stored as key/value pairs associated by the hash value. NOTE: Neither the evaluated hash value nor any of the field values can be null. If the hash value is null, the FlowFile will be routed to failure. For each of the field values, if the value is null that field will be not set in Redis.
PutSalesforceObjectCommunityStructuredSalesforceCreates new records for the specified Salesforce sObject. The type of the Salesforce object must be set in the input flowfile's 'objectType' attribute. This processor cannot update existing records.
QuerySalesforceObjectCommunityStructuredSalesforceRetrieves records from a Salesforce sObject. Users can add arbitrary filter conditions by setting the 'Custom WHERE Condition' property. The processor can also run a custom query, although record processing is not supported in that case. Supports incremental retrieval: users can define a field in the 'Age Field' property that will be used to determine when the record was created. When this property is set the processor will retrieve new records. Incremental loading and record-based processing are only supported in property-based queries. It's also possible to define an initial cutoff value for the age, filtering out all older records even for the first run. In case of 'Property Based Query' this processor should run on the Primary Node only. FlowFile attribute 'record.count' indicates how many records were retrieved and written to the output. The processor can accept an optional input FlowFile and reference the FlowFile attributes in the query.
GetShopifyCommunityStructuredShopifyRetrieves objects from a custom Shopify store. The processor yield time must be set to the account's rate limit accordingly.
ConsumeSlackCommunityUnstructuredSlackRetrieves messages from one or more configured Slack channels. The messages are written out in JSON format. See Usage / Additional Details for more information about how to configure this Processor and enable it to retrieve messages from Slack.
ListenSlackCommunityUnstructuredSlackRetrieves real-time messages or Slack commands from one or more Slack conversations. The messages are written out in JSON format. Note that this Processor should be used to obtain real-time messages and commands from Slack and does not provide a mechanism for obtaining historical messages. The ConsumeSlack Processor should be used for an initial load of messages from a channel. See Usage / Additional Details for more information about how to configure this Processor and enable it to retrieve messages and commands from Slack.
PublishSlackCommunityUnstructuredSlackPosts a message to the specified Slack channel. The content of the message can be either a user-defined message that makes use of Expression Language or\nthe contents of the FlowFile can be sent as the message. If sending a user-defined message, the contents of the FlowFile may also be optionally uploaded as\na file attachment.\n
FetchSmbCommunityUnstructuredSMBFetches files from a SMB Share. Designed to be used in tandem with ListSmb.
GetSmbFileCommunityUnstructuredSMBReads file from a samba network location to FlowFiles. Use this processor instead of a cifs mounts if share access control is important. Configure the Hostname, Share and Directory accordingly: \\\\Hostname\\Share\\path\\to\\Directory
ListSmbCommunityUnstructuredSMBLists concrete files shared via SMB protocol. Each listed file may result in one flowfile, the metadata being written as flowfile attributes. Or - in case the 'Record Writer' property is set - the entire result is written as records to a single flowfile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutSmbFileCommunityUnstructuredSMBWrites the contents of a FlowFile to a samba network location. Use this processor instead of a cifs mounts if share access control is important.Configure the Hostname, Share and Directory accordingly: \\\\Hostname\\Share\\path\\to\\Directory
ConsumeTwitterCommunityUnstructuredTwitter / XStreams tweets from Twitter's streaming API v2. The stream provides a sample stream or a search stream based on previously uploaded rules. This processor also provides a pass through for certain fields of the tweet to be returned as part of the response. See https://developer.twitter.com/en/docs/twitter-api/data-dictionary/introduction for more information regarding the Tweet object model.
GetSplunkCommunityStructuredSplunkRetrieves data from Splunk Enterprise.
PutSplunkCommunityStructuredSplunkSends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. If a Message Delimiter is provided, then this processor will read messages from the incoming FlowFile based on the delimiter, and send each message to Splunk. If a Message Delimiter is not provided then the content of the FlowFile will be sent directly to Splunk as if it were a single message.
PutSplunkHTTPCommunityStructuredSplunkSends flow file content to the specified Splunk server over HTTP or HTTPS. Supports HEC Index Acknowledgement.
QuerySplunkIndexingStatusCommunityStructuredSplunkQueries Splunk server in order to acquire the status of indexing acknowledgement.
ExecuteSQLCommunityStructuredJDBCExecutes provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected.
ExecuteSQLRecordCommunityStructuredJDBCExecutes provided SQL select query. Query result will be converted to the format specified by a Record Writer. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected.
FetchFTPCommunityUnstructuredFTPFetches the content of a file from a remote FTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.
FetchFileCommunityUnstructuredLocal fileReads the contents of a file from disk and streams it into the contents of an incoming FlowFile. Once this is done, the file is optionally moved elsewhere or deleted to help keep the file system organized.
FetchSFTPCommunityUnstructuredSFTPFetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.
GenerateTableFetchCommunityStructuredJDBCGenerates SQL select queries that fetch \pages\" of rows from a table. The partition size property along with the table's row count determine the size and number of pages and generated FlowFiles. In addition incremental fetching can be achieved by setting Maximum-Value Columns which causes the processor to track the columns' maximum values thus only fetching rows whose columns' values exceed the observed maximums. This processor is intended to be run on the Primary Node only.\n\nThis processor can accept incoming connections; the behavior of the processor is different whether incoming connections are provided:\n - If no incoming connection(s) are specified the processor will generate SQL queries on the specified processor schedule. Expression Language is supported for many fields but no FlowFile attributes are available. However the properties will be evaluated using the Environment/System properties.\n - If incoming connection(s) are specified and no FlowFile is available to a processor task no work will be performed.\n - If incoming connection(s) are specified and a FlowFile is available to a processor task the FlowFile's attributes may be used in Expression Language for such fields as Table Name and others. However the Max-Value Columns and Columns to Return fields must be empty or refer to columns that are available in each specified table.
GetFTPCommunityUnstructuredFTPFetches files from an FTP Server and creates FlowFiles from them
GetFileCommunityUnstructuredLocal fileCreates FlowFiles from files in a directory. NiFi will ignore files it doesn't have at least read permissions for.
GetSFTPCommunityUnstructuredSFTPFetches files from an SFTP Server and creates FlowFiles from them
HandleHttpRequestCommunityUnstructuredHTTPStarts an HTTP Server and listens for HTTP Requests. For each request, creates a FlowFile and transfers to 'success'. This Processor is designed to be used in conjunction with the HandleHttpResponse Processor in order to create a Web Service. In case of a multipart request, one FlowFile is generated for each part.
HandleHttpResponseCommunityUnstructuredHTTPSends an HTTP Response to the Requestor that generated a FlowFile. This Processor is designed to be used in conjunction with the HandleHttpRequest in order to create a web service.
InvokeHTTPCommunityUnstructuredHTTPAn HTTP client processor which can interact with a configurable HTTP Endpoint. The destination URL and HTTP Method are configurable. When the HTTP Method is PUT, POST or PATCH, the FlowFile contents are included as the body of the request and FlowFile attributes are converted to HTTP headers, optionally, based on configuration properties.
ListDatabaseTablesCommunityStructuredJDBCGenerates a set of flow files, each containing attributes corresponding to metadata about a table from a database connection. Once metadata about a table has been fetched, it will not be fetched again until the Refresh Interval (if set) has elapsed, or until state has been manually cleared.
ListFTPCommunityUnstructuredFTPPerforms a listing of the files residing on an FTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchFTP in order to fetch those files.
ListFileCommunityUnstructuredLocal fileRetrieves a listing of files from the input directory. For each file listed, creates a FlowFile that represents the file so that it can be fetched in conjunction with FetchFile. This Processor is designed to run on Primary Node only in a cluster when 'Input Directory Location' is set to 'Remote'. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all the data. When 'Input Directory Location' is 'Local', the 'Execution' mode can be anything, and synchronization won't happen. Unlike GetFile, this Processor does not delete any data from the local filesystem.
ListSFTPCommunityUnstructuredSFTPPerforms a listing of the files residing on an SFTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchSFTP in order to fetch those files.
ListenFTPCommunityUnstructuredFTPStarts an FTP server that listens on the specified port and transforms incoming files into FlowFiles. The URI of the service will be ftp://{hostname}:{port}. The default port is 2221.
ListenHTTPCommunityUnstructuredHTTPStarts an HTTP Server and listens on a given base path to transform incoming requests into FlowFiles. The default URI of the Service will be http://{hostname}:{port}/contentListener. Only HEAD and POST requests are supported. GET, PUT, DELETE, OPTIONS and TRACE will result in an error and the HTTP response status code 405; CONNECT will also result in an error and the HTTP response status code 400. GET is supported on /healthcheck. If the service is available, it returns \200 OK\" with the content \"OK\". The health check functionality can be configured to be accessible via a different port. For details see the documentation of the \"Listening Port for health check requests\" property.A Record Reader and Record Writer property can be enabled on the processor to process incoming requests as records. Record processing is not allowed for multipart requests and request in FlowFileV3 format (minifi)."
ListenSyslogCommunityStructuredSyslogListens for Syslog messages being sent to a given port over TCP or UDP. Incoming messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The format of each message is: ()(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The timestamp can be an RFC5424 timestamp with a format of \yyyy-MM-dd'T'HH:mm:ss.SZ\" or \"yyyy-MM-dd'T'HH:mm:ss.S+hh:mm\" or it can be an RFC3164 timestamp with a format of \"MMM d HH:mm:ss\". If an incoming messages matches one of these patterns the message will be parsed and the individual pieces will be placed in FlowFile attributes with the original message in the content of the FlowFile. If an incoming message does not match one of these patterns it will not be parsed and the syslog.valid attribute will be set to false with the original message in the content of the FlowFile. Valid messages will be transferred on the success relationship and invalid messages will be transferred on the invalid relationship.
ListenTCPCommunityUnstructuredTCPListens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. The default behavior is for each message to produce a single FlowFile, however this can be controlled by increasing the Batch Size to a larger value for higher throughput. The Receive Buffer Size must be set as large as the largest messages expected to be received, meaning if every 100kb there is a line separator, then the Receive Buffer Size must be greater than 100kb. The processor can be configured to use an SSL Context Service to only allow secure connections. When connected clients present certificates for mutual TLS authentication, the Distinguished Names of the client certificate's issuer and subject are added to the outgoing FlowFiles as attributes. The processor does not perform authorization based on Distinguished Name values, but since these values are attached to the outgoing FlowFiles, authorization can be implemented based on these attributes.
ListenUDPCommunityUnstructuredUDPListens for Datagram Packets on a given port. The default behavior produces a FlowFile per datagram, however for higher throughput the Max Batch Size property may be increased to specify the number of datagrams to batch together in a single FlowFile. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports.
ListenUDPRecordCommunityStructuredUDPListens for Datagram Packets on a given port and reads the content of each datagram using the configured Record Reader. Each record will then be written to a flow file using the configured Record Writer. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports.
PutEmailCommunityUnstructuredEmailSends an e-mail to configured recipients for each incoming FlowFile
PutFTPCommunityUnstructuredFTPSends FlowFiles to an FTP Server
PutFileCommunityUnstructuredLocal fileWrites the contents of a FlowFile to the local file system
PutRecordCommunityStructuredJDBCThe PutRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file, and sends them to a destination specified by a Record Destination Service (i.e. record sink).
PutSFTPCommunityUnstructuredSFTPSends FlowFiles to an SFTP Server
PutSQLCommunityStructuredJDBCExecutes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute. The SQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.
PutSyslogCommunityStructuredSyslogSends Syslog messages to a given host and port over TCP or UDP. Messages are constructed from the \Message ___\" properties of the processor which can use expression language to generate messages from incoming FlowFiles. The properties are used to construct messages of the form: ()(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The constructed messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The timestamp can be an RFC5424 timestamp with a format of \"yyyy-MM-dd'T'HH:mm:ss.SZ\" or \"yyyy-MM-dd'T'HH:mm:ss.S+hh:mm\" or it can be an RFC3164 timestamp with a format of \"MMM d HH:mm:ss\". If a message is constructed that does not form a valid Syslog message according to the above description then it is routed to the invalid relationship. Valid messages are sent to the Syslog server and successes are routed to the success relationship failures routed to the failure relationship.
PutTCPCommunityUnstructuredTCPSends serialized FlowFiles or Records over TCP to a configurable destination with optional support for TLS
PutUDPCommunityUnstructuredUDPThe PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. The user must ensure that the FlowFile content being fed to this processor is not larger than the maximum size for the underlying UDP transport. The maximum transport size will vary based on the platform setup but is generally just under 64KB. FlowFiles will be marked as failed if their content is larger than the maximum transport size.
QueryDatabaseTableCommunityStructuredJDBCGenerates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to Avro format. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected.
QueryDatabaseTableRecordCommunityStructuredJDBCGenerates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to the format specified by the record writer. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected.
QueryRecordCommunityStructuredJDBCEvaluates one or more SQL queries against the contents of a FlowFile. The result of the SQL query then becomes the content of the output FlowFile. This can be used, for example, for field-specific filtering, transformation, and row-level filtering. Columns can be renamed, simple calculations and aggregations performed, etc. The Processor is configured with a Record Reader Controller Service and a Record Writer service so as to allow flexibility in incoming and outgoing data formats. The Processor must be configured with at least one user-defined property. The name of the Property is the Relationship to route data to, and the value of the Property is a SQL SELECT statement that is used to specify how input data should be transformed/filtered. The SQL statement must be valid ANSI SQL and is powered by Apache Calcite. If the transformation fails, the original FlowFile is routed to the 'failure' relationship. Otherwise, the data selected will be routed to the associated relationship. If the Record Writer chooses to inherit the schema from the Record, it is important to note that the schema that is inherited will be from the ResultSet, rather than the input Record. This allows a single instance of the QueryRecord processor to have multiple queries, each of which returns a different set of columns and aggregations. As a result, though, the schema that is derived will have no schema name, so it is important that the configured Record Writer not attempt to write the Schema Name as an attribute if inheriting the Schema from the Record. See the Processor Usage documentation for more information.
UpdateDatabaseTableCommunityStructuredJDBCThis processor uses a JDBC connection and incoming records to generate any database table changes needed to support the incoming records. It expects a 'flat' record layout, meaning none of the top-level record fields has nested fields that are intended to become columns themselves.
ConnectWebSocketCommunityUnstructuredWebsocketActs as a WebSocket client endpoint to interact with a remote WebSocket server. FlowFiles are transferred to downstream relationships according to received message types as WebSocket client configured with this processor receives messages from remote WebSocket server. If a new flowfile is passed to the processor, the previous sessions will be closed and any data being sent will be aborted.
ListenWebSocketCommunityUnstructuredWebsocketActs as a WebSocket server endpoint to accept client connections. FlowFiles are transferred to downstream relationships according to received message types as the WebSocket server configured with this processor receives client requests
PutWebSocketCommunityUnstructuredWebsocketSends messages to a WebSocket remote endpoint using a WebSocket session that is established by either ListenWebSocket or ConnectWebSocket.
GetWorkdayReportCommunityStructuredWorkdayA processor which can interact with a configurable Workday Report. The processor can forward the content without modification, or you can transform it by providing the specific Record Reader and Record Writer services based on your needs. You can also remove fields by defining schema in the Record Writer. Supported Workday report formats are: csv, simplexml, json
GetZendeskCommunityStructuredZendeskIncrementally fetches data from Zendesk API.
PutZendeskTicketCommunityStructuredZendeskCreate Zendesk tickets using the Zendesk API.