Datavolo DevCenter | Connector processors

Connector processors

The sortable table below provides the Name and Description of the Datavolo runtime included processors which:

Are Datavolo exclusive or Apache NiFi community sourced
Support Structured or Unstructured data formats
Are aligned to a specific or generic System / Protocol
Function as Readable sources and/or Writeable targets

This table of connector processors is also available as a downloadable PDF document. The full list of processors can be found in the Processor documentation.

Processor Name	Datavolo or NiFi	Struct or Unstruct	System / Protocol	R	W	Description
PutIcebergTable	Datavolo	Struct	Snowflake		✓	Store records in Iceberg using configurable Catalog for managing namespaces and tables.
PutSnowflakeInternalStageFile	Datavolo	Unstruct	Snowflake		✓	Puts files into a Snowflake internal stage. The internal stage must be created in the Snowflake account beforehand. This processor can be connected to a StartSnowflakeIngest processor to ingest the file in the internal stage.
DeleteDBFSResource	Datavolo	Unstruct	Databricks		✓	Delete a DBFS files and directories.
GetDBFSFile	Datavolo	Unstruct	Databricks	✓		Read a DBFS file.
ListDBFSDirectory	Datavolo	Unstruct	Databricks	✓		List file names in a DBFS directory and output a new FlowFile with the filename.
PutDBFSFile	Datavolo	Unstruct	Databricks		✓	Write FlowFile content to DBFS.
DeleteUnityCatalogResource	Datavolo	Unstruct	Databricks		✓	Delete a Unity Catalog file or directory.
GetUnityCatalogFile	Datavolo	Unstruct	Databricks	✓		Read a Unity Catalog file up to 5 GiB.
GetUnityCatalogFileMetadata	Datavolo	Unstruct	Databricks	✓		Checks for Unity Catalog file metadata.
ListUnityCatalogDirectory	Datavolo	Unstruct	Databricks	✓		List file names in a Unity Catalog directory and output a new FlowFile with the filename.
PutUnityCatalogFile	Datavolo	Unstruct	Databricks		✓	Write FlowFile content with max size of 5 GiB to Unity Catalog.
PutDatabricksSQL	Datavolo	Struct	Databricks	✓	✓	Submit a SQL Execution using Databricks REST API then write the JSON response to FlowFile Content. For high performance SELECT or INSERT queries use ExecuteSQL instead.
CaptureGoogleDriveChanges	Datavolo	Unstruct	Google Drive	✓		Captures changes to a Shared Google Drive and emits a FlowFile for each change that occurs. This includes addition and deletion of files,\nas well as changes to file metadata and permissions. The processor is designed to be used in conjunction with the FetchGoogleDrive processor.\n
PutHubSpot	Datavolo	Struct	Hubspot		✓	Upsert a HubSpot object.
CaptureSharepointChanges	Datavolo	Unstruct	Microsoft Sharepoint	✓		Captures changes from a Sharepoint Document Library and emits a FlowFile for each change that occurs. This includes additions and deletions of files and folders, as well as changes to permissions, metadata, and file content.
FetchSharepointFile	Datavolo	Unstruct	Microsoft Sharepoint	✓		Fetches the contents of a file from a Sharepoint Drive, optionally downloading a PDF or HTML version of the file when applicable.\nAny FlowFile that represents a Sharepoint folder will be routed to success without fetching contents.\n
DeletePinecone	Datavolo	Struct	Pinecone		✓	Deletes vectors from a Pinecone index.
QueryPinecone	Datavolo	Struct	Pinecone	✓		Queries Pinecone for vectors that are similar to the input vector, or retrieves a vector by ID.
UpsertPinecone	Datavolo	Struct	Pinecone		✓	Publishes vectors, including metadata, and optionally text, to a Pinecone index.
PutVectaraDocument	Datavolo	Unstruct	Vectara		✓	Generate and upload a JSON document to Vectara's upload endpoint. The input text can be JSON Object, JSON Array, or JSONL format.
PutVectaraFile	Datavolo	Unstruct	Vectara		✓	Upload a FlowFile content to Vectara's index endpoint. Document filter attributes and metadata attributes can be set by referencing FlowFile attributes.
ConsumeAMQP	Community	Unstruct	AMQP	✓		Consumes AMQP Messages from an AMQP Broker using the AMQP 0.9.1 protocol. Each message that is received from the AMQP Broker will be emitted as its own FlowFile to the 'success' relationship.
PublishAMQP	Community	Unstruct	AMQP		✓	Creates an AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange. In a typical AMQP exchange model, the message that is sent to the AMQP Exchange will be routed based on the 'Routing Key' to its final destination in the queue (the binding). If due to some misconfiguration the binding between the Exchange, Routing Key and Queue is not set up, the message will have no final destination and will return (i.e., the data will not make it to the queue). If that happens you will see a log in both app-log and bulletin stating to that effect, and the FlowFile will be routed to the 'failure' relationship.
PutCloudWatchMetric	Community	Struct	AWS Cloudwatch		✓	Publishes metrics to Amazon CloudWatch. Metric can be either a single value, or a StatisticSet comprised of minimum, maximum, sum and sample count.
DeleteDynamoDB	Community	Struct	AWS Dynamo DB		✓	Deletes a document from DynamoDB based on hash and range key. The key can be string or number. The request requires all the primary keys for the operation (hash or hash and range key)
GetDynamoDB	Community	Struct	AWS Dynamo DB	✓		Retrieves a document from DynamoDB based on hash and range key. The key can be string or number.For any get request all the primary keys are required (hash or hash and range based on the table keys).A Json Document ('Map') attribute of the DynamoDB item is read into the content of the FlowFile.
PutDynamoDB	Community	Struct	AWS Dynamo DB		✓	Puts a document from DynamoDB based on hash and range key. The table can have either hash and range or hash key alone. Currently the keys supported are string and number and value can be json document. In case of hash and range keys both key are required for the operation. The FlowFile content must be JSON. FlowFile content is mapped to the specified Json Document attribute in the DynamoDB item.
PutDynamoDBRecord	Community	Struct	AWS Dynamo DB		✓	Inserts items into DynamoDB based on record-oriented data. The record fields are mapped into DynamoDB item fields, including partition and sort keys if set. Depending on the number of records the processor might execute the insert in multiple chunks in order to overcome DynamoDB's limitation on batch writing. This might result partially processed FlowFiles in which case the FlowFile will be transferred to the \unprocessed\" relationship with the necessary attribute to retry later without duplicating the already executed inserts."
PutKinesisFirehose	Community	Unstruct	AWS Kinesis		✓	Sends the contents to a specified Amazon Kinesis Firehose. In order to send data to firehose, the firehose delivery stream name has to be specified.
ConsumeKinesisStream	Community	Unstruct	AWS Kinesis	✓		Reads data from the specified AWS Kinesis stream and outputs a FlowFile for every processed Record (raw) or a FlowFile for a batch of processed records if a Record Reader and Record Writer are configured. At-least-once delivery of all Kinesis Records within the Stream while the processor is running. AWS Kinesis Client Library can take several seconds to initialise before starting to fetch data. Uses DynamoDB for check pointing and CloudWatch (optional) for metrics. Ensure that the credentials provided have access to DynamoDB and CloudWatch (optional) along with Kinesis.
PutKinesisStream	Community	Unstruct	AWS Kinesis		✓	Sends the contents to a specified Amazon Kinesis. In order to send data to Kinesis, the stream name has to be specified.
PutLambda	Community	Unstruct	AWS Lambda		✓	Sends the contents to a specified Amazon Lambda Function. The AWS credentials used for authentication must have permissions execute the Lambda function (lambda:InvokeFunction).The FlowFile content must be JSON.
DeleteS3Object	Community	Unstruct	AWS S3		✓	Deletes a file from an Amazon S3 Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success.
FetchS3Object	Community	Unstruct	AWS S3	✓		Retrieves the contents of an S3 Object and writes it to the content of a FlowFile
ListS3	Community	Unstruct	AWS S3	✓		Retrieves a listing of objects from an S3 bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchS3Object. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutS3Object	Community	Unstruct	AWS S3		✓	Writes the contents of a FlowFile as an S3 Object to an Amazon S3 Bucket.
TagS3Object	Community	Unstruct	AWS S3		✓	Adds or updates a tag on an Amazon S3 Object.
PutSNS	Community	Struct	AWS SNS		✓	Sends the content of a FlowFile as a notification to the Amazon Simple Notification Service
DeleteSQS	Community	Struct	AWS SQS		✓	Deletes a message from an Amazon Simple Queuing Service Queue
GetSQS	Community	Struct	AWS SQS	✓		Fetches messages from an Amazon Simple Queuing Service Queue
PutSQS	Community	Struct	AWS SQS		✓	Publishes a message to an Amazon Simple Queuing Service Queue
PutAzureCosmosDBRecord	Community	Struct	Azure CosmosDB		✓	This processor is a record-aware processor for inserting data into Cosmos DB with Core SQL API. It uses a configured record reader and schema to read an incoming record set from the body of a Flowfile and then inserts those records into a configured Cosmos DB Container.
PutAzureDataExplorer	Community	Struct	Azure Data Explorer		✓	Acts as an Azure Data Explorer sink which sends FlowFiles to the provided endpoint. Data can be sent through queued ingestion or streaming ingestion to the Azure Data Explorer cluster.
QueryAzureDataExplorer	Community	Struct	Azure Data Explorer	✓		Query Azure Data Explorer and stream JSON results to output FlowFiles
ConsumeAzureEventHub	Community	Unstruct	Azure Event Hub	✓		Receives messages from Microsoft Azure Event Hubs with checkpointing to ensure consistent event processing. Checkpoint tracking avoids consuming a message multiple times and enables reliable resumption of processing in the event of intermittent network failures. Checkpoint tracking requires external storage and provides the preferred approach to consuming messages from Azure Event Hubs. In clustered environment, ConsumeAzureEventHub processor instances form a consumer group and the messages are distributed among the cluster nodes (each message is processed on one cluster node only).
GetAzureEventHub	Community	Unstruct	Azure Event Hub	✓		Receives messages from Microsoft Azure Event Hubs without reliable checkpoint tracking. In clustered environment, GetAzureEventHub processor instances work independently and all cluster nodes process all messages (unless running the processor in Primary Only mode). ConsumeAzureEventHub offers the recommended approach to receiving messages from Azure Event Hubs. This processor creates a thread pool for connections to Azure Event Hubs.
PutAzureEventHub	Community	Unstruct	Azure Event Hub		✓	Send FlowFile contents to Azure Event Hubs
CopyAzureBlobStorage_v12	Community	Unstruct	Azure Blob Storage		✓	Copies a blob in Azure Blob Storage from one account/container to another. The processor uses Azure Blob Storage client library v12.
DeleteAzureBlobStorage_v12	Community	Unstruct	Azure Blob Storage		✓	Deletes the specified blob from Azure Blob Storage. The processor uses Azure Blob Storage client library v12.
DeleteAzureDataLakeStorage	Community	Unstruct	Azure DataLake Storage		✓	Deletes the provided file from Azure Data Lake Storage
FetchAzureBlobStorage_v12	Community	Unstruct	Azure Blob Storage	✓		Retrieves the specified blob from Azure Blob Storage and writes its content to the content of the FlowFile. The processor uses Azure Blob Storage client library v12.
FetchAzureDataLakeStorage	Community	Unstruct	Azure DataLake Storage	✓		Fetch the specified file from Azure Data Lake Storage
ListAzureBlobStorage_v12	Community	Unstruct	Azure Blob Storage	✓		Lists blobs in an Azure Blob Storage container. Listing details are attached to an empty FlowFile for use with FetchAzureBlobStorage. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. The processor uses Azure Blob Storage client library v12.
ListAzureDataLakeStorage	Community	Unstruct	Azure DataLake Storage	✓		Lists directory in an Azure Data Lake Storage Gen 2 filesystem
MoveAzureDataLakeStorage	Community	Unstruct	Azure DataLake Storage		✓	Moves content within an Azure Data Lake Storage Gen 2. After the move, files will be no longer available on source location.
PutAzureBlobStorage_v12	Community	Unstruct	Azure Blob Storage		✓	Puts content into a blob on Azure Blob Storage. The processor uses Azure Blob Storage client library v12.
PutAzureDataLakeStorage	Community	Unstruct	Azure DataLake Storage		✓	Writes the contents of a FlowFile as a file on Azure Data Lake Storage Gen 2
GetAzureQueueStorage_v12	Community	Unstruct	Azure Queue Storage	✓		Retrieves the messages from an Azure Queue Storage. The retrieved messages will be deleted from the queue by default. If the requirement is to consume messages without deleting them, set 'Auto Delete Messages' to 'false'. Note: There might be chances of receiving duplicates in situations like when a message is received but was unable to be deleted from the queue due to some unexpected situations.
PutAzureQueueStorage_v12	Community	Unstruct	Azure Queue Storage		✓	Writes the content of the incoming FlowFiles to the configured Azure Queue Storage.
FetchBoxFile	Community	Unstruct	Box	✓		Fetches files from a Box Folder. Designed to be used in tandem with ListBoxFile.
ListBoxFile	Community	Unstruct	Box	✓		Lists files in a Box folder. Each listed file may result in one FlowFile, the metadata being written as FlowFile attributes. Or - in case the 'Record Writer' property is set - the entire result is written as records to a single FlowFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutBoxFile	Community	Unstruct	Box		✓	Puts content to a Box folder.
FetchDropbox	Community	Unstruct	Dropbox	✓		Fetches files from Dropbox. Designed to be used in tandem with ListDropbox.
ListDropbox	Community	Unstruct	Dropbox	✓		Retrieves a listing of files from Dropbox (shortcuts are ignored). Each listed file may result in one FlowFile, the metadata being written as FlowFile attributes. When the 'Record Writer' property is set, the entire result is written as records to a single FlowFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutDropbox	Community	Unstruct	Dropbox		✓	Puts content to a Dropbox folder.
ConsumeElasticsearch	Community	Struct	Elastic	✓		A processor that repeatedly runs a paginated query against a field using a Range query to consume new Documents from an Elasticsearch index/query. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached, after which the Range query will automatically update the field constraint based on the last retrieved Document value.
DeleteByQueryElasticsearch	Community	Struct	Elastic		✓	Delete from an Elasticsearch index using a query. The query can be loaded from a flowfile body or from the Query parameter.
GetElasticsearch	Community	Struct	Elastic	✓		Elasticsearch get processor that uses the official Elastic REST client libraries to fetch a single document from Elasticsearch by _id. Note that the full body of the document will be read into memory before being written to a FlowFile for transfer.
JsonQueryElasticsearch	Community	Struct	Elastic	✓		A processor that allows the user to run a query (with aggregations) written with the Elasticsearch JSON DSL. It does not automatically paginate queries for the user. If an incoming relationship is added to this processor, it will use the flowfile's content for the query. Care should be taken on the size of the query because the entire response from Elasticsearch will be loaded into memory all at once and converted into the resulting flowfiles.
PaginatedJsonQueryElasticsearch	Community	Struct	Elastic	✓		A processor that allows the user to run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. It will use the flowfile's content for the query unless the QUERY attribute is populated. Search After/Point in Time queries must include a valid \sort\" field."
PutElasticsearchJson	Community	Struct	Elastic		✓	An Elasticsearch put processor that uses the official Elastic REST client libraries. Each FlowFile is treated as a document to be sent to the Elasticsearch _bulk API. Multiple FlowFiles can be batched together into each Request sent to Elasticsearch.
PutElasticsearchRecord	Community	Struct	Elastic		✓	A record-aware Elasticsearch put processor that uses the official Elastic REST client libraries. Each Record within the FlowFile is converted into a document to be sent to the Elasticsearch _bulk APi. Multiple documents can be batched into each Request sent to Elasticsearch. Each document's Bulk operation can be configured using Record Path expressions.
SearchElasticsearch	Community	Struct	Elastic	✓		A processor that allows the user to repeatedly run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. Search After/Point in Time queries must include a valid \sort\" field. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached after which the query will restart with the first page of results being retrieved.
UpdateByQueryElasticsearch	Community	Struct	Elastic		✓	Update documents in an Elasticsearch index using a query. The query can be loaded from a flowfile body or from the Query parameter. The loaded Query can contain any JSON accepted by Elasticsearch's _update_by_query API, for example a \query\" object to identify what documents are to be updated lus a \"script\" to define the updates to perform.
ConsumeIMAP	Community	Unstruct	IMAP	✓		Consumes messages from Email Server using IMAP protocol. The raw-bytes of each received email message are written as contents of the FlowFile
ConsumePOP3	Community	Unstruct	POP3	✓		Consumes messages from Email Server using POP3 protocol. The raw-bytes of each received email message are written as contents of the FlowFile
PutBigQuery	Community	Struct	Google BigQuery		✓	Writes the contents of a FlowFile to a Google BigQuery table. The processor is record based so the schema that is used is driven by the RecordReader. Attributes that are not matched to the target schema are skipped. Exactly once delivery semantics are achieved via stream offsets.
FetchGoogleDrive	Community	Unstruct	Google Drive	✓		Fetches files from a Google Drive Folder. Designed to be used in tandem with ListGoogleDrive. Please see Additional Details to set up access to Google Drive.
ListGoogleDrive	Community	Unstruct	Google Drive	✓		Performs a listing of concrete files (shortcuts are ignored) in a Google Drive folder. If the 'Record Writer' property is set, a single Output FlowFile is created, and each file in the listing is written as a single record to the output file. Otherwise, for each file in the listing, an individual FlowFile is created, the metadata being written as FlowFile attributes. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Please see Additional Details to set up access to Google Drive.
PutGoogleDrive	Community	Unstruct	Google Drive		✓	Writes the contents of a FlowFile as a file in Google Drive.
ConsumeGCPubSub	Community	Unstruct	Google Drive	✓		Consumes message from the configured Google Cloud PubSub subscription. If the 'Batch Size' is set, the configured number of messages will be pulled in a single request, else only one message will be pulled.
PublishGCPubSub	Community	Unstruct	Google PubSub		✓	Publishes the content of the incoming flowfile to the configured Google Cloud PubSub topic. The processor supports dynamic properties. If any dynamic properties are present, they will be sent along with the message in the form of 'attributes'.
ConsumeGCPubSubLite	Community	Unstruct	Google PubSub Lite	✓		Consumes message from the configured Google Cloud PubSub Lite subscription.
PublishGCPubSubLite	Community	Unstruct	Google PubSub Lite		✓	Publishes the content of the incoming FlowFile to the configured Google Cloud PubSub Lite topic. The processor supports dynamic properties. If any dynamic properties are present, they will be sent along with the message in the form of 'attributes'.
DeleteGCSObject	Community	Unstruct	Google Cloud Storage		✓	Deletes objects from a Google Cloud Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success.
FetchGCSObject	Community	Unstruct	Google Cloud Storage	✓		Fetches a file from a Google Cloud Bucket. Designed to be used in tandem with ListGCSBucket.
ListGCSBucket	Community	Unstruct	Google Cloud Storage	✓		Retrieves a listing of objects from a GCS bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchGCSObject. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutGCSObject	Community	Unstruct	Google Cloud Storage		✓	Writes the contents of a FlowFile as an object in a Google Cloud Storage.
GetGcpVisionAnnotateFilesOperationStatus	Community	Unstruct	Google Vision	✓		Retrieves the current status of an Google Vision operation.
GetGcpVisionAnnotateImagesOperationStatus	Community	Unstruct	Google Vision	✓		Retrieves the current status of an Google Vision operation.
StartGcpVisionAnnotateFilesOperation	Community	Unstruct	Google Vision	✓	✓	Trigger a Vision operation on file input. It should be followed by GetGcpVisionAnnotateFilesOperationStatus processor in order to monitor operation status.
StartGcpVisionAnnotateImagesOperation	Community	Unstruct	Google Vision	✓	✓	Trigger a Vision operation on image input. It should be followed by GetGcpVisionAnnotateImagesOperationStatus processor in order to monitor operation status.
GetHubSpot	Community	Struct	Hubspot	✓		Retrieves JSON data from a private HubSpot application. This processor is intended to be run on the Primary Node only.
ConsumeJMS	Community	Unstruct	JMS	✓		Consumes JMS Message of type BytesMessage, TextMessage, ObjectMessage, MapMessage or StreamMessage transforming its content to a FlowFile and transitioning it to 'success' relationship. JMS attributes such as headers and properties will be copied as FlowFile attributes. MapMessages will be transformed into JSONs and then into byte arrays. The other types will have their raw contents as byte array transferred into the flowfile.
PublishJMS	Community	Unstruct	JMS		✓	Creates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage or TextMessage. FlowFile attributes will be added as JMS headers and/or properties to the outgoing JMS message.
ConsumeKafka	Community	Unstruct	Kafka	✓		Consumes messages from Apache Kafka Consumer API. The complementary NiFi processor for sending messages is PublishKafka. The Processor supports consumption of Kafka messages, optionally interpreted as NiFi records. Please note that, at this time (in read record mode), the Processor assumes that all records that are retrieved from a given partition have the same schema. For this mode, if any of the Kafka messages are pulled but cannot be parsed or written with the configured Record Reader or Record Writer, the contents of the message will be written to a separate FlowFile, and that FlowFile will be transferred to the 'parse.failure' relationship. Otherwise, each FlowFile is sent to the 'success' relationship and may contain many individual messages within the single FlowFile. A 'record.count' attribute is added to indicate how many messages are contained in the FlowFile. No two Kafka messages will be placed into the same FlowFile if they have different schemas, or if they have different values for a message header that is included by the property.
PublishKafka	Community	Unstruct	Kafka		✓	Sends the contents of a FlowFile as either a message or as individual records to Apache Kafka using the Kafka Producer API. The messages to send may be individual FlowFiles, may be delimited using a user-specified delimiter (such as a new-line), or may be record-oriented data that can be read by the configured Record Reader. The complementary NiFi processor for fetching messages is ConsumeKafka.
DeleteMongo	Community	Struct	MongoDB		✓	Executes a delete query against a MongoDB collection. The query is provided in the body of the flowfile and the user can select whether it will delete one or many documents that match it.
GetMongo	Community	Struct	MongoDB	✓		Creates FlowFiles from documents in MongoDB loaded by a user-specified query.
GetMongoRecord	Community	Struct	MongoDB	✓		A record-based version of GetMongo that uses the Record writers to write the MongoDB result set.
PutMongo	Community	Struct	MongoDB		✓	Writes the contents of a FlowFile to MongoDB
PutMongoBulkOperations	Community	Struct	MongoDB		✓	Writes the contents of a FlowFile to MongoDB as bulk-update
PutMongoRecord	Community	Struct	MongoDB		✓	This processor is a record-aware processor for inserting/upserting data into MongoDB. It uses a configured record reader and schema to read an incoming record set from the body of a flowfile and then inserts/upserts batches of those records into a configured MongoDB collection. This processor does not support deletes. The number of documents to insert/upsert at a time is controlled by the \Batch Size\" configuration property. This value should be set to a reasonable size to ensure that MongoDB is not overloaded with too many operations at once."
RunMongoAggregation	Community	Struct	MongoDB	✓		A processor that runs an aggregation query whenever a flowfile is received.
DeleteGridFS	Community	Unstruct	MongoDB		✓	Deletes a file from GridFS using a file name or a query.
FetchGridFS	Community	Unstruct	MongoDB	✓		Retrieves one or more files from a GridFS bucket by file name or by a user-defined query.
PutGridFS	Community	Unstruct	MongoDB		✓	Writes a file to a GridFS bucket.
ConsumeMQTT	Community	Unstruct	MQTT	✓		Subscribes to a topic and receives messages from an MQTT broker
PublishMQTT	Community	Unstruct	MQTT		✓	Publishes a message to an MQTT topic
ListenOTLP	Community	Struct	Open Telemetry	✓		Collect OpenTelemetry messages over HTTP or gRPC. Supports standard Export Service Request messages for logs, metrics, and traces. Implements OpenTelemetry OTLP Specification 1.0.0 with OTLP/gRPC and OTLP/HTTP. Provides protocol detection using the HTTP Content-Type header.
PutRedisHashRecord	Community	Struct	Redis		✓	Puts record field data into Redis using a specified hash value, which is determined by a RecordPath to a field in each record containing the hash value. The record fields and values are stored as key/value pairs associated by the hash value. NOTE: Neither the evaluated hash value nor any of the field values can be null. If the hash value is null, the FlowFile will be routed to failure. For each of the field values, if the value is null that field will be not set in Redis.
PutSalesforceObject	Community	Struct	Salesforce		✓	Creates new records for the specified Salesforce sObject. The type of the Salesforce object must be set in the input flowfile's 'objectType' attribute. This processor cannot update existing records.
QuerySalesforceObject	Community	Struct	Salesforce	✓		Retrieves records from a Salesforce sObject. Users can add arbitrary filter conditions by setting the 'Custom WHERE Condition' property. The processor can also run a custom query, although record processing is not supported in that case. Supports incremental retrieval: users can define a field in the 'Age Field' property that will be used to determine when the record was created. When this property is set the processor will retrieve new records. Incremental loading and record-based processing are only supported in property-based queries. It's also possible to define an initial cutoff value for the age, filtering out all older records even for the first run. In case of 'Property Based Query' this processor should run on the Primary Node only. FlowFile attribute 'record.count' indicates how many records were retrieved and written to the output. The processor can accept an optional input FlowFile and reference the FlowFile attributes in the query.
GetShopify	Community	Struct	Shopify	✓		Retrieves objects from a custom Shopify store. The processor yield time must be set to the account's rate limit accordingly.
ConsumeSlack	Community	Unstruct	Slack	✓		Retrieves messages from one or more configured Slack channels. The messages are written out in JSON format. See Usage / Additional Details for more information about how to configure this Processor and enable it to retrieve messages from Slack.
ListenSlack	Community	Unstruct	Slack	✓		Retrieves real-time messages or Slack commands from one or more Slack conversations. The messages are written out in JSON format. Note that this Processor should be used to obtain real-time messages and commands from Slack and does not provide a mechanism for obtaining historical messages. The ConsumeSlack Processor should be used for an initial load of messages from a channel. See Usage / Additional Details for more information about how to configure this Processor and enable it to retrieve messages and commands from Slack.
PublishSlack	Community	Unstruct	Slack		✓	Posts a message to the specified Slack channel. The content of the message can be either a user-defined message that makes use of Expression Language or\nthe contents of the FlowFile can be sent as the message. If sending a user-defined message, the contents of the FlowFile may also be optionally uploaded as\na file attachment.\n
FetchSmb	Community	Unstruct	SMB	✓		Fetches files from a SMB Share. Designed to be used in tandem with ListSmb.
GetSmbFile	Community	Unstruct	SMB	✓		Reads file from a samba network location to FlowFiles. Use this processor instead of a cifs mounts if share access control is important. Configure the Hostname, Share and Directory accordingly: \\\\Hostname\\Share\\path\\to\\Directory
ListSmb	Community	Unstruct	SMB	✓		Lists concrete files shared via SMB protocol. Each listed file may result in one flowfile, the metadata being written as flowfile attributes. Or - in case the 'Record Writer' property is set - the entire result is written as records to a single flowfile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.
PutSmbFile	Community	Unstruct	SMB		✓	Writes the contents of a FlowFile to a samba network location. Use this processor instead of a cifs mounts if share access control is important.Configure the Hostname, Share and Directory accordingly: \\\\Hostname\\Share\\path\\to\\Directory
ConsumeTwitter	Community	Unstruct	Twitter / X	✓		Streams tweets from Twitter's streaming API v2. The stream provides a sample stream or a search stream based on previously uploaded rules. This processor also provides a pass through for certain fields of the tweet to be returned as part of the response. See https://developer.twitter.com/en/docs/twitter-api/data-dictionary/introduction for more information regarding the Tweet object model.
GetSplunk	Community	Struct	Splunk	✓		Retrieves data from Splunk Enterprise.
PutSplunk	Community	Struct	Splunk		✓	Sends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. If a Message Delimiter is provided, then this processor will read messages from the incoming FlowFile based on the delimiter, and send each message to Splunk. If a Message Delimiter is not provided then the content of the FlowFile will be sent directly to Splunk as if it were a single message.
PutSplunkHTTP	Community	Struct	Splunk		✓	Sends flow file content to the specified Splunk server over HTTP or HTTPS. Supports HEC Index Acknowledgement.
QuerySplunkIndexingStatus	Community	Struct	Splunk	✓		Queries Splunk server in order to acquire the status of indexing acknowledgement.
ExecuteSQL	Community	Struct	JDBC	✓	✓	Executes provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected.
ExecuteSQLRecord	Community	Struct	JDBC	✓	✓	Executes provided SQL select query. Query result will be converted to the format specified by a Record Writer. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected.
FetchFTP	Community	Unstruct	FTP	✓		Fetches the content of a file from a remote FTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.
FetchFile	Community	Unstruct	Local file	✓		Reads the contents of a file from disk and streams it into the contents of an incoming FlowFile. Once this is done, the file is optionally moved elsewhere or deleted to help keep the file system organized.
FetchSFTP	Community	Unstruct	SFTP	✓		Fetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.
GenerateTableFetch	Community	Struct	JDBC	✓		Generates SQL select queries that fetch \pages\" of rows from a table. The partition size property along with the table's row count determine the size and number of pages and generated FlowFiles. In addition incremental fetching can be achieved by setting Maximum-Value Columns which causes the processor to track the columns' maximum values thus only fetching rows whose columns' values exceed the observed maximums. This processor is intended to be run on the Primary Node only.\n\nThis processor can accept incoming connections; the behavior of the processor is different whether incoming connections are provided:\n - If no incoming connection(s) are specified the processor will generate SQL queries on the specified processor schedule. Expression Language is supported for many fields but no FlowFile attributes are available. However the properties will be evaluated using the Environment/System properties.\n - If incoming connection(s) are specified and no FlowFile is available to a processor task no work will be performed.\n - If incoming connection(s) are specified and a FlowFile is available to a processor task the FlowFile's attributes may be used in Expression Language for such fields as Table Name and others. However the Max-Value Columns and Columns to Return fields must be empty or refer to columns that are available in each specified table.
GetFTP	Community	Unstruct	FTP	✓		Fetches files from an FTP Server and creates FlowFiles from them
GetFile	Community	Unstruct	Local file	✓		Creates FlowFiles from files in a directory. NiFi will ignore files it doesn't have at least read permissions for.
GetSFTP	Community	Unstruct	SFTP	✓		Fetches files from an SFTP Server and creates FlowFiles from them
HandleHttpRequest	Community	Unstruct	HTTP	✓	✓	Starts an HTTP Server and listens for HTTP Requests. For each request, creates a FlowFile and transfers to 'success'. This Processor is designed to be used in conjunction with the HandleHttpResponse Processor in order to create a Web Service. In case of a multipart request, one FlowFile is generated for each part.
HandleHttpResponse	Community	Unstruct	HTTP	✓	✓	Sends an HTTP Response to the Requestor that generated a FlowFile. This Processor is designed to be used in conjunction with the HandleHttpRequest in order to create a web service.
InvokeHTTP	Community	Unstruct	HTTP	✓	✓	An HTTP client processor which can interact with a configurable HTTP Endpoint. The destination URL and HTTP Method are configurable. When the HTTP Method is PUT, POST or PATCH, the FlowFile contents are included as the body of the request and FlowFile attributes are converted to HTTP headers, optionally, based on configuration properties.
ListDatabaseTables	Community	Struct	JDBC	✓		Generates a set of flow files, each containing attributes corresponding to metadata about a table from a database connection. Once metadata about a table has been fetched, it will not be fetched again until the Refresh Interval (if set) has elapsed, or until state has been manually cleared.
ListFTP	Community	Unstruct	FTP	✓		Performs a listing of the files residing on an FTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchFTP in order to fetch those files.
ListFile	Community	Unstruct	Local file	✓		Retrieves a listing of files from the input directory. For each file listed, creates a FlowFile that represents the file so that it can be fetched in conjunction with FetchFile. This Processor is designed to run on Primary Node only in a cluster when 'Input Directory Location' is set to 'Remote'. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all the data. When 'Input Directory Location' is 'Local', the 'Execution' mode can be anything, and synchronization won't happen. Unlike GetFile, this Processor does not delete any data from the local filesystem.
ListSFTP	Community	Unstruct	SFTP	✓		Performs a listing of the files residing on an SFTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchSFTP in order to fetch those files.
ListenFTP	Community	Unstruct	FTP	✓		Starts an FTP server that listens on the specified port and transforms incoming files into FlowFiles. The URI of the service will be ftp://{hostname}:{port}. The default port is 2221.
ListenHTTP	Community	Unstruct	HTTP	✓		Starts an HTTP Server and listens on a given base path to transform incoming requests into FlowFiles. The default URI of the Service will be http://{hostname}:{port}/contentListener. Only HEAD and POST requests are supported. GET, PUT, DELETE, OPTIONS and TRACE will result in an error and the HTTP response status code 405; CONNECT will also result in an error and the HTTP response status code 400. GET is supported on /healthcheck. If the service is available, it returns \200 OK\" with the content \"OK\". The health check functionality can be configured to be accessible via a different port. For details see the documentation of the \"Listening Port for health check requests\" property.A Record Reader and Record Writer property can be enabled on the processor to process incoming requests as records. Record processing is not allowed for multipart requests and request in FlowFileV3 format (minifi)."
ListenSyslog	Community	Struct	Syslog	✓		Listens for Syslog messages being sent to a given port over TCP or UDP. Incoming messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The format of each message is: ()(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The timestamp can be an RFC5424 timestamp with a format of \yyyy-MM-dd'T'HH:mm:ss.SZ\" or \"yyyy-MM-dd'T'HH:mm:ss.S+hh:mm\" or it can be an RFC3164 timestamp with a format of \"MMM d HH:mm:ss\". If an incoming messages matches one of these patterns the message will be parsed and the individual pieces will be placed in FlowFile attributes with the original message in the content of the FlowFile. If an incoming message does not match one of these patterns it will not be parsed and the syslog.valid attribute will be set to false with the original message in the content of the FlowFile. Valid messages will be transferred on the success relationship and invalid messages will be transferred on the invalid relationship.
ListenTCP	Community	Unstruct	TCP	✓		Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. The default behavior is for each message to produce a single FlowFile, however this can be controlled by increasing the Batch Size to a larger value for higher throughput. The Receive Buffer Size must be set as large as the largest messages expected to be received, meaning if every 100kb there is a line separator, then the Receive Buffer Size must be greater than 100kb. The processor can be configured to use an SSL Context Service to only allow secure connections. When connected clients present certificates for mutual TLS authentication, the Distinguished Names of the client certificate's issuer and subject are added to the outgoing FlowFiles as attributes. The processor does not perform authorization based on Distinguished Name values, but since these values are attached to the outgoing FlowFiles, authorization can be implemented based on these attributes.
ListenUDP	Community	Unstruct	UDP	✓		Listens for Datagram Packets on a given port. The default behavior produces a FlowFile per datagram, however for higher throughput the Max Batch Size property may be increased to specify the number of datagrams to batch together in a single FlowFile. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports.
ListenUDPRecord	Community	Struct	UDP	✓		Listens for Datagram Packets on a given port and reads the content of each datagram using the configured Record Reader. Each record will then be written to a flow file using the configured Record Writer. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports.
PutEmail	Community	Unstruct	Email		✓	Sends an e-mail to configured recipients for each incoming FlowFile
PutFTP	Community	Unstruct	FTP		✓	Sends FlowFiles to an FTP Server
PutFile	Community	Unstruct	Local file		✓	Writes the contents of a FlowFile to the local file system
PutRecord	Community	Struct	JDBC		✓	The PutRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file, and sends them to a destination specified by a Record Destination Service (i.e. record sink).
PutSFTP	Community	Unstruct	SFTP		✓	Sends FlowFiles to an SFTP Server
PutSQL	Community	Struct	JDBC		✓	Executes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute. The SQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.
PutSyslog	Community	Struct	Syslog		✓	Sends Syslog messages to a given host and port over TCP or UDP. Messages are constructed from the \Message ___\" properties of the processor which can use expression language to generate messages from incoming FlowFiles. The properties are used to construct messages of the form: ()(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The constructed messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The timestamp can be an RFC5424 timestamp with a format of \"yyyy-MM-dd'T'HH:mm:ss.SZ\" or \"yyyy-MM-dd'T'HH:mm:ss.S+hh:mm\" or it can be an RFC3164 timestamp with a format of \"MMM d HH:mm:ss\". If a message is constructed that does not form a valid Syslog message according to the above description then it is routed to the invalid relationship. Valid messages are sent to the Syslog server and successes are routed to the success relationship failures routed to the failure relationship.
PutTCP	Community	Unstruct	TCP		✓	Sends serialized FlowFiles or Records over TCP to a configurable destination with optional support for TLS
PutUDP	Community	Unstruct	UDP		✓	The PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. The user must ensure that the FlowFile content being fed to this processor is not larger than the maximum size for the underlying UDP transport. The maximum transport size will vary based on the platform setup but is generally just under 64KB. FlowFiles will be marked as failed if their content is larger than the maximum transport size.
QueryDatabaseTable	Community	Struct	JDBC	✓		Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to Avro format. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected.
QueryDatabaseTableRecord	Community	Struct	JDBC	✓		Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to the format specified by the record writer. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected.
QueryRecord	Community	Struct	JDBC	✓		Evaluates one or more SQL queries against the contents of a FlowFile. The result of the SQL query then becomes the content of the output FlowFile. This can be used, for example, for field-specific filtering, transformation, and row-level filtering. Columns can be renamed, simple calculations and aggregations performed, etc. The Processor is configured with a Record Reader Controller Service and a Record Writer service so as to allow flexibility in incoming and outgoing data formats. The Processor must be configured with at least one user-defined property. The name of the Property is the Relationship to route data to, and the value of the Property is a SQL SELECT statement that is used to specify how input data should be transformed/filtered. The SQL statement must be valid ANSI SQL and is powered by Apache Calcite. If the transformation fails, the original FlowFile is routed to the 'failure' relationship. Otherwise, the data selected will be routed to the associated relationship. If the Record Writer chooses to inherit the schema from the Record, it is important to note that the schema that is inherited will be from the ResultSet, rather than the input Record. This allows a single instance of the QueryRecord processor to have multiple queries, each of which returns a different set of columns and aggregations. As a result, though, the schema that is derived will have no schema name, so it is important that the configured Record Writer not attempt to write the Schema Name as an attribute if inheriting the Schema from the Record. See the Processor Usage documentation for more information.
UpdateDatabaseTable	Community	Struct	JDBC		✓	This processor uses a JDBC connection and incoming records to generate any database table changes needed to support the incoming records. It expects a 'flat' record layout, meaning none of the top-level record fields has nested fields that are intended to become columns themselves.
ConnectWebSocket	Community	Unstruct	Websocket	✓	✓	Acts as a WebSocket client endpoint to interact with a remote WebSocket server. FlowFiles are transferred to downstream relationships according to received message types as WebSocket client configured with this processor receives messages from remote WebSocket server. If a new flowfile is passed to the processor, the previous sessions will be closed and any data being sent will be aborted.
ListenWebSocket	Community	Unstruct	Websocket	✓		Acts as a WebSocket server endpoint to accept client connections. FlowFiles are transferred to downstream relationships according to received message types as the WebSocket server configured with this processor receives client requests
PutWebSocket	Community	Unstruct	Websocket		✓	Sends messages to a WebSocket remote endpoint using a WebSocket session that is established by either ListenWebSocket or ConnectWebSocket.
GetWorkdayReport	Community	Struct	Workday	✓		A processor which can interact with a configurable Workday Report. The processor can forward the content without modification, or you can transform it by providing the specific Record Reader and Record Writer services based on your needs. You can also remove fields by defining schema in the Record Writer. Supported Workday report formats are: csv, simplexml, json
GetZendesk	Community	Struct	Zendesk	✓		Incrementally fetches data from Zendesk API.
PutZendeskTicket	Community	Struct	Zendesk		✓	Create Zendesk tickets using the Zendesk API.