Connector processors
The sortable table below provides the Name and Description of the Datavolo runtime included processors which:
- Are Datavolo exclusive or Apache NiFi community sourced
- Support Structured or Unstructured data formats
- Are aligned to a specific or generic System / Protocol
- Function as Readable sources and/or Writeable targets
This table of connector processors is also available as a downloadable PDF document. The full list of processors can be found in the Processor documentation.
Processor Name | Datavolo or NiFi | Struct or Unstruct | System / Protocol | R | W | Description |
---|---|---|---|---|---|---|
PutIcebergTable | Datavolo | Struct | Snowflake | ✓ | Store records in Iceberg using configurable Catalog for managing namespaces and tables. | |
PutSnowflakeInternalStageFile | Datavolo | Unstruct | Snowflake | ✓ | Puts files into a Snowflake internal stage. The internal stage must be created in the Snowflake account beforehand. This processor can be connected to a StartSnowflakeIngest processor to ingest the file in the internal stage. | |
DeleteDBFSResource | Datavolo | Unstruct | Databricks | ✓ | Delete a DBFS files and directories. | |
GetDBFSFile | Datavolo | Unstruct | Databricks | ✓ | Read a DBFS file. | |
ListDBFSDirectory | Datavolo | Unstruct | Databricks | ✓ | List file names in a DBFS directory and output a new FlowFile with the filename. | |
PutDBFSFile | Datavolo | Unstruct | Databricks | ✓ | Write FlowFile content to DBFS. | |
DeleteUnityCatalogResource | Datavolo | Unstruct | Databricks | ✓ | Delete a Unity Catalog file or directory. | |
GetUnityCatalogFile | Datavolo | Unstruct | Databricks | ✓ | Read a Unity Catalog file up to 5 GiB. | |
GetUnityCatalogFileMetadata | Datavolo | Unstruct | Databricks | ✓ | Checks for Unity Catalog file metadata. | |
ListUnityCatalogDirectory | Datavolo | Unstruct | Databricks | ✓ | List file names in a Unity Catalog directory and output a new FlowFile with the filename. | |
PutUnityCatalogFile | Datavolo | Unstruct | Databricks | ✓ | Write FlowFile content with max size of 5 GiB to Unity Catalog. | |
PutDatabricksSQL | Datavolo | Struct | Databricks | ✓ | ✓ | Submit a SQL Execution using Databricks REST API then write the JSON response to FlowFile Content. For high performance SELECT or INSERT queries use ExecuteSQL instead. |
CaptureGoogleDriveChanges | Datavolo | Unstruct | Google Drive | ✓ | Captures changes to a Shared Google Drive and emits a FlowFile for each change that occurs. This includes addition and deletion of files,\nas well as changes to file metadata and permissions. The processor is designed to be used in conjunction with the FetchGoogleDrive processor.\n | |
PutHubSpot | Datavolo | Struct | Hubspot | ✓ | Upsert a HubSpot object. | |
CaptureSharepointChanges | Datavolo | Unstruct | Microsoft Sharepoint | ✓ | Captures changes from a Sharepoint Document Library and emits a FlowFile for each change that occurs. This includes additions and deletions of files and folders, as well as changes to permissions, metadata, and file content. | |
FetchSharepointFile | Datavolo | Unstruct | Microsoft Sharepoint | ✓ | Fetches the contents of a file from a Sharepoint Drive, optionally downloading a PDF or HTML version of the file when applicable.\nAny FlowFile that represents a Sharepoint folder will be routed to success without fetching contents.\n | |
DeletePinecone | Datavolo | Struct | Pinecone | ✓ | Deletes vectors from a Pinecone index. | |
QueryPinecone | Datavolo | Struct | Pinecone | ✓ | Queries Pinecone for vectors that are similar to the input vector, or retrieves a vector by ID. | |
UpsertPinecone | Datavolo | Struct | Pinecone | ✓ | Publishes vectors, including metadata, and optionally text, to a Pinecone index. | |
PutVectaraDocument | Datavolo | Unstruct | Vectara | ✓ | Generate and upload a JSON document to Vectara's upload endpoint. The input text can be JSON Object, JSON Array, or JSONL format. | |
PutVectaraFile | Datavolo | Unstruct | Vectara | ✓ | Upload a FlowFile content to Vectara's index endpoint. Document filter attributes and metadata attributes can be set by referencing FlowFile attributes. | |
ConsumeAMQP | Community | Unstruct | AMQP | ✓ | Consumes AMQP Messages from an AMQP Broker using the AMQP 0.9.1 protocol. Each message that is received from the AMQP Broker will be emitted as its own FlowFile to the 'success' relationship. | |
PublishAMQP | Community | Unstruct | AMQP | ✓ | Creates an AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange. In a typical AMQP exchange model, the message that is sent to the AMQP Exchange will be routed based on the 'Routing Key' to its final destination in the queue (the binding). If due to some misconfiguration the binding between the Exchange, Routing Key and Queue is not set up, the message will have no final destination and will return (i.e., the data will not make it to the queue). If that happens you will see a log in both app-log and bulletin stating to that effect, and the FlowFile will be routed to the 'failure' relationship. | |
PutCloudWatchMetric | Community | Struct | AWS Cloudwatch | ✓ | Publishes metrics to Amazon CloudWatch. Metric can be either a single value, or a StatisticSet comprised of minimum, maximum, sum and sample count. | |
DeleteDynamoDB | Community | Struct | AWS Dynamo DB | ✓ | Deletes a document from DynamoDB based on hash and range key. The key can be string or number. The request requires all the primary keys for the operation (hash or hash and range key) | |
GetDynamoDB | Community | Struct | AWS Dynamo DB | ✓ | Retrieves a document from DynamoDB based on hash and range key. The key can be string or number.For any get request all the primary keys are required (hash or hash and range based on the table keys).A Json Document ('Map') attribute of the DynamoDB item is read into the content of the FlowFile. | |
PutDynamoDB | Community | Struct | AWS Dynamo DB | ✓ | Puts a document from DynamoDB based on hash and range key. The table can have either hash and range or hash key alone. Currently the keys supported are string and number and value can be json document. In case of hash and range keys both key are required for the operation. The FlowFile content must be JSON. FlowFile content is mapped to the specified Json Document attribute in the DynamoDB item. | |
PutDynamoDBRecord | Community | Struct | AWS Dynamo DB | ✓ | Inserts items into DynamoDB based on record-oriented data. The record fields are mapped into DynamoDB item fields, including partition and sort keys if set. Depending on the number of records the processor might execute the insert in multiple chunks in order to overcome DynamoDB's limitation on batch writing. This might result partially processed FlowFiles in which case the FlowFile will be transferred to the \unprocessed\" relationship with the necessary attribute to retry later without duplicating the already executed inserts." | |
PutKinesisFirehose | Community | Unstruct | AWS Kinesis | ✓ | Sends the contents to a specified Amazon Kinesis Firehose. In order to send data to firehose, the firehose delivery stream name has to be specified. | |
ConsumeKinesisStream | Community | Unstruct | AWS Kinesis | ✓ | Reads data from the specified AWS Kinesis stream and outputs a FlowFile for every processed Record (raw) or a FlowFile for a batch of processed records if a Record Reader and Record Writer are configured. At-least-once delivery of all Kinesis Records within the Stream while the processor is running. AWS Kinesis Client Library can take several seconds to initialise before starting to fetch data. Uses DynamoDB for check pointing and CloudWatch (optional) for metrics. Ensure that the credentials provided have access to DynamoDB and CloudWatch (optional) along with Kinesis. | |
PutKinesisStream | Community | Unstruct | AWS Kinesis | ✓ | Sends the contents to a specified Amazon Kinesis. In order to send data to Kinesis, the stream name has to be specified. | |
PutLambda | Community | Unstruct | AWS Lambda | ✓ | Sends the contents to a specified Amazon Lambda Function. The AWS credentials used for authentication must have permissions execute the Lambda function (lambda:InvokeFunction).The FlowFile content must be JSON. | |
DeleteS3Object | Community | Unstruct | AWS S3 | ✓ | Deletes a file from an Amazon S3 Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success. | |
FetchS3Object | Community | Unstruct | AWS S3 | ✓ | Retrieves the contents of an S3 Object and writes it to the content of a FlowFile | |
ListS3 | Community | Unstruct | AWS S3 | ✓ | Retrieves a listing of objects from an S3 bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchS3Object. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. | |
PutS3Object | Community | Unstruct | AWS S3 | ✓ | Writes the contents of a FlowFile as an S3 Object to an Amazon S3 Bucket. | |
TagS3Object | Community | Unstruct | AWS S3 | ✓ | Adds or updates a tag on an Amazon S3 Object. | |
PutSNS | Community | Struct | AWS SNS | ✓ | Sends the content of a FlowFile as a notification to the Amazon Simple Notification Service | |
DeleteSQS | Community | Struct | AWS SQS | ✓ | Deletes a message from an Amazon Simple Queuing Service Queue | |
GetSQS | Community | Struct | AWS SQS | ✓ | Fetches messages from an Amazon Simple Queuing Service Queue | |
PutSQS | Community | Struct | AWS SQS | ✓ | Publishes a message to an Amazon Simple Queuing Service Queue | |
PutAzureCosmosDBRecord | Community | Struct | Azure CosmosDB | ✓ | This processor is a record-aware processor for inserting data into Cosmos DB with Core SQL API. It uses a configured record reader and schema to read an incoming record set from the body of a Flowfile and then inserts those records into a configured Cosmos DB Container. | |
PutAzureDataExplorer | Community | Struct | Azure Data Explorer | ✓ | Acts as an Azure Data Explorer sink which sends FlowFiles to the provided endpoint. Data can be sent through queued ingestion or streaming ingestion to the Azure Data Explorer cluster. | |
QueryAzureDataExplorer | Community | Struct | Azure Data Explorer | ✓ | Query Azure Data Explorer and stream JSON results to output FlowFiles | |
ConsumeAzureEventHub | Community | Unstruct | Azure Event Hub | ✓ | Receives messages from Microsoft Azure Event Hubs with checkpointing to ensure consistent event processing. Checkpoint tracking avoids consuming a message multiple times and enables reliable resumption of processing in the event of intermittent network failures. Checkpoint tracking requires external storage and provides the preferred approach to consuming messages from Azure Event Hubs. In clustered environment, ConsumeAzureEventHub processor instances form a consumer group and the messages are distributed among the cluster nodes (each message is processed on one cluster node only). | |
GetAzureEventHub | Community | Unstruct | Azure Event Hub | ✓ | Receives messages from Microsoft Azure Event Hubs without reliable checkpoint tracking. In clustered environment, GetAzureEventHub processor instances work independently and all cluster nodes process all messages (unless running the processor in Primary Only mode). ConsumeAzureEventHub offers the recommended approach to receiving messages from Azure Event Hubs. This processor creates a thread pool for connections to Azure Event Hubs. | |
PutAzureEventHub | Community | Unstruct | Azure Event Hub | ✓ | Send FlowFile contents to Azure Event Hubs | |
CopyAzureBlobStorage_v12 | Community | Unstruct | Azure Blob Storage | ✓ | Copies a blob in Azure Blob Storage from one account/container to another. The processor uses Azure Blob Storage client library v12. | |
DeleteAzureBlobStorage_v12 | Community | Unstruct | Azure Blob Storage | ✓ | Deletes the specified blob from Azure Blob Storage. The processor uses Azure Blob Storage client library v12. | |
DeleteAzureDataLakeStorage | Community | Unstruct | Azure DataLake Storage | ✓ | Deletes the provided file from Azure Data Lake Storage | |
FetchAzureBlobStorage_v12 | Community | Unstruct | Azure Blob Storage | ✓ | Retrieves the specified blob from Azure Blob Storage and writes its content to the content of the FlowFile. The processor uses Azure Blob Storage client library v12. | |
FetchAzureDataLakeStorage | Community | Unstruct | Azure DataLake Storage | ✓ | Fetch the specified file from Azure Data Lake Storage | |
ListAzureBlobStorage_v12 | Community | Unstruct | Azure Blob Storage | ✓ | Lists blobs in an Azure Blob Storage container. Listing details are attached to an empty FlowFile for use with FetchAzureBlobStorage. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. The processor uses Azure Blob Storage client library v12. | |
ListAzureDataLakeStorage | Community | Unstruct | Azure DataLake Storage | ✓ | Lists directory in an Azure Data Lake Storage Gen 2 filesystem | |
MoveAzureDataLakeStorage | Community | Unstruct | Azure DataLake Storage | ✓ | Moves content within an Azure Data Lake Storage Gen 2. After the move, files will be no longer available on source location. | |
PutAzureBlobStorage_v12 | Community | Unstruct | Azure Blob Storage | ✓ | Puts content into a blob on Azure Blob Storage. The processor uses Azure Blob Storage client library v12. | |
PutAzureDataLakeStorage | Community | Unstruct | Azure DataLake Storage | ✓ | Writes the contents of a FlowFile as a file on Azure Data Lake Storage Gen 2 | |
GetAzureQueueStorage_v12 | Community | Unstruct | Azure Queue Storage | ✓ | Retrieves the messages from an Azure Queue Storage. The retrieved messages will be deleted from the queue by default. If the requirement is to consume messages without deleting them, set 'Auto Delete Messages' to 'false'. Note: There might be chances of receiving duplicates in situations like when a message is received but was unable to be deleted from the queue due to some unexpected situations. | |
PutAzureQueueStorage_v12 | Community | Unstruct | Azure Queue Storage | ✓ | Writes the content of the incoming FlowFiles to the configured Azure Queue Storage. | |
FetchBoxFile | Community | Unstruct | Box | ✓ | Fetches files from a Box Folder. Designed to be used in tandem with ListBoxFile. | |
ListBoxFile | Community | Unstruct | Box | ✓ | Lists files in a Box folder. Each listed file may result in one FlowFile, the metadata being written as FlowFile attributes. Or - in case the 'Record Writer' property is set - the entire result is written as records to a single FlowFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. | |
PutBoxFile | Community | Unstruct | Box | ✓ | Puts content to a Box folder. | |
FetchDropbox | Community | Unstruct | Dropbox | ✓ | Fetches files from Dropbox. Designed to be used in tandem with ListDropbox. | |
ListDropbox | Community | Unstruct | Dropbox | ✓ | Retrieves a listing of files from Dropbox (shortcuts are ignored). Each listed file may result in one FlowFile, the metadata being written as FlowFile attributes. When the 'Record Writer' property is set, the entire result is written as records to a single FlowFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. | |
PutDropbox | Community | Unstruct | Dropbox | ✓ | Puts content to a Dropbox folder. | |
ConsumeElasticsearch | Community | Struct | Elastic | ✓ | A processor that repeatedly runs a paginated query against a field using a Range query to consume new Documents from an Elasticsearch index/query. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached, after which the Range query will automatically update the field constraint based on the last retrieved Document value. | |
DeleteByQueryElasticsearch | Community | Struct | Elastic | ✓ | Delete from an Elasticsearch index using a query. The query can be loaded from a flowfile body or from the Query parameter. | |
GetElasticsearch | Community | Struct | Elastic | ✓ | Elasticsearch get processor that uses the official Elastic REST client libraries to fetch a single document from Elasticsearch by _id. Note that the full body of the document will be read into memory before being written to a FlowFile for transfer. | |
JsonQueryElasticsearch | Community | Struct | Elastic | ✓ | A processor that allows the user to run a query (with aggregations) written with the Elasticsearch JSON DSL. It does not automatically paginate queries for the user. If an incoming relationship is added to this processor, it will use the flowfile's content for the query. Care should be taken on the size of the query because the entire response from Elasticsearch will be loaded into memory all at once and converted into the resulting flowfiles. | |
PaginatedJsonQueryElasticsearch | Community | Struct | Elastic | ✓ | A processor that allows the user to run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. It will use the flowfile's content for the query unless the QUERY attribute is populated. Search After/Point in Time queries must include a valid \sort\" field." | |
PutElasticsearchJson | Community | Struct | Elastic | ✓ | An Elasticsearch put processor that uses the official Elastic REST client libraries. Each FlowFile is treated as a document to be sent to the Elasticsearch _bulk API. Multiple FlowFiles can be batched together into each Request sent to Elasticsearch. | |
PutElasticsearchRecord | Community | Struct | Elastic | ✓ | A record-aware Elasticsearch put processor that uses the official Elastic REST client libraries. Each Record within the FlowFile is converted into a document to be sent to the Elasticsearch _bulk APi. Multiple documents can be batched into each Request sent to Elasticsearch. Each document's Bulk operation can be configured using Record Path expressions. | |
SearchElasticsearch | Community | Struct | Elastic | ✓ | A processor that allows the user to repeatedly run a paginated query (with aggregations) written with the Elasticsearch JSON DSL. Search After/Point in Time queries must include a valid \sort\" field. The processor will retrieve multiple pages of results until either no more results are available or the Pagination Keep Alive expiration is reached after which the query will restart with the first page of results being retrieved. | |
UpdateByQueryElasticsearch | Community | Struct | Elastic | ✓ | Update documents in an Elasticsearch index using a query. The query can be loaded from a flowfile body or from the Query parameter. The loaded Query can contain any JSON accepted by Elasticsearch's _update_by_query API, for example a \query\" object to identify what documents are to be updated lus a \"script\" to define the updates to perform. | |
ConsumeIMAP | Community | Unstruct | IMAP | ✓ | Consumes messages from Email Server using IMAP protocol. The raw-bytes of each received email message are written as contents of the FlowFile | |
ConsumePOP3 | Community | Unstruct | POP3 | ✓ | Consumes messages from Email Server using POP3 protocol. The raw-bytes of each received email message are written as contents of the FlowFile | |
PutBigQuery | Community | Struct | Google BigQuery | ✓ | Writes the contents of a FlowFile to a Google BigQuery table. The processor is record based so the schema that is used is driven by the RecordReader. Attributes that are not matched to the target schema are skipped. Exactly once delivery semantics are achieved via stream offsets. | |
FetchGoogleDrive | Community | Unstruct | Google Drive | ✓ | Fetches files from a Google Drive Folder. Designed to be used in tandem with ListGoogleDrive. Please see Additional Details to set up access to Google Drive. | |
ListGoogleDrive | Community | Unstruct | Google Drive | ✓ | Performs a listing of concrete files (shortcuts are ignored) in a Google Drive folder. If the 'Record Writer' property is set, a single Output FlowFile is created, and each file in the listing is written as a single record to the output file. Otherwise, for each file in the listing, an individual FlowFile is created, the metadata being written as FlowFile attributes. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Please see Additional Details to set up access to Google Drive. | |
PutGoogleDrive | Community | Unstruct | Google Drive | ✓ | Writes the contents of a FlowFile as a file in Google Drive. | |
ConsumeGCPubSub | Community | Unstruct | Google Drive | ✓ | Consumes message from the configured Google Cloud PubSub subscription. If the 'Batch Size' is set, the configured number of messages will be pulled in a single request, else only one message will be pulled. | |
PublishGCPubSub | Community | Unstruct | Google PubSub | ✓ | Publishes the content of the incoming flowfile to the configured Google Cloud PubSub topic. The processor supports dynamic properties. If any dynamic properties are present, they will be sent along with the message in the form of 'attributes'. | |
ConsumeGCPubSubLite | Community | Unstruct | Google PubSub Lite | ✓ | Consumes message from the configured Google Cloud PubSub Lite subscription. | |
PublishGCPubSubLite | Community | Unstruct | Google PubSub Lite | ✓ | Publishes the content of the incoming FlowFile to the configured Google Cloud PubSub Lite topic. The processor supports dynamic properties. If any dynamic properties are present, they will be sent along with the message in the form of 'attributes'. | |
DeleteGCSObject | Community | Unstruct | Google Cloud Storage | ✓ | Deletes objects from a Google Cloud Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success. | |
FetchGCSObject | Community | Unstruct | Google Cloud Storage | ✓ | Fetches a file from a Google Cloud Bucket. Designed to be used in tandem with ListGCSBucket. | |
ListGCSBucket | Community | Unstruct | Google Cloud Storage | ✓ | Retrieves a listing of objects from a GCS bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchGCSObject. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. | |
PutGCSObject | Community | Unstruct | Google Cloud Storage | ✓ | Writes the contents of a FlowFile as an object in a Google Cloud Storage. | |
GetGcpVisionAnnotateFilesOperationStatus | Community | Unstruct | Google Vision | ✓ | Retrieves the current status of an Google Vision operation. | |
GetGcpVisionAnnotateImagesOperationStatus | Community | Unstruct | Google Vision | ✓ | Retrieves the current status of an Google Vision operation. | |
StartGcpVisionAnnotateFilesOperation | Community | Unstruct | Google Vision | ✓ | ✓ | Trigger a Vision operation on file input. It should be followed by GetGcpVisionAnnotateFilesOperationStatus processor in order to monitor operation status. |
StartGcpVisionAnnotateImagesOperation | Community | Unstruct | Google Vision | ✓ | ✓ | Trigger a Vision operation on image input. It should be followed by GetGcpVisionAnnotateImagesOperationStatus processor in order to monitor operation status. |
GetHubSpot | Community | Struct | Hubspot | ✓ | Retrieves JSON data from a private HubSpot application. This processor is intended to be run on the Primary Node only. | |
ConsumeJMS | Community | Unstruct | JMS | ✓ | Consumes JMS Message of type BytesMessage, TextMessage, ObjectMessage, MapMessage or StreamMessage transforming its content to a FlowFile and transitioning it to 'success' relationship. JMS attributes such as headers and properties will be copied as FlowFile attributes. MapMessages will be transformed into JSONs and then into byte arrays. The other types will have their raw contents as byte array transferred into the flowfile. | |
PublishJMS | Community | Unstruct | JMS | ✓ | Creates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage or TextMessage. FlowFile attributes will be added as JMS headers and/or properties to the outgoing JMS message. | |
ConsumeKafka | Community | Unstruct | Kafka | ✓ | Consumes messages from Apache Kafka Consumer API. The complementary NiFi processor for sending messages is PublishKafka. The Processor supports consumption of Kafka messages, optionally interpreted as NiFi records. Please note that, at this time (in read record mode), the Processor assumes that all records that are retrieved from a given partition have the same schema. For this mode, if any of the Kafka messages are pulled but cannot be parsed or written with the configured Record Reader or Record Writer, the contents of the message will be written to a separate FlowFile, and that FlowFile will be transferred to the 'parse.failure' relationship. Otherwise, each FlowFile is sent to the 'success' relationship and may contain many individual messages within the single FlowFile. A 'record.count' attribute is added to indicate how many messages are contained in the FlowFile. No two Kafka messages will be placed into the same FlowFile if they have different schemas, or if they have different values for a message header that is included by the | |
PublishKafka | Community | Unstruct | Kafka | ✓ | Sends the contents of a FlowFile as either a message or as individual records to Apache Kafka using the Kafka Producer API. The messages to send may be individual FlowFiles, may be delimited using a user-specified delimiter (such as a new-line), or may be record-oriented data that can be read by the configured Record Reader. The complementary NiFi processor for fetching messages is ConsumeKafka. | |
DeleteMongo | Community | Struct | MongoDB | ✓ | Executes a delete query against a MongoDB collection. The query is provided in the body of the flowfile and the user can select whether it will delete one or many documents that match it. | |
GetMongo | Community | Struct | MongoDB | ✓ | Creates FlowFiles from documents in MongoDB loaded by a user-specified query. | |
GetMongoRecord | Community | Struct | MongoDB | ✓ | A record-based version of GetMongo that uses the Record writers to write the MongoDB result set. | |
PutMongo | Community | Struct | MongoDB | ✓ | Writes the contents of a FlowFile to MongoDB | |
PutMongoBulkOperations | Community | Struct | MongoDB | ✓ | Writes the contents of a FlowFile to MongoDB as bulk-update | |
PutMongoRecord | Community | Struct | MongoDB | ✓ | This processor is a record-aware processor for inserting/upserting data into MongoDB. It uses a configured record reader and schema to read an incoming record set from the body of a flowfile and then inserts/upserts batches of those records into a configured MongoDB collection. This processor does not support deletes. The number of documents to insert/upsert at a time is controlled by the \Batch Size\" configuration property. This value should be set to a reasonable size to ensure that MongoDB is not overloaded with too many operations at once." | |
RunMongoAggregation | Community | Struct | MongoDB | ✓ | A processor that runs an aggregation query whenever a flowfile is received. | |
DeleteGridFS | Community | Unstruct | MongoDB | ✓ | Deletes a file from GridFS using a file name or a query. | |
FetchGridFS | Community | Unstruct | MongoDB | ✓ | Retrieves one or more files from a GridFS bucket by file name or by a user-defined query. | |
PutGridFS | Community | Unstruct | MongoDB | ✓ | Writes a file to a GridFS bucket. | |
ConsumeMQTT | Community | Unstruct | MQTT | ✓ | Subscribes to a topic and receives messages from an MQTT broker | |
PublishMQTT | Community | Unstruct | MQTT | ✓ | Publishes a message to an MQTT topic | |
ListenOTLP | Community | Struct | Open Telemetry | ✓ | Collect OpenTelemetry messages over HTTP or gRPC. Supports standard Export Service Request messages for logs, metrics, and traces. Implements OpenTelemetry OTLP Specification 1.0.0 with OTLP/gRPC and OTLP/HTTP. Provides protocol detection using the HTTP Content-Type header. | |
PutRedisHashRecord | Community | Struct | Redis | ✓ | Puts record field data into Redis using a specified hash value, which is determined by a RecordPath to a field in each record containing the hash value. The record fields and values are stored as key/value pairs associated by the hash value. NOTE: Neither the evaluated hash value nor any of the field values can be null. If the hash value is null, the FlowFile will be routed to failure. For each of the field values, if the value is null that field will be not set in Redis. | |
PutSalesforceObject | Community | Struct | Salesforce | ✓ | Creates new records for the specified Salesforce sObject. The type of the Salesforce object must be set in the input flowfile's 'objectType' attribute. This processor cannot update existing records. | |
QuerySalesforceObject | Community | Struct | Salesforce | ✓ | Retrieves records from a Salesforce sObject. Users can add arbitrary filter conditions by setting the 'Custom WHERE Condition' property. The processor can also run a custom query, although record processing is not supported in that case. Supports incremental retrieval: users can define a field in the 'Age Field' property that will be used to determine when the record was created. When this property is set the processor will retrieve new records. Incremental loading and record-based processing are only supported in property-based queries. It's also possible to define an initial cutoff value for the age, filtering out all older records even for the first run. In case of 'Property Based Query' this processor should run on the Primary Node only. FlowFile attribute 'record.count' indicates how many records were retrieved and written to the output. The processor can accept an optional input FlowFile and reference the FlowFile attributes in the query. | |
GetShopify | Community | Struct | Shopify | ✓ | Retrieves objects from a custom Shopify store. The processor yield time must be set to the account's rate limit accordingly. | |
ConsumeSlack | Community | Unstruct | Slack | ✓ | Retrieves messages from one or more configured Slack channels. The messages are written out in JSON format. See Usage / Additional Details for more information about how to configure this Processor and enable it to retrieve messages from Slack. | |
ListenSlack | Community | Unstruct | Slack | ✓ | Retrieves real-time messages or Slack commands from one or more Slack conversations. The messages are written out in JSON format. Note that this Processor should be used to obtain real-time messages and commands from Slack and does not provide a mechanism for obtaining historical messages. The ConsumeSlack Processor should be used for an initial load of messages from a channel. See Usage / Additional Details for more information about how to configure this Processor and enable it to retrieve messages and commands from Slack. | |
PublishSlack | Community | Unstruct | Slack | ✓ | Posts a message to the specified Slack channel. The content of the message can be either a user-defined message that makes use of Expression Language or\nthe contents of the FlowFile can be sent as the message. If sending a user-defined message, the contents of the FlowFile may also be optionally uploaded as\na file attachment.\n | |
FetchSmb | Community | Unstruct | SMB | ✓ | Fetches files from a SMB Share. Designed to be used in tandem with ListSmb. | |
GetSmbFile | Community | Unstruct | SMB | ✓ | Reads file from a samba network location to FlowFiles. Use this processor instead of a cifs mounts if share access control is important. Configure the Hostname, Share and Directory accordingly: \\\\Hostname\\Share\\path\\to\\Directory | |
ListSmb | Community | Unstruct | SMB | ✓ | Lists concrete files shared via SMB protocol. Each listed file may result in one flowfile, the metadata being written as flowfile attributes. Or - in case the 'Record Writer' property is set - the entire result is written as records to a single flowfile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. | |
PutSmbFile | Community | Unstruct | SMB | ✓ | Writes the contents of a FlowFile to a samba network location. Use this processor instead of a cifs mounts if share access control is important.Configure the Hostname, Share and Directory accordingly: \\\\Hostname\\Share\\path\\to\\Directory | |
ConsumeTwitter | Community | Unstruct | Twitter / X | ✓ | Streams tweets from Twitter's streaming API v2. The stream provides a sample stream or a search stream based on previously uploaded rules. This processor also provides a pass through for certain fields of the tweet to be returned as part of the response. See https://developer.twitter.com/en/docs/twitter-api/data-dictionary/introduction for more information regarding the Tweet object model. | |
GetSplunk | Community | Struct | Splunk | ✓ | Retrieves data from Splunk Enterprise. | |
PutSplunk | Community | Struct | Splunk | ✓ | Sends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. If a Message Delimiter is provided, then this processor will read messages from the incoming FlowFile based on the delimiter, and send each message to Splunk. If a Message Delimiter is not provided then the content of the FlowFile will be sent directly to Splunk as if it were a single message. | |
PutSplunkHTTP | Community | Struct | Splunk | ✓ | Sends flow file content to the specified Splunk server over HTTP or HTTPS. Supports HEC Index Acknowledgement. | |
QuerySplunkIndexingStatus | Community | Struct | Splunk | ✓ | Queries Splunk server in order to acquire the status of indexing acknowledgement. | |
ExecuteSQL | Community | Struct | JDBC | ✓ | ✓ | Executes provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected. |
ExecuteSQLRecord | Community | Struct | JDBC | ✓ | ✓ | Executes provided SQL select query. Query result will be converted to the format specified by a Record Writer. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected. |
FetchFTP | Community | Unstruct | FTP | ✓ | Fetches the content of a file from a remote FTP server and overwrites the contents of an incoming FlowFile with the content of the remote file. | |
FetchFile | Community | Unstruct | Local file | ✓ | Reads the contents of a file from disk and streams it into the contents of an incoming FlowFile. Once this is done, the file is optionally moved elsewhere or deleted to help keep the file system organized. | |
FetchSFTP | Community | Unstruct | SFTP | ✓ | Fetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file. | |
GenerateTableFetch | Community | Struct | JDBC | ✓ | Generates SQL select queries that fetch \pages\" of rows from a table. The partition size property along with the table's row count determine the size and number of pages and generated FlowFiles. In addition incremental fetching can be achieved by setting Maximum-Value Columns which causes the processor to track the columns' maximum values thus only fetching rows whose columns' values exceed the observed maximums. This processor is intended to be run on the Primary Node only.\n\nThis processor can accept incoming connections; the behavior of the processor is different whether incoming connections are provided:\n - If no incoming connection(s) are specified the processor will generate SQL queries on the specified processor schedule. Expression Language is supported for many fields but no FlowFile attributes are available. However the properties will be evaluated using the Environment/System properties.\n - If incoming connection(s) are specified and no FlowFile is available to a processor task no work will be performed.\n - If incoming connection(s) are specified and a FlowFile is available to a processor task the FlowFile's attributes may be used in Expression Language for such fields as Table Name and others. However the Max-Value Columns and Columns to Return fields must be empty or refer to columns that are available in each specified table. | |
GetFTP | Community | Unstruct | FTP | ✓ | Fetches files from an FTP Server and creates FlowFiles from them | |
GetFile | Community | Unstruct | Local file | ✓ | Creates FlowFiles from files in a directory. NiFi will ignore files it doesn't have at least read permissions for. | |
GetSFTP | Community | Unstruct | SFTP | ✓ | Fetches files from an SFTP Server and creates FlowFiles from them | |
HandleHttpRequest | Community | Unstruct | HTTP | ✓ | ✓ | Starts an HTTP Server and listens for HTTP Requests. For each request, creates a FlowFile and transfers to 'success'. This Processor is designed to be used in conjunction with the HandleHttpResponse Processor in order to create a Web Service. In case of a multipart request, one FlowFile is generated for each part. |
HandleHttpResponse | Community | Unstruct | HTTP | ✓ | ✓ | Sends an HTTP Response to the Requestor that generated a FlowFile. This Processor is designed to be used in conjunction with the HandleHttpRequest in order to create a web service. |
InvokeHTTP | Community | Unstruct | HTTP | ✓ | ✓ | An HTTP client processor which can interact with a configurable HTTP Endpoint. The destination URL and HTTP Method are configurable. When the HTTP Method is PUT, POST or PATCH, the FlowFile contents are included as the body of the request and FlowFile attributes are converted to HTTP headers, optionally, based on configuration properties. |
ListDatabaseTables | Community | Struct | JDBC | ✓ | Generates a set of flow files, each containing attributes corresponding to metadata about a table from a database connection. Once metadata about a table has been fetched, it will not be fetched again until the Refresh Interval (if set) has elapsed, or until state has been manually cleared. | |
ListFTP | Community | Unstruct | FTP | ✓ | Performs a listing of the files residing on an FTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchFTP in order to fetch those files. | |
ListFile | Community | Unstruct | Local file | ✓ | Retrieves a listing of files from the input directory. For each file listed, creates a FlowFile that represents the file so that it can be fetched in conjunction with FetchFile. This Processor is designed to run on Primary Node only in a cluster when 'Input Directory Location' is set to 'Remote'. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all the data. When 'Input Directory Location' is 'Local', the 'Execution' mode can be anything, and synchronization won't happen. Unlike GetFile, this Processor does not delete any data from the local filesystem. | |
ListSFTP | Community | Unstruct | SFTP | ✓ | Performs a listing of the files residing on an SFTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchSFTP in order to fetch those files. | |
ListenFTP | Community | Unstruct | FTP | ✓ | Starts an FTP server that listens on the specified port and transforms incoming files into FlowFiles. The URI of the service will be ftp://{hostname}:{port}. The default port is 2221. | |
ListenHTTP | Community | Unstruct | HTTP | ✓ | Starts an HTTP Server and listens on a given base path to transform incoming requests into FlowFiles. The default URI of the Service will be http://{hostname}:{port}/contentListener. Only HEAD and POST requests are supported. GET, PUT, DELETE, OPTIONS and TRACE will result in an error and the HTTP response status code 405; CONNECT will also result in an error and the HTTP response status code 400. GET is supported on | |
ListenSyslog | Community | Struct | Syslog | ✓ | Listens for Syslog messages being sent to a given port over TCP or UDP. Incoming messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The format of each message is: ( | |
ListenTCP | Community | Unstruct | TCP | ✓ | Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. The default behavior is for each message to produce a single FlowFile, however this can be controlled by increasing the Batch Size to a larger value for higher throughput. The Receive Buffer Size must be set as large as the largest messages expected to be received, meaning if every 100kb there is a line separator, then the Receive Buffer Size must be greater than 100kb. The processor can be configured to use an SSL Context Service to only allow secure connections. When connected clients present certificates for mutual TLS authentication, the Distinguished Names of the client certificate's issuer and subject are added to the outgoing FlowFiles as attributes. The processor does not perform authorization based on Distinguished Name values, but since these values are attached to the outgoing FlowFiles, authorization can be implemented based on these attributes. | |
ListenUDP | Community | Unstruct | UDP | ✓ | Listens for Datagram Packets on a given port. The default behavior produces a FlowFile per datagram, however for higher throughput the Max Batch Size property may be increased to specify the number of datagrams to batch together in a single FlowFile. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports. | |
ListenUDPRecord | Community | Struct | UDP | ✓ | Listens for Datagram Packets on a given port and reads the content of each datagram using the configured Record Reader. Each record will then be written to a flow file using the configured Record Writer. This processor can be restricted to listening for datagrams from a specific remote host and port by specifying the Sending Host and Sending Host Port properties, otherwise it will listen for datagrams from all hosts and ports. | |
PutEmail | Community | Unstruct | ✓ | Sends an e-mail to configured recipients for each incoming FlowFile | ||
PutFTP | Community | Unstruct | FTP | ✓ | Sends FlowFiles to an FTP Server | |
PutFile | Community | Unstruct | Local file | ✓ | Writes the contents of a FlowFile to the local file system | |
PutRecord | Community | Struct | JDBC | ✓ | The PutRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file, and sends them to a destination specified by a Record Destination Service (i.e. record sink). | |
PutSFTP | Community | Unstruct | SFTP | ✓ | Sends FlowFiles to an SFTP Server | |
PutSQL | Community | Struct | JDBC | ✓ | Executes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute. The SQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. | |
PutSyslog | Community | Struct | Syslog | ✓ | Sends Syslog messages to a given host and port over TCP or UDP. Messages are constructed from the \Message ___\" properties of the processor which can use expression language to generate messages from incoming FlowFiles. The properties are used to construct messages of the form: ( | |
PutTCP | Community | Unstruct | TCP | ✓ | Sends serialized FlowFiles or Records over TCP to a configurable destination with optional support for TLS | |
PutUDP | Community | Unstruct | UDP | ✓ | The PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. The user must ensure that the FlowFile content being fed to this processor is not larger than the maximum size for the underlying UDP transport. The maximum transport size will vary based on the platform setup but is generally just under 64KB. FlowFiles will be marked as failed if their content is larger than the maximum transport size. | |
QueryDatabaseTable | Community | Struct | JDBC | ✓ | Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to Avro format. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected. | |
QueryDatabaseTableRecord | Community | Struct | JDBC | ✓ | Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima. Query result will be converted to the format specified by the record writer. Expression Language is supported for several properties, but no incoming connections are permitted. The Environment/System properties may be used to provide values for any property containing Expression Language. If it is desired to leverage flow file attributes to perform these queries, the GenerateTableFetch and/or ExecuteSQL processors can be used for this purpose. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer or cron expression, using the standard scheduling methods. This processor is intended to be run on the Primary Node only. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected. | |
QueryRecord | Community | Struct | JDBC | ✓ | Evaluates one or more SQL queries against the contents of a FlowFile. The result of the SQL query then becomes the content of the output FlowFile. This can be used, for example, for field-specific filtering, transformation, and row-level filtering. Columns can be renamed, simple calculations and aggregations performed, etc. The Processor is configured with a Record Reader Controller Service and a Record Writer service so as to allow flexibility in incoming and outgoing data formats. The Processor must be configured with at least one user-defined property. The name of the Property is the Relationship to route data to, and the value of the Property is a SQL SELECT statement that is used to specify how input data should be transformed/filtered. The SQL statement must be valid ANSI SQL and is powered by Apache Calcite. If the transformation fails, the original FlowFile is routed to the 'failure' relationship. Otherwise, the data selected will be routed to the associated relationship. If the Record Writer chooses to inherit the schema from the Record, it is important to note that the schema that is inherited will be from the ResultSet, rather than the input Record. This allows a single instance of the QueryRecord processor to have multiple queries, each of which returns a different set of columns and aggregations. As a result, though, the schema that is derived will have no schema name, so it is important that the configured Record Writer not attempt to write the Schema Name as an attribute if inheriting the Schema from the Record. See the Processor Usage documentation for more information. | |
UpdateDatabaseTable | Community | Struct | JDBC | ✓ | This processor uses a JDBC connection and incoming records to generate any database table changes needed to support the incoming records. It expects a 'flat' record layout, meaning none of the top-level record fields has nested fields that are intended to become columns themselves. | |
ConnectWebSocket | Community | Unstruct | Websocket | ✓ | ✓ | Acts as a WebSocket client endpoint to interact with a remote WebSocket server. FlowFiles are transferred to downstream relationships according to received message types as WebSocket client configured with this processor receives messages from remote WebSocket server. If a new flowfile is passed to the processor, the previous sessions will be closed and any data being sent will be aborted. |
ListenWebSocket | Community | Unstruct | Websocket | ✓ | Acts as a WebSocket server endpoint to accept client connections. FlowFiles are transferred to downstream relationships according to received message types as the WebSocket server configured with this processor receives client requests | |
PutWebSocket | Community | Unstruct | Websocket | ✓ | Sends messages to a WebSocket remote endpoint using a WebSocket session that is established by either ListenWebSocket or ConnectWebSocket. | |
GetWorkdayReport | Community | Struct | Workday | ✓ | A processor which can interact with a configurable Workday Report. The processor can forward the content without modification, or you can transform it by providing the specific Record Reader and Record Writer services based on your needs. You can also remove fields by defining schema in the Record Writer. Supported Workday report formats are: csv, simplexml, json | |
GetZendesk | Community | Struct | Zendesk | ✓ | Incrementally fetches data from Zendesk API. | |
PutZendeskTicket | Community | Struct | Zendesk | ✓ | Create Zendesk tickets using the Zendesk API. |