Apache Nifi — Data Ingestion Tool

  • It consists of atomic elements that can be combined into groups to build a data flow
  • Processors
  • Flow File
    - NIFI propagates data in the form of a flow file
    -It can contain any form of data — “CSV, JSON, XML, text or even a binary data”
    - NIFI can propagate any form of data from any source to any destination since it has a flow file abstraction
    - We can use the processor to process the flow file to generate a new flow file as a result
  • Connection
    - All Processors together connected to create a data flow called connections which act as a queue for flow files
  • Process Group
    - We can create the processor groups by connecting one or more processors which will help in complex data flow for better maintenance
  • Controller Service
    - It is a shared service that processors can use
    - For example — the processor which gets and puts data to the SQL database can have a controller service with the required DB connection details
  • Has Data
  • Composed of two components
    - Content — data itself
    - Attribute — It’s metadata from the flow file which resembles Key-value pairs
  • The processor can manipulate the flow file attribute — “ update, add, remove attributes “ or it can change the content of the flow file or it can do both
  • Lifecycle
    - Persisted in the disk
    - Passed — by — reference
    - Whenever a new flow file is generated by the processor it immediately gets persisted in the disk and nifi will just pass the reference of the flow file to the next processor
    - Ingesting the new data into the existing flow file or changing the content of the existing flow file will generate a new flow file
    - It won’t be created by just manipulating the attributes of the flow file
  • Types of Processors
    - ( https://www.nifi.rocks/apache-nifi-processors/ )
    - Nifi has 280+ processors (increases in each release )
    - It can distribute data to many systems
    - Some of them are
  • Standard configuration
    - Common across all processors
  • Unique Configuration
    - Unique for specific processors
  • Relationships
    - Each processor has zero or more relationships defined for it
    - Once the processor has finished processing it routes one or more flow files to their relationships
    - Its flow file is responsible for handling these relationships by creating the connection for each of them to another processor. u can also terminate the relationship when not in use
    - NIFI will complain if we have any unhandled relationship in the processor we can’t start the processor until we handle it
  • In NIFI each processor will take their own time to process based on the complexity involved
  • To handle this NIFI has a backpressure configuration each connection can have its backpressure defined
    - Object threshold
    - Size threshold
  • NIFI stops the processor if either of the thresholds is reached
  • Here I’m allowing all the values including null, in your case you can exclude null by keeping the type as string/integer.
  • So the processor will route the flow file data content to failure status [“relationship”]
  1. Now the partitioning of the records will happen based on the partition_dt value when the processor reads the incoming data.
  • Data ingestion to hive table is very slow
  • It does not support nontransactional table




Your Digital Transformation partner. We are here to share knowledge on varied technologies, updates; and to stay in touch with the tech-space.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Introductions and a Learned Lesson

from fit6 http://ift.tt/2kfxSYU via alanafalk.jimdo.com

My journey to learning how to turn my car into a self-driving automobile — Day 3

AutoIt Automation Tool

ActionSheet in Flutter

Chatting with Watson to Hook any Tweets: Webhook Tutorial

AWS Lambda Cookbook — Elevate your handler’s code — Part 3 — Observability: Business KPI Metrics

Introducing ‘Razor’

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Payoda Technology Inc

Payoda Technology Inc

Your Digital Transformation partner. We are here to share knowledge on varied technologies, updates; and to stay in touch with the tech-space.

More from Medium

How to build Lakehouse with Azure Synapse

Available Tools and Frameworks for Big Data Engineering

Azure Data Factory: Connect to Multiple Resources with One Linked Service

Using Azure Databricks for Batch and Streaming Processing