One of the most important things to understand in Apache NiFi (incubating) is the concept of FlowFile attributes. You may already have a general understanding of what attributes are or know them by the term “metadata”, which is data about the data. There is also a good description in this Wikipedia article. However, since this blog is all about keeping things simple… I like to use the simple analogy of a letter in an envelope when describing NiFi FlowFile attributes.
Let’s say that all the data objects being processed in your NiFi dataflow are like letters you have received. In this analogy, these objects, or FlowFiles, are each made up of the content (the letter inside the envelope) and the attributes (the details written on outside of the envelope). So, for example, you have the sender’s name, the sender’s address, the recipient’s name, and the recipient’s address. You could think of these as attributes. If the letter has been mailed, you might even have details like the date it was mailed, the post office that processed it, etc. All of these things are written on the outside of the envelope.
So, without even making the effort of opening the letter, NiFi knows a lot about it and can make decisions about it. For example, we could tell NiFi that if it’s from John Smith, route it so that it gets filed in a directory called “Love Letters”; whereas, if it’s from Smitty John, delete it.
(For more on routing FlowFiles based on attributes, see the usage documentation for the RouteOnAttribute processor.)
The fact that NiFi can just inspect the attributes (keeping only the attributes in memory) and perform actions without even looking at the content means that NiFi dataflows can be very fast and efficient.
Within the dataflow, the user can also add or change the attributes on a FlowFile to make it possible to perform other actions. For example, let’s say all of John Smith’s letters include both a letter and a newspaper clipping, whereas Smitty John’s letters do not. The user could add an attribute to “flag” each of John Smith’s letters, indicating that somewhere later in the dataflow, the newspaper clipping should be separated from the letter content. In this case, NiFi would then delve into the contents inside the envelope, but only for letters that require it. (Doing something with the content of the FlowFile is often more resource intensive than just inspecting the attributes. So, by flagging only those letters that need such an action, the dataflow will be more efficient overall.)
(For more on manipulating attributes, see the NiFi Expression Language Guide and the usage documentation for the UpdateAttribute processor.)
Each attribute is made up of a key-value pair. The key is the name or type of attribute. And the value is the unique information assigned to that key. So, for the letter shown in the image above, we might have the following keys and values:
||456 Street Cir., Elsewhere, ST 00000
||123 Road Ave., Anytown, ST 11111
If we wanted to add an attribute to flag certain letters so that NiFi would know to separate the newspaper clipping from the letter, we might have an attribute like the following:
So, let’s leave our letter analogy now and look at how attributes actually appear in NiFi. Below is a bulletin from a LogAttribute processor. This processor is mainly used for testing and it logs the list of attributes of any FlowFiles it processes. This bulletin pertains to a FlowFile that was produced by the GenerateFlowFile processor. (It’s important to note which processor the FlowFile has come from, because different types of processors can add/change different attributes.)
In the image above, we can see each key and value listed. For example, the first one is Key: ‘entryDate’ and Value: ‘Tue Jan 20 10:21:48 EST 2015’. You can think of all the key-value pairs that are currently on a FlowFile as the FlowFile’s “Attribute Map”. The LogAttribute processor can help you determine whether the Attribute Map is currently what you expect, so you know how to build and adjust your dataflow from that point forward.
Having this understanding about FlowFile attributes and how they can be used is an important part of building dynamic, efficient, and powerful dataflows in NiFi.