Architecture : Make loosely coupled Applications with Apache NiFi – Lumiq-tech – Medium
Three years ago, one of our solutions looked like this
The business use case was that we were receiving real time feeds data from Smart Electric Meters from around hundreds of locations, each having tens of such meters. We needed to process this data which meant filtration, aggregation and transformation on some business rules; and serve the results to Business Users.
This system had become a success and a useful tool for the business. After a year, the use cases increased tremendously and it looked like below
We started receiving data of water, gas and temperature meters too. This was a huge pressure on our ingestion and processing pipeline. Real-time predictive analysis got also included in our processing pipeline. We needed to develop new workflows for real-time and batch processing. As we needed to maintain the same latency and throughput, we introduced some new elements and decoupled the system into more logical and functional modules.
Here came the Apache NiFi to our rescue. We did a POC where we transported our pre-processing logic to Apache NiFi and made some architectural changes. Let me be clear that NiFi did not magically make the system better but it allowed us the flexibility to make a loosely coupled system.
We made our Nginx server to send any POST request received to a Kafka topic. In NiFi, we started performing following operations
- Consume the message from Kafka topic
- Store the raw message into a database, in parallel, for audit and re-run purpose
- Download file from the url received in the message
- Validating the file according to some business rules
- Filtering, splitting and attributing the data
- Distributing the files to multiple Kafka topics according to filter type and load (number of files in the queue)
- Logging using a common Processor Group
Alerting or calling external Rest APIs during processing were now moved to NiFi and communication started happening through Kafka. Our daily reconciliation reports of ingestion and processing were now generated by workflows designed in NiFi. We designed our data flows in such a way, that continuous real time processing of data does not stop and all the slow and non-critical operations are done in parallel. Introducing Kafka was meant for decoupling our different applications so that they can produce and consume messages on their own speed.
We could write complex flows easily, like this one in which we are load balancing our messages to different systems on the basis of current pending sites.
We were already using Apache Storm for stream processing, so we could easily scale up on processing part by fine tuning the flow as needed.
Over this time, NiFi has helped us to enable new functionalities, add new features, build custom workflows, debug in difficult times (thanks to data provenance).
It has become a key component of most of our products, the true Data Orchestrator.