Work, Flow, Business!
Business Processes are the fundamental building blocks of every business. In order to achieve objectives of the business, processes are implemented which are performed by actors, which can be systems or employees.
These processes help the teams perform the tasks in a systematic, predictable way which eventually contributes to consistency in performance.
But as they say, with consistent performance comes consistent efficiency.
As a first step to close on inefficiencies, management needs to be vigilant of the process gaps, which is best achieved today by Data. Accumulated data solves for visibility, and overtime, starts showing patterns or insights. This helps businesses realize and focus on, the systemic problems and implementing solutions for them.
At Blackbuck, before July 2020, our order fulfillment operations didn't have enough automated visibility to answer the following questions in real-time:
- How are my team members performing?
- What are the root causes of performance gaps or what can we do to improve our efficiency and performance?
To address both the issues, we built a process or workflow on-boarding platform or a workflow engine AKA Karma!
So without further ado, let's dive right in …
Terminology
A Business Process Model or Workflow explicitly defines steps/tasks and execution strategy of a business process.
A Tenant is any service which defines a business process model in the workflow engine.
Workflow Management Systems or Workflow Engines are software systems whose objective is to assist the actors in performing tasks and to collect and manage, data and associated configuration, based on tenant's requirements.
Workflow Patterns are design patterns specialized for Workflow Management Systems. One of the ways to classify them is by execution strategy, for e.g. serial, parallel, multiple paths, synchronized executions et c.
Design
Objectives of our workflow engine are to
- Implement business processes as a configuration
- Onboard employees and services which can perform the tasks defined in a workflow
- Allocate tasks between team members based on permissions in a load balanced, round-robin manner
Workflow Pattern
Imagine, you are in a bread processing factory. The factory might be highly automated with minimal manual labor or can have a lot of people working in it. Regardless, you'll see that the bread-making process will remain more or less same, namely, Doughing, Molding, Baking, Packaging and Shipping. Each of these steps can have multiple steps within, depending on different sizes and shapes of bread required.
Here, execution of each step is in a well-defined sequence which makes this a business process model or a workflow pattern.
To incorporate a majority of Blackbuck's order fulfilment processes, we implemented a particular flavor of workflow pattern defined as,
A graph where nodes are tasks and edges are execution conditions. Execution can happen
1. either parallelly and independently, or
2. in serial, where next task starts on completion of previous task only, or
3. in combination of both
There can be multiple start points and multiple end points of a workflow
A workflow doesn’t complete unless all executed tasks have been completed
Within this workflow pattern, we also introduced support for two kinds of tasks, namely,
- Form Submissions, utilized for Data Entry, Request Approvals within organization, and
- HTTP Calls, to execute logic or fetch data present with tenants
Business Process as a Config
In order to ease the development for tenants, a business process can be on-boarded as a configuration. If the configuration complies with internal constraints of workflow engine, tenants can run workflows without any development.
For configuration definition, we used relational schema to store the business process model and execution information, for instance
- Allocation Strategy
- Employee Hierarchy
- Permissions, for performing manual tasks
- SLA
- Scheduled Delays
- HTTP API information for HTTP call task
- Notifications
As a result, most of the configuration changes take effect at runtime without any extra effort.
To provide dynamic behavior on processes, we use
- Jayway’s JSON path syntax, since majority of applications and services communicate with each other in JSON format
- Spring Expression Language, to execute logic based on states available in different contexts of workflow execution
- Tenant's HTTP API Information, stored in DB, to give the tenants control over individual features of their workflows
For instance,
Within a workflow w, if a task t1, where we called an API to perform some action on a tenant, is being completed and we want to decide whether the next task after t1 should be t2 or t3, we can write conditions based on context associated to t2 and t3.
So say, t1 task is a verification of a document image with data entry. Once the data is entered, it is submitted to the tenant and tenant can internally decide to either APPROVE or REJECT the document. Based on each possible result from tenant, conditions of execution for t2 and t3 tasks can be as follows
For t2, where we want to trigger payment after approval
#responseEntity.getStatusCodeValue() == 200 && #expressionEvaluator.getContext().read("$.status") == 'APPROVED'
And for t3, where want to an employee to upload correct documents based on rejection criteria
#responseEntity.getStatusCodeValue() == 200 && #expressionEvaluator.getContext().read("$.status") == 'REJECTED'
Where responseEntity and expressionEvaluator are JAVA variables available in the context where this expression will be evaluated at runtime
State Transitions
Within the engine, we have defined states and transitions that a node or a task in the workflow graph can undergo. All tasks in all workflows go through this one finite-state machine.
Depth of Configuration
Workflow Configuration has various levels based on elements of workflow graph and context of business object
- L1 - the default behavior of the system with default configuration format.
For e.g.
SLA of a task is decided by a static configuration in DB. Based on this, system escalates the task if it's not completed within given SLA minutes - L2 - specialized configuration which overrides the default behavior at a workflow level, task level or execution condition level.
For e.g.
SLA of a task might be dynamic depending on tenant's requirements. So provision is given within DB config where tenants can add HTTP API information which workflow engine will utilize to calculate SLA at runtime. - L3 - Request body or Event message, which overrides all configuration and based on data provided to engine by tenant for an instance of the task.
For e.g.
Team members can dynamically decide SLA based on situation on ground and live contract negotiation for a particular order.
These levels of flexibility in configuration help tenants to control the workflow in any manner they want.
Workflows in Production
We have on-boarded 8 individual processes related to 4 different business use cases which are running in our production.
This includes:
- On-Ground Fulfillment Process
- Internal Document Verification Process for Fulfillment
- Driver Verification Process
- Information of Supply Charges to Partners
Future Enhancements
In upcoming sprints, we'll be extending the workflow pattern definition to include more flexibility on execution. Such as
- Infrastructure Orchestration for different workflows and tenants
- Making workflow on-boarding process seamless with a clean UI
- Experiments on different workflow versions
TL;DR
Business Processes are key components of any business. And they are driven by consistent performance over consistent efficiency.
We developed a workflow engine which maintains business processes as a config. Our tech stack consists of:
- MySQL and Mongo, which utilizes JsonPath and SpEL notation wherever required
- Spring Boot, as the development framework
- Redis, for distributed cache
- Kafka, for asynchronous messaging
Our workflow engine can store and execute multiple business processes simultaneously and independently, which might belong to different teams and tenants. Supported task types are Form Submissions and HTTP Calls.
This has increased our visibility and control of the work our teams do which has improved communication and SLA adherence, specially in operations.
Overall, it's low-code development platform for business process models