How to Achieve Seamless Data Integration
Think about all the data in your personal life – passwords, calendars, finances, phone numbers, and so much more. Then think about how that data intersects with the other people in your circle. It could be a family calendar with all the doctor appointments and soccer games. Or it’s a shared bank account with a partner for which you both have the login.
But maybe one day, someone forgets to add a game to the calendar or share a password change. Suddenly, the whole process breaks down cause the various parties are not bringing the necessary data together.
And even though we’re talking about a family, the same scenario may be true for your work family, where data isn’t always flowing effortlessly among the various systems and processes. Without proper integration, a company can feel a bit lost and unsure about its data. But there are steps to set your organization on the right path to achieving seamless data integration.
WHY DATA INTEGRATION MATTERS
Data integration should occur when data or information from one business context or department generates value if used in another context or department. This may sound a bit abstract, but bear with me, as we are going to cover a lot of ground – and I want to build a shared foundation.
Integration is an immense problem space because most companies require a multitude of systems to operate. The cost structure of business models often depends on how effectively and efficiently these integrations are implemented with constrained resources. Most companies recognize that as the marginal cost of compute moves to zero, the effective digitization of workflows will yield market winners. As Marc Andreessen, the progenitor of the modern browser, explains in his article “Why Software is Eating the World,” this mega-trend started developing back in 2011.
As our global economy moves into the cloud ecosystem, “digital transformations” are accelerating the demand curve for integrated operations and analysis. In the early phases of these “digital transformations,” companies were shifting away from manual processes like spreadsheets and using modern, governable systems to store and analyze their data.
These opportunities are buoyed by the 100-billion-dollar market cap of the growing data and analytics space where the major players are Snowflake, Microsoft Azure, AWS, DataBricks, and Google Cloud. Snowflake represents the most innovative approach to databasing since PostgresSQL was introduced in the 1990s. Virtual database technology fundamentally changes the velocity of innovation and experimentation with a company’s harvested analytical data (more on that later). Luckily, Snowflake competitors realized this and have made their own substantial investments in building products and services in this space as well.
According to Gartner, 89% of board directors say digital business is now embedded in all business growth strategies. Still, a mere 35% of board directors report that they have achieved or are on track to achieving digital transformation goals. This may be because managing data across the enterprise and the timely integration of information among departments are critical aspects of all “digital transformations.” We often see companies look at the complexity of this task and try to over-engineer a perfect solution. This is a mistake.
In the cloud, infrastructure can be spun up and torn down multiple times in a day and because this problem space is so novel, mistakes can be contained and remediated. Furthermore, they represent the best way to learn and grow your company’s capabilities. It’s time to take on this challenge and that can mean starting small.
INTEGRATION OF OPERATIONAL AND ANALYTICAL DATA
So, with that context in place, let’s get back to that shared foundation we were building. There are two main types of data that are valuable to an organization:
- Operational data: This is the data that powers your company’s day-to-day operations. It is largely transactional and event-based data produced by internal functions and processes of a business, such as purchase orders in the ERP, accounting data from the financial systems, and events in the CRM. Integrating operational data usually means maintaining consistency of the same data points in multiple systems or triggering downstream workflows in a coordinated way.
- Analytical data: When analytical data is consolidated from different sources, including operational data, it can be analyzed for decision support and operational intelligence — for example, integrating Salesforce CRM data with Google site traffic to get a holistic view of customers and their behavior. By bringing data into the analytical plane and integrating data from different sources, an organization can extract insights and model data for applied use-cases like predictive analysis.
When integrating operational data, you are attempting to provide an accurate and relevant model of the world as it currently is. The relevancy of the data or an event to the domain being integrated is a critical limiting factor on whether a particular integration should be implemented. If irrelevant data is moved from one system to another, you’ve incurred an operational cost with no benefit, and over time this behavior will erode data quality in the analytical plane.
In the analytical plane, you look at data describing the past. Data should be moved from the operational to the analytical plane, where cross-domain analysis and experimentation can proceed without interrupting business operations. This is where baselines and benchmarks get created. As new data flows from the operational to the analytical plane, we can assess how the business is measuring against those established KPIs and indexes.
HOW OPERATIONAL DATA INTEGRATION STREAMLINES PROCESSES
Business operations require different people to do different jobs and use different systems for various aspects of an organization’s wealth-generating and value-producing activities. Sometimes the activities from one department are an input to another department’s work. The ability to efficiently execute workflows and optimally distribute across resources requires an organization to carefully consider how operational data from one department is used – and is useful – to another department’s operations.
Activities supported by operational data may include scheduling services, ordering products, or shipping materials. For example, when operational data is integrated, it allows orders placed on an e-commerce site to make it to the ERP system for fulfillment. Tracking numbers created by the warehouse team can be accessed by the customer using the Amazon marketplace.
Data integration is a hot topic because of the explosion of software systems and digitization. Software systems generally produce operational data that is stored within those systems and have back-end support for the operational workflows employees are executing.
Operational systems generally have dissimilar perspectives of the external world and require different data to model those aspects effectively. CRM systems don’t care about bank account numbers, and financial systems don’t care about how many candidates the HR team might be recruiting. But this does not imply that data from one system will not be useful in another.
Finding the right tools for operational data integration depends on your data architecture, cloud footprint, and application landscape. Here are some technologies & patterns to consider:
- Custom development: micro-services, events, and messaging queues
- Platform features: webhooks and APIs (ERPs, CRMs, and other SaaS solutions)
- iPaaS solutions: MuleSoft, Jitterbit, Synapse Pipelines, and Informatica for integrating multiple systems with a variety of patterns and transformation logic.
HOW ANALYTICAL DATA INTEGRATION POWERS BUSINESS DECISIONS
Back to that shipping example, a company not only needs the integration of operational data to ensure the raw materials physically show up at the manufacturer’s warehouse. It also needs to be able to use that data and combine it with other information known to be true. This allows the company to draw deeper insights into its own business.
In much the same way that algebraic derivatives give us an understanding of functions, using analytical data provides us with an understanding of operational processes. Analytical data is generally represented and accessed in schemas, models, and views using database technology. And that modeling and analysis require high-quality data, applied in the correct context, to accurately represent the state of any business.
Modeled data is descriptive of the real world like a map describes the terrain. How well modeled data provides the “map holder” the ability to navigate the current environment based on a map derived from historical information is a measure of “fitness.” The key to evolving a company’s map-making capabilities is the correct toolkit and framework to help them map the terrain. Here are some suggestions that we have implemented at Kenway:
- Key concepts: Data mesh and domain driven design
- Key toolsets: Synapse, Snowflake, Google Big Query, Redshift
- Concepts to be aware of: Data sharing and data clean rooms
HOW KENWAY HELPS WITH DATA INTEGRATION
Kenway offers a flexible and tailored approach to data integration by guiding clients with a data strategy that aligns with corporate objectives and drives long-term, sustainable value. Based on our experience with a wide array of data integration projects, we generally keep the following in mind when handling data in the analytical plane:
- Conceptualize data as products: Reports and other applied use-cases should be decoupled from stored data using a product framework. Data products can be leveraged in both operational integration and analytical modeling without impacting other applications using the same data. Data product consumption patterns should be considered for different business needs, such as data science operations and downstream applications.
- Scope teams for the technical skills and the throughput needed: Business stakeholders and subject matter experts (SMEs) in data engineering, data modeling, cloud architecture, and infrastructure are often the constricting factors.
- Rationalize the skills needed in-house vs. what can be outsourced: Pipeline development should be easily repeatable and helps to remove main dependencies. Data modeling with SME involvement should be cultivated in-house where domain knowledge about your data is high but is often the limiting factor.
- Groom cross-functional requirements with dev teams and the business: Build product roadmaps before project plans and empower development teams to become more product oriented instead of relying entirely on project management processes to tie requirements together.
If you’re looking for data integration solutions for your organization, connect with us to discover how to complement your business objectives and maximize return on investment while minimizing operational overhead.