Data Feeds

A business intelligence(BI) system is nothing without a valid data source. When designing a BI system, we first need to determine what data we want to consume for analysis. Most organizations have various information systems that aid them in their day-to-day operations. Internal data from a system that aids in everyday operations of an organization is usually a good candidate for a BI project. Data can also come from external or even public data sources as well. These data sources that provide the information that drives a BI implementation are referred to as data feeds.

Data feed sources can be anything that provides the required data in a well-structured format. They can be exposed in a variety of formats, such as databases, XML files, CSV files, and even API (application programming interface) service calls. There is no one-size-fits-all format type for a data feed. For example, XML files are good sources for smaller data that doesn’t change much. However, data that changes rapidly and is large might be better sourced from a database.

In some cases, the BI architect might not have a choice when consuming external data. For example, if you want to consume public data from a web site that provides guidance on salary information, you may have to use that site’s API. A vendor that provides data as a service is unlikely to make available a backup of its entire database nightly; however, it is more likely to provide an external-facing API as a service.

Figure below shows three separate feeds going into our BI system. Two feeds are internal and are sourced via database repositories. In addition, we are consuming a web service API data feed on salary information for consultants. We don’t know how the data is getting there, but we have architecturally defined what data we want to consume.

Click on image to enlarge

Time entry and HR systems are highly transactional with multiple updates happening every minute. Using the underlying databases directly as the source for data is not a good idea. The data feed sources need to pull data that is transitionally accurate, and pulling them from a live system does not guarantee that. Furthermore, largescale data pulls can adversely affect the performance of the underlying system.
 
Most BI implementations use a snapshot or a backup of the data that happens at a given point in time. The snapshots can be in the form of synchronization that can give an almost real-time feed to the data; alternatively, snapshots of the data can be taken at monthly intervals. This allows the data feed to “bridge” itself from the operations and transactions of the system.
 

Share this article :
 
Copyright © 2011. BI Articles and Study Case - All Rights Reserved
Proudly powered by Blogger