Monday 28 March 2011

Trends in Extract Transform Load

The processes that enables companies to move data from multiple and disparate sources, reformat and cleanse it, and then load it into either another database, a data mart / data warehouse for analysis, or another operational system to support a business process. This process involves:

Extracting data from outside sources or source operational / archive systems that are the primary source of data for the data warehouse

Transforming the data to fit business needs, which may involve data migration, data cleaning, filtering, validating, reformatting, standardization, aggregation or applying business rules

Loading the data into the end target (i.e., a data warehouse or any other database or application that houses data)

Trends in ETL

Evolving towards generic data integration tools: Not withstanding the fact that ETL tools are specifically aimed at the business intelligence market, they have evolved rapidly over the last few years. According to Gartner, “The stand-alone data integration technology markets — such as extraction, transformation and loading (ETL), replication and federation, and enterprise information integration — will rapidly implode into a single market for multi-mode, multipurpose data integration platforms.” Indeed, if one looks at the top vendors in the market, it is clear that this is happening or has happened already. Informatics has added a real-time module to its distributed software development allowing Informatics to brand Power Center as an EAI tool. IBM has added Data Stage, acquired from Ascential, under the WebSphere family. Oracle has also greatly improved its Warehouse Builder in the 11g version.

Data quality: Another obvious trend in ETL (and data integration in general) is the linkage with data quality tools (both cleansing and profiling). The awareness of the impact of bad quality data on both decision making and operations has risen enormously during the last years. Therefore, most ETL vendors have incorporated data profiling functionality into their tools (thus allowing developers to assess data quality before they develop data transformations), as well as integration with Medical device software development (thus allowing developers to build complex cleansing and standardization features in the transformation process). The above becomes clear when one analyzes the investments or acquisitions that ETL vendors have made over the last years.

Lower latency requirement: In the early stages of their data integration maturity and infrastructure, organizations tend to focus on batch-oriented, high-latency activities, such as nightly population of data warehouses and data extracts for interfacing between developing mobile applications. However, as business pressures require rapid response and reduced cycle times, the demand for lower-latency data integration builds. Although ETL vendors are having a hard time offering true real-time, many offer near real-time functionality.

Market consolidation: Also important to mark is that independent ETL vendors are disappearing. Informatica Power-Center still remains as an independent market leader, but other companies are offering ETL tools as part of a wider gamma of BI tools or as part of the database offering. Indeed, Microsoft, Oracle and IBM all have an ETL offering, with the first two vendors even offering the ETL engine ‘free of charge’ with the database.

Phase out home-built, phase in open source: Quite a few companies have invested during the 80s and the 90s in self-built ETL tools. Mostly these were simple, sometimes metadata-based SQL generators that executed scheduled SQL scripts against the database. In recent years, these tools have been disappearing, first making way for the commercial ETL tools and more recently for the open source ETL market, which has seen quite a few successes with Pentaho Data Integration (previously kettle), Talend and others.

Read more on  saas application development, Ecommerce solutions

No comments:

Post a Comment