Wednesday, December 11, 2013

Informatica Performance Tuning

 

Finding/Fixing Performance Bottlenecks

 
                The goal of performance tuning is to optimize session performance so that the sessions run during the available load window and get completed by minimal expected time. Performance tuning s nothing but identifying the bottlenecks and eliminate them to get a better acceptable ETL load time.

              The first step in performance tuning is to identify performance bottlenecks. Performance bottlenecks can occur in the source and target, the mapping, the session, and the system.

              Whenever a session is triggered, Integration service starts Data Transformation Manager (DTM) process, which  is responsible to start reader thread, transformation thread and writer thread.



 

Source Bottlenecks 

 
               The source side bottlenecks occurs when the reader thread runs very slow with poor throughput and causes the transformation thread and writer thread to wait for data. The complete session performance get impacted with source bottlenecks.
                Inefficient source query, source side database server performance or small database network packet sizes can cause source bottlenecks
 
 
 
                 We can upgrade the source database infrastructure to obtain better performance  and enough network bandwidth. Most importantly the source qualifier query can be tuned to retrieve the complete data set within minimum time interval.
                 Using hints or implementing indexes can boost up the performance of the select query as expected.
 
Target Bottlenecks
 
                 The target side bottlenecks occurs when the writer thread runs very slow with poor throughput and causes the transformation thread and reader thread to wait for memory space. The complete session performance get impacted with target bottlenecks.
 
                   Database network packet sizes, target side database server poor performance due to inefficient infrastructure or heavy data load can cause the target bottlenecks.
 
 
                      We can schedule our job when the target database server will have very less data load and all other resources will be free to gain better bandwidth for our respective jobs. Loading in bulk mode, increasing commit interval, Dropping Indexes and Key Constraints or using external loader utility can improve the target performance to some extant.
 
Transformation Bottlenecks
 
                     Development strategy by not following the transformation best practices or very complex implementation of business logics can lead to a critical transformation bottleneck.
                     In this case the transformation thread process incoming data very slowly and keeps the reader thread waiting till the memory is free up and the writer thread waits to get the transformed data to be written. All together resulting the session performance to be degraded.
 
                    
                   We may redesign the business logic and implement the most convenient way of using the transformations to get the optimal output. For example, using sorter at the very beginning of the pipeline to get rid of unwanted data processing time can improve the mapping performance.
Session Bottlenecks
 
                   Informatica may encounter session bottlenecks even when there is no reader, writer or transformation bottlenecks. It could happen due to incorrect settings provided for session memory configuration. For example, the DTM buffer size, The default buffer block size, informatica data cache and index cache size or other UNIX mount size can be incremented as required to bypass the bottlenecks.
 
                 For better performance and reducing the task execution time, there are some other methodologies available which can be implemented as per business requirement. For example session partitioning, Pushdown optimization, Concurrent Workflows etc.


 
 

Wednesday, December 4, 2013

Informatica Architecture




Domain: Domain is the primary unit for management and administration  of  services in PowerCenter. The components of domain are one or more nodes, service manager and application services.


Node: Node is logical representation of machine in a domain. A domain can have multiple nodes. Master gateway node is the one that hosts the domain. You can configure nodes to run application services like integration service or repository service. All requests from other nodes go through the master gateway node.

 
Integration Service: It is the heart of Informatica architecture. It accepts requests from the Power Center Client and process all transformation request and load data into target. It starts load balancer and DTM process to manage all task involved in ETL.

The DTM uses multiple threads to process data in a session. The main DTM thread is called the master thread. The master thread can create the following types of threads:


 Mapping Threads, Pre and Post-Session Threads, Reader Threads, Transformation Threads, Writer Threads

Repository & Repository Services:
Repository is nothing but a relational database which stores all the metadata created in Power Center. Whenever we develop mapping, session, workflow, do anything meaningful and save, entries are made in the repository.

And Repository service is the one that understands content of the repository, fetches data from the repository and sends it back to the requesting components, mostly client tools and integration service.

Global Repository: The global repository is the hub of the repository domain. The global repository can contain common objects to be shared throughout the domain through global shortcuts. Once created, you can not change a global repository to a local repository.

Local Repository: A local repository is any repository within the domain which can connect to the global repository and use objects in its shared folders. You can promote a local repository to global repository.