Wednesday, December 11, 2013

Informatica Performance Tuning

 

Finding/Fixing Performance Bottlenecks

 
                The goal of performance tuning is to optimize session performance so that the sessions run during the available load window and get completed by minimal expected time. Performance tuning s nothing but identifying the bottlenecks and eliminate them to get a better acceptable ETL load time.

              The first step in performance tuning is to identify performance bottlenecks. Performance bottlenecks can occur in the source and target, the mapping, the session, and the system.

              Whenever a session is triggered, Integration service starts Data Transformation Manager (DTM) process, which  is responsible to start reader thread, transformation thread and writer thread.



 

Source Bottlenecks 

 
               The source side bottlenecks occurs when the reader thread runs very slow with poor throughput and causes the transformation thread and writer thread to wait for data. The complete session performance get impacted with source bottlenecks.
                Inefficient source query, source side database server performance or small database network packet sizes can cause source bottlenecks
 
 
 
                 We can upgrade the source database infrastructure to obtain better performance  and enough network bandwidth. Most importantly the source qualifier query can be tuned to retrieve the complete data set within minimum time interval.
                 Using hints or implementing indexes can boost up the performance of the select query as expected.
 
Target Bottlenecks
 
                 The target side bottlenecks occurs when the writer thread runs very slow with poor throughput and causes the transformation thread and reader thread to wait for memory space. The complete session performance get impacted with target bottlenecks.
 
                   Database network packet sizes, target side database server poor performance due to inefficient infrastructure or heavy data load can cause the target bottlenecks.
 
 
                      We can schedule our job when the target database server will have very less data load and all other resources will be free to gain better bandwidth for our respective jobs. Loading in bulk mode, increasing commit interval, Dropping Indexes and Key Constraints or using external loader utility can improve the target performance to some extant.
 
Transformation Bottlenecks
 
                     Development strategy by not following the transformation best practices or very complex implementation of business logics can lead to a critical transformation bottleneck.
                     In this case the transformation thread process incoming data very slowly and keeps the reader thread waiting till the memory is free up and the writer thread waits to get the transformed data to be written. All together resulting the session performance to be degraded.
 
                    
                   We may redesign the business logic and implement the most convenient way of using the transformations to get the optimal output. For example, using sorter at the very beginning of the pipeline to get rid of unwanted data processing time can improve the mapping performance.
Session Bottlenecks
 
                   Informatica may encounter session bottlenecks even when there is no reader, writer or transformation bottlenecks. It could happen due to incorrect settings provided for session memory configuration. For example, the DTM buffer size, The default buffer block size, informatica data cache and index cache size or other UNIX mount size can be incremented as required to bypass the bottlenecks.
 
                 For better performance and reducing the task execution time, there are some other methodologies available which can be implemented as per business requirement. For example session partitioning, Pushdown optimization, Concurrent Workflows etc.


 
 

Wednesday, December 4, 2013

Informatica Architecture




Domain: Domain is the primary unit for management and administration  of  services in PowerCenter. The components of domain are one or more nodes, service manager and application services.


Node: Node is logical representation of machine in a domain. A domain can have multiple nodes. Master gateway node is the one that hosts the domain. You can configure nodes to run application services like integration service or repository service. All requests from other nodes go through the master gateway node.

 
Integration Service: It is the heart of Informatica architecture. It accepts requests from the Power Center Client and process all transformation request and load data into target. It starts load balancer and DTM process to manage all task involved in ETL.

The DTM uses multiple threads to process data in a session. The main DTM thread is called the master thread. The master thread can create the following types of threads:


 Mapping Threads, Pre and Post-Session Threads, Reader Threads, Transformation Threads, Writer Threads

Repository & Repository Services:
Repository is nothing but a relational database which stores all the metadata created in Power Center. Whenever we develop mapping, session, workflow, do anything meaningful and save, entries are made in the repository.

And Repository service is the one that understands content of the repository, fetches data from the repository and sends it back to the requesting components, mostly client tools and integration service.

Global Repository: The global repository is the hub of the repository domain. The global repository can contain common objects to be shared throughout the domain through global shortcuts. Once created, you can not change a global repository to a local repository.

Local Repository: A local repository is any repository within the domain which can connect to the global repository and use objects in its shared folders. You can promote a local repository to global repository.

Wednesday, May 22, 2013

UNIX Command: AWK

Why AWK ?

  • The name AWK comes from the initials of its designers. Alfred V. Aho, Peter J. Weinberger. and Brian W. Kernighan. The Original version  of AWK was written in 1977.
  • UNIX has many utilities and AWK is one of them.
  • AWK is an excellent tool for processing rows and columns of data, and is easier to use  AWK than most conventional programing languages.
  • AWK can be considered to be a pseudo-C interpreter, as it understands the same arithmetic operators as C.
  • AWK also has string manipulation functions, so it can search for particular strings and modify the output.

Syntax of AWK:

                   awk '/search pattern1/ { Actions}  
                   awk '/search pattern1/ { Actions}' file

In The above syntax search pattern is a regular expression.
Actions - Statements to be performed. Several patterns and actions are possible in AWK.
File - Input File.

Example 1:  awk  '{print;}' employee.txt

Here pattern is not there so actions are applicable to all the lines.

Working Methodology of AWK :

  • AWK reads the input files one line at a time.
  • For each line, it matches with given pattern in the given order, if matches performs the corresponding action.
  • If no pattern matches, no action will be performed.
  • In the above syntax either search patterns or actions are optional, but not both.
  • If  the search pattern is not given, then AWK performs the given actions for each line of the input.
  • If the action is not given, print all the lines that matches with the given patterns which is the default action.
  • Each statement in actions should be delimited by semicolon.
  • Each word in a line is a field $1, $2,..... $NF. 
 Example 2: Print the line which matches the pattern 

awk '/Thomas/Nisha/' employee.txt

It will print the lines that contain Thomas and lines contains Nisha.

Example 3: Print only selective fields.

AWK treats a line as a record and columns as a field.

awk '{print $1,$5;}' employee.txt

It will print only the 2nd and 5th filed of employee.txt

Example 4: Initialization and final action

Syntax: BEGIN {Actions}
 {ACTION}
END {Actions}


The Built in Variables of AWK: 

1. FS (Input Filed Separator): By default AWK assumes that fields in a file are separated by space character. If the fields are separated by any other character, we can use the FS  variable to tell about the delimiter.

$ awk 'BEGIN {FS="|"} {print $2}' input_file  

It will print the second word of the input_file provided the field separator is "|".

Tuesday, May 21, 2013

UNIX Command: tr [translate]

Syntax:

The syntax of tr command is:

$ tr [OPTION] [SET1] [SET2] 
 

Examples:

1. Convert lower case to upper case:

$ tr [:lower:] [:upper:]
 

2. Translate braces into parenthesis

$ tr '{}' '()' < inputfile > outputfile
 

3. Translate white-space to tabs

$ tr [:space:] '\t'
 

4. Squeeze repetition of characters using -s

$ tr -s [:space:] '\t'
 

5. Delete specified (digits) characters using -d option

$ tr -d [:digit:]
 

6. Complement the sets using -c option

$ tr -cd [:digit:] 
 

7. Remove all non-printable character from a file

$ tr -cd [:print:] < file.txt 
 

8. Join all the lines in a file into a single line

$ tr -s '\n' ' ' < file.txt
 

9. To replace every sequence of one or more new lines with a single new line

$ tr -s '\n' < textfile > newfile


10. To delete all NULL characters from a file
 
$ tr -d '\0' < textfile > newfile
 
-------------------------------------------------------------------------------