Showing posts with label Approaches to Development. Show all posts
Showing posts with label Approaches to Development. Show all posts

Thursday, July 26, 2012

Get top 5 records to target without using rank

How to get top 5 records to target without using rank ? 
Solution:
  1. Drag the source to mapping and connect it to sorter transformation.
source sorter mapping 
Arrange the salary in descending order in sorter as follows and send the record to expression.








  1. Add the next value of sequence generator to expression.(start the value from 1 in sequence generator).


Connect the expression transformation to a filter or router. In the property set the condition as follows



  1. Finally connect to the target.

Tuesday, July 24, 2012

Change your Target Flat File name Dynamically


When ever you create a flat file using a PowerCenter Workflow, the name of the file will be static, which is specified in the Session 'Output Filename' property.  There can be business cases where you need to generate flat files with dynamic file name, like add Timestamp with the file name . Here in this article we will discuss how we can generate flat files with dynamically changing name.

As the first step lets create the flat file definition using Target Designer

informatica target designer

Now lets add one new column 'FileName' using the 'Add File Name Column' button which is highlighted at the top right corner as shown in the below image. Dont get confused... this column is not going to be in the target file, this is the column based on the file name is dynamically changed.

informatica target designer flat file with dynamic file name

Next step will be; create the mapping to generate the flat file as shown below.

informatica power center mapping for dynamic file name

We wont worry about any transformation in the mapping except EXP_FILE_NAME Transformation. This expression transformation is responsible for generating the target file name dynamically.

Lets look at the Expression Transformation below.

informatica power center mapping for dynamic file name

Here in this transformation, we have to create an output port 'FileName' and give the expression value as 'Customer_Master_' || TO_CHAR(SYSDATE,'YYYY-MM-DD') || '.csv'. Using this expression we are dynamically attaching the Date along with the file name. You can customize this expression as your need to change the file name.

Now from the expression transformation, map the 'FileName' port to the target table and all the remaining ports.

We are all done... Now create and run the workflow; you will see the file is generated in your Target File directory with the date as part of the file name. There is not any specific setting required at the session level.

Design Approach to update Bigger tables


One of the issue we come across during the ETL design is "Update Large Tables".  This is a very common ETL scenarion especially when you treat with large volume of data like loading an SCD Type 2 dimension table.  Here in this article lets discuss an approach to update Larger tables using Informatica Mapping.

High level Design Approach.

  1. Use Database JOIN to identify the records to be updated.
  2. Insert the records into TEMP table, which is identified for UPDATE.
  3. Use post session SQL to update the target table.
update-large-table-design

Design Assumption.

  1. Source and Target tables are relational table.
  2. Both source and target table is on the same schema.
  3. Tables are accessible using a single database user. 

Informatica Implementation.

For the demonstration purpose lets consider the Customer Dimension table load T_DIM_CUST, which has 100 M records. Each load we are expecting to update 100 K Records records in the Dimension table.

Lets start with the mapping building. As the first step, lets OUTER Join the source table CUST_STAGE and target table T_DIM_CUST. Use the SQL below as the SQL override in source qualifier. 

SELECT
--Columns From Source Tables
CUST_STAGE.CUST_ID,
CUST_STAGE.CUST_NAME,
CUST_STAGE.ADDRESS1,
CUST_STAGE.ADDRESS2,
CUST_STAGE.CITY,
CUST_STAGE.STATE,
CUST_STAGE.ZIP,
...
--Columns from Target Tables.
--If any column has NULL value, record to be set as INSERT else UPDATE
T_DIM_CUST.CUST_ID,
T_DIM_CUST.AS_OF_START_DT,
T_DIM_CUST.AS_OF_END_DT
FROM CUST_STAGE
--Outer Join is Used
LEFT OUTER JOIN T_DIM_CUST
ON CUST_STAGE.CUST_ID = T_DIM_CUST.CUST_ID
AND T_DIM_CUST.AS_OF_END_DT = TO_DATE('12-31-4000','MM-DD-YYYY')


Now using a Router Transformation, route the records to INSERT/UPDATE path. Records identified as INSERT will be mapped to T_DIM_CUST and identified as UPDATE will be mapped to T_DIM_CUST_TEMP. Use T_DIM_CUST_CUST_ID, which is the column from the target table to identify the records to be inserted/updated. Below is the Router Group Filter Condition and you can see how the mapping looks like in the below image.

INSERT : IIF(ISNULL( T_DIM_CUST_CUST_ID ), TRUE, FALSE)
UPDATE : IIF(NOT ISNULL( T_DIM_CUST_CUST_ID ), TRUE, FALSE)


Now the mapping development is complete,  during the session configuration process, add the below SQL as part of the Post session SQL statement as shown below. This correlated update SQL will update the records in T_DIM_CUST table with the values from T_DIM_CUST_TEMP.

UPDATE T_DIM_CUST SET
  (T_DIM_CUST.AS_OF_END_DT,T_DIM_CUST.UPDATE_DT) =
  (SELECT T_DIM_CUST_TEMP.AS_OF_END_DT,T_DIM_CUST_TEMP.UPDATE_DT
  FROM T_DIM_CUST_TEMP
  WHERE T_DIM_CUST_TEMP.CUST_ID = T_DIM_CUST.CUST_ID
  AND T_DIM_CUST.AS_OF_END_DT   = TO_DATE('12-31-4000','MM-DD-YYYY'))
WHERE EXISTS
  (SELECT 1
  FROM T_DIM_CUST_TEMP
  WHERE T_DIM_CUST_TEMP.CUST_ID = T_DIM_CUST.CUST_ID
  AND T_DIM_CUST.AS_OF_END_DT   = TO_DATE('12-31-4000','MM-DD-YYYY'))


That is all we need... Hope you enjoyed this design technique. Please let us know if you have any difficulties in implementing this technique.