DataUnited: 2012

Saturday, September 29, 2012

Regular Expressions

What Are Regular Expressions?
A regular expression is a pattern template you define that a Linux utility Uses to filter text. A Linux utility (such as the sed editor or the gawk program)matches the regular expression pattern against data as that data flows Into the utility. If the data matches the pattern, it’s accepted for processing.
If the data doesn’t match the pattern, it’s rejected. The regular expression pattern makes use of wildcard characters to represent one or more characters in the data stream.
Types of regular expressions:

There are two popular regular expression engines:

The POSIX Basic Regular Expression (BRE) engine
The POSIX Extended Regular Expression (ERE) engine

Defining BRE Patterns:
The most basic BRE pattern is matching text characters in a data stream.
Eg 1: Plain text

$ echo "This is a test" | sed -n ’/test/p’
This is a test.
$ echo "This is a test" | sed -n ’/trial/p’
$
$ echo "This is a test" | gawk ’/test/{print $0}’
This is a test.
$ echo "This is a test" | gawk ’/trial/{print $0}’
$
Eg 2: Special characters

The special characters recognized by regular expressions are:
.*[]^${}\+?|()
For example, if you want to search for a dollar sign in your text, just precede it with a backslash character:
$ cat data2
The cost is $4.00
$ sed -n ’/\$/p’ data2
The cost is $4.00
$
Eg 3: Looking for the ending

The dollar sign ($) special character defines the end anchor.

$ echo "This is a good book" | sed -n ’/book$/p’
This is a good book
$ echo "This book is good" | sed -n ’/book$/p’
$
Eg 4: Using ranges

You can use a range of characters within a character class by using the dash symbol.
Now you can simplify the zip code example by specifying a range of digits:
$ sed -n ’/^[0-9][0-9][0-9][0-9][0-9]$/p’ data8
60633
46201
45902
$
Extended Regular Expressions:

The POSIX ERE patterns include a few additional symbols that are used by some Linux applications and utilities. The gawk program recognizes the ERE patterns, but the sed editor doesn’t.
Eg 1: The question mark

The question mark indicates that the preceding character can appear zero or one time, but that’s all. It doesn’t match repeating occurrences of the character:
$ echo "bt" | gawk ’/be?t/{print $0}’
bt
$ echo "bet" | gawk ’/be?t/{print $0}’
Bet
$ echo "beet" | gawk ’/be?t/{print $0}’
$
$ echo "beeet" | gawk ’/be?t/{print $0}’
$
Eg 2: The plus sign

The plus sign indicates that the preceding character can appear one ormore times, but must be present at least once. The pattern doesn’t match if the character is not present:
$ echo "beeet" | gawk ’/be+t/{print $0}’
beeet
$ echo "beet" | gawk ’/be+t/{print $0}’
beet
$ echo "bet" | gawk ’/be+t/{print $0}’
bet
$ echo "bt" | gawk ’/be+t/{print $0}’
$
Eg 3: The pipe symbol

The pipe symbol allows to you to specify two or more patterns that the regular expression engine uses in a logical OR formula when examining the data stream. If any of the patterns match the data stream text, the text passes. If none of the patterns match, the data stream text fails.
The format for using the pipe symbol is:
expr1|expr2|...
Here’s an example of this:
$ echo "The cat is asleep" | gawk ’/cat|dog/{print $0}’
The cat is asleep
$ echo "The dog is asleep" | gawk ’/cat|dog/{print $0}’
The dog is asleep
$ echo "The sheep is asleep" | gawk ’/cat|dog/{print $0}’
$

Eg 4: Grouping expressions

When you group a regular expression pattern, the group is treated like a standard character. You can apply a special character to the group just as you would to a regular character.
For example:
$ echo "Sat" | gawk ’/Sat(urday)?/{print $0}’
Sat
$ echo "Saturday" | gawk ’/Sat(urday)?/{print $0}’
Saturday
$

Wednesday, August 1, 2012

Informatica Testing

Unit Test Plan(UTP)

Testing regimens:

1. Unit Testing

2. Functional Testing

3. System Integration Testing

4. User Acceptance Testing

Unit testing: The testing, by development, of the application modules to verify each unit (module) itself meets the accepted user requirements and design and development standards

Functional Testing: The testing of all the applications modules individually to ensure the modules, as released from development to QA, work together as designed and meet the accepted user requirements and system standards

System Integration Testing: Testing of all of the application modules in the same environment, database instance, network and inter-related applications, as it would function in production. This includes security, volume and stress testing.

User Acceptance Testing(UAT): The testing of the entire application by the end-users ensuring the application functions as set forth in the system requirements documents and that the system meets the business needs.

UTP Template:

Step #	Description	Test Conditions	Expected Results	Actual Results,	Pass or Fail (P or F)	Tested By
SAP- CMS Interfaces
1	Check for the total count of records in source tables that is fetched and the total records in the PRCHG table for a perticular session timestamp	SOURCE: SELECT count() FROM XST_PRCHG_STG TARGET:* Select count(*) from _PRCHG	Both the source and target table load record count should match.	Should be same as the expected	Pass	Stev
2	Check for all the target columns whether they are getting populated correctly with source data.	select PRCHG_ID, PRCHG_DESC, DEPT_NBR, EVNT_CTG_CDE, PRCHG_TYP_CDE, PRCHG_ST_CDE, from T_PRCHG MINUS select PRCHG_ID, PRCHG_DESC, DEPT_NBR, EVNT_CTG_CDE, PRCHG_TYP_CDE, PRCHG_ST_CDE, from PRCHG	Both the source and target table record values should return zero records	Should be same as the expected	Pass	Stev
3	Check for Insert strategy to load records into target table.	Identify a one record from the source which is not in target table. Then run the session	It should insert a record into target table with source data	Should be same as the expected	Pass	Stev
4	Check for Update strategy to load records into target table.	Identify a one Record from the source which is already present in the target table with different PRCHG_ST_CDE or PRCHG_TYP_CDE values Then run the session

What are the differences between stored procedures and triggers ?

Stored procedure normally used for performing tasks But the Trigger normally used for tracing and auditing logs.

Stored procedures should be called explicitly by the user in order to execute But the Trigger should be called implicitly based on the events defined in the table.

Stored Procedure can run independently But the Trigger should be part of any DML events on the table.

Stored procedure can be executed from the Trigger But the Trigger cannot be executed from the Stored procedures.

Stored Procedures can have parameters.But the Trigger cannot have any parameters.

Stored procedures are compiled collection of programs or SQL statements in the database.

Using stored procedure we can access and modify data present in many tables. Also a stored procedure is not associated with any particular database object.

But triggers are event-driven special procedures which are attached to a specific database object say a table.

Stored procedures are not automatically run and they have to be called explicitly by the user. But triggers get executed when the particular event associated with the event gets fired.

Sunday, July 29, 2012

Snapshot Facts

The snapshot fact contains a reflection of the state of an entity at a given point in time. A classic example of this would be a "daily_balance" fact in a banking system. This would, on a daily basis record the balance of each account, it would NOT list the individual transactions that happened on the account.

The differences between a logical data model and physical data model

Logical vs Physical Data Modeling

Logical Data Model	Physical Data Model
Represents business information and defines business rules	Represents the physical implementation of the model in a database.
Entity	Table
Attribute	Column
Primary Key	Primary Key Constraint
Alternate Key	Unique Constraint or Unique Index
Inversion Key Entry	Non Unique Index
Rule	Check Constraint, Default Value
Relationship	Foreign Key
Definition	Comment

Factless Facts

A factless fact is where the fact does not store an actual numerical measure, the mere existance of a fact record indicates that an event has happened that you wish to track. The classic example of this would be an "Attendance" fact. If you had dimensions to record date, scheduled_course, instructor and delegate then you could create a fact table that held the permutations of these dimensions. From this you could evaluate the number of courses you run, the number of delegates, the number of courses by instructor etc.

I would never simply leave a factess fact as a bare collection of foreign key columns I would always add a dummy measure column which would be set to 1 which you would then sum.

Conformed dimensions

Conformed dimensions can be used to analyze facts from two or more data marts. Suppose you have a “shipping” data mart (telling you what you’ve shipped to whom and when) and a “sales” data mart (telling you who has purchased what and when). Both marts require a “customer” dimension and a “time” dimension. If they’re the samedimension, then you have conforming dimensions, allowing you to extract and manipulate facts relating to a particular customer from both marts, answering questions such as whether late shipments have affected sales to that customer.

Suppose now that you add a “marketing” data mart to help you analyze product promotions. Again, with conformed customer and time dimensions, you’re able to analyze the effects of a particular product promotion on sales. (Analyzing facts from more than one fact table in this way is termed “drilling across.” My previous article,“Thinking dimensionally aids business intelligence design and use,” explains the function of facts and dimensions.)

As this example shows, the very same conformed dimensions—in this case, time and customer dimensions—have meaning in the context of three independentlydevelopeddata marts. These dimensions become enterprise property and can be used later in other marts as you evolve the enterprise data warehouse.

What is Junk Dimension?

A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good example would be a trade fact in a company that brokers equity trades.

When you consolidate lots of small dimensions and instead of having 100s of small dimensions, that will have few records in them, cluttering your database with these mini ‘identifier’ tables, all records from all these small dimension tables are loaded into ONE dimension table and we call this dimension table Junk dimension table. (Since we are storing all the junk in this one table) For example: a company might have handful of manufacture plants, handful of order types, and so on, so forth, and we can consolidate them in one dimension table called junked dimension table

It’s a dimension table which is used to keep junk attributes

Informatica PowerCenter Repository tables

every PowerCenter developer either has an intention or necessity to know about the Informatica metadata tables and where information is stored etc. For the starters, all the objects that we create in Informatica PowerCenter - let them be sources, targets, mappings, workflows, sessions, expressions, be it anything related to PowerCenter, will get stored in a set of database tables (call them as metadata tables or OPB tables or repository tables).

* I want to know all the sessions in my folder that are calling some shell script/command in the Post-Session command task.
* I want to know how many mappings have transformations that contain "STOCK_CODE" defined as a port.
* I want to know all unused ports in my repository of 100 folders.

In repositories where you have many number of sessions or workflows or mappings, it gets difficult to achieve this with the help of Informatica PowerCenter client tools. After all, whole of this data is stored in some form in the metadata tables. So if you know the data model of these repository tables, you will be in a better position to answer these questions.

Before we proceed further, let me clearly urge for something very important. Data in the repository/metadata/OPB tables is very sensitive and that the modifications like insert or updates are to be made using the PowerCenter tools ONLY. DO NOT DIRECTLY USE UPDATE OR INSERT COMMANDS AGAINST THESE TABLES.

Please also note that there is no official documentation from Informatica Corporation on how these tables act. It is purely based on my assumption, research and experience that I am providing these details. I will not be responsible to any of the damages caused if you use any statement other than the SELECT, knowing the details from this blog article. This is my disclaimer. Let us move on to the contents now.
There around a couple of hundred OPB tables in 7.x version of PowerCenter, but in 8.x, this number crosses 400. In this regard, I am going to talk about few important tables in this articles. As such, this is not a small topic to cover in one article. I shall write few more to cover other important tables like OPB_TDS, OPB_SESSLOG etc.
We shall start with OPB_SUBJECT now.

OPB_SUBJECT - PowerCenter folders table

This table stores the name of each PowerCenter repository folder.

Usage: Join any of the repository tables that have SUBJECT_ID as column with that of SUBJ_ID in this table to know the folder name.

OPB_MAPPING - Mappings table

This table stores the name and ID of each mapping and its corresponding folder.

Usage: Join any of the repository tables that have MAPPING_ID as column with that of MAPPING_ID in this table to know the mapping name.

OPB_TASK - Tasks table like sessions, workflow etc

This table stores the name and ID of each task like session, workflow and its corresponding folder.

Usage: Join any of the repository tables that have TASK_ID as column with that of TASK_ID/SESSION_ID in this table to know the task name. Observe that the session and also workflow are stored as tasks in the repository. TASK_TYPE for session is 68 and that of the workflow is 71.

OPB_SESSION - Session & Mapping linkage table

This table stores the linkage between the session and the corresponding mapping. As informed in the earlier paragraph, you can use the SESSION_ID in this table to join with TASK_ID of OPB_TASK table.

OPB_TASK_ATTR - Task attributes tables

This is the table that stores the attribute values (like Session log name etc) for tasks.

Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to find what each attribute in this table means. You can know more about OPB_ATTR table in the next paragraphs.

OPB_WIDGET - Transformations table

This table stores the names and IDs of all the transformations with their folder details.

Usage: Use WIDGET_ID from this table to that of the WIDGET_ID of any of the tables to know the transformation name and the folder details. Use this table in conjunction with OPB_WIDGET_ATTR or OPB_WIDGET_EXPR to know more about each transformation etc.

OPB_WIDGET_FIELD - Transformation ports table

This table stores the names and IDs of all the transformation fields for each of the transformations.

Usage: Take the FIELD_ID from this table and match it against the FIELD_ID of any of the tables like OPB_WIDGET_DEP and you can get the corresponding information.

OPB_WIDGET_ATTR - Transformation properties table

This table stores all the properties details about each of the transformations.

Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to find what each attribute in this transformation means.

OPB_EXPRESSION - Expressions table

This table stores the details of the expressions used anywhere in PowerCenter.

Usage: Use this table in conjunction with OPB_WIDGET/OPB_WIDGET_INST and OPB_WIDGET_EXPR to get the expressions in the Expression transformation for a particular, mapping or a set.

OPB_ATTR - Attributes

This table has a list of attributes and their default values if any. You can get the ATTR_ID from this table and look it up against any of the tables where you can get the attribute value. You should also make a note of the ATTR_TYPE, OBJECT_TYPE_ID before you pick up the ATTR_ID. You can find the same ATTR_ID in the table, but with different ATTR_TYPE or OBJECT_TYPE_ID.

OPB_COMPONENT - Session Component

This table stores the component details like Post-Session-Success-Email, commands in Post-Session/pre-Session etc.

Usage: Match the TASK_ID with that of the SESSION_ID in OPB_SESSION table to get the SESSION_NAME and to get the shell command or batch command that is there for the session, join this table with OPB_TASK_VAL_LIST table on TASK_ID.

OPB_CFG_ATTR - Session Configuration Attributes

This table stores the attribute values for Session Object configuration like "Save Session log by", Session log path etc.

Optimizing Oracle Optimizer Statistics

Top 5 MS Power Point Tips

Power point has become so common a tool that every Windows user clicks on MS Power point icon when asked to research, create a presentation, draw a flowchart or even just take notes.

However, there is so much of power under the hood that hardly gets utilized. So here are the top 5 power point tips to upgrade your power point experience

1. Stop Ctrl+C (Copy) and Ctrl+V (Paste). Select an object and do a Ctrl+D (Duplicate). Not only are you reducing the number of key presses, the Duplicate shortcut also aligns every newly duplicated object to create a pattern of equidistant objects

2. Pasted a image and it has an irritating white background?. Relax. You can clear the white background by double-clicking on the image, go to Colors on the menu bar, select "Set Transparent Color" and click anywhere on the white background of the image

3. Do you have a table that does not fit into a single slide? Do not fret. You can divide the table across two slides and use a cool slide transition to let the audience know that the table is continuing over two slides. Select the first slide and Go to Transitions on the menu bar and select "Push" as the transition.

4. Need to fit the letters of a word onto a curve? Type your word. Select the text box. Go to Format on the menu bar. Select Text Effects --> Transform --> Follow Path and the needed layout.

5. You are projecting your slides and you suddenly need the audience to stop looking at the slides and focus on you the speaker. You can press "B" or "W" on the keyboard to produce a Black or White background respectively. This works only in Presentation mode.

What are theTypes of Triggers ?

This section describes the different types of triggers:

Row Triggers

A row trigger is fired each time the table is affected by the triggering statement. For example, if an UPDATE statement updates multiple rows of a table, a row trigger is fired once for each row affected by the UPDATE statement. If a triggering statement affects no rows, a row trigger is not run.

BEFORE and AFTER Triggers

When defining a trigger, you can specify the trigger timing--whether the trigger action is to be run before or after the triggering statement. BEFORE and AFTER apply to both statement and row triggers.

BEFORE and AFTER triggers fired by DML statements can be defined only on tables, not on views.

Difference between Trigger and Procedure

Triggers	Stored Procedures
In trigger no need to execute manually. Triggers will be fired automatically. Triggers that run implicitly when an INSERT, UPDATE, or DELETE statement is issued against the associated table.	Where as in procedure we need to execute manually.

Differences between stored procedure and functions

Stored Procedure	Functions
Stored procedure may or may not return values.	Function should return at least one output parameter. Can return more than one parameter using OUT argument.
Stored procedure can be used to solve the business logic.	Function can be used to calculations
Stored procedure is a pre-compiled statement.	But function is not a pre-compiled statement.
Stored procedure accepts more than one argument.	Whereas function does not accept arguments.
Stored procedures are mainly used to process the tasks.	Functions are mainly used to compute values
Cannot be invoked from SQL statements. E.g. SELECT	Can be invoked form SQL statements e.g. SELECT
Can affect the state of database using commit.	Cannot affect the state of database.
Stored as a pseudo-code in database i.e. compiled form.	Parsed and compiled at runtime.