Monday, September 8, 2014


1. Sampling is to select a random subset of given set of data. Suppose we have 1000 rows in our table. And we have to select any 10 rows randomly, then we use sampling transformation.

                In SSIS, we can do sampling in 2 ways i.e.  Row Sampling and Percentage Sampling.

In this BOK, we are going to see how to use Row Sampling transformation in SSIS Packaging. Row sampling is used to randomly select some rows and move it as output to the required process as and when required to divide the rows. Example of using this transformation is say example if we want to select some randomly 10 users of a community for a random prize then we can use this transformation. In this process we are going to see an example on how to use this process for the selected output values.

 

Steps:

First select the right project to work on integration services project. Now once the projects is opened drag and drop a source and a row sampling task as shown in the screen below


 

Now we can see the red mark on to the control which indicates that the controls are yet to be configured. Now configure the source connection with selecting any table of your choice.  Now your output will look like below


 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now we need to configure the Row Sampling task, double click the task will open the screen as below –


 

Here we can specify the number of random rows can be selected and the name for the Sample output selected and unselected. Since in our example we are going to see the Selected Output we will see on how to use it. Now after specifying the number of rows just clicks on OK button. Now drag and drop the Flat File Destination as shown in the below screen –

 


Now drag the green line from the Row Sampling to the Flat File destination task. It will open a window as shown in the below screen


 

Here we have selected the Selected output as the Output to get the data as we can see in the screen above. We can also select only that columns that we want . We don’t need to fetch all the columns if not required as below in columns tab of row sampling transformation-

 

 

 

 

 

 

 

 

 

 

Now click on OK to do the further configuration of the destination as shown in the below screen –


 

 

 

 

 

 

 

 

 

 

 

 

Once the configuration is done we can see the screen as below-


 

 

 

 

 

 

 

 

Now our package is ready to execute. Press F5 and execute the project we can see the screen as below which indicates that the package is executed successfully –




 

We can see only 10 rows are randomly selected and passed to the destination. We can see the output by opening the file which we gave in the destination configuration section. You can see the result as shown in the screen below –


 


Conclusion: So in this article we have seen how to use the Row Sampling to execute dataset and split based on the number of rows and uses it across the requirement.
2. Sampling is to select a random subset of given set of data. Suppose we have 1000 rows in our table. And we have to select any 10 rows randomly, then we use sampling transformation.
                In SSIS, we can do sampling in 2 ways i.e.  Row Sampling and Percentage Sampling.
In this BOK, we are going to see how to use Percentage Sampling transformation in SSIS Packaging. Percentage sampling is used to randomly select some rows and move it as output to the required process as and when required to divide the rows. Example of using this transformation is say example if we want to select some randomly 20% users of a community for a random prize then we can use this transformation. In this process we are going to see an example on how to use this process for the selected output values.
 
Steps:
First select the right project to work on integration services project. Now once the projects is opened drag and drop a source, a destination and a percentage sampling task as shown in the screen below

We can see some red marks on each task which indicates that the tasks are not configured. We need to configure each task so that while execution we can have a smooth process. Now let’s configure each and every task to execute the package. First let us start with the OLEDB Source as shown in the screen below –
 
 
 
Now we are done with the source, we need to configure the percentage sampling task now. To do that double click on the task will open the window as shown in the screen below. Here we need to specify the percentage of rows to be affected in this transformation and to proceed further. In our sample we are going to select as 40 as shown in the screen below – 
 
 
 


 
 
 
 
 
 
 
 
 
 
 
Now we are done with the Percentage sampling task, we need to configure the destination section where the results are expected. To do that drag and drop the green arrow to the destination task which we created earlier. It will open a configuration window to select the output name from the percentage sampling task as shown in the screen below
 
Now we need to configure the destination excel as shown in the screens below which is self-explanatory.
 
 
Now we are ready with our package. We need to build and execute it to see the desired result. So our screen will look like below
 

 
Now to build and execute press F5 and we can see the result window as shown in the screen below
 

 
Here you can see that out of 29 rows, 40% rows that are approximate to 12 rows randomly have been sent to excel destination.
 
We can see the number of rows affected and used across. To see the result in the excel navigate to the path where we configured our destination and open the excel, we can see the result as shown in the screen below
 
 
Conclusion: So in this article we have seen how to use the Percentage Sampling to execute data set and split based on the percent and uses it across the requirement