Search This Blog

Monday, September 30, 2013

Data Mining: What is Data Mining?

Overview

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Continuous Innovation

Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost.

Example

For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.

Thursday, September 26, 2013

Import data from Excel to SQLServer

Many times we come across a situation where we have readily available excel sheets having enormous data that needs to be imported in database system for enhanced querying capabilities or as a backend datasource for some sort of software development.

We could achieve this goal either by copy-pasting the excel rows directly to the destined database table in SQL Server Management Studio or by querying directly on the excel sheet from SQL Server Management Studio itself!

The former way (copy-pasting) looks pretty simple but consider a situation where you are having 10K records in a single excel sheet and wanted to fetch some selective/filtered records only! At that time, the earlier way is much efficient and ideal.

I've created this sample excel file to import/retrieve the data in SQL server, and is also the file I will be using in this article. Here is the screenshot for the same.

Excel sheet - Cars

OPENROWSET() to query excel files

Here, I've used the OPENROWSET() function to query excel files. This is a T-SQL function that can be used to access any OLE DB data source. All you need is the right OLE DB driver.

The following are the queries to retrieve data from Excel files.

--Excel 2007-2010
SELECT * --INTO #Cars
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0 Xml;HDR=YES;Database=C:\cars.xlsx','SELECT * FROM [cars$]');

--Excel 97-2003
SELECT * --INTO #Cars
FROM OPENROWSET('Microsoft.ACE.OLEDB.4.0',
'Excel 8.0;HDR=YES;Database=C:\cars.xls','SELECT * FROM [cars$]');

SQL Server blocked access to STATEMENT OpenRowset OpenDatasource of component Ad Hoc Distributed Queries



Error :


SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', see "Surface Area Configuration" in SQL Server Books Online.

Solution :


Solution for this common error in Sql Server is to enable the OpenRowset in Sql Server. We can do it by enabling theShowAdvanced Options and Ad hoc Distributed Queries. this error also appears when you execute some Adhoc queries in Sql Server Query analyzer. If you enable this for the first time, The further statements using this option will go fine without flaw.

To enable the ShowAdvanced Options, Please follow/use the below TSQL statements



SP_CONFIGURE 'SHOW ADVANCED OPTIONS', 1 
GO 
RECONFIGURE 
GO



After executing the above staement if you try to execute or use OpenRowSet or any TSQL commands that requires the Ad Hoc Queries option to be enabled, you would get the below error

Msg 15281, Level 16, State 1, Line 1 SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', see "Surface Area Configuration" in SQL Server Books Online.

We can get Rid of the above error using the below TSQL commands


SP_CONFIGURE 'Ad Hoc Distributed Queries', 1 
GO 
RECONFIGURE 
GO 



Once the above TSQL statements are getting executed, you would be able to use the OpenRowSet Statement successfully. To Enable these two options you must have the sysadmin role. If you do not have this role, please contact your database administrator for doing this operation

Monday, September 16, 2013

Understanding SQL Server Configuration Manager

By , 15 Sep 2013 on codeproject

Introduction
SQL Server Configuration Manager is a tool to configure the network protocols used by the SQL Server, and to manage network connectivity.
We will find SQL Server Configuration Manager API in the below path:
Start -> Microsoft SQL Server 2008 R2 -> Configuration Tools -> SQL Server Configuration Manager
SQL Server Configuration Manager gets information for WMI (Windows Management Instrumentation) Provider which takes information for “SQLServerManager10.msc” which is located in “C:\Windows\System32\” path. Generally SQL Server Configuration Manager will use to Start, Stop, resume and Configure services of SQL Server.

Left pane has list of configurations we can do for SQL Server.
  • SQL Server Services
  • SQL Server Network Configuration
  • SQL Server Native Client Configuration

Find Which Column in Which Table Updated from logfile

Introduction

Sometimes, database developers need to work with a database in which they don't have any permission to add any procedure or trigger, because that database is for someone else but they need change tracking. In this tip, I will show a solution for this problem.

Background

First of all, I want to appreciate Muhammad Imran for the great information on his site:http://raresql.com/2012/02/01/how-to-recover-modified-records-from-sql-server-part-1.
Then, as a brief history and description: In SQLServer, the log file (LDF) is a Microsoft solution for implementing transactions, and rollback action uses the log file for rolling back, so we can use the LDF file to track data.

Using the Code

SQL Server has undocumented T-function sys.fn_dblog that is used for reading log file and procedure DBCC PAGE that is used for details of data page like allocation_UnitIdFieldsValuesSlotId.
Create PROCEDURE Find_UpdatedColumn_In_Table
@Database_Name NVARCHAR(MAX),
@SchemaName_n_TableName NVARCHAR(MAX),
@Date_From datetime='1900/01/01',
@Date_To datetime ='9999/12/31',
@FeildName nvarchar(max)
AS
begin
DECLARE @parms nvarchar(1024)
DECLARE @Fileid INT
DECLARE @Pageid INT
DECLARE @Slotid INT

DECLARE @ConsolidatedPageID VARCHAR(MAX)
Declare @AllocUnitID as bigint
Declare @TransactionID as VARCHAR(MAX)
Declare @Operation as VARCHAR(MAX)
Declare @DatabaseCollation VARCHAR(MAX)

/*  Pick The actual data
*/
declare @temppagedata table 
(
[ParentObject] sysname,
[Object] sysname,
[Field] sysname,
[Value] sysname)

declare @pagedata table 
(
[Page ID] sysname,
[AllocUnitId] bigint,
[ParentObject] sysname,
[Object] sysname,
[Field] sysname,
[Value] sysname)

    DECLARE Page_Data_Cursor CURSOR FOR 
    /*We need to filter LOP_MODIFY_ROW,LOP_MODIFY_COLUMNS from log 
    for modified records & Get its Slot No, Page ID & AllocUnit ID*/
    SELECT [PAGE ID],[Slot ID],[AllocUnitId]
    FROM    sys.fn_dblog(NULL, NULL)  
    WHERE    
    AllocUnitId IN 
    (Select [Allocation_unit_id] from sys.allocation_units allocunits
    INNER JOIN sys.partitions partitions ON (allocunits.type IN (1, 3)  
    AND partitions.hobt_id = allocunits.container_id) OR (allocunits.type = 2 
    AND partitions.partition_id = allocunits.container_id)  
    Where object_id=object_ID('' + @SchemaName_n_TableName + ''))
    AND Operation IN ('LOP_MODIFY_ROW','LOP_MODIFY_COLUMNS')  _
    AND [Context] IN ('LCX_HEAP','LCX_CLUSTERED')
    /*Use this subquery to filter the date*/

    AND [TRANSACTION ID] IN (SELECT DISTINCT _
    [TRANSACTION ID] FROM    sys.fn_dblog(NULL, NULL) 
    WHERE Context IN ('LCX_NULL') AND Operation in ('LOP_BEGIN_XACT')  
    --AND [Transaction Name]='UPDATE'
    AND  CONVERT(NVARCHAR(11),[Begin Time]) BETWEEN @Date_From AND @Date_To)

    /****************************************/

    GROUP BY [PAGE ID],[Slot ID],[AllocUnitId]
    ORDER BY [Slot ID]    
 
    OPEN Page_Data_Cursor

    FETCH NEXT FROM Page_Data_Cursor INTO @ConsolidatedPageID, @Slotid,@AllocUnitID

    WHILE @@FETCH_STATUS = 0
    BEGIN
        DECLARE @hex_pageid AS VARCHAR(Max)
        /*Page ID contains File Number and page number It looks like 0001:00000130.
          In this example 0001 is file Number &  
          00000130 is Page Number & These numbers are in Hex format*/
        SET @Fileid=SUBSTRING(@ConsolidatedPageID,0,CHARINDEX_
        (':',@ConsolidatedPageID)) -- Separate File ID from Page ID
        SET @hex_pageid ='0x'+ SUBSTRING(@ConsolidatedPageID,CHARINDEX_
        (':',@ConsolidatedPageID)+1,Len(@ConsolidatedPageID))  ---Separate the page ID
       
        SELECT @Pageid=Convert(INT,cast('' AS XML).value('xs:hexBinary_
        (substring(sql:variable("@hex_pageid"),sql:column("t.pos")) )', _
        'varbinary(max)')) -- Convert Page ID from hex to integer
        FROM (SELECT CASE substring(@hex_pageid, 1, 2) WHEN '0x' THEN 3 ELSE 0 END) AS t(pos) 
                    
        DELETE @temppagedata
        -- Now we need to get the actual data (After modification) from the page
        INSERT INTO @temppagedata EXEC( 'DBCC PAGE(' + @DataBase_Name + ', ' + _
        @fileid + ', ' + @pageid + ', 3) with tableresults,no_infomsgs;'); 
        -- Add Page Number and allocUnit ID in data to identity which one page it belongs to.                    
        INSERT INTO @pagedata SELECT @ConsolidatedPageID,@AllocUnitID,_
        [ParentObject],[Object],[Field] ,[Value] FROM @temppagedata

        FETCH NEXT FROM Page_Data_Cursor INTO  @ConsolidatedPageID, @Slotid,@AllocUnitID
    END

CLOSE Page_Data_Cursor
DEALLOCATE Page_Data_Cursor
DECLARE @Newhexstring VARCHAR(MAX);

--select * from  @PageData
--select * from  @temppagedata
DECLARE @ModifiedRawData TABLE
(
  [ID] INT IDENTITY(1,1),
  [PAGE ID] VARCHAR(MAX),
  [Slot ID] INT,
  [AllocUnitId] BIGINT,
  [RowLog Contents 0_var] VARCHAR(MAX),
  [RowLog Contents 0] VARBINARY(8000)
)
--The modified data is in multiple rows in the page, 
--so we need to convert it into one row as a single hex value.
--This hex value is in string format
INSERT INTO @ModifiedRawData ([PAGE ID],[Slot ID],[AllocUnitId]
,[RowLog Contents 0_var])
SELECT B.[PAGE ID],A.[Slot ID],A.[AllocUnitId]
,
(
SELECT REPLACE(STUFF((SELECT REPLACE(SUBSTRING([VALUE],CHARINDEX(':',[Value])+1,48),'†','')
FROM @pagedata C  
WHERE B.[Page ID]= C.[Page ID] And A.[Slot ID] =LTRIM(RTRIM_
(SUBSTRING(C.[ParentObject],5,3))) And [Object] Like '%Memory Dump%' 
Group By [Value] FOR XML PATH('') ),1,1,'') ,' ','')
) AS [Value]

FROM sys.fn_dblog(NULL, NULL) A
INNER JOIN @pagedata B On A.[PAGE ID]=B.[PAGE ID]
AND A.[AllocUnitId]=B.[AllocUnitId]
AND A.[Slot ID] =LTRIM(RTRIM(Substring(B.[ParentObject],5,3)))
AND B.[Object] Like '%Memory Dump%'
WHERE A.AllocUnitId IN 
(Select [Allocation_unit_id] from sys.allocation_units allocunits
INNER JOIN sys.partitions partitions ON (allocunits.type IN (1, 3)
AND partitions.hobt_id = allocunits.container_id) OR (allocunits.type = 2 
AND partitions.partition_id = allocunits.container_id)  
Where object_id=object_ID('' + @SchemaName_n_TableName + ''))
AND Operation in ('LOP_MODIFY_ROW','LOP_MODIFY_COLUMNS') _
AND [Context] IN ('LCX_HEAP','LCX_CLUSTERED')
/*Use this subquery to filter the date*/

AND [TRANSACTION ID] IN (Select DISTINCT [TRANSACTION ID] FROM sys.fn_dblog(NULL, NULL) 
Where Context IN ('LCX_NULL') AND Operation IN ('LOP_BEGIN_XACT')  
AND [Transaction Name]='UPDATE'
AND  CONVERT(NVARCHAR(11),[Begin Time]) BETWEEN @Date_From AND @Date_To)

/****************************************/
GROUP BY B.[PAGE ID],A.[Slot ID],A.[AllocUnitId]--,[Transaction ID]
ORDER BY [Slot ID]

-- Convert the hex value data in string, convert it into Hex value as well. 
UPDATE @ModifiedRawData  SET [RowLog Contents 0] = cast('' AS XML).value_
('xs:hexBinary(substring(sql:column("[RowLog Contents 0_var]"), 0) )', 'varbinary(max)')
FROM @ModifiedRawData

-- 
select Field, Value  from @temppagedata t
inner join @ModifiedRawData m on LTRIM(RTRIM(SUBSTRING_
(t.[ParentObject],5,3)))  = cast(m.[Slot ID] as nvarchar(1000))
where t.Field =  @FeildName

end

EXEC Find_UpdatedColumn_In_Table  'Test','dbo.Test','2000/01/01','9999/12/31',N'id'

Note

Something to note is that this proc just works with the current log, and if we need to work with log back up, we need to restore.
By 13 Sep 2013