Monday, May 11, 2009
Comparing Files from Different Folders Using SQL Server and XML
Sometimes there is a need to process files in a folder, but first you need to determine which files need to be processed compared to older files that have already been processed. There are several ways that this can be done, but in this tip I show you a way this can be done using SQL Server and XML.
To further illustrate the issue I have set this up into three folders:
Source - files to be processed
Destination - files that have been processed
NewFiles - holding folder for new files to be processed after they have been compared
Elaborating the problem in depth, let’s say I have three text files (or any type of file) in the Source folder and the Destination folder contains two files. The problem is to identify differences between the files in the Source and Destination folders and copy all the modified or new files from the Source Folder to the folder NewFiles.
Shown below are the set of files with their modified date in the Source folder:
Shown below are the set of files with their modified date in Destination folder:
Looking at the images above, it is clear that Source Folder contains one extra file ‘text3’ and one modified file ‘text2’.
I want to compare the files in these two folders and copy the modified file (text2) and the new file (text3) into folder NewFiles.
SolutionMy first task was to get all the files from the specified folder. This can be achieved via the ‘DIR’ command of DOS. DOS commands can be executed in SQL Server through xp_cmdshell system stored procedure, but it needs to be configured first.
The below code creates a new stored procedure called "MakeXML" which will create a XML document of all of the files in a folder. The code does a few things which you can either turn on or off as desired.
It first drops stored procedure MakeXML so it can create a new sp with this name
It enables xp_cmshell so we can shell out to DOS to get the folder listing
It does a DIR command against the specified folder with the /O-D parameter to order the list by modified date
It then inserts the data into a temporary table
Deletes any rows from this table that are not files that we want to compare
Parses the modified date
Parses the filename
Then it creates this data as a temporary XML object
IF OBJECT_ID (N'MakeXML') IS NOT NULL DROP PROCEDURE MakeXML GO -------INPUT THE FILEPATH FOLDER AND IT WILL RETURN AN XML CREATE PROCEDURE MakeXML @FilePath VARCHAR(2000), @xmlFile XML OUTPUT AS DECLARE @myfiles TABLE (MyID INT IDENTITY(1,1) PRIMARY KEY,FullPath VARCHAR(2000),ModifiedDate datetime, FileName VARCHAR(100)) DECLARE @CommandLine VARCHAR(4000) ----Configure xp_cmdshell in 2005 -- To allow advanced options to be changed. EXEC sp_configure 'show advanced options', 1 -- To update the currently configured value for advanced options. RECONFIGURE -- To enable the feature. EXEC sp_configure 'xp_cmdshell', 1 -- To update the currently configured value for this feature. RECONFIGURE ----- run dir command on the path specified by ordering on the date modified SELECT @CommandLine =LEFT('dir "' + @FilePath + '" /O-D',10000) ------use xp_cmdshell stored procedure to run dos commands INSERT INTO @MyFiles (FullPath) EXECUTE xp_cmdshell @CommandLine ------Delete all the files that are other than .txt or can specify the file type you want to compare on DELETE FROM @MyFiles WHERE fullpath IS NULL OR Fullpath NOT LIKE '%.txt%' ------Separate the modified date from the fullpath UPDATE @myfiles SET ModifiedDate=SUBSTRING(fullpath,1,PATINDEX('%[AP]M%',fullpath)+1) ------Separate out FileName from fullpath UPDATE @myfiles SET FileName=SUBSTRING(fullpath,40,LEN(fullpath)) -----Make an xml out of separated ModifiedDate,FileName columns and SET the OUTPUT parameter SELECT @xmlFile=(SELECT ModifiedDate,FileName FROM @MyFiles FOR XML PATH('FILE'), ROOT('ROOT') ) GO
For our example the stored procedure would return XML for the Source folder ordered on modified date as shown below.
For our example the stored procedure would return XML for the Destination folder ordered on modified date as shown below.
Now, the next task is to compare the two XML documents from Source and Destination, this is achieved by the following stored procedure.
It takes three input parameters
@Sourcepath which is ‘D:\Source’
@DestPath is ‘D:\Destination’ and
@NewFilesPath as ‘D:\NewFiles’
IF OBJECT_ID (N'CompareXML') IS NOT NULL DROP PROCEDURE CompareXML GO --------COMPARE XML CREATE PROCEDURE CompareXML @SourcePath VARCHAR(255), @DestPath VARCHAR(255), @NewFilesPath VARCHAR(255) AS DECLARE @SourceXML AS XML,@DestXML AS XML ---Call the MakeXML sp to return the XML for Source and Destination Folders EXECUTE MakeXML @SourcePath,@SourceXML OUTPUT EXECUTE MakeXML @DestPath,@DestXML OUTPUT DECLARE @CURSOR_File AS CURSOR ---------Join both the tables retrieved from xml on filename and get all the files having modifieddate mismatch ---------Fetch all the values into a cursor SET @CURSOR_File= CURSOR FOR SELECT A.FNAME FROM ( SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , N.value('FileName[1]','varchar(100)') AS FNAME FROM @SourceXML.nodes('ROOT/FILE') AS X(N) )A INNER JOIN ( SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , N.value('FileName[1]','varchar(100)') AS FNAME FROM @DestXML.nodes('ROOT/FILE') AS X(N) )B ON A.FNAME=B.FNAME AND A.MDATE<>B.MDATE UNION ----------Fetch all the extra files in the source folder SELECT A.FNAME FROM ( SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , N.value('FileName[1]','varchar(100)') AS FNAME FROM @SourceXML.nodes('ROOT/FILE') AS X(N) )A LEFT OUTER JOIN ( SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , N.value('FileName[1]','varchar(100)') AS FNAME FROM @DestXML.nodes('ROOT/FILE') AS X(N) )B ON A.FNAME=B.FNAME WHERE b.fname IS NULL DECLARE @file AS VARCHAR(255) DECLARE @Command AS VARCHAR(500) --------First copy all the files found mismatched on modified date into @NewFilesPath OPEN @CURSOR_File FETCH NEXT FROM @CURSOR_File INTO @file WHILE @@Fetch_Status=0 BEGIN SET @Command='xcopy "'+@SourcePath+'\'+ @file+ '" "' + @NewFilesPath + '"' EXEC xp_cmdshell @Command,no_output FETCH NEXT FROM @CURSOR_File INTO @file END CLOSE @CURSOR_File DEALLOCATE @CURSOR_File GO
The CompareXML stored procedure above will first call the ‘MakeXML’ sp and get the source and destination xml. It will fetch all the values from the nodes of both the XML documents and make two types of comparison on those.
First the comparison would be on the basis of filename having different modified date values.
The second comparison will fetch all the files from Source folder that do not exist in Destination folder.
The output of this is put into a Cursor so that such files are then copied to the ‘NewFiles’ folder one by one using the DOS xcopy command.
After the two stored procedures have been created the code is executed as follows:
EXECUTE CompareXML @SourcePath='D:\Source', @DestPath='D:\Destination', @NewFilesPath='D:\NewFiles'
Based on our example the "NewFiles" folder will now look like this.
The NewFiles folder has the same set of files which had a different modified date and the file which was extra in the Source folder. Now, just copy the files from this NewFiles folder to Destination folder.
The problem of comparing files based on modified date and filename has been drastically reduced compared to doing it manually.
Friday, May 8, 2009
How to Export data from SQL Server to Excel
Problem
Exporting data from SQL Server to Excel seems like a reasonably simple request.  I just need to write out a few reports for users on a regular basis, nothing too fancy, the same basic report with a few different parameters.  What native SQL Server options are available to do so?  Do I need to learn another tool or can I use some T-SQL commands?  Does SQL Server 2005 offer any new options to enhance this process?
Solution
Exporting data from SQL Server to Excel can be achieved in a variety of ways.  Some of these options include Data Transformation Services (DTS), SQL Server Integration Services (SSIS) and Bulk Copy (BCP).  Data Transformation Services (SQL Server 2000) and SQL Server Integration Services (SQL Server 2005) offers a GUI where widgets can be dragged and dropped Each option has advantages and disadvantages, but all can do the job.  It is just a matter of your comfort level with the tools and the best solution to meet the need.  
Another option that is available directly via the T-SQL language is the OPENROWSET command (SQL Server 2000 and SQL Server 2005). This command can be called directly in any stored procedure, script or SQL Server Job from T-SQL. Below outlines the full syntax available:
Source - SQL Server 2005 Books Online
Below is a simple example of writing out the Job name and date to Sheet1 of an Excel spreadsheet in either SQL Server 2005 or 2000:
| INSERT INTO OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=C:\testing.xls;', 'SELECT Name, Date FROM [Sheet1$]') SELECT [Name], GETDATE() FROM msdb.dbo.sysjobs GO | 
Using the OPENROWSET command creates two caveats. The first caveat is the need to have an Excel spreadsheet serve as a template in the needed directory with the correct worksheets and columns. Without this the you would receive an error message. The second caveat is that it is necessary to enable the OPENROWSET command with the SQL Server 2005 Surface Area Configuration utility. Note - This step is not needed for SQL Server 2000. With the loop example you could copy and paste the Excel spreadsheet and load the data as needed.
Although the example above is very simple, you could integrate this simple example into your code. For example, if you had the need to write out a number of reports in the same format, you could loop (WHILE loop or Cursor) over the records and write them out to different Excel spreadsheets based on the name or report type. In addition, you could integrate this code with either SQL Server 2000 mail or Database mail (SQL Server 2005) and mail the results to the users with very little effort and build the process entirely with T-SQL.

 
