Google
 

Monday, May 11, 2009

Comparing Files from Different Folders Using SQL Server and XML

Problem
Sometimes there is a need to process files in a folder, but first you need to determine which files need to be processed compared to older files that have already been processed. There are several ways that this can be done, but in this tip I show you a way this can be done using SQL Server and XML.
To further illustrate the issue I have set this up into three folders:
Source - files to be processed
Destination - files that have been processed
NewFiles - holding folder for new files to be processed after they have been compared
Elaborating the problem in depth, let’s say I have three text files (or any type of file) in the Source folder and the Destination folder contains two files. The problem is to identify differences between the files in the Source and Destination folders and copy all the modified or new files from the Source Folder to the folder NewFiles.
Shown below are the set of files with their modified date in the Source folder:
Shown below are the set of files with their modified date in Destination folder:
Looking at the images above, it is clear that Source Folder contains one extra file ‘text3’ and one modified file ‘text2’.
I want to compare the files in these two folders and copy the modified file (text2) and the new file (text3) into folder NewFiles.
SolutionMy first task was to get all the files from the specified folder. This can be achieved via the ‘DIR’ command of DOS. DOS commands can be executed in SQL Server through xp_cmdshell system stored procedure, but it needs to be configured first.
The below code creates a new stored procedure called "MakeXML" which will create a XML document of all of the files in a folder. The code does a few things which you can either turn on or off as desired.
It first drops stored procedure MakeXML so it can create a new sp with this name
It enables xp_cmshell so we can shell out to DOS to get the folder listing
It does a DIR command against the specified folder with the /O-D parameter to order the list by modified date
It then inserts the data into a temporary table
Deletes any rows from this table that are not files that we want to compare
Parses the modified date
Parses the filename
Then it creates this data as a temporary XML object
IF OBJECT_ID (N'MakeXML') IS NOT NULL DROP PROCEDURE MakeXML GO -------INPUT THE FILEPATH FOLDER AND IT WILL RETURN AN XML CREATE PROCEDURE MakeXML @FilePath VARCHAR(2000), @xmlFile XML OUTPUT AS DECLARE @myfiles TABLE (MyID INT IDENTITY(1,1) PRIMARY KEY,FullPath VARCHAR(2000),ModifiedDate datetime, FileName VARCHAR(100)) DECLARE @CommandLine VARCHAR(4000) ----Configure xp_cmdshell in 2005 -- To allow advanced options to be changed. EXEC sp_configure 'show advanced options', 1 -- To update the currently configured value for advanced options. RECONFIGURE -- To enable the feature. EXEC sp_configure 'xp_cmdshell', 1 -- To update the currently configured value for this feature. RECONFIGURE ----- run dir command on the path specified by ordering on the date modified SELECT @CommandLine =LEFT('dir "' + @FilePath + '" /O-D',10000) ------use xp_cmdshell stored procedure to run dos commands INSERT INTO @MyFiles (FullPath) EXECUTE xp_cmdshell @CommandLine ------Delete all the files that are other than .txt or can specify the file type you want to compare on DELETE FROM @MyFiles WHERE fullpath IS NULL OR Fullpath NOT LIKE '%.txt%' ------Separate the modified date from the fullpath UPDATE @myfiles SET ModifiedDate=SUBSTRING(fullpath,1,PATINDEX('%[AP]M%',fullpath)+1) ------Separate out FileName from fullpath UPDATE @myfiles SET FileName=SUBSTRING(fullpath,40,LEN(fullpath)) -----Make an xml out of separated ModifiedDate,FileName columns and SET the OUTPUT parameter SELECT @xmlFile=(SELECT ModifiedDate,FileName FROM @MyFiles FOR XML PATH('FILE'), ROOT('ROOT') ) GO
For our example the stored procedure would return XML for the Source folder ordered on modified date as shown below.
For our example the stored procedure would return XML for the Destination folder ordered on modified date as shown below.
Now, the next task is to compare the two XML documents from Source and Destination, this is achieved by the following stored procedure.
It takes three input parameters
@Sourcepath which is ‘D:\Source’
@DestPath is ‘D:\Destination’ and
@NewFilesPath as ‘D:\NewFiles’
IF OBJECT_ID (N'CompareXML') IS NOT NULL DROP PROCEDURE CompareXML GO --------COMPARE XML CREATE PROCEDURE CompareXML @SourcePath VARCHAR(255), @DestPath VARCHAR(255), @NewFilesPath VARCHAR(255) AS DECLARE @SourceXML AS XML,@DestXML AS XML ---Call the MakeXML sp to return the XML for Source and Destination Folders EXECUTE MakeXML @SourcePath,@SourceXML OUTPUT EXECUTE MakeXML @DestPath,@DestXML OUTPUT DECLARE @CURSOR_File AS CURSOR ---------Join both the tables retrieved from xml on filename and get all the files having modifieddate mismatch ---------Fetch all the values into a cursor SET @CURSOR_File= CURSOR FOR SELECT A.FNAME FROM ( SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , N.value('FileName[1]','varchar(100)') AS FNAME FROM @SourceXML.nodes('ROOT/FILE') AS X(N) )A INNER JOIN ( SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , N.value('FileName[1]','varchar(100)') AS FNAME FROM @DestXML.nodes('ROOT/FILE') AS X(N) )B ON A.FNAME=B.FNAME AND A.MDATE<>B.MDATE UNION ----------Fetch all the extra files in the source folder SELECT A.FNAME FROM ( SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , N.value('FileName[1]','varchar(100)') AS FNAME FROM @SourceXML.nodes('ROOT/FILE') AS X(N) )A LEFT OUTER JOIN ( SELECT N.value('ModifiedDate[1]','varchar(100)') AS MDATE , N.value('FileName[1]','varchar(100)') AS FNAME FROM @DestXML.nodes('ROOT/FILE') AS X(N) )B ON A.FNAME=B.FNAME WHERE b.fname IS NULL DECLARE @file AS VARCHAR(255) DECLARE @Command AS VARCHAR(500) --------First copy all the files found mismatched on modified date into @NewFilesPath OPEN @CURSOR_File FETCH NEXT FROM @CURSOR_File INTO @file WHILE @@Fetch_Status=0 BEGIN SET @Command='xcopy "'+@SourcePath+'\'+ @file+ '" "' + @NewFilesPath + '"' EXEC xp_cmdshell @Command,no_output FETCH NEXT FROM @CURSOR_File INTO @file END CLOSE @CURSOR_File DEALLOCATE @CURSOR_File GO
The CompareXML stored procedure above will first call the ‘MakeXML’ sp and get the source and destination xml. It will fetch all the values from the nodes of both the XML documents and make two types of comparison on those.
First the comparison would be on the basis of filename having different modified date values.
The second comparison will fetch all the files from Source folder that do not exist in Destination folder.
The output of this is put into a Cursor so that such files are then copied to the ‘NewFiles’ folder one by one using the DOS xcopy command.
After the two stored procedures have been created the code is executed as follows:
EXECUTE CompareXML @SourcePath='D:\Source', @DestPath='D:\Destination', @NewFilesPath='D:\NewFiles'
Based on our example the "NewFiles" folder will now look like this.
The NewFiles folder has the same set of files which had a different modified date and the file which was extra in the Source folder. Now, just copy the files from this NewFiles folder to Destination folder.
The problem of comparing files based on modified date and filename has been drastically reduced compared to doing it manually.

No comments: