Careful File Copy

This area of the site was getting really chatty, so I've removed it from the Software book, and moved it under DIY notes. The remainder of the experience will be put on the blog. Consider all the material on this page obsolete.

Prologue

I was asked to help migrate a large batch of ArcMac GIS files to a new server. The problems: the files contain references to other files, and all those files must also be copied. Also, these files are mixed in with other files, not pertinent to GIS, on a single server. To manage growth, it's necessary to move the GIS files out.

Also, it's not a simple file copy. ArcMap can store the references in absolute or relative form. At this office they were stored as absolute paths, because that's more reliable. Thus, it's necessary to flip this bit to "relative," copy it over, and then re-flip it back to "absolute."

Due to the large number of files, and the slow speed of ArcMap, I decided to try and script the process. This sub-site details some of what I've learned in the process.

For a little more info: http://www.acadweb.wwu.edu/gis/tutorials/ArcMap_File_Mgmt.htm

Part 1

These are the product of the first iteration of this project. It succeeded in some ways, but failed in other ways. It succeded in processing around 150 files before it crashed and failed to process the remaning 1,400 or so.

I concluded that a longer-term project was feasible, due to the tedious and slow nature of this task. (That is, the tedium of copying files exceeded the tedium of reading hundreds of pages of VB help docs, which use "enterprise"-style code examples that usually don't do anything useful, as presented.)

The code

Taken together, these scripts form an almost-functional system. Some of the scripts are installed in an Excel document, and others are installed into the Normal.mxt template in ArcMap. The perl file copier script should be run as a scheduled task. It runs under ActivePerl.

The big problems, so far: VBA doesn't handle OLE server timeouts well; ArcMap chokes on some files; the scripts use the IMxDocument interface instead of the IMapDocument, which might be faster; the scripts pause for one minute while the mxd file loads, instead of polling the app to see if the file is loaded.

The small problems, so far: it'd be better to have the file copier written in VBA; the file format for the manifest files (generated by a script from the esri dev site) should be written for computer processing; using Excel as the process db is kind of cheesy.

Being a noob, I didn't realize that Microsoft's idea of "Automation" was not very thorough. OLE automation, as implemented with Excel and ArcMap, isn't stable enough to do real batch processing. With VB (not VBA) driving ArcMap, I suspect it's possible, but ArcMap will still not provide good error handling.

The code below has the following interesting features:

References

The LayerSourceArray code from the ESRI dev site.

http://search.cpan.org/~jdb/libwin32-0.26/OLE/lib/Win32/OLE.pm

about using multiple interfaces via scripting (you don't)

http://support.microsoft.com/kb/193247/en-us?spid=2513&sid=946

http://www.microsoft.com/technet/scriptcenter/guide/sas_wmi_jgfx.mspx?mfr=true

AttachmentSize
sapphos.bas.txt1.29 KB
filecopy.pl.txt6.1 KB
FileBatcher.cls.txt2.82 KB
FileSystemScanner.cls.txt1.56 KB
main.bas.txt4.84 KB

More VBA Sample Code

Here's some more code to use.


Sub test()
    Dim pDoc As IDocument
    Dim pApp As IApplication
    
    Set pDoc = New MxDocument
    Set pApp = pDoc.Parent
    
    pApp.Visible = True
    pApp.OpenDocument ("G:\1217\1217-014\GISFiles\SEIFiles\ArcGISProjects\FieldTransects2.mxd")
    pApp.RefreshWindow
End Sub

Sub setRelativePaths()
    Dim pMxDoc As IMxDocument
    Set pMxDoc = ThisDocument
    pMxDoc.RelativePaths = True
End Sub


#! perl

use strict;
use Win32::OLE qw(in with);
use Win32::OLE::Const 'ESRI ArcMapUI Object Library';
use Data::Dumper;

# my $class = 'esriCarto.IMapDocument';
# my $class = 'esriArcMap.Application';
# my $class = 'esriFramework.IApplication';
# 'esriArcMapUI.MxDocument'

# print Dumper( Win32::OLE::Const->Load('ESRI ArcMapUI Object Library') );

my $pDoc = Win32::OLE->new( 'esriArcMapUI.MxDocument', 'Shutdown' ); # || die Win32::OLE->LastError()." no $class";

print Dumper( $pDoc );

my $pApp = $pDoc->Parent();
$pApp->{Visible} = 1;

print Dumper( $pApp );

$pApp->Shutdown();

exit;

$pApp->Visible = 1;
$pApp->OpenDocument( '' );



Private Sub test()
Dim pDoc As IDocument
Dim pMxDoc As IMxDocument
Dim pApp As esriFramework.IApplication
Dim pDocDS As IDocumentDatasets
Dim pEnumDS As IEnumDataset
Dim pDS As IDataset
Dim pWS As IWorkspace

    ' get a ref to a new ArcMap application
    Set pDoc = New MxDocument
    Set pApp = pDoc.Parent

    ' Loop thru your .mxd documents here

        ' Open an existing document
        pApp.OpenDocument "c:\MyMap.mxd"
        Set pMxDoc = pApp.Document

        ' Iterate thru the datasets and display details
        Set pDocDS = pMxDoc
        Set pEnumDS = pDocDS.Datasets
        Set pDS = pEnumDS.Next
        While Not pDS Is Nothing
    
            On Error Resume Next
            Set pWS = pDS.Workspace
            If Err.Number = 0 Then
                Debug.Print pDS.Workspace.PathName + " : " + pDS.Name
            Else
                Debug.Print pDS.BrowseName + " : Error with datasource"
            End If
            On Error GoTo 0
        
            Set pDS = pEnumDS.Next
        Wend

    ' End of you loop

    ' Shut down the ArcMap application
    pApp.Shutdown

End Sub


--------------------------------------------------------------

Sub muliplemxds()
  
  Dim sDir As String
  Dim sFile As String
  Dim DocPath As String
    sDir = "C:\Myfolder\TestFolder\"
    sFile = Dir(sDir & "*.mxd", vbNormal)

Do While sFile <> ""
        DocPath = sDir & sFile
        OpenMXDDoc DocPath
              
        sFile = Dir
    Loop

End Sub
Private Sub OpenMXDDoc(sFileName As String)
    On Error Resume Next
    
    Dim pDoc As IMapDocument
    Set pDoc = New MapDocument
     
    pDoc.Open sFileName
    
    
    Documentation pDoc
    
    pDoc.Close
    Set pDoc = Nothing
    
End Sub
Private Sub Documentation(pMxDoc As IMapDocument)
 Dim mapcount As Long, LayerCount As Long, text As String
 text = ""
   Dim pLayer As ILayer
   Dim pFL As IFeatureLayer
   Dim pRL As IRasterLayer
   Dim pFC As IFeatureClass
   Dim pDS As IDataset
   Dim pMap As IMap
    text = text & vbCrLf & pMxDoc.DocumentFilename
   For mapcount = 0 To pMxDoc.mapcount - 1
        Set pMap = pMxDoc.Map(mapcount)
      
            For LayerCount = 0 To pMap.LayerCount - 1
            Set pLayer = pMap.Layer(LayerCount)
            If TypeOf pLayer Is IFeatureLayer Then
              Set pFL = pLayer
              Set pFC = pFL.FeatureClass
              Set pDS = pFC
              text = text & vbCrLf & pFC.AliasName & vbCrLf & pDS.BrowseName & vbCrLf & pDS.Workspace.PathName
            ElseIf TypeOf pLayer Is IRasterLayer Then
              Set pRL = pLayer
              text = text & vbCrLf & pRL.FilePath
              Else
              text = text & vbCrLf & pLayer.name
              End If
              Next
    Next
    WriteToTextFile "C:\textfile.txt", text
   
End Sub
Sub WriteToTextFile(sFileName As String, text As String)
    Dim fso
    Set fso = CreateObject("Scripting.FileSystemObject")
    'Set fso = New Scripting.FileSystemObject
    Dim ts
    'Create File if doesn't exist, if it does, append to the current File
    Set ts = fso.OpenTextFile(sFileName, 8, True)
    ts.WriteLine text
    
    ts.Close
    Set ts = Nothing
    Set fso = Nothing

End Sub

-------------------------------

use Win32::OLE;
my $class = "esriGeoprocessing.GpDispatch.1";
my $gp = Win32::OLE->new($class) || die "Could not create a COM $class object";
$gp->{overwriteoutput} = 1;
print $gp->{overwriteoutput};

 -
----------------------------------------

New Manifest Format

Note: this information is obsolete.

I'd written a parser to read those manifest files, but the format was kind of irregular, so I'm going to spend an hour making it more regular. That way, it's easier to parse, and safer to parse, ultimately. The new manifest format is CSV.

The first field is the type:
m means the mxd file
s means a shape, and implies that the related files must be figured out later
f means a file

The second field is the name.
The third field is the filename.

We'll assume that no stray quotes or commas are in the data.

m,,G:\blah
s,shape name,G:\blah
f,,G:\blah

Use perl/bin/ppm.bat to install Text-CSV-Simple to get a csv reader. Attached is some untested code that reads a manifest and returns a list of files, sort of. There's still an issue of getting all the different types of shape files. It's just not fully determined.

AttachmentSize
readmanifest.pl.txt703 bytes

OLE/ActiveX Scripting Notes

I'm still working on this. These are just notes, and I'm a noob.

The ESRI ArcObjects don't fully support scripting. They support some basic level of scripting, but they don't fully support scripting with via contemporary OLE Automation, which is what Perl and other languages use.

Historically, there are three phases of COM/OLE that should help explain this situation a little.

First is COM. COM is a way to factor applications into objects that can be used across languages. Normally, you're constrained by the language.

Second is DCOM or OLE. OLE, and later, Distributed COM allowed for the objects to be located on different computers, or within another application. You could issue a method call to a remote program. The technology to do this involved "interfaces". An interface, in this situation, is a lightweight object that communicates with a remote concrete class, aka, coclass. The interface presents a "local face" for the remote object. To access the objects, you "instantiate an interface." Complex objects typically implement several interfaces, and, to access such an object, you needed to instantiate each interface separately, and then set the instance to the object.

Dim foo as IFooThing
Set foo = CreateObject("Foo.FooThing")

Dim bar as IApplication
Set bar = foo

The first two lines set up an object called foo that is accessing Foo.FooThing via the IFooThing interface. The last two lines set up the bar object to also access Foo.FooThing, but via an IApplication interface.

Third is ActiveX and scripting. This is where we are today. Scripting requires a single interface to the entire object. ActiveX objects havea single interface to the entire object, called IDispatcher.

Historically, much of the ESRI applications are stuck back in the second period, where the objects lack an IDispatcher. Thus, ArcGIS apps are difficult to write using scripting tools that expect it.

The alternatives are to use VB for Applications, .NET, Java, C++, and VB.

I'm not certain if Python has support for COM interfaces. I believe it does, according to what some sites say.

Part 2: a VB.NET Version of this Project

After a while, it became obvious that there was no way to drive the ArcMap application from Excel -- timeouts from errors wouldn't get handled, so bad runs would hang.

A real app could raise errors on timeouts, so, I had to learn VB OLE programming. Fortunately there's a free version of VB called VB Express Edition. It's a complete VB environment, that uses .NET. Unfortunately, there aren't references for the old VB classes included. .NET is, in parts, a bit more complex than VB - it's a victim of feature-itis. There are also fewer VB.NET tutorials out there.

Here's a diagram of the "new" system, which is, mostly, going to be an iteration of the "old"system.

The app is broken into three parts. One part manages a list of files. One part is a bunch of "scripts" that do the actual work of analyzing, copying, and deleting files. One part is a scheduler that will run the scripts only at specified times, so that it won't interrupt the normal workday.

File Batching

This code fits into the larger goal of a project that will reliably run an application on a set of files, over the course of several nights.

The first thing I've written, so far, is something that will scan the file system for file names, to create a "batch". The batch is stored in a Microsoft Access .mdb file.

The coolest feature is that you don't need Access to run it. It creates the .mdb file from scratch, and inserts data into it.

Another cool feature is the call to System.IO.Directory.GetFiles. That does all the scanning that, in the original project, required custom code.

This is very alpha code, but, it might help someone out there.

FileBatch.vb

Imports System.Data
Imports system.Data.SqlClient


Public Class FileBatch

    Private Const StatusNone = 0
    Private Const StatusProcessed = 1
    Private Const StatusSkip = 2


    Private Sub CreateNewDatabase(ByVal dbPath As String)
        ' delete the file first
        If System.IO.File.Exists(dbPath) = True Then
            System.IO.File.Delete(dbPath)
        End If

        Dim dbCatalog As New ADOX.Catalog()
        dbCatalog.Create("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & dbPath)

        Dim objFirstTable As New ADOX.Table()
        objFirstTable.Name = "FileBatch"
        objFirstTable.Columns.Append("File", ADOX.DataTypeEnum.adLongVarWChar, 1024)
        objFirstTable.Columns.Append("DestinationFile", ADOX.DataTypeEnum.adLongVarWChar, 1024)
        objFirstTable.Columns.Append("Status", ADOX.DataTypeEnum.adInteger)
        objFirstTable.Columns.Append("ProcessingDate", ADOX.DataTypeEnum.adDate)
        objFirstTable.Columns.Append("Comment", ADOX.DataTypeEnum.adVarWChar, 255)
        objFirstTable.Keys.Append("PK_File", 1, "File")

        dbCatalog.Tables.Append(objFirstTable)

        'cleanup
        dbCatalog = Nothing
        objFirstTable = Nothing
    End Sub


    Public Function CreateBatch(ByVal dbPath As String, _
        ByVal pathStart As String, _
        ByVal ext As String, _
        Optional ByVal statusBox As TextBox = Nothing)
        Dim ar, element

        CreateNewDatabase(dbPath)

        If statusBox IsNot Nothing Then
            statusBox.Text = "Scanning for *." & ext & " in " & pathStart & "."
            statusBox.Refresh()
        End If

        ar = System.IO.Directory.GetFiles(pathStart, "*." & ext, IO.SearchOption.AllDirectories)

        Dim cs
        Dim conn As OleDb.OleDbConnection
        Dim command As OleDb.OleDbCommand
        Dim sql As String

        cs = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & dbPath
        conn = New OleDb.OleDbConnection(cs)
        conn.Open()

        For Each element In ar
            sql = "INSERT INTO FileBatch (File,DestinationFile,Status,ProcessingDate,Comment) VALUES ('" _
                  & element & "','',0,'1/1/1899','')"
            ' Console.WriteLine(sql)
            command = New OleDb.OleDbCommand()
            With command
                .Connection = conn
                .CommandText = sql
                .ExecuteNonQuery()
                .Dispose()
            End With
        Next
        conn.Close()


        CreateBatch = 1
    End Function

End Class

Here's the code that calls it (from a form button):

    Private Sub Button1_Click(ByVal sender As System.Object, _
        ByVal e As System.EventArgs) Handles Button1.Click
        Dim fb As FileBatch
        fb = New FileBatch
        fb.CreateBatch("C:\tmp\text.mdb", "C:\Documents and Settings\johnkuser\", "jpg", Me.StatusMessage)
        Close()
    End Sub

References

http://www.4guysfromrolla.com/webtech/013101-1.2.shtml

AttachmentSize
Form1.vb.txt343 bytes
Form1.Designer.vb.txt2.24 KB
FileBatch.vb.txt2.49 KB
filebatcher.jpg8.02 KB

Some COM and .NET Notes

This document explains some terminology used on other pags.

ActiveX
A technology layered on OLE that supports a method, IDispatch(), that executes method calls by name (by a string argument). IDispatch() solved the problem of scripting languages being late bound, and not able to handle multiple interfaces. ActiveX also covered other technical things, but the IDispatch feature is relevant to this topic.
Assemblies
A group of classes. The classes generally work together, and form a namespace. Analagous to a Java package. The .NET assemblies are analagous to the Java class libraries.
CLR, Common Language Runtime
A "virtual machine" that executes programs coded in CL, a platform-neutral assembly language produced by compilers. The CLR is also called a "managed environment" because the virtual machine takes care of many runtime issues like allocating memory.
COM - Component Object Model
Microsoft's object technology that allows code objects written in different languages to interact with each other. The idea was that you could instantiate an object written in C++ from within VB.
OLE - Object Linking and Embedding
A technology layered on COM that defined how independently running objects would interact with each other. One example is how code in MS Excel can execute a macro in MS Word.
Late Binding - Dynamic Typing
The type of an object is not known until it is used. This contrasts with early binding, or static typing, where you declare that an object is of a specific type, first, then use it. Early binding in the COM environment is used when you declare that an object uses a specific interface. That allows the compiler to check that your method calls conform to the interface.
Managed
See CLR. Managed code is any code that runs within the CLR. The execution is "managed" because the CLR takes care of things like memory allocation and threads.
Multiple Interfaces
The technique used by MS VB and COM to implement objects. An object implements an interface, and may implement more than one. To interact with the object, you instantiate the object with the specific interface, and that defines how you interact with it.