My, what an unwieldy blog title.
While working on my current project, I wanted to be able to generate and fill my SQLite database from scratch, in an instant, by calling a single script. I found a convenient way of doing this is to have a folder of Python scripts – each script does something to the database, typically creating a table and filling it with initial data which it reads from a text file.
As development continues, I’m constantly adding more tables or more data, so I wanted new scripts in that folder to automatically be include in the database building process. To do this, I wrote another script to find and run everything it finds in that folder. This has been very convenient, and has saved me a lot of effort over the past couple of months. It works like this:
- I call RunDatabaseScripts.py, giving it a reference to the database file and a reference to the folder containing the scripts I want to run.
- RunDatabaseScripts.py scans the script folder for files that match Script*.py, and loads them all.
- RunDatabaseScripts.py opens a connection to the database. It will create the file if it does not already exist.
- RunDatabaseScripts.py iterates over the loaded scripts, passing each of them the open database connection.
- RunDatabaseScripts.py closes the database and prints a little run report.
Here is the directory structure for a contrived example with only two database scripts:
root | |-- RunDatabaseScripts.py | |-- scripts | |-- Script_One.py | |-- Script_Two.py | |-- __init__.py
The Database Scripts
Before we get to RunDatabaseScripts.py, lets have a look at the subject scripts that are run by it. Here is the code for Script_Two.py, a simple script that creates a new table in the database:
Name = "Script Two" def CanRun(connection, listOfExecutedScriptNames): ''' Return False if this script can not yet run, True if it can.''' return True def Run(connection): print "\n\tStarting %s Script" % Name dbCursor = connection.cursor() # Create table: SQLScript = r""" DROP TABLE IF EXISTS DemoTable; CREATE TABLE DemoTable ( [ID] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, [DemoString] TEXT NOT NULL ) """ dbCursor.executescript(SQLScript) connection.commit() print "\tCreated DemoTable" print "\tEnd of %s Script" % (Name)
There is a bit an interface that RunDatabaseScripts.py expects the database scripts to adhere to:
- The Name property in line 1 is needed by RunDatabaseScripts.py for reporting failures and for ensuring the scripts execute in the right order.
- The CanRun function in line 3 is called by RunDatabaseScripts.py before the script is run – the script can implement custom code here to make sure any prerequisites have been satisfied. It receives an open connection to the database, and a string list of Names of those scripts which have already been run. Script_One.py below demonstrates the use of this.
- The Run function in line 7 is called by RunDatabaseScripts.py, and is where the meat of the script goes. It receives an open database connection.
Below is the code for Script_One.py, which adds some data rows to the table created by Script_Two.py:
Name = "Script One" def CanRun(connection, listOfExecutedScriptNames): ''' Return False if this script can not yet run, True if it can.''' if "Script Two" not in listOfExecutedScriptNames: return False else: return True def Run(connection): print "\n\tStarting %s Script" % Name dbCursor = connection.cursor() changesPre = connection.total_changes # Count Changed Rows dataToAdd = ["These", "Are", "Some", "Strings", "To", "Add", "To", "The", "Database"] for word in dataToAdd: # Write to DB: cmd = "INSERT INTO DemoTable VALUES (null, ?)" values = (word,) dbCursor.execute(cmd, values) connection.commit() changesPost = connection.total_changes # Count Changed Rows changes = changesPost - changesPre # Count Changed Rows print "\tAdded some rows to DB" print "\tEnd of %s Script" % (Name)
Of interest here is the fact that CanRun now checks that Script_Two.py has already been run – it needs the table to be created before it can add rows. CanRun uses the Name assigned to Script_Two.py and the listOfExecutedScriptNames received from RunDatabaseScripts.py to check this. If Script_Two.py has not been run, CanRun returns False – RunDatabaseScripts.py will then move on to the other scripts before returning to this one.
This empty file is needed for Python to recognise the scripts folder as an importable module.
The launcher script: RunDatabaseScripts.py
Below are the two major functions of the launcher script. You can view and download the full script code here.
Loading the script files as modules
The LoadScripts function scans the script folder for Script*.py files, loads them, and returns a list of loaded modules:
def LoadScripts(scriptFolder): ''' Returns a list of module references ''' # Find Script Files: scriptFiles = glob.glob(os.path.join(scriptFolder, "Script*.py")) print "Found %d Script Files" % (len(scriptFiles)) # Load scripts: loadedScripts =  for scriptFile in scriptFiles: try: # Get full module name: scriptName = os.path.basename(scriptFile) scriptName = os.path.splitext(scriptName) folderName = os.path.basename(scriptFolder) fullModuleName = "%s.%s" % (folderName, scriptName) # Import Script: script_module = __import__(fullModuleName, fromlist=["fullModuleName"]) # Store reference to Script: loadedScripts.append(script_module) except Exception, e: print "Something went wrong while loading script file \'%s\'" % scriptFile print e return loadedScripts
Line 21 forms the full module name of each script, which consists of the folder name and the filename, e.g. scripts.Script_One. It pops the module name in a variable, and the name of this variable is then put in a single element list given to the actual __import__ call in line 24.
If the subject script file fails to parse due to syntax errors or such, the error will be caught in line 29, and the next script loaded.
Running the loaded scripts
Actually running the loaded script is a simple matter of getting a reference to the module object created in line 24 above, and calling the function you want, in this case the Run function shown in the Script_One.py and Script_Two.py samples:
Most of the code in the RunScripts function deals with making sure each script gets run, that they run in the right order, that there are no circular dependencies between scripts, and that the results are reported. Here is the code, with some comments below:
def RunScripts(databaseFilename, loadedScripts): try: # Open DB: connection = sqlite.connect(databaseFilename) # Keep a list of completed scripts so scripts can test if they are ready to run: namesOfCompletedScripts =  success = True # Keep passing over list until everything has been run: while (len(namesOfCompletedScripts) < len(loadedScripts)): nothingExecutedOnThisPass = True # Iterate over scripts: for script in loadedScripts: try: # Check if this script has been run or not: if (script.Name not in namesOfCompletedScripts): # Ask script if it is ready to run - # - give it a list of completed scripts: if (script.CanRun(connection, namesOfCompletedScripts)): # Run script: script.Run(connection) namesOfCompletedScripts.append(script.Name) nothingExecutedOnThisPass = False else: print "\n%s not ready to run" % script.Name except Exception, e: print "Something went wrong while running script %s" % script.Name print e success = False if (nothingExecutedOnThisPass): # If nothing executed on this pass, we're stuck - # - the remaining scripts won't run break print "\nThe following scripts were successfully run:" print namesOfCompletedScripts if (len(namesOfCompletedScripts) < len(loadedScripts)): print "Some scripts failed:" for script in loadedScripts: if (script.Name not in namesOfCompletedScripts): print "\t" + script.Name except Exception, e: print "Something went wrong while applying scripts to DB" print e finally: # Close DB: connection.close()
The database connection
The database connection is opened in line 38. This connection is then passed to each script’s CanRun function in line 53 and to each scripts Run function in line 55. Finally, it is closed again in line 85.
Running scripts in the correct order
Some scripts may depend on database elements created by other scripts – in this example, Script_Two.py must create the DemoTable table before Script_One.py can add rows to it.
Each script is allowed to check it’s own prerequisites in its CanRun function, called in line 53. To assist it, RunScripts keeps a list of successfully completed scripts’ names in namesOfCompletedScripts. It also provides CanRun with a database connection, in case the subject script wants to manually check the structure of the database.
If a script indicates that it is not ready to be run, RunScripts skips over it and continues iterating over the rest of the scripts. It then keeps returning for another pass, revisiting all those scripts not yet listed in namesOfCompletedScripts. This includes both those scripts that were previously not ready to run, and those that raised errors during execution.
In the two script example given here, the first pass skips over Script_One.py due to Script_Two.py not having been run. On the second pass, Script_One.py detect that its prerequisites are satisfied, and successfully executes.
RunScripts will continue to pass over the list of scripts until all have been executed. However, a CanRun requirement that cannot be satisfied, or a script that persistently raises an error upon execution, will cause this loop to be infinite. To halt execution in these cases, RunScripts monitors the boolean variable nothingExecutedOnThisPass, which is set to True at the start of each pass, and negated as soon as a script successfully completes. If an entire pass completes without a single script being successfully run, the remaining scripts will never be ready to run. Execution is halted, and the list of failed scripts is printed in lines 72-76.
There you have it
I found this an easy and straightforward way to automate the execution of new scripts. It frees the developer from manually adding scripts to the execution list, and also solves order-of-execution problems automatically. While it was used for database building here, I’m sure to use the same code in future for any scenarios where I have a rapidly changing list of scripts to run.
If you find this helpful, or have any cool ideas, please do leave a comment!
UPDATE 2013-02-28: I figured out that pysqlite2 is deprecated and has been replaced by the sqlite3 library, which is bundled with Python 2.5 and up by default, and thus requires no download. I’ve removed the links to pysqlite2 and modified the downloadable script to use sqlite3 instead.