Introduction to the
1901 Canadian Census Workbench


Purpose for Building, and Description of, the Workbench

Briefly: Workbench for easier and faster transcription of information from the 1901 Canadian Census into machine-readable form.

Overall, the product is intended to make it easier and faster to transcribe information accurately from the 1901 Canadian Census forms into machine-readable form.

The 1901 Canadian Census is available as a large collection of MrSID-format images which show entire pages (of 50 lines each, one line per person) of the handwritten census. One of the primary problems that users of these images face is that the software available for viewing MrSID images is not made to be embedded in, or controlled by, other software that might aid in the transcription process. In particular, the full-scale images will not fit within ordinary computer screens, which obliges the person doing a transcription to zoom and pan repeatedly and, at the same time, to manipulate whatever GUI is being used for data entry.

The product will display selected rows or columns of one of these forms, along with an conventional computer input form, to facillitate entry. The basic output of the product will be appropriately defined XML.

The product is being written in Python, with a wxPython-based GUI. The product transforms MrSID images to JPEG, so that they can be manipulated using PIL (the Python Imaging Library). The DOS program used to make the transformation from MrSID format is MrSIDDECODE, which is freely available from the company with a proprietary interest in MrSID.

We intend to support the Windows, Mac OS X and Linux platforms, which is to say, all of those platforms for which MrSIDDECODE is made available. Initial development has been for the Windows platform.

Since most of the code is quite straightforward, and having prototyped those parts of the code needed to process the MrSID format, we foresee no big obstacles to development.

Chief features:

The prototype of the product comprises in excess of 700 lines plus a few small graphic files.

Using the Workbench (as it exists as of 2004-02-03) NB: The Workbench is not yet operational, mainly insofar as it lacks facilities for saving and retrieving transcribed data.

The Workbench can mediate the retrieval of MrSID-format census form images from the Canadian federal government web server on which they are stored.. Click on "Retrieve" in the "File" menu to begin.

Click on the name of the province or on "The Territories", then indicate the records that you seek, as shown, and press 'OK'.

If you successfully identified this series of images before then the Workbench will display the dialog box shown to the left. If you select 'Yes' then the Workbench will refresh its list from the government server. Normally this is unnecessary.

The Workbench lists the census form images that it identified corresponding to your search. Check the ones that you want to download.

The Workbench will indicate if you have already downloaded one of the census images that you have just identified using the dialog box shown to the left. Normally you would not need to download again.

At this point the MrSID census form image should be available on your disc. Here are some typical entries in a listing of such files that would appear in a Windows Explorer pane.

Notice that each file is named using the numbering system for the image files on the government server, as well as with identifying information.

The next step is to prepare an image for transcription, and then to open it for transcription. Click on "Open" in the "File" menu to begin.

At this point you can select one of the files that you have downloaded using the Workbench, or you can select one that you have downloaded by other means. Here is how the file open dialog appears.

Because the Workbench displays just a few lines from the census form at a time it needs to have alignment information. If you had aligned the current census form image previously then you would see the dialog that is shown to the left.

If this were the first time that you were opening this particular census form image the dialog would not appear. Let us press "Yes" so that we can continue with alignment (as if this were the first time we had met this image).

If you had converted the MrSID image for processing previously then you would see the following dialog.

If this were the first time that you were opening this particular census form image the dialog would not appear. Let us again press "Yes" so that we can continue with alignment (as if this were the first time we had met this image).

Whilst the Workbench is using another program, called "mrsiddecode", to convert the image for you, the following window will be displayed.

Notice how progress is indicated as a percentage.

When the MrSIDDECODE window closes press 'OK' in the dialog shown to the right.

The Workbench now expands to fill the screen (if necessary), and displays the entire census form for alignment, as shown in the graphic to the left.

Please follow the instructions that appear on this screen. Essentially you need to click on each of the four corners of the area that contains the (up to) fifty lines of census information. The Workbench uses this information to be able to identify individual lines in the converted image for display during the transcription process. Notice especially that it is necessary to download and align a census form image only once. The Workbench maintains the information that you have supplied between sessions.

The above is a portion of the image that appears when you open a census image. (As a matter of fact, this is a composite image, to save space on this web page.) In this case line 1 of the form is presented in the graphic above the computer form. Notice that each field in the computer form is aligned with the column it represented in the form above. What you cannot see in this image is that, as you tab through the computer form the image above it scrolls so that the appropriate item to be entered is kept in view. Other notes:

Please note that this is a work-in-progress. Clearly there are lots of ways to introduce efficiencies in this process; however, this is the way the program works as of the first half of February 2004.

Download files

Communicate with author

Project administration page (not available to everyone)