Archiving images with retrieval in mind
On this page—
- Storage, Retrieval—What’s the Difference?
- Design Criteria
- Archiving on CD-Rs
- Folders, Filenames, Filing and Browsing
- References and Links
Storage, Retrieval—What’s the Difference?
What’s the point of taking all those pictures if they’re viewed once and then forever after relegated to the family picture drawer, or that precarious stack of slide carousels in the hall closet? To my mind,
The ability to enjoy your photos depends critically on your ability to retrieve them at will.
Our picture drawer is an abysmal retrieval system, and would probably remain so even if we took the time to organize it. Why? Because hardcopy just doesn’t lend itself to efficient retrieval.
Fortunately, digital images are different. With powerful, flexible, affordable and readily available computer-based tools and a little planning, drilling down on any specific image becomes a snap. And reorganizations that would take hours in hardcopy take seconds on your hard drive.
The rest of this article describes the digital retrieval system I’ve worked out for myself, not as a recommendation but as an illustration of the issues and some possible solutions. Many valid approaches are possible.
A well-known organizing consultant interviewed on NPR a while back said that organizing should be more about retrieval than about storage. She then went on to tell of a 3-foot stack of papers and file folders on the desk of a client who could put his hands on any document in the stack in under 10 seconds. It was an ugly, inelegant storage system, but somehow, it was an effective retrieval system for him, and that was good enough for her. They left the stack alone.
The retrieval system that works for you may differ substantially from mine. (To you, my method may look like that 3-foot stack in the preceding paragraph.) The important thing is to figure out what does work for you and then put it in place before you find yourself swimming in digital pictures. If you store images properly as they come in, the burden will seem light.
For me, an efficient image retrieval system means
- Stable, inexpensive media with a long shelf life, low $/MB ratio and small physical size
- A rational folder structure that makes sense, so I’ll know intuitively where to look for things
- A rational file-naming system guaranteeing each image a unique name with some name recognition
- Fast random access to any file in any folder on any disc or tape
- Consumption of the fewest possible disks or tapes to minimize the physical storage space and media handling involved
CD-Rs remain by far the best overall long-term retrieval solution, and they win on nearly all but one of the individual criteria as well. Tape wins big on the last count but falls down miserably on the random access criterion. Random access is much more important than physical storage density in an efficient retrieval system.
Bang for the Buck
Once you face the fact that you’ll be storing images in gigabyte quantities, you’ll find that storage costs vary widely with the choice of medium, as the table below clearly demonstrates.
digital photo storage options as of 2Q2000
|brand||iomega||iomega||iomega||smart & friendly||smart & friendly|
|spec||w/ case||w/ case||w/ case||700 mb
8x w/ case
4x w/ case
|cds without cases are even cheaper|
|gb to store||2.0||2.0||2.0||2.0||2.0||not even a year’s worth with my 2.1MP camera|
|mb to store||2048||2048||2048||2048||2048||1024 mb/gb|
|mb per formatted blank||38.8||97||242.5||600||600||guessing based on my 650 mb cd-rs and zip100s|
|$$ per pack||$ 99.95||$ 87.95||$ 129.95||$ 10.95||$ 14.95||necx prices as of 3/24/00|
|blanks per pack||10||10||8||10||10|
|$$ per blank shipped||$ 10.50||$ 9.30||$ 16.87||$ 1.60||$ 2.00||$ 5.00 s&h|
|$$ for blanks||$ 553.96||$ 196.25||$ 142.46||$ 5.44||$ 6.81|
|$$ lost relative to CD-R||$ 548.52||$ 190.80||$ 137.02||$ –||$ 1.37|
|$$ per gb stored||$ 276.98||$ 98.12||$ 71.23||$ 2.72||$ 3.40|
The last row tells most of the story, and CD-R prices have fallen dramatically since this table was calculated. As of 4Q2003, DVD burners remain beyond the reach of most budgets, so I won’t cover DVD storage here.
|250MB Zip discs offer some pluses here—mainly speed with ATAPI and SCSI drives—but their $/MB ratio is way too high for the potentially huge volume of image data involved, and the wisdom of long-term storage on any magnetic medium is dubious.|
Bare CDs take up much less physical storage space than any form of removable disc I know of, especially if stored in paper envelopes or in a high-density 30-40 disc box with double-sided pullout CD holders. I’d be hesitant to store bare archival CDs in rack-like holders, and I’m very leery of vinyl sleeve solutions.
Archiving on CD-Rs
I like chronologically ordered CD-Rs for final storage. Once images accumulate to critical mass on my hard drive, they’re off-loaded to one or more CD-Rs. Since the CD-Rs usually get filled in the process, writing to them once is generally all that’s needed. (I save my expensive and considerably less reliable CD-RW blanks for other applications that require less data security and benefit from the rewritability.)
I can routinely archive 1200-1400 raw Oly C-2020Z images (1600×1200 JPEGs at around 400KB each with HQ recording) onto a single 650MB CD-R in one or more recording sessions.
Depending on your equipment, burning and reading CD-Rs is probably slower than writing and reading Zip discs, for example, but the CD speed penalty on the reading side (mostly at spin-up) is tolerable and the writing happens only once. CD-R burners with 48x recording speeds are becoming commonplace at affordable prices as of 4Q2003. The recording time hit is diminishing rapidly. Some CD-R burners even label the CDs for you.
Archival File Formats
FWIW, here’s my take on storage formats for edited and unedited images…
- For images that have been manipulated in any way other than lossless rotation, PNG (Portable Network Graphics) makes an excellent archival file format by virtue of its lossless compression, its high degree of standardization and its alpha channel support. But storing PNGs assumes that
- Your tools support PNG, and most do nowadays.
- Saving space is more important to you than retaining the future editing options that may be preserved only via your editor’s own lossless “native” file format. For example, PhotoShop and PHOTO-PAINT native formats allow returns to previous editing stages, keep layers separate, retain saved masks and selections, etc. Unfortunately, native editor files can easily exceed 20MB for a 2MP camera image that started out as a measly 400KB JPEG.
The format output by your camera becomes largely irrelevant once images are edited. It goes without saying that one should
|Always work on copies, never on original images.|
Also, bear in mind that
|Repeated saves to JPEG or any other format with lossy compression cause a slow but sure accumulation of compression artifacts that will sooner or later lead to visible image degradation.|
- An image recorded initially as a TIFF (Tagged Image File Format) and not yet edited can be safely archived as a PNG to save space if space is more valuable than the time and effort involved. Since both are lossless 24-bit formats, the conversion won’t introduce computation errors. Once you edit a copy of an original TIFF, see No. 1 above.
- An image recorded initially in JPEG (Joint Photographic Experts Group) format and not yet edited beyond lossless rotation should be archived as is. Converting such JPEGs to a lossless format adds nothing but work. The image data certainly won’t get any better and are potentially subject to quanitization and other computational errors on reformatting, depending on the decompression algorithm used and the final format involved. Of course, once you edit a copy of an original JPEG, No. 1 above applies.
- Images recorded initially in your camera’s RAW format should be archived as RAWs to retain all the valuable post-processing advantages RAW files afford. And since edited images can’t be saved to RAW format, No. 1 above once again applies.
Folders, Filenames, Filing and Browsing
The choice of storage media and hardware is easy. Designing data structures that foster retrieval is the hard part. What you do here can easily make or break your retrieval system, and the best approach is likely to be a highly personal one.
If you shoot a lot, you’ll do well to invest in a specialized image management utility to help you create, populate, maintain and above all enjoy your electronic image retrieval system. Variously known as image viewers, browsers or thumbnailers, these invaluable tools are swift and agile enough for day-to-day image housekeeping. The slow-loading software behemoths used for image post-processing are usually far too unwieldy for this task, and most operating systems lack the necessary skills.
The articles on Gene and Marion Wilburn’s Northern Journey document a very carefully planned and executed “homebrew” digital image retrieval system. You may not have the time or resources to pull off a database like theirs, but examining what they’ve done will no doubt tip you off to the many but not always obvious practical issues involved.
Of course, there are many ways to skin this cat, and you’re the best judge of what will work for you. By way of example only, the subsections below document an approach that’s worked well for me.
After a few false starts, I settled on a logical hierarchy of topically organized hard drive storage folders with names descriptive enough to help me drill down on any desired image with a reasonable chance of success on the first try. I duplicate the pertinent portions this folder structure on each of my archival CD-Rs.
I keep a separate database of event and session dates to take full advantage of the date-time filenames I’ve used since day one. Since many-to-one relationships among sessions and topics can easily go in either direction — many topics can arise in one session, and many sessions can include a single favorite topic — this combination allows a very effective two-pronged retrieval strategy based on topic, time or both using nothing more elaborate than a simple session-date spreadsheet and your operating system’s file finder.
My personal folder structure looks something like this:
\c-2020z (raw unsorted images from this camera land here first)
\d-340l (raw unsorted images from this camera land here first)
\family (pictures of more than one of us go here)
\pets (picture of more than one pet go here)
Choose categories, hierarchies and folder names make sense to you from a retrieval standpoint.
Carefully chosen filenames greatly simplify image retrieval, but descriptive filenames are difficult to automate and manage. I name my image files with a time-date stamp with one second resolution to guarantee unique file names and to insure that files sort temporally when sorted alphabetically.
Holger Junck’s excellent shareware utility Picture Information Extractor (PIE) does this date-time renaming automatically as the images download from my camera. (PIE is discussed in more detail here and here.)
For instance, the PIE filename 20000101-1541-14.jpg refers to an image taken at 3:41:14 PM on 1/1/2000.
You can easily customize the filename mask PIE uses to rename downloaded images to suit your own filing scheme.
Sooner or later in a retrieval system relying on filenames, you’re going to have to rename entire batches of files to maintain consistency and retrievability. I’m usually presented with this chore as punishment for forgetting to restore my camera’s internal date and time after a prolonged battery outage. PIE ends up stamping all the images with the same mysterious date and time in 1969. (Don’t worry—PIE will automatically suffix a counter field to otherwise identical filenames to keep them unique.)
Fortunately, PIE’s flexible batch file renaming feature can also be applied profitably to image files already on your hard drive. Another batch renaming utility worth looking into is MultiRen, a free 32-bit Windows Magazine utility written by Gregory A. Wolking. MultiRen functions as a Windows Explorer extension accessed via the right-mouse menu.
Naming for Uploads
If you’re planning on returning images to your camera for slide-show displays on TV, for instance, be aware that workable filename choices suddenly become severely restricted. My C-2020Z completely ignores its own images if they’re renamed outside certain rules detailed here. Other cameras seem to be equally finicky about uploaded filenames as well.
Keyword searches could easily supplement any image retrieval system. A number of image management utilities allow you to attach meaningful keywords to your image files. Most do so via a dedicated relational database (e.g., ThumbsPlus), but some actually embed keywords in the image file header itself. (The EXIF image file format used by most digital cameras allows a limited amount of free text to be stored directly in the image file header.) In theory, the latter approach would allow you to locate image files carrying embedded keywords using only the “text within file” search your operating system’s file finder probably offers. How well it works in practice, I don’t know, but I’ve found the more elaborate database approach to be very flexible and reliable.
Since my simple date-session retrieval scheme continues to work well enough for my purposes, I’ve not investigated the keyword approach further, but it garners considerable interest on RPD.
Thumbnailers like PIE, ACDSee, ThumbsPlus and Qimage play important roles in image maintenance, batch processing and printing, among many other uses. In my hands, nothing’s faster than PIE in single-image thumbnailing mode when it’s time to cull, rotate and file away a batch of images fresh off the camera. The fast, familiar Windows Explorer-like interfaces in PIE and ACDSee let me distribute new images to their destination folders quickly and accurately.
For large reorganizing jobs at the folder level, a fast file transfer application like LapLink comes in very handy—especially when image version control issues come into play. Experienced Windows 9x and NT users may find the old Windows File Manager (winfile.exe) more convenient than Windows Explorer. A nicely enhanced 32-bit version still lurks in the main Windows folder.
Once they’re tucked way in a suitable folder structure, I find looking for images according to when I might have taken them as good a way as any.
To jog my memory, I log notable temporal “landmarks” such as birthdays, holidays, hikes, trips, memorable shots, tests, and so on in a simple text “date database”, each line of which has the format
date [tab] note
to allow for importation into a spreadsheet or true database down the line.
To drill down on specific images, I consult the “date database” as needed to identify the appropriate CD or hard drive folder and then turn to a thumbnailing utility to find exactly what I’m looking for.
PIE’s familiar Windows Explorer-like interface and its very fast on-the-fly single-image thumbnailing let me browse for the image(s) I need and then manipulate, open or print the files as I would any file in Windows Explorer. PIE fully supports Windows file extension associations.
Occasionally, I resort to whole-folder thumbnailers like ACDSee, ThumbsPlus, Qimage and PIE (in whole-folder mode) to find images. ThumbsPlus can even scan a folder for images similar to the one selected.
Thumbnailing (from “thumbnail sketch”) helps you browse your image files by attaching a small image to a file name. Among the many available thumbnailing applications and utilities, I’m familiar with ACDSee, PIE, ThumbsPlus and Qimage, all high-quality shareware. Another shareware thumbnailer with an enthusiastic RPD following is Max Lyon’s Thumber, but I won’t be covering it further here for lack of first-hand experience. Finally, recent versions of Windows offer built-in thumbnailing, but among other catches, it’s very slow.
PIE (Picture Information Extractor)
I think of PIE as a utility for the manipulation of image files—as opposed to the images themselves—and find it absolutely invaluable for that purpose.
In the single-image thumbnailing mode I use most often, PIE shows
- a familiar 2-pane folder-and-file interface very similar to the one used by Windows Explorer
- a very fast, generous, well-rendered thumbnail of the selected single image file
- extensive exposure and camera information extracted from the selected image’s EXIF file header
Highlight an image file, hit enter, and you’ll be looking at the same image full-screen, again with great speed and little aliasing. You can quickly browse through any folder, switching quickly between thumbnail and full-screen views as you please.
Recent versions of PIE can also show thumbnails of all the images in the selected folder at impressive speed. I consider the flexibility to go both ways (single-image vs. whole-folder) a great virtue in any tool routinely used for downloading images from my camera and then for preliminary inspection, lossless rotation (to get those vertical shots oriented properly), culling and filing.
There’s much more to PIE, and the enhanced PIE Studio version adds flexible paper-conserving image printing. Unfortunately, the interpolation algorithm Studio uses to size images for printing doesn’t measure up to the flexible interpolation offered by Qimage.
Anyone who shoots a lot will appreciate PIE’s winning combination of great speed and detailed EXIF data display relative to other whole-folder thumbnailers like ACDSee, ThumbsPlus and Qimage. You really have to download the PIE demo and try it out to appreciate just how handy a tool PIE can be in day-to-day digital photography.
BTW, I have no personal interest in PIE. I just appreciate a good value in software when I see one. Support has been sketchy in the past, but I don’t know where that stands now.
I’m still exploring ACDSee v. 3.1. This flexible, well-designed, feature-packed thumbnailer excels at fast image review, especially in whole-folder mode. ACDSee is more powerful than my old favorite PIE in many respects, particularly on the editing front, but…
- ACDSee’s single-image thumbnailing is noticeably slower than PIE’s
- ACDSee doesn’t automate image transfers and file renaming as adroitly as PIE, and
- it doesn’t display exposure data as clearly and conveniently as PIE.
Still, ACDSee has great potential. Versions later than 3.1 may well have resolved these issues.
ThumbsPlus and Qimage
These affordable 32-bit Windows shareware whole-folder thumbnailers are much more ambitious than PIE, particularly along editing lines, but are noticeably slower to launch and use as a result. Their thumbnail caches can also take up a gobs of disc space. The same can be said of ACDSee. The sketchy descriptions below only hint at what these feature-rich programs have to offer.
Both programs generate thumbnails for entire folders with user-controlled thumbnail caching to improve performance on return visits to cached folders. Both offer folder-based file management, basic to intermediate image editing, lossless JPEG rotations, EXIF file information display, slide shows and batch image processing among many, many other common features.
Unfortunately, neither offers a fast single-image thumbnailing mode like PIE and ACDSee. Qimage forces you to choose whole-folder thumbnails or none at all. In ThumbsPlus, whole-folder thumbnailing is always on. In my book, that gives PIE a big advantage as a day-to-day image organizer.
ThumbsPlus features a well-designed automated thumbnail table generator for web pages. The folder-based ThumbsPlus interface is very intuitive. Its powerful batch processing proves very useful in an archival context. The intermediate post-processing facilities fall short of PhotoShop or PHOTO-PAINT, of course, but go well beyond the routine. The “General Enhancement” (GE) feature is by far the best I’ve seen in the “Instant Fix”, “Auto-balance” genre. In fact, it’s so consistently good, it’s almost eerie. Even when GE falls short of the result I’m after, it’s often an excellent starting point for manual fine-tuning, in ThumbsPlus or elsewhere. Batch GE can be a huge time-saver, especially for the non-masterpieces that tend to dominate my photos from gatherings and group outings and vacations.
If you archive images on CD-R or other removable media, ThumbsPlus by default creates automatically updated off-line volumes of thumbnails stored on your hard drive for fast and easy access, even with the disc unmounted. Off-line volumes are a very powerful image management tool.
Mike Chaney’s Qimage shareware from DDI Software offers powerful paper-saving print layout control and after-the-fact noise reduction. The user-selectable interpolation algorithms used to resize images for printing are some of the best around. The Qimage interface takes some getting used to, but these features alone are worth the effort.
In Windows XP and 2000, even Windows Explorer (WE) can produce thumbnails, either of the selected image or of all the images in the current folder. Furthermore, this feature can be enabled on a folder-by-folder basis. WE offered more primitive thumbnailing in Windows 98 and in Windows NT 4.0 with the Active Desktop update installed.
As a thumbnail generator, WE is quite slow, even in Windows XP. Earlier versions of WE cached thumbnails in a system folder named ‘thumbs.db’ to speed viewing on return visits. The bad news about ‘thumbs.db’ was that it could only grow. Thumbnails were never removed, even when the parent file had been deleted. If you turned on WE thumbnailing (via a folder’s Properties context menu), you might have to make some ugly choices about ‘thumbs.db’ down the road. I’ve heard that the XP version of WE has addressed most of these issues, but I’m no expert on XP.
WE thumbnailing performance may be tolerable for relatively static image folders, but it can be a big problem for commonly accessed folders in constant flux, like the receiving folders I use to hold images fresh off the camera. Most of my subject-oriented storage folders (family members, pets, local hiking haunts, etc.) also end up taking on new images fairly frequently, so they’d bog down as well.
All whole-folder thumbnailing schemes suffer from performance and footprint issues. Most of the time, I prefer the speed and storage economy of a fast, flexible single-image thumbnailer and transfer engine like PIE.
References and Links
(See also the home page links.)
Archiving—Michael Reichmann’s tutorial on archiving options for digital images covers media and techniques not included here.
Octave’s CD Recording library—a large collection of helpful articles and FAQs on CD-R and CD-RW technology and techniques.