[Skowhegan Friends Homepage] [The Library Lookout] [The Library Lookout: Back Issues] [Links to Sites of Related Interest] [Computer Classes] [Photo Album] [Please Fill Out Our Survey] [Library in the News] [Links to Reference Material]

Class Six
File Formats

Since all the data a computer uses is contained in files, there are many different types of file, called file formats. As I mentioned in discussing filenames in Class Three, different file formats are identified by different extensions, the three letters (or four, on mainframes) after the last dot. In preparing this class, I downloaded from the Web a dictionary of file extensions; it contained over 2,500 different extensions, and was far from complete. Most of these, however, are either obscure or obsolete, or would be handled automatically by the programs they belong to. In this class, I will cover about forty formats that you might come across on the Internet, or in your programs, and need to identify, to know whether you can use them or which you should be using yourself.

Most file formats can be divided into two groups: text formats, which are made up of human-readable characters, and binary formats, or binaries, which are made up of 0's and 1's to be read directly by the computer. (Of course, text files are also made up of 0's and 1's, but they are designed to be interpreted as characters by the software. If you try to read a binary file in a text editor, it will assume it is a text file and convert it to a meaningless string of characters, including many unusual letters and symbols.)

I. Program Files

Most programs have the extension .exe (for executable). Other possible extensions are .com, .vbx, .ovl, .drv, .cla, .cpl, .scr, .sys and .vxd. These are extensions your virus scanner will check for file viruses.

A program is independent; you just double click on it in Explorer, say, and it runs by itself. Program files are binaries.

Another type of file which is an auxiliary file to programs is .dll (dynamic link library). Windows will give you a warning if you try to delete an .exe file, and if you drag and drop, it just makes a shortcut rather than copying or moving it. Windows doesn't give you the same protection for .dll files, so be careful with these.


II. Text and Word-Processing Files

When you use the Save As command in your word processor, you get a list of possible formats you can save in. The one at the top is the default extension for your type of word processor: .doc for MS Word, .wpd for WordPerfect, etc. Others are for older versions or for other wordprocessors, many of which are obsolete, so that you can choose a format that someone with that program can read. I've already mentioned the .txt extension for ASCII (American Standard Code for Information Interchange) text files, which have just characters without formatting. This is the lowest common denominator: nearly any program on any computer can read a text file. The next higher step is Rich Text Format (.rtf). This preserves most of your simple formatting, such as bold, italic, etc. and is readable by nearly all programs except very old DOS programs. If you're sending someone a document and don't know what software they have, .rtf is a pretty safe choice.

In general, a newer version of a word processing program can import the formats of its older versions, and older versions of its competitors, but not vice versa. Desktop Publishing applications can usually import a large number of word processing formats - something to remember if you have one, and come across a file your word processor can't open - and some of the more expensive ones like PageMaker use separate import filter files, which gives them forward as well as backward compatibility: if a new version comes out of WordPerfect, for example, you can go to the PageMaker website and download a new filter to handle it in your old version of PageMaker.


III. Web Pages; Portable Documents

Most web pages are in HTML, with the extension .htm or .html. This is a text format; you can open these files in Notepad and edit them, or even make new web pages from scratch, if you know HTML.

Most word processors and Desktop Publishing applications today allow you to save in HTML format, although the results are not always the best. Both Netscape communicator and Internet Explorer include simple HTML editors (Composer and Front Page Express) and there are other editors available as freeware or shareware which allow you to produce simple web pages without knowing HTML code.

Some web pages, especially with lots of multimedia, have the extension .asp instead of .html. This stands for Active Server Page, and is basically just a faster way of downloading pages with less error checking than HTML.

An advantage to HTML, as opposed to word processing document format, is cross-platform compatibility; that is, the same web page can be viewed by browsers running in Windows, Linux, OS/2, or on a Macintosh or a mainframe. To accomplish this, however, it leaves much of the formatting and choice of colors, fonts, and so forth to the browser. The same page may be viewable on all computers, but it will not look exactly the same on all of them. Each successive version of the HTML specification has increased the web page designer's control over the appearance of the page; but for duplicating the appearance of a document exactly, while retaining cross-platform compatibility, the best solution is probably Adobe's Portable Document Format (.pdf).

PDF's can be created from any program you can print from, using a program called Adobe Acrobat. They are viewed with a free program called Acrobat Reader, which is available for every platform and which nearly everyone has.


IV. Compression

Three things that no one ever has enough of are money, hard disk space, and bandwidth. To help with the last two, there is file compression. I am referring here to lossless compression - compression which does not lose any information. Without getting into the technical details, this works more or less like shorthand, replacing a frequently occurring pattern by a single code.

The most common compression format for PC's is .zip. Originally, the main use for zipping files was to put multiple files into a single "container" or archive for storage; the compression was just an added benefit. Hence the name ".zip", as in "zipper" or "ziplock bag". This has nothing to do with Zip disks, where the name presumably is from "zippy" or fast. Most software and documents you download from the Internet will be in .zip archives.

There are a number of programs which can zip and unzip files, but by far the most popular is WinZip, which is the most downloaded program on the Internet.

There are several older archiving and compression formats which existed before .zip; you will run across these mainly on foreign websites. There are also some which are used mainly on UNIX or Linux systems, such as .g, .gz, .tar, .tgz, and .taz. WinZip can handle most of these.

There are also self-extracting .zip files, used for downloading software, which do not require an unzipping program. These are programs, with an .exe extension, and as with all .exe files you have to be careful as they can harbor viruses or Trojan horses (programs that pretend to do something useful while doing something else behind your back).

At the time that .zip was becoming popular among DOS users, PC and Mac users were in different worlds, and so a different compression format developed for the Mac, called Stuff-it (.sit). The freeware program for stuffing and unstuffing files was called Stuff-it Expander (now Aladdin Expander). This is now available for Windows as well as the Mac and can handle .zip and the older formats as well as .sit files.

These programs are much better at compressing text files than binaries. Two formats which are used for compressing large programs are Microsoft's Cabinet files (.cab), which are used for Microsoft's programs (the Windows installation disks are made up of .cab files), and .jar files which are used by other programs (for instance Netscape Communicator.) These are decompressed by the install programs and do not require any special software on your computer.

A compression format you will come across occasionally on the Internet, which is optimized for binary graphics and multimedia files is .rar. The shareware program for making and opening .rar archives is called Winrar. It gets very good compression. The only disadvantage is that most people are not familiar with it and do not have the Winrar program.

E-mail and newsgroup posts are text only. So when you include a binary as an attachment to an e-mail or a news posting (to a binary group) it has to be coded into ASCII text, then decoded at the other end. Today, this is generally done with something called MIME, which is transparent to the user. An older method, which you will sometimes run across in binary newsgroups, is called UUE, and uses files with a .uue or .uu extension. Usually, these are handled automatically by your mail or news program. This is not a compression format, but I mention it here because if you do have to encode or decode a UUE file manually, you could use a compression program like WinZip.

 

V. Graphics Formats

There are two basic types of graphics formats, Raster (or bitmapped) and Vector. Raster graphics formats divide up a picture into a grid, or raster, of small squares known as pixels. They then give the color information for each individual pixel. Vector formats describe the picture by giving mathematical formulas for the lines and curves which make it up.

Vector graphics are produced by drawing programs, such as CorelDraw or Adobe Illustrator. They are best for simple line graphics, produce very small files, and can be scaled with no loss of quality. The two vector formats you will meet with most often are Windows Metafiles (.wmf) and Computer Graphics Metafiles (.cgm). A new vector format which has recently been introduced for use on the Internet is Scalable Vector Graphics (.svg).

Raster graphics are best for more complicated images, and especially photographs. They are produced by Paint programs, or by scanning and photo-editing applications. You will meet with a much larger number of raster formats. The two most important factors in describing raster graphics formats are the color depth and the compression. (I explained about color depth in class two, but to review: the number of colors which a format can contain depends upon the number of bits it uses to describe them.)

One of the oldest formats still common today is the Windows Bitmap (.bmp), which was developed for Windows 3.x wallpaper. (Note the ambiguity of the term bitmap, which can be used generically as a synonym for raster graphic, or specifically for the .bmp format.) This is a 24 bit format, with no compression; it gives the largest file sizes of any format for a given size and resolution image. .Bmp is the default format for many scanner programs; if it is for yours, change it to .tif (see below) - .bmp is the worst possible format for almost any use, except as Windows 3.x wallpaper.

A variant of the .bmp format is .rle (run length encoded), an 8 bit format with a fairly primitive compression.

Many scanners and photo-editing programs default to Tagged Image File Format (.tif or .tiff), and this is usually your best choice for any photo you want to edit or print. Tif's can be saved either uncompressed (for virtually universal compatibility with any graphics program) or with one of four lossless compression options. Tif's are 24 bit.

Although tif's are smaller than .bmp's, they are still quite large - far too large for e-mail or putting on a web page. For photographs to put on web pages, you want to use the JPEG format (.jpg, .jpe, or .jpeg). This stands for Joint Photographic Experts Group. JPEG's are 24 bit color files. Unlike the other graphics formats, they use lossy as well as lossless compression; that is, they use the characteristics of vision to discard the information which is least noticeable. JPEG's can get very high compression rates, as much as 25 or 30 to 1. I've printed out .tifs and .jpgs of the same photograph, and on a standard inkjet printer like mine the difference is not perceptible. However, you would not want to send JPEGs to a printshop. The main thing to remember about JPEGs is that every time you make changes and resave, they get recompressed, which is like making a xerox of a xerox. After two or three saves, the image is extremely degraded. Thus, if you are scanning a photo to use on a web page, you should scan it to a .tif file, make all your editing changes, and then convert to JPEG. If you think you may need to make more changes later on, keep the .tif and make a new .jpg when needed.

JPEG is your best choice when a photo is the point of your web page; if you're just using a small photo or graphic as a design element, you want to choose the .gif format. This stands for Graphics Interchange Format, and it was developed by CompuServe specifically for the Internet. .Gifs are 8 bit and use lossless compression.

A newer format, called Portable Network Graphics (.png), has been developed for the purpose of replacing the (copyrighted) .gif format. It has 24 bit color, a choice of lossy or lossless compression, and a few extra features such as password protection. While .png has been around for several years, it is only beginning to become common, and is still far less common than the .gif and.jpeg formats it is designed to replace.

JPEG is designed for "natural" images, such as photographs, with subtle gradations of color. "Synthetic" images, such as lettering or clip art, may look very fuzzy in JPEG, and the files may actually be larger rather than smaller. For synthetic images, your best choice is .pcx, the format of the Paintbrush program. This handles 24 bit color.

Of course, any image file can be viewed in the program that produced it. Photo-editors usually can view and edit at least .tif, .bmp, and .jpg; your browser can handle at least .gif and .jpg. There are many graphics viewer programs which are available as freeware or shareware, and they can all handle the formats mentioned here. The one I use is Graphics Workshop (a $40 Cdn shareware program), which can view about fifty different formats, and convert most of them from one to another. I have seldom found an image format on the Internet that Graphics Workshop couldn't open. A very similar commercial program I have tried (it was preloaded on my computer when I bought it) is ACDSee, which handles about twenty of the most common formats with a very nice interface. Other popular programs are LviewPro and PaintShopPro, which have somewhat more editing capabilities but which have other problems. All these are in the $40-60 range.



VI. Video Formats

There are only a few really common video formats. The basic video format for Windows is Microsoft's Audio Video Interleaved (.avi). .Avi files can be viewed with a program which comes as part of Windows. However, the .avi format can contain a number of different codecs (coder-decoders) so not all .avi files will play with the included Windows software, unless you can find and download the appropriate codecs. .Avi files are usually extremely large (but see DivX below.).

A second format, which is also a codec in itself, is MPEG (.mpg or .mpeg). This stands for Motion Picture Experts Group. (Actually, .mpg files refer to the MPEG v.1 codec.) The way MPEG works is that the first frame is saved, and then only the changes are saved from frame to frame. This means that a "talking heads" video may have a relatively small file, while an "action" movie may be nearly as large as an .avi.

Microsoft's Advanced Streaming Format (.asf) (formerly known as Active Streaming Format), was based on MPEG v.4, the DVD format, which gives smaller files; the major disadvantage is that it only plays back with Microsoft Media Player, which excludes people using the Macintosh or other non-Windows operating systems. A similar proprietary format from Microsoft is Windows Media Video (.wmv).

A non-proprietary version of the MPEG v.4 codec is DivX. (Not to be confused with DIVX, the late unlamented proprietary DVD format.) This is rapidly becoming the favorite format on the Internet for longer movies. Many new .avi movies use the DivX codec, which is not included with the standard Windows Media Player and the free Real Player, so this codec must be downloaded from the Internet.

A third popular format is Apple's Quick Time movies (.mov, .qt or .qtm). To view these, you need the free Quick Time for Windows program.

A fourth type, which is designed for "streaming" over the Internet, is Real Media (.rm), which requires at least the free Real Player software. One thing you can do with Real Media is to watch TV programs from around the world - the poor man's satellite dish. Unfortunately, the size of the image tends to be very small, although the latest versions (helped by the spread of higher bandwidth connections) are improving on this.



VII. Audio Formats

There are many audio formats used on the Internet, most of which require a specific plug-in. I will only mention here the four or five types which are "standard" types.

The sounds on your computer, like the ones you hear when you start up or open a program, are .wav files (pronounced "wave"). These are just digital recordings of sounds; you can record .wav files from a microphone with the Sound Recorder program, or other programs. Like .avi, the .wav format uses various codecs, and the files are rather long for the length of the sounds.

The equivalent of .wav files on the Mac are .snd (sound) files. There is a free program called snd2wav which can translate them for Windows.

A totally different approach is taken by MIDI files (.mid, .mdi). This stands for Musical Instrument Device Interface, and was originally designed for connecting to separate music synthesizers. A midi file is not a recording, but more like a musical score; the sound card synthesizes the sound when you play it. The quality of midi playback is highly dependent on the quality of your sound card. Of course, midi files cannot contain vocal music or non-musical sounds. The files are relatively short for the length of the music they contain. Midi was the main way of downloading music on the Web until very recently.

Real Audio (.ra) files are for streaming audio; they are used for both voice and music, and require the Real Player software.

Finally, there is MP3, which is short for MPEG v2 Audio Layer 3. Probably everyone knows about this format; it is the new downloadable music format that is supposed to revolutionize the music industry and make CD's obsolete. There is a proliferation of MP3 software on the Internet today, and you can also buy a machine like a portable CD player to play your downloaded music on.