March 29, 2007
(XHTMLized by Andux)
The overall file is composed of a main header followed blocks of audio and video data.
Bytes 1 - 15: Header Bytes 16 - EOF: Various Blocks ...
The type of block is identified by the first byte in the block.
0x01 = Video, Compression #1 0x02 = Palette 0x03 = Video, Compression #2 0x04 = Video, Compression #3 0x14 = End of File 0x7C = Audio, Start Frame 0x7D = Audio
The header is always the first 15 bytes of the VID file.
Bytes 1 - 3: ASCII Always "VID" Bytes 4 - 5: INT16 Always 0x00 02 (512) Bytes 6 - 7: INT16 Number of Frames Bytes 8 - 9: INT16 Video Frame Width (256 or 320) Bytes 10 - 11: INT16 Video Frame Height (200) Bytes 12 - 13: INT16 Global Delay value? Bytes 14 - 15: INT16 0x0E 00 (14) in all DF VIDs
This usually only appears immediately after the header.
Bytes 1 - 1 : CHAR8 Block Type (0x02) Bytes 2 - 769: RGB 256-color VGA palette. Each palette entry is composed of 3 triplets (R/G/B) ranging in value from 0 to 63.
Experimentation has revealed that the Daggerfall engine does corectly interpret palette blocks inserted later in the file, though none of the videos included with the game make use of this functionality.
This is usually the first audio block in the file.
Bytes 1 - 1: CHAR8 Block Type (0x7C). Bytes 2 - 3: INT16 Unknown; always 0. Bytes 4 - 4: CHAR8 Sound Blaster DAC init value (usually 0xA6) Bytes 5 - 6: INT16 Audio block data length Bytes 7 - ???: AUDIO 8-bit audio data
The DAC init value corresponds to the audio sample rate as follows:
InitVal = 256 - (1000000 DIV SampleRate) SampleRate = 1000000 / (256 - InitVal)
The audio blocks hold the audio for each frame.
Bytes 1 - 1: CHAR8 Block Type (0x7D). Bytes 2 - 3: INT16 Audio block data length Bytes 4 - ???: AUDIO 8-bit audio data
The basic structure of the video blocks is all the same:
Bytes 1 - 1: CHAR8 Block Type (0x01, 0x03 or 0x04). Bytes 2 - 3: INT16 Display time (in 60ths of a second) Bytes 4 - ???: VIDEO Compressed video data.
Each type of video block has a slightly different compression format (see below for details). Unfortunately there is no record size so you must completely parse the video data in order to find the next block. There is only one frame per block, so you can stop parsing video when InputByte == 0 or Bytes Copied >= (Frame Width * Frame Height).
It is important to note exactly how the video frames go together. Only the 0x03 video block actually contains a full frame of data. The 0x01 and 0x04 types only contain the pixels that have changed from the previous frame. Thus, generally in order to render any one frame you must also render all previous frames up to the first 0x03 video block.
The video data in a form of RLE compression with the following algorithim used for uncompression:
InputByte = Read 1 Byte From File if ( InputByte >= 0x80 ) RunLength = InputByte - 0x80 Skip RunLength Pixels in the Video Frame else if ( InputByte == 0 ) End of Video Frame else RunLength = InputByte Read RunLength Bytes From File and Copy to Frame endif
This is usually the first video block in the file and contains a full frame of video in regular RLE format. The basic uncompressing algorithm is:
InputByte = Read 1 Byte From File /* RLE compression */ if (InputByte >= 0x80) NumberofBytes = InputByte - 0x80 InputByte = Read 1 Byte From File Add NumberofBytes of InputByte to Video Frame else if (InputByte == 0) /* Should never happen in 0x03 */ End of Video Frame else NumberofBytes = InputByte Read NumberofBytes from File and Copy to Video Frame endif
This format has an extra header variable:
Bytes 4 - 5: INT16 Y-Offset. The video frame data starts at this line in the frame.
What follows is the video frame data in the same format as the 0x01 type.
This block always occupies the last byte the file.
Bytes 1 - 1: CHAR8 Block Type (0x14).
This is only included for completeness.
Note: All addresses have been spamproofed. To email someone, you must first pass the simple Turing test of putting their @ back where it's supposed to be.
For those of you (like me) who find it difficult to get your heads around a file format with specifications alone, here is a breakdown (in English) of the inner workings of a VID reader, based on my own audio/frame dumper app. Hopefully, you will find it helpful.
Be a good little programmer and initialize your variables:Read 3 ASCII bytes from the beginning of the VID File; if they read "VID", continue.
Read an INT16; it should have the value 512.
Read an INT16 containing the Frame Count.
Read an INT16 containing the Frame Width.
Read an INT16 containing the Frame Height.
Read an INT16 containing the Global Delay.
Read an INT16 containing Unknown Value 1.
Calculate FrameBuffer Size by multiplying Frame Width and Frame Height.
Do the following:Note that this is just the basic functionality for interpreting VID files. In an actual player, you would probably want to cache the next couple KB or so of the file to improve speed on video blocks, and generally optimize the hell out of the code.
Andux, Had a comparison of your format versus mine - I'd split up my frames in a kind of odd way - I'd been bundling the audio and video together and decoding based upon that - but the same result was acheived. I had a feeling there were some more similarities between frame types to be discovered. Anyway, as payback - here's a couple of "unknowns" in your document for you. In the header: Bytes 12 - 13: INT16 Unknown In the video block: Bytes 2 - 3: INT16 Unknown value These are both used to calculate the amount of time each frame should be displayed (and presumably audio frame is heard). They are both little endian format. The header delay value is added to the video block delay value to produce the total time for the delay. So by doubling the header value in a file, it halves the overall frame rate. Equally you can make adjustments to the individual frames by modifying that value on a per frame basis. Now - you'll be wanting to know what these "time delay" units are right? Well - now we enter my approximation zone. To the best of what I've been able to tell - these units should be 16 millisecond each. So, if the total of header and frame delays is 0x0100 (which would be very high btw) then delay should be 256 * 16 = 4096 milliseconds = about 4 seconds. Just don't quote me on that 16 millisecond bit ;o) Anyway - I'm tired and I've rambled enough - I realised I'd attached a slightly old exe to my last mail - so the most current one is attached this time. Thanks again, CC
Looking at the format, it seems like each video frame is supposed to be displayed for the length of the next audio block (i.e., until the next frame overwrites it). Maybe the delay values are used to prevent audio skipping or something (e.g., "Start working on the next frame after X milliseconds so it's ready before the audio finishes.").
Correction: Each video frame is linked to the audio block immediately before it in the file. Comparing the results of CC's audio_block_size calculation (see below) to the actual DF videos confirms this.
Further update: After doing some calculations based on the new (relatively speaking) sample rate formulas, I have determined that one delay unit is equal to 1/60th of a second (or as close to it as possible, given all the wacky sample rates and integer division). Thus, CC's formulas may be adapted to other sample rates as follows:
AudioGranuleSize = SampleRate DIV 60 AudioBlockSize = (HeaderDelay + FrameDelay) * AudioGranuleSize FrameDisplayTimeInSeconds = (HeaderDelay + FrameDelay) / 60
Note, however, that I have not yet had a chance to test this with the Daggerfall player.
This document is hosted as part of Andux's Daggerfall Studio.