Daggerfall VID File Format

March 29, 2007

(XHTMLized by Andux)

Overall VID Format

The overall file is composed of a main header followed blocks of audio and video data.

	Bytes   1 -  15:	Header
	Bytes  16 - EOF: 	Various Blocks
		...

The type of block is identified by the first byte in the block.

	0x01 = Video, Compression #1
	0x02 = Palette
	0x03 = Video, Compression #2
	0x04 = Video, Compression #3
	0x14 = End of File
	0x7C = Audio, Start Frame
	0x7D = Audio

The header is always the first 15 bytes of the VID file.

	Bytes   1 -   3:	ASCII	Always "VID"
	Bytes   4 -   5:	INT16	Always 0x00 02 (512)
	Bytes   6 -   7:	INT16	Number of Frames
	Bytes   8 -   9:	INT16	Video Frame Width (256 or 320)
	Bytes  10 -  11:	INT16	Video Frame Height (200)
	Bytes  12 -  13:	INT16	Global Delay value?
	Bytes  14 -  15:	INT16	0x0E 00 (14) in all DF VIDs

Palette Block - 0x02

This usually only appears immediately after the header.

	Bytes  1  -  1 :	CHAR8	Block Type (0x02)
	Bytes  2  - 769:	RGB	256-color VGA palette. Each palette entry is
	                		composed of 3 triplets (R/G/B) ranging in 
	                		value from 0 to 63.

Experimentation has revealed that the Daggerfall engine does corectly interpret palette blocks inserted later in the file, though none of the videos included with the game make use of this functionality.

Audio Block - 0x7C

This is usually the first audio block in the file.

	Bytes   1 -   1:	CHAR8	Block Type (0x7C).
	Bytes   2 -   3:	INT16	Unknown; always 0.
	Bytes   4 -   4:	CHAR8	Sound Blaster DAC init value (usually 0xA6)
	Bytes   5 -   6:	INT16	Audio block data length
	Bytes   7 - ???:	AUDIO	8-bit audio data

The DAC init value corresponds to the audio sample rate as follows:

	InitVal = 256 - (1000000 DIV SampleRate)
	SampleRate = 1000000 / (256 - InitVal)

Audio Block - 0x7D

The audio blocks hold the audio for each frame.

	Bytes   1 -   1:	CHAR8	Block Type (0x7D).
	Bytes   2 -   3:	INT16	Audio block data length
	Bytes   4 - ???:	AUDIO	8-bit audio data

Video Block - 0x01, 0x03 and 0x04

The basic structure of the video blocks is all the same:

	Bytes   1 -   1:	CHAR8	Block Type (0x01, 0x03 or 0x04).
	Bytes   2 -   3:	INT16	Display time (in 60ths of a second)
	Bytes   4 - ???:	VIDEO	Compressed video data.

Each type of video block has a slightly different compression format (see below for details). Unfortunately there is no record size so you must completely parse the video data in order to find the next block. There is only one frame per block, so you can stop parsing video when InputByte == 0 or Bytes Copied >= (Frame Width * Frame Height).

It is important to note exactly how the video frames go together. Only the 0x03 video block actually contains a full frame of data. The 0x01 and 0x04 types only contain the pixels that have changed from the previous frame. Thus, generally in order to render any one frame you must also render all previous frames up to the first 0x03 video block.

Video compression - 0x01

The video data in a form of RLE compression with the following algorithim used for uncompression:

	InputByte = Read 1 Byte From File

	if ( InputByte >= 0x80 )
		RunLength = InputByte - 0x80
		Skip RunLength Pixels in the Video Frame

	else if ( InputByte == 0 )
		End of Video Frame

	else 
		RunLength = InputByte
		Read RunLength Bytes From File and Copy to Frame
	endif

Video compression - 0x03

This is usually the first video block in the file and contains a full frame of video in regular RLE format. The basic uncompressing algorithm is:

	InputByte = Read 1 Byte From File

	/* RLE compression */
	if (InputByte >= 0x80) 
		NumberofBytes = InputByte - 0x80
		InputByte = Read 1 Byte From File
		Add NumberofBytes of InputByte to Video Frame

	else if (InputByte == 0) /* Should never happen in 0x03 */
		End of Video Frame

	else 
		NumberofBytes = InputByte
		Read NumberofBytes from File and Copy to Video Frame
	endif

Video compression - 0x04

This format has an extra header variable:

	Bytes   4 -   5:	INT16	Y-Offset. The video frame data 
	                	starts at this line in the frame.

What follows is the video frame data in the same format as the 0x01 type.

End of File - 0x14

This block always occupies the last byte the file.

	Bytes   1 -   1:	CHAR8	Block Type (0x14).

This is only included for completeness.

Credits

Note: All addresses have been spamproofed. To email someone, you must first pass the simple Turing test of putting their @ back where it's supposed to be.


Programming Example

For those of you (like me) who find it difficult to get your heads around a file format with specifications alone, here is a breakdown (in English) of the inner workings of a VID reader, based on my own audio/frame dumper app. Hopefully, you will find it helpful.

Be a good little programmer and initialize your variables: Ensure the VID File exists, and open it.

Read 3 ASCII bytes from the beginning of the VID File; if they read "VID", continue.
Read an INT16; it should have the value 512.
Read an INT16 containing the Frame Count.
Read an INT16 containing the Frame Width.
Read an INT16 containing the Frame Height.
Read an INT16 containing the Global Delay.
Read an INT16 containing Unknown Value 1.

Calculate FrameBuffer Size by multiplying Frame Width and Frame Height.

Do the following: Repeat until you reach the end of the VID File.

Note that this is just the basic functionality for interpreting VID files. In an actual player, you would probably want to cache the next couple KB or so of the file to improve speed on video blocks, and generally optimize the hell out of the code.


Community Response

Re: DF movies

From: Corroded Coder
Date: December 14, 2003
Andux,

Had a comparison of your format versus mine - I'd split up my frames in a
kind of odd way - I'd been bundling the audio and video together and
decoding based upon that - but the same result was acheived. I had a feeling
there were some more similarities between frame types to be discovered.

Anyway, as payback - here's a couple of "unknowns" in your document for you.

In the header:
Bytes  12 -  13:	INT16	Unknown
In the video block:
Bytes   2 -   3:	INT16	Unknown value

These are both used to calculate the amount of time each frame should be
displayed (and presumably audio frame is heard). They are both little endian
format. The header delay value is added to the video block delay value to
produce the total time for the delay. So by doubling the header value in a
file, it halves the overall frame rate. Equally you can make adjustments to
the individual frames by modifying that value on a per frame basis.

Now - you'll be wanting to know what these "time delay" units are right?
Well - now we enter my approximation zone. To the best of what I've been
able to tell - these units should be 16 millisecond each. So, if the total
of header and frame delays is 0x0100 (which would be very high btw) then
delay should be 256 * 16  = 4096 milliseconds = about 4 seconds. Just don't
quote me on that 16 millisecond bit ;o)

Anyway - I'm tired and I've rambled enough - I realised I'd attached a
slightly old exe to my last mail - so the most current one is attached this
time.

Thanks again,
CC

Looking at the format, it seems like each video frame is supposed to be displayed for the length of the next audio block (i.e., until the next frame overwrites it). Maybe the delay values are used to prevent audio skipping or something (e.g., "Start working on the next frame after X milliseconds so it's ready before the audio finishes.").

Correction: Each video frame is linked to the audio block immediately before it in the file. Comparing the results of CC's audio_block_size calculation (see below) to the actual DF videos confirms this.

Further update: After doing some calculations based on the new (relatively speaking) sample rate formulas, I have determined that one delay unit is equal to 1/60th of a second (or as close to it as possible, given all the wacky sample rates and integer division). Thus, CC's formulas may be adapted to other sample rates as follows:

	AudioGranuleSize = SampleRate DIV 60

	AudioBlockSize = (HeaderDelay + FrameDelay) * AudioGranuleSize
	FrameDisplayTimeInSeconds = (HeaderDelay + FrameDelay) / 60

Note, however, that I have not yet had a chance to test this with the Daggerfall player.

Notes for Andux - re: Skynet .vid file formats.

From: Corroded Coder
Date: January 11, 2004

This document is hosted as part of Andux's Daggerfall Studio.