Time Stamping and Synchronization
Transport Protocols and File Wrappers
Large Volume Motion Imagery (LVMI)
What is the Motion Imagery Standards Board (MISB)?
The Motion Imagery Standards Board (MISB) was established in accordance with DoD Directive 5105.60 "to formulate, review, and
recommend standards for Motion Imagery, associated Metadata, Audio, and other related systems" for the Department of Defense (DoD), Intelligence
Community (IC), and National System for Geospatial-Intelligence (NSG). The MISB exists under the Geospatial Intelligence Standards Working Group (GWG)
which is operated by the GEOINT Standards Center of Excellence division at NGA.
The MISB meets three times a year (typically February, June and October) in the Washington D.C. metropolitan area. The MISB is comprised of working
groups that address different functional areas regarding Motion Imagery.
Why should I care about the MISB? Where and when do MISB requirements apply?
Any Motion Imagery (MI) system subject to the DoD IT Standards Registry (DISR) is subject to MISB standards and requirements. If you
are manufacturing Motion Imagery systems or components for use within the DoD/IC communities, those systems and components are subject to MISB
standards and requirements.
What constitutes a Motion Imagery system (as defined by the MISB)?
Any imaging system that provides the functionality of collecting, encoding, processing, controlling, exploiting, viewing, and/or
storing Motion Imagery as defined in MISP-2015.1 or later. This explicitly includes, but is not limited to, phenomenologies such as Electro-optical (EO),
Infrared (IR), Synthetic Aperture Radar (SAR), Multi-spectral (MSI), and Hyper-spectral (HSI). Video Teleconference (VTC), Video Telemedicine, and Video
Support Services applications DO NOT fall within the purview of the MISB and are not subject to its requirements.
What is the difference between Motion Imagery and Full Motion Video?
Motion Imagery is a sequence of Images, that when viewed (e.g. with a media player) must have the potential for providing informational or
intelligence value. This implies the Images composing the Motion Imagery are: (1) generated from sensed data, and (2) related to each other both in time
and in space. Some sensed data, such as Visible Light and Infrared, can be used directly to form Images, while others, such as SAR and LIDAR, require a
conversion to a viewable Image. To satisfy the time and space relationship the capture time (i.e. the time the Image was taken) of each successive Image
must be sequentially in order and the space relationship between each successive Image must have some recognizable visual overlap with the previous Image.
Full Motion Video (FMV) is a term used within the military and intelligence communities. As used, FMV implies a very narrow subset of Motion Imagery;
one that assumes geo-spatial metadata, commercial image formats and playback rates. FMV has no formal definition and conveys different meanings to different
communities; therefore, the term FMV should not be used in any contractual language.
Building a new Motion Imagery system: At a high level what do I have to do to be MISB compliant?
To be MISB compliant, any Motion Imagery system must:
- Be digital
- Produce a compliant MPEG-2 Transport Stream (TS); Note this does not apply to JPEG 2000 based systems or RTP-based systems.
- Use MPEG-2, MPEG-4 Part 10 (H.264/AVC), or JPEG 2000 image compression
- Produce non-destructive (not "burned in") metadata
- Comply with the MISB ST 0902
- Add metadata elements as needed for the task (e.g., MISB ST 0601, MISB ST 0801, etc.)
Older systems used MISB EG 0104
, which has been deprecated. The Motion Imagery Standards Profile (MISP) codifies all MISB requirements, Standards,
and Recommended Practices. The MISP is found on the MISB website, and is cited in the DISR.
Building a new Motion Imagery system: What should I avoid doing?
Do not build systems with any of the following:
- Analog image capture/processing
- Digital systems that use interlaced scanning
- Destructive ("burned in") metadata
- MISB EG 0104
- Systems that utilize proprietary file formats, metadata encodings or compression algorithms
- Systems that utilize standardized file formats, metadata encodings and compression algorithms not cited in the MISP; Just because a standard exists,
does not mean it has been endorsed by the MISB for use within the community
Where is the MISB website, and what can I find there?
The MISB website is http://www.gwg.nga.mil/misb. The MISP (Motion Imagery Standards Profile) and all current Standards, Recommended Practices
(RPs), and Technical Reference Material (TRMs) can be found there. A good starting point is to review the Motion Imagery Handbook, which provides some
fundamentals on Motion Imagery and sets the stage for a better understanding of the MISP. The MISP defines requirements that programs can use in the acquisition
phase; it includes references to all subsequent MISB STs, RPs and TRMs. For access to draft documents, test files, and other support documentation follow
the instructions on the website to apply for an account to access the MISB protected website.
What is the difference between a Standard and a Recommended Practice?
A document is eligible to be a Standard when it meets at least one of the following criteria:
- Facilitates interoperability and consistency
- Defines metadata elements
Where the MISP term Standard (ST) is used, the MISP item mandates binding technical implementation policy, and as such, should be identified in Government
procurement actions as a mandatory conformance item in order for vendor offerings to be accepted by the Government.
A document begins the standards process as "Developing", where it is authored and presented for community review and approval. Once adopted the developing
Standard moves to an "Approved" status. Standards that are obsolete or replaced are declared "Deprecated", while those no longer in use are "Retired".
A document is considered a RP when it:
- Provides guidance that facilitates the implementation of a Standard
- Is not required for interoperability, but when used states requirements for its usage
Recommended Practices should be considered technical implementation policy. They may be identified in Government procurement actions as a mandatory conformance
item in order for vendor offerings to be accepted by the Government.
Need a viewer to play the Motion Imagery and Metadata: What should I choose?
There are several COTS and GOTS tools available from a variety of government contracting companies as well as a few commercial companies. The
MISB cannot make recommendations regarding software and hardware solutions.
Need to get a Motion Imagery system certified: How do I start the process?
NGA is responsible for overseeing conformance testing to GEOINT standards.
- Results from conformance testing are submitted to the NSG GEOINT Functional Manager Standards Assessment (GFMSA) program, which ultimately
provides NGA's recommendations on GEOINT conformance to JITC's Interoperability Certification process on a per program basis.
- NGA oversees the process whereby a Motion Imagery System achieves and sustains conformance through the NGA Conformance Program for Motion Imagery.
- The Motion Imagery Standards Board (MISB), acting as NGA's delegate for Motion Imagery, is instituting the NGA Conformance Program for Motion Imagery.
The NGA Conformance Program for Motion Imagery defines the testing policies and procedures to meet conformance to the (MISP) as issued by the MISB.
- The MISP states requirements and specifies standards for maximizing interoperability in the production, exchange and use of Motion Imagery.
The NGA Conformance Program for Motion Imagery prescribes test policies, defines the roles and responsibilities of participating organizations, outlines test
processes, and identifies artifact repositories for test reports and certificates of conformance. A companion document, the NGA Conformance Test Plan for Motion
Imagery, defines the baseline suite of tests, test procedures, test equipment and test report templates to document results of conformance testing. The NGA
Conformance Test Plan for Motion Imagery is specifically tailored to measure conformance to the MISP issued by the MISB.
Of the approved compression algorithms (MPEG-2, H.264 and JPEG 2000), which one should I use?
H.264 yields the best quality for low bandwidth applications. For similar video quality MPEG-2 compression needs roughly twice the bandwidth.
H.264 is quickly replacing MPEG-2 in the commercial world.
JPEG 2000 is intraframe compression rather than intra- and interframe as is found in MPEG-2 and H.264. JPEG 2000 therefore consumes 2-3 times the bandwidth
of MPEG-2. However, JPEG-2000 accommodates very large frame (Gpixels) sizes and has low (1 frame) latency. JPEG 2000 has features that make it very useful in Large
Volume Motion Imagery (LVMI) applications, while H.264 and MPEG-2 are typically elsewhere. For more information regarding LVMI systems see the section below.
If I have a choice between MPEG-2 and H.264, which one is recommended?
H.264 offers better performance over MPEG-2 in reduced bandwidth for similar quality (about 2 to 1). This improvement comes with increased
complexity in the encoder and decoder, which affects overall cost, but with the complete adoption of H.264 in the commercial world this is becoming less of
an issue. DISA is pushing H.264 over MPEG-2 throughout their networks. For HD applications, H.264 offers the best performance overall.
What's wrong with MISB EG 0104? The MISB promulgated it, after all.
MISB EG 0104: Predator UAV Basic Universal Metadata Set was the first step in moving away from the analog metadata used by the initial
RQ-1's. Although still supported by the MISB for legacy systems, there is no reason to use it in a new system. Any information conveyed with MISB EG 0104 can
be conveyed with MISB ST 0601 with greater precision and bit-efficiency.
What is KLV metadata?
KLV stands for Key-Length-Value. KLV metadata comes in self-contained binary units. The Key describes the metadata element, the Length defines
the data in number of bytes, and the Value contains the actual data. KLV metadata is very bit-efficient. The Society of Motion Picture and Television Engineers
(SMPTE) standard, SMPTE ST 336: Data Encoding Protocol Using Key-Length-Value, defines the KLV data encoding protocol.
KLV metadata isn't human readable! Why?
KLV is expressed in binary bits, which provide a very efficient representation of data. There is a great deal of padding in XML to make it
"human readable" that wastes precious bandwidth. KLV metadata can be translated into human-readable XML (and vice versa) without loss of information,
Where do I find definitions for KLV keys?
The structure of KLV metadata is defined in SMPTE ST 336. The actual metadata dictionaries are SMPTE RP 210 and MISB ST
Why are there two metadata dictionaries?
SMPTE created the standard for KLV encoding of metadata. SMPTE produces and maintains a KLV metadata dictionary (SMPTE RP 210). Various
organizations are allowed to buy part of the KLV domain name-space to maintain private metadata dictionaries. The DoD was the first organization to take advantage
of this offer. Initially, most of the metadata keys used by the MISB were registered in SMPTE RP 210, but, over time, several issues became apparent.
First, it can take 12-24 months to get a new KLV metadata key approved by SMPTE. Second, SMPTE does not give tight definitions to their metadata elements.
MISB ST 0807 is the metadata dictionary for elements in the DoD private domain space. The MISB can assign keys quickly if necessary (a week is
common), and can define their meaning and usage to whatever exactitude is necessary. Finally, because the keys in MISB ST 0807 are not published to the
general public, it is possible to maintain classified keys.
Which metadata dictionary (MISB or SMPTE) has precedence?
MISB ST 0807 has precedence over SMPTE RP 210.
How can I tell if a key is in SMPTE RP 210 or MISB ST 0807?
All KLV Keys are 16 bytes long. All SMPTE keys (including the DoD private keys in MISB ST 0807) begin with the 4-byte sequence
06 0E 2B 34 (in hexadecimal). Keys from MISB ST 0807 have the ninth byte set to 0E and the tenth byte set to 01, 02, or 03. A MISB key will,
therefore, have the form 06 0E 2B 34 xx xx xx xx 0E [01, 02, or 03] xx xx xx xx xx xx. As a general rule, older MISB documents have SMPTE RP 210 keys,
and newer MISB documents have their keys registered in MISB ST 0807.
If I need new keys registered, should I go to SMPTE or the MISB?
Go to the MISB. Keys can be created faster and their usage defined unambiguously.
What KLV metadata do I need to use?
MISB ST 0902: Motion Imagery Sensor Minimum Metadata Set is a required metadata set. Depending on your mission requirements and CONOPS,
you may need to support more than the baseline elements from MISB ST 0902 defined in other MISB documents.
What is time stamping?
All Motion Imagery and Metadata are required to have a time stamp. The MISB standards define the time format and how a time stamp should be
inserted into a Motion Imagery stream. MISB ST 0603: Common Time Reference for Digital Motion Imagery Using Coordinated Universal Time (UTC)
, defines UTC
as the preferred time stamp. MISB ST 0605: Encoding and Inserting Time Stamps and KLV Metadata in Class 0 Motion Imagery
, describes how to insert time
stamps into uncompressed Motion Imagery, while MISB ST 0604: Time Stamping Compressed Motion Imagery
describes the insertion of time stamps into
compressed Motion Imagery.
I don't understand time stamping, why do I need it?
Time stamping aides the search and discovery process. Time stamps provide a means to align metadata with collected Motion Imagery for event
analysis and exploitation. It is not uncommon for platform metadata to be collected asynchronously relative to the Motion Imagery. For example, platform
elevation, heading and speed might be collected at 7 Hz, while the Motion Imagery might be collected at 30 Hz. Clearly the metadata will not temporally align
with the Motion Imagery frames. Time stamping will allow for interpolation of the metadata, if needed, for processing or exploiting a given Motion Imagery frame.
What is the difference between asynchronous and synchronous metadata?
In general, asynchronous metadata is not registered to a particular frame in the Motion Imagery (MI). Units of metadata travel in close
proximity to corresponding events in the MI, but this proximity can vary depending on how the MI and metadata information is processed. If the asynchronous
metadata has time stamp information associated with it, the metadata can be correlated with the MI frames (some interpolation of the metadata may also be required).
Synchronous metadata is registered in temporal alignment with MI frames. Events in the imagery can then be accurately associated with the corresponding metadata.
It is preferred that all future MI systems employ synchronous metadata.
Why is MPEG-2 Transport Stream a desired carrier for Motion Imagery?
MPEG-2 Transport Stream (TS) was designed originally for digital television transmission. As an international standard, it is widely supported
and many tools are available for testing and compliance. The value in Motion Imagery is greatly increased when augmented with metadata, and MPEG-2 TS provided
an excellent vehicle to carry Motion Imagery and Metadata as a unified package over IP networks.
What is RTP/RTCP/RTSP used for?
Real-time Transport Protocol (RTP) is designed to deliver real time media, such as video and audio, over internet protocol (IP). Specifically,
RTP addresses the public internet, where quality-of-service (QoS) is not guaranteed. RTP is a protocol layer added (typically) on top UDP that adds a time
stamp and count to every data packet to aid the receiver in reconstructing the stream when packets suffer latency, become reordered, or are lost in the network.
MPEG-2 transport stream does not do as well in such environments because it was designed for constant delay networks like broadcast. Some systems do use RTP to
carry MPEG2 transport stream at the expense of additional data overhead and may be less robust in the presence of lost packets.
RTP generally is accompanied with the bi-directional server/client protocol RTCP (RTP Control Protocol). RTCP provides network and timing information between
video senders (servers) and receivers (clients). Clients and servers use this information to determine QoS operating points and to maintain
real-word time synchronization. Finally, RTSP (Real Time Streaming Protocol) provides information that allows clients and servers to describe and establish
video streaming sessions and it gives clients TiVo-like control for the client to record, rewind, stop, play, and fast-forward the stream. MISB ST 0804
addresses the use of RTP.
What is JPIP used for?
JPIP (JPEG 2000 Interactive Protocol) is similar in spirit to RTP and RTSP (there currently is no RTCP equivalent within JPIP). JPIP is a
client/server streaming protocol that provides interactive delivery of JPEG 2000 compressed imagery. It allows a client to specify a region of interest out
of a large image, at a desired resolution and image quality and have the data streamed to the client. Using JPEG 2000 and JPIP together it is possible to
browse very large images (1 Gpixel and up) on lightweight clients (PDAs). This is possible because only small portions of the compressed image are streamed
from the server to the client. As the client changes their viewing region, the server streams new information to the client to update the image display. The
MISB anticipates that JPIP will be very useful for LVMI applications (see below).
What are the differences between file transfer, progressive download, and streaming?
File transfer is based on FTP, which is a protocol that guarantees complete delivery of the file to a receiver. FTP operates over TCP/IP,
and therefore all packets are assured they will be received as transmitted. Because of this the download of a file using FTP can take a long time, and the user
must wait for the content to be delivered in its entirety prior to viewing. Progressive download helps this by invoking a buffer in the receiver that will
display the content after sufficient data has been received; the user must still wait, however.
Streaming is designed to accommodate real time delivery of content and is appropriate for live events and time-critical applications. Streaming operates
over UDP/IP, and for this reason cannot guarantee that all packets transmitted will be received. The quality of content received via streaming may be low as
the server/client attempt to deliver the stream as fast as possible to meet real time delivery. Image size, frame rate, and bits assigned to preserve image
detail all may be adjusted to meet the channel bandwidth. See MISB TRM 0803 Delivery of Low Bandwidth Motion Imagery and TRM 0703 Low Bandwidth Motion
Imagery - Technologies for more information.
What is the best format to store/archive Motion Imagery?
At this time, the MISB advocates the MPEG-2 Transport Stream (TS), AAF (Advanced Authoring Format) and MXF (Material eXchange Format) as file
wrappers. MPEG-2 TS is a delivery format that also serves as container. AAF can accommodate historical editing and updates of content as it moves through its
production. MXF is emerging as a format that can manage complex content and metadata, and is also designed for exchange of motion imagery. The "best"
format to choose is application dependent.
What is Large Volume Motion Imagery (LVMI)?
LVMI systems typically collect very large frame imagery (100 Mpixels to a 10 Gpixels per frame) using arrays of smaller cameras or
multiple focal plane sensors which are then processed to form a large single image mosaic.
LVMI systems may incorporate more than one sensor modality. For example, an HD MI (i.e. 24 - 60 Hz frame rate) camera might be present in addition
to the large image array. LVMI systems typically collect very large volumes of data (terabytes to petabytes during a collection) and may provide Motion
Imagery streaming services off of the platform during data collects.
Wait a minute; you said JPEG 2000 compression was not as good as MPEG-2 or H.264 for motion imagery. Why are you using it for LVMI?
The reasons for this choice are many. JPEG 2000 provides a multi-resolution representation of the compressed image. This is very important when
dealing with 100 Mpixel - 10 Gpixel images where it is impossible to view the full image at full resolution. When dealing with imagery of this size you typically
look at the full image at reduced resolution and zoom in to increased resolutions as you define your areas of interest. JPEG 2000 excels at this. It is trivial
to extract reduced resolution data sets (RRDS) from a JPEG 2000 compressed image. JPEG 2000 also provides easy region of interest access and decoding and the
ability to adjust decoded/transmitted image quality on the fly. This allows users to select a desired spatial region of interest in a large image and even
control the visual quality received.
Furthermore, JPEG 2000 allows for images with bit depths up to 32 bits/pixel/color component and it allows up to 16,000 color components, so it is suited for
multi-spectral image compression. JPEG 2000 can even compress an image losslessly when needed and extract from the losslessly compressed image a reduced quality
version to save on transmission bandwidth. The JPIP protocol allows for very efficient transmission of portions of large compressed images over modest bandwidth
links. Most of these features are simply not available within the MPEG-2 and H.264 standards.