MoMuSys (MObile MUltimedia SYStems)
The Video Object Generation Tool with User Environment
VOGUE is an interactive tool for the creation of video objects
suitable for MPEG-4 encoding.
Example of video object produced using VOGUE.
Top: some frames of the original image sequence;
Middle: masks created with VOGUE;
Bottom: final video object
The Video Object Generation Tool with User Environment was developed
in the framework of the ACTS098 MoMuSys
The recent MPEG4 standard
supports content-based functionalities. This means that the different video
objects (e.g. the presenter of a news program without the background) in
a video sequence can, for instance, be individually coded and manipulated.
However, the standard intentionally leaves open the issue of object definition,
i.e the segmentation of a video sequence into different video objects,
and this has created the need for the development of efficient segmentation
Automatic segmentation is an ill-posed problem. The object of interest
can be different for the same scene depending on the user or the application.
Automatic segmentation is therefore a problem without a general solution,
at least at the current state-of-the-art. User-assisted segmentation offers
an attractive solution by letting the user to introduce semantic aspects
while keeping an important part of the process automatic.
VOGUE is an integrated framework for user-assisted segmentation of video
sequences. It integrates several different algorithms into a common graphical
user interface. A fully interactive static segmentation algorithm allows
the user to quickly create an initial mask defining the object of interest.
Afterwards, the tracking algorithm will allow to follow the object throughout
the video sequence. An algorithm for the detection of moving objects (temporal
segmentation) is as well included, which is specially suited to those cases
where the objects of interest are the moving objects. User interaction
is possible during the whole process. The tracking or the temporal segmentation
can be stopped at any time, to make corrections using the tools provided
by the static segmentation module.
Figure 1 shows a picture of the
graphical user interface. On the left-hand window, the original video sequence
is displayed. On the right-hand window, the working window displays the
state of the current segmentation in the form of colour labels. On the
left-hand part of the working window, the colour labels are superimposed
to the luminance of the original image for better evaluation of contour
Figure 1. VOGUE Graphical User Interface.
The spatial segmentation algorithm permits to define the segmentation mask
of the object(s) of interest. It is based on a multi-scale segmentation
scheme([5,6]). A family of nested
partitions is constructed (Fig. 2). The coarsest
level considers the image as a whole (as only one region) and finer partitions
are always included in coarser ones. This means that a finer level is obtained
by re-segmentation of regions of the previous level. The user is then given
the tools to navigate between the different resolution levels of
the family, to create the desired partition (Fig.
As an alternative, the user can select the regions of the partition
just roughly drawing a marker for each object of interest and a marker
for the background. The spatial segmentation algorithm finds automatically
the real contours of the selected objects ( Fig.
Once the segmentation mask is available for an initial image, it can be
automatically extended to the following images of the sequence. For this
purpose a tracking algorithm has been implemented. It is based on a partition
projection allowing the introduction of new regions, followed by a decision
on the regions that belong to the new mask. The result of the automatic
tracking is displayed to the user, who has the possibility to stop the
execution and ask for refinements of the object mask . The user corrections
will then be used by the automatic algorithm to improve its subsequent
performance. Figure 5 shows
video objects obtained with the tracking algorithm.
The temporal segmentation is useful when the user is interested in moving
objects. It is based on a change detection followed by a motion analysis.
It consists in the following steps: (1) camera motion estimation and compensation
assuming that the background of the scene is a rigid plane, (2) scene-cut
detection based on the evaluation of the mean absolute difference between
the camera motion compensated previous frame and the current frame, (3)
estimation of change detection mask by thresholding the frame difference
between two successive frames, (4) uncovered background elimination analyzing
a displacement vector field, (5) contour adaptation to the luminance edges
of the current frame in order to get a more accurate object boundary.
Figure 6 shows example of results obtained
using temporal segmentation.
This demo-tutorial will guide you through VOGUE's main functionalities.
Click here to see the tutorial.
B. Marcotegui, P. Correia, F. Marques, R. Mech, R. Rosa, M. Wollborn, F.
Zanoguera "A Video Object Generator Tool Allowing Friendly User Interaction"
in ICIP-99, Kobe (Japan), October 1999. pdf
format (256 Kbytes)
P. Correia and F. Pereira. "User Interaction
in Content-Based Video Coding and Indexing" in EUSIPCO-98, Rhodes,
Greece. Sept 1998.
P. Correia and F. Pereira. "The Role of Analysis
in Content-Based Video Coding and Indexing", Signal Processing-Special
Issue on Video Sequence Segmentation for Content-Based Processing and Manipulatation.
Vol 66, No. 2. pp. 125-142. April 1998.
C. Vachier and F. Meyer, "Extinction Values: A New Measurement of Persistence",
IEEE Workshop on Non Linear Signal/Image Processing, pp. 254-257,
F. Meyer, "Morphological multiscale and interactive segmentation", IEEE-EURASIP
Workshop on Nonlinear Signal and Image Processing, Antalya. Turkey,
F. Zanoguera, B. Marcotegui, F. Meyer. "A Toolbox for
Interactive Segmentation Based on Nested Partitions" ICIP-99, Kobe (Japan),
F. Marques and M. Pardas and P. Salembier, Video Coding: The second generation
approach, chapter Coding-oriented segmentation of video sequences,
pp. 79-124. L. Torres and M. Kunt (Eds). Kluwer Academic Publishers, 1996.
F. Marques and J. Llach, "Tracking of generic objects for Video Object
generation", IEEE International Conference on Image Processing,
Chicago, USA, October 1998.
R. Mech and M. Wollborn. "A Noise Robust Method for 2D Shape Estimation
of Moving Objects in Video Sequences Considering a Moving Camera", Signal
Processing, Vol 66, No 2, pp 203-217, 1998.
ISO/IEC JTC1/SC29/WG11 Doc. 2502. Information Technology - Generic Coding
of Audio-visual Objects: Visual, ISO/IEC 14496-2, Final Draft of International
Standard, Annex F. October 1998.