Semantic interpretation of files

Semantic interpretation of files

Diploma thesis

Contact person: Dipl.-Inform. Philipp Hahn
PDF-Dokument

Completed on 12 October 2009 by Moritz Brandt

Subject areas

  • File systems

Background

There are thousands of different file formats which, to make matters worse, can also be nested inside each other: For example, a Debian GNU/Linux package is an ar archive that contains, among other things, a gzip-compressed tar archive with the files, e.g. an OpenOffice.org document, which is nothing other than a zip archive with several xml text files and other embedded files such as png images or films, which in turn contains an mpeg2 video and several ogg-vorbis audio tracks in a QuickTime-mov container format. This package file is then located in an ISO-9660 file system, which is provided via a loop-back device from a file on an ext3 file system, which is formed via RAID5 across several SATA hard drives.

At the lowest level, the bits are stored as zeros and ones; they can only be interpreted correctly with the necessary contextual knowledge. Troubleshooting is particularly time-consuming when errors occur, as the description of the formats often only exists in text form and has to be painstakingly worked through by the person searching for the error.

Job description

As part of this work, the results already achieved in a previous project are to be further developed. In addition to the expansion to include additional file formats, the programme needs to be revised in order to better support some data types such as enumerations and character strings. Integration or co-operation with Strigi and Nepomuk is also possible.

Previous knowledge

  • BS1
  • (VBS)

Comment

The work contains practical elements.

(Changed: 11 Feb 2026)  Kurz-URL:Shortlink: https://uol.de/p37541en
Zum Seitananfang scrollen Scroll to the top of the page

This page contains automatically translated content.