Semantic interpretation of files
Semantic interpretation of files
Background
There are thousands of different file formats which, to make matters worse, can also be nested inside each other: For example, a Debian GNU/Linux package is an ar archive that contains, among other things, a gzip-compressed tar archive with the files, e.g. an OpenOffice.org document, which is nothing other than a zip archive with several xml text files and other embedded files such as png images or films, which in turn contains an mpeg2 video and several ogg-vorbis audio tracks in a QuickTime-mov container format. This package file is then located in an ISO-9660 file system, which is provided via a loop-back device from a file on an ext3 file system, which is formed via RAID5 across several SATA hard drives.
At the lowest level, the bits are stored as zeros and ones; they can only be interpreted correctly with the necessary contextual knowledge. Troubleshooting is particularly time-consuming when errors occur, as the description of the formats often only exists in text form and has to be painstakingly worked through by the person searching for the error.
For a few formats, there are special editors that offer better support for low-level editing. But as the example above shows, many formats can be linked together and nested inside each other.
In a possible future extension, the context information will also be used to save changes between different versions of a file more efficiently.
Job description
As part of the work, a framework for displaying and , if necessary, editing data is to be created. It should be possible to store information for interpreting the data for various formats. The practicability of the created solution is to be underpinned using some typical file formats.