Distribution of file formats on european physics server
Research within the scope of the DFN Project
EPRINT
Günter Rohen
Fachbereich Physik
Universität Oldenburg
Oldenburg, Tue Nov 11 14:23:09 CET 1997
The insanely fast development of the computer technology, particularly within
the data transfer, unforunately that means a certain confusion.
It is difficult to say, on what position of development we are here.
Whithin such a dynamically increasing branch of information technology
it is extremely important to know the state of development. Predicts and on
these based settings of the course at the concept planning could be realised. A
forseeing and forplanning concept supports a quality directed dynamic of the
development.
Within the scope of the German Research Net (DFN) project EPRINT such an
Investigation was done. EPRINT aspires to the examplary Arrangement of
a collection- and communicatorservices for preprints and other scientific and
specialist documents as well as the research of the optimal interlinking
distributed information systems.
The - at this time alternative - two concepts, first the concept of a central
preprint archive (like the preprint server at Los Alamos National Laboratory)
and the concept decentrale distributed gatherer and broker (Harvest)
Gatherer und Broker (Harvest) - should be connected. The gigantic requirement
of recources at the concept of the central server should be avoided and
minimalised by distributed mehtods.
One aim of EPRINT is the creation of decentral EPRINT Server, that should
collect publications. Specifically for this purpose matched search engines
should make the information accessible on the WWW.
Involved in this project are members of the universities Oldenburg,
Darmstadt, Augsburg und Halle.
The analysis of the Distribution of the files and formats on 150 servern in
european physikdepartments should be useful to a ballot of the search engines.
Popular search engines like Altavista a. o., that for example just have
slight search depth, are only suitable for the search of special scientific documents.
The depth ist here the number of links you got to pursue to get to the page
with the desired information.
Teh examplary analysis of the server in european physics departments should
also enable other people to create investigations to specialised search engines.
To analyse the structure of the offer of files on remoten servern a roboter was
programmed. A roboter is a programm, that pulls in
information independently. this roboter is a perl script, that
requires a United Recource Location (URL) as an input.
The porgramm loads a webpage by means of a helping programm (urlget). This page
is searched for links, the searched pages are loaded again. In this manner all
pages of a server are listed. A check routine causes the right counting.
So you have the knowledge of the public offered files. Then you can make
statements about the size of the server, about the depth structure and aboute
sformat structure.
150 server in european physics department were investigated. The server were selected from a list, that was built in scope of a not financially supported EU project of the IOK (Coorperation of physical societies for support of information und communication) . In this list all european physics departments are put back. For our investigation of the server at least one physics department of each county was selected. If a country has more than one server, more server were investigated. A list of the researched server and the results of the particular server were taken of at http://www.physik.uni-oldenburg.de/ ~eprint/netz/vorkommen.html. You can see here the format structure of the server in detail, for example the physics department server at Oldenburg in this figure.
How large are the server?
The figures show the size of the investigated server. More than 75 percent
of the server have got a file offering under 500, almost 40 percent
have only an offering under 100 files. Server with an offering from 500 until
1000 files are only represented with 18 percent.
Until this you can recognize a trend. The small server are often represented.
The big server with a file offering of more than 2000 do not follow this trend. 7 percent of
the server are big server, for example the server of physics department in Oldenburg and the
server of the Chalmers University Göteborg in sweden.
The huge number of little server is not be of heavy consequence. Almost the half of the file offering comes from the big servern.
You can see a maximum at the depth 6. For the most of the files the depth ist about 4 until 8. The last depth was 15 (physics department server at Darmstadt).
The share of files in html format is the biggest by far. Postscript, plain and dvi are
following behind.
The striking results are summarized in this chapter:
The results of the investigation provide information about the position. We know now, how the distribution of the document formats ist at this time. This investigationdid not research the development for the next future. Therefore it would be surely meaningful ot make an investigation at a later time. Then we can estimate the development and perhaps say something about the 'final' state of the server.