The project
The project
The project
As part of the Master's programme at the University of Oldenburg, each student takes part in a project group once. The project lasts exactly one year and represents a complete development process in the field of Computing Science.
We are one of the few project groups to have a real customer: the company CEWE Stiftung & Co KGaA. However, at the beginning of the project, we have not yet been given a precise goal or clear requirements. Our rough goal is to be able to analyse the data from the CEWE photo stations using Apache Hadoop.
The CEWE photo stations, also known as Digi Foto Maker (DFM for short), are the kiosk stations of CEWE Stiftung & Co KGaA, which are set up at the point of sale of retail partners such as Müller Großhandels Ltd & Co KG. The DFM can be used to print out photo products directly on site or to place an order, which is produced in the CEWE Stiftung & Co. KGaA laboratory. The order is sent to the shop so that the products can be collected. Direct printing on site is called OnSite Finishing (OSF for short).
During operation of the photo station, click behaviour is recorded anonymously and transmitted to CEWE Stiftung & Co. KGaA in a semi-structured XML file. In addition to the job data, the XML file also contains information about the DFM itself, such as the temperature of the printer.
CEWE Stiftung & Co. KGaA would like to be able to evaluate this data in order to carry out "near time" analyses, among other things. In addition, an alarm system is to be created in order to recognise failures and downtimes of the DFM at an early stage. CEWE Stiftung & Co. KGaA also hopes to be able to further improve the ordering software through a detailed analysis of user behaviour.
In order to achieve these goals and requirements, we first need to find out what exactly Apache Hadoop is and what its components are. We therefore need to analyse it in detail and check whether it is really suitable for the requirements of CEWE Stiftung & Co KGaA. Once we have familiarised ourselves with this topic, we want to build and set up an Apache Hadoop system and then develop possible algorithms to realise the requirements. To do this, we also need to understand the structure of the XML files. In order to better understand this, we will work intensively with the CEWE photo stations and create our own test data to meet our requirements.