SEO-DWARF


Quality, innovative aspects and credibility of the research (including inter/multidisciplinary aspects)

 

1.1 Objectives of the MSCA-RISE project: SEO-DWARF

The goal of the SEO-DWARF project is to establish a fertile collaborative research and innovation environment by means of staff exchanges, knowledge sharing and know-how transfer that will promote the resolution of scientific questions formed by the marine application domain regarding satellite image retrieval from the Copernicus programs and advance the state of the art in processing, alerting, retrieving, fusing, serving, of remote sensing images and products. Specifically, objectives of the project will be centered on:

  • Creating a research and innovation collaboration network through staff exchanges for the:
  • Exploitation of the existing knowledge about the remote sensing applications on the marine domain.
  • Development and implementation of automatic algorithms for extracting metadata from the remote sensing images and associate them with the corresponding ontology.
  • Exploitation of the big data EO archives for the efficient retrieval and serving of data based on the semantic queries performed.
  • Data fusion among optical, SAR and ancillary data.
  • Sharing multidisciplinary knowledge and know-how among the partners, while aligning different scientific cultures for the integration of the knowledge from the marine domain and the developed metadata extraction algorithms to retrieve the appropriate images that help to resolve the scientific questions.
  • Push the conducted complementary research of the partners to the market by creating added value services to the Copernicus EO program. The added value services correspond to a semantic web alert and retrieval system for EO data from the Copernicus satellites.
  • Promote the skills and careers of the exchanged staff. The seconded staff are going to be re-integrated to the sending companies/institutes with an in-build return mechanism for further exploitation of their newly acquired and enhanced skills and knowledge.

The main research and innovation of the proposed project is the bridging of the gap between the raw information that remote sensing images provide and the knowledge gained from the marine application domain to retrieve relevant to the semantic query data. This will be achieved by developing automatic remote sensing algorithms (classification, indices, spectral matching, segmentation, endmember extraction, abundance estimation, etc) that extract information which are then going to be formulated as metadata within the EO data archive system. Researchers in the marine application domain and remote sensing will provide the knowledge on “what we study” and relevant information and experience on their field. This knowledge in terms of “what we study” and processing will be integrated to the system in order to provide a case specific solution. Under this perspective, the project will develop partnerships in the form of joint research and innovation activities between the participants. This joint research corresponds to international mobility, based on secondments of research and innovation staff. These secondments will contribute directly to the implementation of the project by exploiting the complementary competences of the participants. Networking activities, new skills acquisition, career development, organization of workshops and conferences is a parallel objective of the proposed project.

1.1.1 Relation to the scope of the call

The SEO-DWARF project is highly related to the Marie Sklodowska-Curie Research and Innovation Staff Exchange (MSCA-RISE) funding. It covers the main objectives of the call, i.e. promotion of international and inter-sectoral collaboration through research and innovation staff exchanges, and sharing of knowledge and ideas from research to market for the advancement of science and the development of innovation. Also the proposed project is highly associated with the exploitation of Copernicus flagship EU project.

1.1.2 Relevance of the research and innovation to the “state of art”

  1.1.2.1 Semantics, Ontologies and Linked Data

The need for intelligence in human computer interaction has already been acknowledged 1, particularly in the sense of the semantic web 2. Knowledge engineering methods consist of: knowledge formalization, encoding, reasoning and representation tools as well as advanced querying mechanisms to name just a few. As far as the knowledge formalization is concerned, ontologies 3 have been proposed as a means to capture the conceptualization of different target groups and overcome semantic interoperability issues 4. Depending on the nature of the formalized knowledge different levels of ontologies have been proposed i.e. Guarino 5 proposes:

  1) top-level ontologies

  2) domain ontologies,

  3) application ontologies and

  4) task ontologies.

Our focus is on domain ontologies that are used to formalize knowledge for a certain purpose, i.e. marine applications. 6. The necessity for content based image retrieval techniques in remote sensing, calls for new methodologies combining low level semantics of images with high level semantics of user queries. With respect to that, ontologies are designed to formalize knowledge, and new algorithms are introduced for their relation to numerical data of the images. Although a wealth of research has already been conducted in the computer science domain and many ontologies for simple photos have been designed, remote sensing images have special characteristics and, therefore, more specialized methodologies are needed. Related work focuses on techniques for hyperspectral remote sensing images 7, while in the EU-funded project TELEIOS, features are extracted from TerraSAR-X images and accompanied with image metadata and GIS data unfold their semantics 8. The methodology in the work of Priti and Namita, (2009) is applied on multispectral images to different image processing and querying techniques. Object/Segment oriented techniques for relating low-level features of images and ontological concepts can be seen in Ruan et al., (2006), Li and Bretschneider, (2007), Liu et al., (2012), and Wang et al., (2013). Furthermore, in Datcu et al., (2003), Li and Narayanan, (2004), Aksoy et al., (2005), the labeling process is applied to pixels.

  1.1.2.2 Remote Sensing

For the operational needs of the Copernicus program, ESA has launched Sentinel-1 and plans to launch in this year Sentinel-2 and 3 satellite constellations. As a constellation of two satellites, Sentinel-1 mission images the entire Earth every six days. Sentinel-2 pair of satellites will have a revisit time of 5 days at the equator (under cloud-free conditions) and 2–3 days at mid-latitudes, whereas Sentinel-3 satellites will enable a short revisit time of less than two days for Ocean and Land Colour Instrument (OLCI) and less than one day for Sea and Land Surface Temperature Radiometer (SLSTR) at the equator. It is evident that a vast amount of remote sensing images will be accumulated, making hard and time consuming the selection of appropriate data for scientific or operational use. The need for incorporating remote sensing methods and algorithms in a procedure that extracts metadata from images and associate them with the corresponding ontology in order to make image selection fast and efficient is urgent. The aforementioned Remote Sensing methods and algorithms should be relevant to several standard-processing concepts from the remote sensing community. The output of these algorithms is mostly used either for map creation or qualitative and quantitative studies. For example classification and spectral unmixing algorithms are commonly used for creating maps. Spectral matching, segmentation, anomaly detection algorithms are usually used to determine the presence of objects within the image scene or discriminate between objects. Indices are used to quantify indicators, such as vegetation health or chlorophyll concentration in seawater. Pixels in a multi/hyperspectral image usually contain more than one objects, thus they are considered as mixed pixels. Their spectral response is a mixture (linear or non-linear) of the existing endmembers in the image scene 9. Depending on the spatial and spectral resolution of the image, endmember refer to spectral signatures of their land-use classes or materials. Spectral unmixing algorithms decompose the mixed pixels into a set of abundance fractions, which correspond to the existing endmembers. These algorithms also lead to classification maps when applying rules to the abundance fraction values. Unmixing algorithms are usually organized into the following categories, based on the way the mixing problem is approached: the least squares 9, the statistical 11, the geometrical 12, and the non-linear 13, as well as, on the network based unmixing approach 14. The extracted endmembers from the unmixing algorithms or generally pixels from the images usually need to be matched or discriminated. The matching refers to the identification of the examined signatures based on a reference spectral library 15. Discrimination refers to the separation of labels between two signatures, i.e. it is examined if two signatures will be considered to be the same object or not 16. These procedures are based on spectral similarity measures. For example Manolakis, et al. (2003) presented a new method for labeling endmembers extracted from hyperspectral images using ground measured spectral signatures. The Spectral Angle (SA) algorithm was used as similarity measure. Salem (2001) and Horstrad, et al. (2011) used the SA and the Cross Correlation (CC) algorithms to label oil spill spectral signatures. Basener, et al. (2011) developed a detection-identification algorithm to extract endmembers. These endmembers where labeled based on a reference spectral library using the Adaptive Coherence/cosine Estimator (ACE). Bue et al. (2009) presented a new automatic labeling technique, which was applied on segmented multi/hyperspectral images. The technique compared the mean signature from each segment with a reference spectral library, using the Continuum Intact – Continuum Removed (CICR) similarity measure. Multispectral images are usually segmented before the performing other algorithms. This aims to the grouping of the pixels from spectrally homogeneous areas, which can later be associated with objects. Image segmentation algorithms purpose is to divide the image into a set of non-overlapping regions that correspond to some interesting modalities in the data (Paclik et al., 2001). Four main categories of algorithms exist, based on the way the segmentation problem is approached: edge-based 17, neighborhood-based 18, histogram-based 19, and cluster based 19. The edge and neighbourhood based approaches work in the spatial domain, while the histogram and cluster based approaches work in the spectral domain. Edge based approaches exploit discontinuities in the image by extracting the contours (edges) of the objects. On the contrary neighbourly based approaches are based on similarities between various image parts. Histogram based approaches assume that homogeneous regions in the image correspond to modes of the histogram. Due to the high dimensionality of multi/hyperspectral images, histogram based approaches seek for a multiband threshold to perform the segmentation. Cluster based approaches assume that the “interesting” structures in the image form clusters in the spectral space. These algorithms are similar to classification algorithms, which also group pixels based on the spectral space, but cluster based segmentation algorithms perform also rules to the derived classes in order to produce the final segments. The remote sensing community broadly uses spectral indices, in order to quantify phenomena such as sea chlorophyll content, total suspended matter etc. The main concept is to combine two or three bands from the multispectral images (normalized ratio or polynomial equation) and correlate the result to the phenomena under investigation. The bands are selected in an application basis. The spectral characteristics (absorption lines) of the phenomena are considered for the band selection. For example Mahasandana et al. (2009) developed a three-band index to predict the chlorophyll-a content in the coastal waters of Thailand.

  1.1.2.3 Big Data and Data Mining

Big data mining is the process of discovering interesting and previously unknown, but potentially useful patterns from the spectral, spatial and spatiotemporal data 21. The growth in the remote sensing images, weather data and other spatial and spatiotemporal data, has been exponential. There is the need for developing new and computationally efficient methods tailored for analysing these specific kind of big data. Several major mining algorithms have been proposed to cope with this type of data, including Spatial Autoregressive Models 22, Gaussian Learning Process 23, Clustering 24, Complex Object Recognition 25. These type of methods are usually coupled with linked data and semantics to further explore the unknown patterns in the data. The research and innovation in these methods is their optimized implementation with respect to the I/O requirements.

  1.1.2.4 Data Fusion

Image fusion refers to the process of combining two or more images into one composite image, which integrates the information contained within the individual images 26. In Wald (1999) it has been defined data fusion as ‘‘a formal framework in which are expressed means and tools for the alliance of data originating from different sources. It aims at obtaining information of greater quality, although the exact definition of ‘greater quality’ will depend upon the application’’. The result is an image that has a higher information content compared to any of the input images. The goal of the fusion process is to evaluate the information at each pixel location in the input images and retain the information from that image which best represents the true scene content or enhances the utility of the fused image for a particular application. Image fusion is a vast discipline in itself and refers to the fusion of various types of imagery that provide complementary information. For example, thermal and visible images are combined to aid in aircraft landing. Multispectral images are combined with radar imagery because of the ability of the radar to ‘see’ through cloud cover. Conceptually, the image fusion can be performed at three different processing levels according to the stage at which the fusion takes place: i.) pixel, ii.) feature, iii.) decision level.

At the lowest processing level (pixel) the image data fusion corresponds to the merging of the measured physical parameters (i.e. the values of the pixels), while the fusion at the feature level requires the extraction of objects recognized in the various data sources, e.g., using segmentation procedures. At each object corresponds characteristics extracted from the initial images which are depending on the spectral values of the areas, its shape and its neighbourhood. The similar objects from multiple sources are assigned to each other and then fused for further assessment using statistical approach or algorithms of machine learning. Data fusion at decision level is a method that used the information extracted from each input image processed individually and then combined applying decision rules to reinforce the interpretation and to give a better understanding of the objects observed 27.

  1.1.2.5 Marine Application Domain

SEO-DWARF is aligned with the Copernicus marine services. It aims to procure users with vital information about the selection of images that meet the requirements for the study of the state and dynamics of oceans and coastal zones. It will extract features/indices/segments from Sentinel-1 imagery, which provides continuous sampling of the open ocean, offering information on wind and waves. Furthermore, it will make use of the forthcoming, Sentinel-2 multispectral imagery, which comprises 13 spectral bands, showing variability in land surface conditions and minimizing any artefacts introduced by atmospheric variability. Last, of Sentinel-3, Ocean and Land Color Instrument (OLCI), which comprises 21 bands and its design is optimized to minimize sun-glint and Sea and Land Surface Temperature Radiometer (SLSTR), which aims at determining global sea-surface temperatures to an accuracy of better than 0.3 K. Several applications will be studied not only for each sensor separately, e.g. oil spills from S-1 data or algae bloom from S-2, but also by synergistic use, e.g. oceanic eddies which can be imprinted in SAR images and explained by optical data. A SAR image can be featureless or contain a large variety of diverse phenomena like waves, currents, wind and rainfall. The standard classification procedure is the manual inspection of the radar imagery, the detection of a specific oceanic or atmospheric phenomenon and the explanation of its presence on the image. In the present proposal, a semantic scheme of detecting and classifying oceanic phenomena in an automatic way will be tested. Actually the sea surface roughness is classified by oceanic and atmospheric formations (wind variations and atmospheric phenomena) or due to oceanographic processes such as mesoscale currents, fronts and eddies 28. The traditional method, a pixel-based approach, is meaningless in such research because backscattering values are similar for most of the phenomena. Therefore, a group of pixels, i.e., semantic objects, should be used. Radar images acquired over sea areas by SAR sensors contain a wide range of information on small scale and mesoscale phenomena in the ocean and the marine boundary layer. These phenomena can be categorized into two main categories according to the triggering mechanism—oceanic and atmospheric phenomena. Oceanic phenomena are mesoscale sea phenomena which become visible on SAR images because they are owing to the wave–current interactions and shortwave damping by surface films. These mechanisms modulate the sea surface roughness and thus the backscattering radar signal. Atmospheric phenomena become visible because they are associated with variations of the wind speed and direction at the sea surface. The wind speed also changes due to the air–sea instability as a function of temperature differences between water and sea 29. A list of the possible phenomena can be found in Table 1.

Table 1: List of ocean and atmospheric phenomena
Oceanic phenomena Atmospheric phenomena
Channel plumes  Atmospheric gravity waves
Coastal discharges Atmospheric convective cells
Coastal fronts Atmospheric boundary layer rolls
Coastal rivers Katabatic winds
Ocean currents Land/Sea breeze
Current fronts  Atmospheric fronts
Estuaries Island wakes
Intertidal zone Coastal winds
Oceanic eddies Rain events
Oceanic internal waves
Oceanic wakes
Oil pollution
Ship wakes
Underwater bottom topography
Upwelling

The first approach for automatic detection procedures originates from the oil spill classification problem, where SAR signatures are classified for possible oil spills or something looking like oil spills 30. All the above mentioned mesoscale oceanic phenomena are grouped in the “look-alike” category. Their statistical values are clustered against the statistical values of possible oil spills for classification purposes. Case specific solution for oil spills and oceanic phenomena will be carried out in order to meet user’s requirements. The marine application domain will provide the relevant knowledge on small scale and mesoscale phenomena. Multispectral and hyperspectral remote sensing imagery (mostly airborne) have repeatedly been used to identify and study oil spill occurrences on seawater. There have been several studies concerning oil spills. Carnesecchi et al. (2008) have performed an extensive interpretation of oil spills and their appearance variations. Palmer et al. (1994) analyzed an oil spill event with Compact Airborne Spectrographic Imager (CASI) and concluded that the spectrum from 440 to 900 nm is effective to detect the marine oil spill. Zhao et al. (2000) concluded that reflectance of various kinds of offshore oil slicks present peaks in the spectral regions from 500 to 580 nm. Salem F. (2003) has demonstrated that the increase of oil quantity causes light absorption to increase and thus the reflectance in the visible bands is reduced. The near infrared electromagnetic spectrum region from 600 to 900 nm provides the greatest possibility for oil spill detection using remote sensing techniques. YingCheng et al. (2008) studied the change of reflectance spectrum of artificial offshore crude oil slick with its thickness and concluded that spectral characteristics of oil spills are very distinct at 550 and 645 nm. Bradford et al. (2011) have developed an automatic oil spill detection method using multispectral imagery and Svejkovsky et al. (2008 and 2012) presented a real time method to estimate the oil slick thickness of crude oils and fuel oils using multispectral sensor. The proposed algorithm showed that oil thickness distributions up to 200-300 μm can be mapped with accuracy of up to 70% 31, by exploiting knowledge which has been derived from CASI hyperspectral imagery, have developed an OBIA automatic methodology for the detection of oil spills as well as natural oil outflows using multispectral images. Sykas et al. (2011) have presented a method for estimating oil spill thickness using hyperspectral data and Karathanassi V. (2014) has performed assessment of environmental and oil-slick parameters using CASI hyperspectral images. The study included laboratory spectral measurements, spectral measurements in marine environment and finally, application of spectral unmixing-based methods for oil spill detection, oil type identification and oil thickness estimation, using CASI hyperspectral images. By exploiting spectral measures, spectral indices or even more complicated techniques, which have been developed within oil spill studies like those listed, automatic algorithms, can be used in order to build appropriate image metadata and semantic information for this topic. Algorithms research is also crucial for the detection, mapping and monitoring of the phytoplankton blooms, and other suspended particulate materials. Seawater containing phytoplankton has complex spectral characteristics because of the small plant organisms containing chlorophyll and other pigments used for photosynthesis. Products for concentration of chlorophyll-a and suspended particulate matter are widely used in marine science (McClain, 2009) and water quality monitoring 32. Present challenges should overcome the limitations of the detection algorithms in coastal waters, by refining detection limits in various oceanic and coastal environments 33. Suspended particulate material may be suspended bottom sediment, river-borne particles, eroded coastal and beach material, whereas in a mixture with organic constituents may result from sewage-sludge waste dumping or dredging spoil. Such materials are presented in shallow waters newer to the coast, but not in the open sea. Their spectral characteristics are diverse of the natural sea colour 34. As regards of sea bottom types mapping, a number of studies can be found in the literature combining in situ observations and remote sensing techniques. Remote sensing provides the most flexible and accurate techniques for sea bottom assessments at differential scale, while ground techniques are not suitable to complete the mission by a certain time 35. Hot spots are mostly referred to fire and emergency services in mainland. In the marine application domain, and especially for coastal ocean, temperature variations are mostly associated to thermal pollution, such as the thermal plume effect in the seawaters due to power plants or to hot springs. Remote sensing offers a cost-effective means for rapid assessment of thermal plume discharges from power stations. A variety of remote sensing methods are available but, the discrimination of temperature anomalies in remote sensing data has many restrictions is coastal areas 36. Optical bathymetry is underpinned by the principle that the total amount of radiative energy reflected from a water column is a function of water depth. The physical phenomenon behind interpretation of images with bathymetry is the attenuation of light as it travels through any medium, in this case water. The light is attenuated proportionally to the distance. In other words, the longer the travel, the less energy will arrive, or the deeper the sea, the darker the image. It takes advantage of shortwave radiation in the blue and green spectrum that has a strong penetration capability. Optical sensing of bathymetry, also known as passive remote sensing, requires a model between radiance values on satellite imagery and the depths at sampled locations. This model can be analytical, semi-analytical 37 or empirical.

1.1.3 Methodological approach and Research and Innovation activities

The research and innovation of the project is divided into 5 main steps to ensure the quality and credibility of the conducted research and innovation activities among the consortium. The project main steps are structured in a product/service creation manner, i.e. the development of the Semantic EO Data Web Alert and Retrieval System (SEO-DWARS). This type of structure is selected in order to ensure the knowledge transfer and know-how among the consortium and turn creative ideas into innovative products/services. More analytically the following steps are defined:

  1. User Requirements and System Specifications.
  2. System Design based on the Shared Knowledge Domains.
  3. Collaborative Implementation of the System.
  4. System Calibration and Validation.
  5. Pilot Service and Verification.

During the first step of the project the user requirements and the specifications of the proposed system are defined. The users of the SEO-DWARS are either universities/companies using EO data that need the semantic retrieval capabilities or public/private entities that need alerting on specific phenomena (e.g. presence of an oil spill in a NATURA region, water quality in swimming areas or offshore industrial platforms). The user requirements are mainly defined by the marine application domain partners (NTUA, UnAeg, CUT, PKH, i-Sea) in collaboration with the semantic oriented partners (UniBa and TWT), in order to bridge the gap between the Remote Sensing and Marine domain community with the Semantic and Linked Data community. Based on these requirements, the specifications of the proposed system are drawn by the systemic partners (PKI, CloudSigma and PKH) in collaboration with the users (NTUA, UnAeg, CUT, PKH, i-Sea) and the semantic oriented partners (UniBa and TWT). This, in order to ensure the proper transition between “what we could do in theory” and “what we can do and how we do it under the framework of a system”. The second step of the project, i.e. System Design based on the Shared Knowledge Domains, includes the design of the SEO-DWARS, based on the complementarity knowledge of the partners (Semantics [UniBa, TWT], Marine Remote Sensing [NTUA, UnAeg, CUT, PKH, i-Sea], Big Data Mining [CloudSigma], Data Fusion [PKI]) and the already in place infrastructure, that mainly the industrial partners bring to the project as added value. Here, all partners collaborate to design the SEO-DWARS in order to ensure that both scientific (research and innovation) and technical aspects are met. The SEO-DWARS is targeted at expert users, at certain type of satellite data 38 and Sentinel 1, 2, 3.] and at a specific application domain. Having the a priori knowledge of the data constraints, the expertise of the users and the domain knowledge to be formalized, the complexity reduction, the increase of performance and the accuracy of results can be achieved. The project will focus on the marine applications as described by the Copernicus program and the specific research needs of the partners (Table 2). In order to perform quality research and innovation the marine sub-applications that will be studied derive from the competitive advantage of the partners (Table 2).

Table 2: Marine Applications studied with complementary knowledge of the consortium. 
Marine Applications Examples of Queries and Alerts Partner with knowledge competitive advantage
Oil Spill Detection * Retrieve a Sentinel1 image with oil spills in this AOI
* Send an alert when an oil spill is detectet in this NATURA area
NTUA CUT
Water Quality * Retrieve a Sentinel 2 and 3 image showing Eutrophication at the in this AOI
* Retrieve a Sentinel 2 and 3 image with mean x value of turbidity
Send an alert when the SST and Chl\. concentration rise above a threshold at this offshore industrial platform
NTUA CUT
UnAeg PKH
i-Sea
Bathymetry Analysis * What is the most optimum time period and image quality (atmospherically and illumination conditions) to extract the bathymetry from Sentinel 2 and 3 image?
* Retrieve a Sentinel 2 and 3 image with x m2 of sea grass in the coast line
CUT i-Sea
PKH
Image Quality * Does the image contain sun glint and at what percentage?
* What are the atmospheric conditions of the image? E.g. visibility
CUT UnAeg
Surface Activities * Find the number of ships contained in a Sentinel 1
* Change detection in aqua culture places Sea state
 NTUA PKH
Sea State * Determination of Long waves presence in Sentinel 1 images
* Retrieve a Sentinel 1 image with mean x value of Wind currents
UnAeg CUT
PKH

The types of research and innovation in this step will include a) the ontology formalization for the specific use-cases, b) the determination of the semantic queries that are needed from the application domains (Table 1), c) the algorithm development for extracting metadata from the remote sensing images, d) the design of a architecture of the platform to perform the semantic image retrieval and storage and management of the extracted metadata. One important component of the system is the knowledge base. This refers to the domain application concepts about geographic objects and phenomena that are present and interpretable in remote sensing images. They are formalized as an ontology, i.e. the specification of a conceptualization 39 able to explicitly constraint the meaning of concepts within one domain eliminating semantic interoperability problems 40. The ontology 41 will be strongly related to the spatial and spectral characteristics of the data, i.e. it is a remote sensing oriented ontology (Fig. 1). For example MERIS and Sentinels 2, 3 will have their own ontologies, due to the difference in the spatial and spectral resolution. Sentinel 1 and ASAR will also have their own ontologies, which will also be different from MERIS and Sentinels 2, 3.

Figure 1: Ontology example.

The determination of semantic queries are also important in the framework of the project. The semantic queries are derived from the application domain and if answered correctly they can help the retrieval of the appropriate data and contribute to the answering of the scientific questions of the application domain. The semantic image retrieval is based on the metadata that accompany each image. These metadata are associated to the formalized ontology, i.e. linking between the objects and metadata. The metadata are composed from spectral signatures, spectral indexes and segments. The link between the ontology and the metadata is part of the research and innovation, mainly between the semantic and remote sensing experts. The research and innovation in this part, lays not only to the linking between the ontology components and the metadata, but also to the automatic extraction of these metadata for each image (Fig. 2). An indicative automatic metadata extraction process is shown in Figure 2, where the metadata are indicated as blue. In order to accomplish this, the research and innovation will focus on the design and development of specific remote sensing processing algorithms to perform the required processing. The topics of the algorithms are (but not limited): endmember extraction, spectral matching, indices, segmentation. For each category the appropriate algorithms will be designed and developed under the scope of automation.

Figure 2: Indicative automatic process for metadata extraction

During the third step of the project, the collaborative implementation of the SEO-DWARS is accomplished (Fig. 3). This step includes all the necessary processes to create the proposed semantic web alert and retrieval system. The infrastructure needed to implement the system is provided by the industrial partners and lays on the system of the Data Fusion Centre, property of PKI, the big data mining and data base hosting system 42, Copernicus Sentinel 2A and 2B [launch 2015-16], ENVISAT [archive], European Weather [operational].] of CloudSigma and the Cartanet platform, the INSPIRE compliant web-GIS framework developed by PKH. The Data Fusion Center (DFC) is a cloud-based data and service hub able to deliver products through automatic complex processes with a minimum human intervention. The DFC relays on open data, i.e. Copernicus, INSPIRE and the data bases hosted by CloudSigma to create the data archives. It’s architecture is designed to be modular, i.e. new processes to be easily integrated in the DFC and provide added value services. The system and the data mining know-how of CloudSigma constitutes both of the algorithmic processing to harvest big data archives and their off-the-self services in providing cloud computing services. The Cartanet platform and the Spatial Data Infrastructure development know-how of PKH constitutes of their large portfolio in development/deployment of web-GIS services to the public and private sector regarding alerting and management of geo-spatial data.

Figure 3: Structure of the SEO-DWARS

In this step, all partners need to closely collaborate and exchange knowledge and know-how to actually bring the designed system to an operational system. The collaborative research and innovation activities that took place in the first and second step of the project are properly extended to actually implement the SEO-DWARS. The systemic partners (PKI, CloudSigma, PKH) need the scientific input from the marine remote sensing partners (NTUA, UnAeg, CUT, i-Sea) and the semantic/linked data partners (UniBa, TWT) to understand how to meet the standards of the designed system. Based on this interaction the marine remote sensing and semantic/linked data partners will understand the capabilities of the ongoing implemented system and re-adjust or extend their requirements. These knowledge exchange activities take place on the premises of the companies that will provide the system parts, i.e. PKI, CloudSigma and PKH. The research and innovation of the above aspects of the project will be integrated under a unified platform to perform the semantic image retrieval and storage and management of the extracted metadata, i.e. the DFC of PKI with the hosting and connectivity capabilities of CloudSigma. This platform is based on a Geographical Information System (GIS) and expanded to meet the requirements of the proposed system. The platform will enable the users to make semantic queries. For example “Give me the remote sensing image in the X area that contains an oil spill incident” or “Calculate the rate of increasing/decreasing chlorophyll concentration in the X area for Y range of dates” can be passed to the system using a set of “drop-down” lists which contain the ontology objects and conditions such as “AND”, “OR”, “EQ”, “CONTAIN”, “NEIGHBOR” etc. During the fourth step of the project, which will partially overlap with the third step, will include the calibration and validation of the SEO-DWARS. The calibration part refers to the research and innovation conducted to connect the metadata extraction algorithms (Fig. 2) with the formulated ontologies (Fig. 1) for the specific research topics of the consortium (Table 2). Seconded staff from the TWT, CloudSigma, and PKI need to collaborate with the NTUA, CUT and UniBa closely to understand the physical meaning of the extracted metadata based on remote sensing algorithms in order to calibrate and validate the metadata extraction with the conducted quires and alerts. The final step of the project will include the launching and the verification of the pilot service. The implemented system will start in a Pilot stage, where both the partner users and other potential users (see chapter 3.3.1) will use the SEO-DWARS and evaluate the added value service. The verification part refers to the process of evaluating the performance of the service in terms of system response and retrieving accuracy. The seconded staff from NTUA, UnAeg, CUT and UniBa to the CloudSigma, TWT, PKI and PKH need to collaborate in order to solve system performance issues with respect to the data mining and metadata extraction processing, while enhancing the alert and retrieval accuracy of the system.

1.1.4 Inter/multidisciplinary types of knowledge involved

The SEO-DWARF involves several different types of inter/multidisciplinary types of knowledge in order to accomplish its objectives (Fig. 4). The main types of knowledge involved in the project are remote sensing, ontologies and linked data, marine applications based on earth observation data, data mining and data management and geographical information systems. All types of knowledge must be combined with each other in order to design, implement, calibrate, validate, and demonstrate the SEO-DWARS. Remote sensing is used both as theory and application to understand the processes that take place during the data acquisition and data processing. Ontologies and linked data are used to describe the marine application domain, but it is strongly related to the remote sensing knowledge. The marine applications define the requirements of the ontologies and also the type of algorithms used from the remote sensing knowledge part. Big data mining, data fusion and WebGIS encompass the above knowledge types in terms of combining them under a platform to operate.

Figure 4: Inter/multidisciplinary types of knowledge involved in the SEO-DWARF project.
Table 3: Work Package List
Work Package
No
Work Package
Title
Activity Type
(e.g.Research,Training,Management,Communication,Dissemination..)
Number of
person-months involved
Start
Month
End
Month
1 Project Management
and Coordination
Management 40 M1 M48
2 User Requirements
and System Design
Training and Research 102 M1 M14
3 Implementation of
the SEO-DWARS
Research 90 M15 M31
4 Calibration and Validation of
the SEO-DWARS
Research and Training 79 M30 M39
5 Pilot Service and
System Verification
Research 78 M40 M48
6 Dissemination and
Exploitation activities
Dissemination 38 M1 M48
The “Number of person-months involved” correspond to the total person-months needed per WP to implement successfully the SEO-DWARF and equal to 427. The seconded person-months (as declared in Table A3.1 and as described in Table3) are 352. The 75 remaining person-months will be funded from the Institutional Unit Cost of the budget (as defined and signed in the Consortium Agreement) in order to ensure the successful implementation of the project.

1.2 Clarity and quality of knowledge sharing among the participants in light of the research and innovation objectives

The clarity and quality of the knowledge sharing among the partners of the consortium is based on a step-wise staff international and cross sectoral secondment approach (Fig. 5). More analytically for WP 2 and 3, secondments are mainly planned into two steps. In the first step the sending organizations send 1 person month to the hosting organizations for training (Fig. 5, orange cells). These secondments aim to train the seconded staff in the research domain activities of the hosting organizations and initiate the collaboration among them. The following month the direction of secondments is reversed, i.e. the sending organization becomes the hosting organization. This interchangeable international and cross sectoral secondments aim to have all participants trained to the research and innovation domains of the consortium and actively collaborate in the same premises. The second step of the secondment is the actual research and innovation to be conducted (Fig. 5, blue cells). The WP 1, i.e. project management, is performed (Fig. 5, grey cells) in parallel with WPs 2, 3, 4, 5, and 6 and includes the project management and coordination activities needed to implement successfully the SEO-DWARF. The WP 6 includes secondments related to the dissemination and exploitation activities (Fig. 5, green cells), i.e. conference, summer school, workshop hosting, participation and/or organization, and is performed in alignment to the other WPs to maximize the impact.
Figure 5: Types of secondments for each WP and Project Months; (grey): Management, (orange): Training, (blue): Research, (green): Dissemination.

To further ensure the clarity and quality of the knowledge sharing among the participants in the light of the research and innovation objectives of SEO-DWARF, the following table is compiled. This table divides the project into 4 periods (10 – 18 months depending on the WP) with 2 terms for each period and provides a more detailed overview of the secondments.

Table 4: Program for transfer of knowledge
Period
Source Participant(s) Target Participant(s) Transfer of Knowledge Added Value
All partners All partners Kick-off meeting at NTUA Start-up, coordination
1st period
1st term
NTUA, UnAeg
CUT, UniBa
PKH, PKI
CloudSigma
TwT, i-Sea
Requirements and Specifications Get all partners to agree on the
quantitative objectives of the
project and start establishing
gateways of transferring knowledge
1st period
2nd term
PKH, PKI
CloudSigma
TwT, i-Sea
 NTUA, UnAeg
CUT, UniBa
System Definition and Design Get the academics and the
private sector to understand
and form the SEO-DWARS
All partners All partners Year 1 meeting at UniBa Coordination
Intermediate report
2nd period
1st term
NTUA UnAeg
CUT UniBa
PKI CloudSigma
PKH
RS algorithms and
data mining
Get the partners together to
start the implementation of RS
and data mining algorithms in the DFC
2nd period
2nd term
NTUA UnAeg
CUT UniBa
PKI CloudSigma
TWT
Ontologies and
formulated metadata
Get the partners together to
start the implementation of the
ontologies and the metadata
formulation in the DFC
All partners All partners Year 2 meeting at
CUT
Coordination
Intermediate report
3rd period
1st term
PKI CloudSigma
PKH
NTUA UnAeg
CUT i-Sea
Marine Remote
Sensing and Data Fusion
Gets the RS and Marine
domain partners to share their
knowledge for the algorithm
calibration in the frame of the
project and collaborate with
the data fusion partners to
provide added value to the
calibration process
3rd period
2nd term
NTUA UnAeg
CUT i-Sea PKH
PKI CloudSigma
TWT
Marine Remote
Sensing and Data mining
Gets the Marine domain and
systemic partners to share their
best approaches to validate the
results of the algorithms
All partners All partners Year 3 meeting at TWT Coordination
Intermediate report
4th period
1st term
TWT UniBa
CUT
CloudSigma
PKI
Data mining
data fusion ontologies
Improve the context of
semantic querying and alerting
of the system
4th period
2nd term
NTUA UnAeg
CUT i-Sea
CloudSigma PKI Data mining data
fusion marine
remote sensing
Improve the retrieval accuracy
of the semantic queries and
alerts of the system
All partners All partners Year 4 meeting at PKI Final report and project wrap-up
Combined Fig. 5 and Table 4 provide the approach used to the knowledge sharing and transfer within the SEO-DWARF. The structure of the secondments (Fig. 5 and Table 4) is designed to provide an in-build return mechanism. More analytically all eligible secondments for each partner are performed in two-way and two-steps. The two-way corresponds to the exchange of staff from “A to B” and “B to A” partners. The two-steps correspond to the exchange of staff from “A to B” into two phases: one for training and one for research. In the first phase the seconded staff returns to the sending organization to integrate the new knowledge and get feedback on the training he/she received. While in the second phase, the same seconded staff returns to the hosting organization to conduct the collaborative research. At the end of each secondment each seconded staff will organize training sessions to the corresponding sending organizations to preserve the newly acquired knowledge and help the integration process in the organization.


  1. (Corcho et al., 2006) (Aksoy et al., 2005) (Gomez-Perez and Benjamins, 1999) (Gruber et al., 1995).
  2. Berners-Lee et al., 2001.
  3. Gruber, 1993.
  4. Bishr, 1998.
  5. Gomez-Perez et al., 2004.
  6.  Fonseca et al., 2002, Kuhn, 2001, Esbjorn-Hargens, 2010, Kauppinen and de Espindola, 2011.
  7. Veganzones et al., 2008.
  8. Octavian Corneliu et al., 2011.
  9. Keshava et al., 2000.
  10. Keshava et al., 2000.
  11. Manolakis et al., 2003.
  12. Parente and Plaza, 2010.
  13. Keshava, 2003.
  14. Karathanassi et al., 2011.
  15. Sykas and Karathanassi, 2012.
  16. Sykas et al., 2013.
  17. Lambert and Macairne, 2000.
  18. Pal and Pal, 1993.
  19. Matas and Kittler, 1995.
  20. Matas and Kittler, 1995.
  21. Vatsavai et al., 2012.
  22. Lesage, 1997.
  23. Rasmussen and Williams, 2006.
  24. Wayant, 2012.
  25. Anguelov et al., 2005.
  26. Hall, 1992.
  27. Pohl & Van Genderen, 1998.
  28. Kudryavtsev et al., 2005, 2014; Lyzenga, 1996.
  29. Keller, Wismann, &Alpers, 1989.
  30. Brekke & Solberg, 2005.
  31. Kolokoussis et al. (2013).
  32. Bresciania et al., 2011; McClain 2009.
  33. Blondeau-Patissier et al., 2014; Kratzer et al., 2014; Matthews et al., 2010.
  34. Miller and McKee, 2004; Nechad et al., 2010; Volpe et al., 2011.
  35. Dekker et al., 2006; Yahyal et al., 2014.
  36.  Haselwimmer et al., 2013; Ingleton and McMinn, 2012; Langford, 2001.
  37. Lafon et al., 2002; Capo, 2012 ; Dehouck et al., 2010; Sénéchal and Lafon, 2012; Capo, S. (2012).
  38. ENVISAT [ASAR, MERIS
  39. Gruber, 1993.
  40. Bishr, 1998.
  41. Gruber, 1995.
  42. Copernicus Sentinel 1 [operational