Objectives and Focus Areas

SOFIA Data Center

More than 140 TB of raw data from 783 scientific flights with 5,300 observation hours—what lies behind this huge amount of data, and how can it be structured and made usable?

Project objectives and focus areas

Project objectives

The SDC project, funded by the German Aerospace Center (DLR), has a total duration of five years and was launched on July 1, 2024. The project is scheduled for completion on June 30, 2029. The work is based on data recorded during more than ten years of operation of the SOFIA observatory. In addition to the scientific data recorded by the instruments, this also includes data generated during the development of the telescope and the operation of SOFIA. Detailed information about the astronomical observation projects based on the SOFIA Data Cycle System (DCS) is also included.

The main project objectives of the SOFIA Data Center are:

  • Development of a German SOFIA data archive for the astronomical community
  • Ensuring the quality of the data archive
  • Making the data usable for astronomers without in-depth specific knowledge of the observatory or the instruments
  • Availability of infrared data, especially in the next 10-20 years, during which there will probably be no new data sources in the wavelength range observed by SOFIA
  • Establishing an operational and technical archive containing all relevant information about the telescope, its development, performance, and operation

This results in the following areas of work for the SDC:

  • Procuring and securing the data on servers at the University of Stuttgart
  • Cleaning the data of instrument-specific artifacts, erroneous or unusable data sets, etc.
  • Modification and optimization of instrument-specific data reduction pipelines and subsequent reprocessing of scientific data
  • Improvement of data quality with regard to atmospheric water vapor and telescope pointing
  • Provision of scientific data in an archive with VO functionality (VO – Virtual Observatory)
  • Provision of technical and operational data from the telescope in a structured and easily searchable database
  • Support for users (astronomical, technical) through support, training, workshops, and similar activities
  • Establishment of an SDC user group to provide feedback on the needs and requirements of the astronomical community to the project

This leads to the following work packages and focus areas for the project:

Data transfer and data backup

The identification of all necessary data, its export from the USA to Germany, and its provision and backup on servers at the University of Stuttgart was and remains an essential prerequisite for the establishment and operation of a SOFIA Data Center. This process has now been completed, and several types of data and software can be distinguished:

SI raw data: The SOFIA raw data comprises a total of approximately 141 TB, of which the scientific instruments account for around 24.2 TB. The raw data also includes the housekeeping data from the telescope and the MCCS (Mission Control and Communication System) as well as the images from all telescope cameras.

Metadata: This refers to the DCS (Data Cycle System), which contains important information about the observation projects. This includes the observation proposals (AORs), the order of the observations, their distribution across different flights, information on technical feasibility, flight routes, etc.

Processed scientific data: This data was processed from the raw data using an earlier version of the data reduction pipeline (REDUX) or an optimized REDUX version (from 2017) and serves as a reference for comparison with further updates to the pipeline developed as part of the SDC project.

For the sake of completeness, it should be mentioned at this point that the scientific data is already available today via NASA's multi-mission archive (IRSA – InfraRed Science Archive), but is no longer being maintained or improved.

Data reduction pipeline: The instrument-specific data reduction pipeline (REDUX) has also been backed up and is now operational at the SDC. No specific optimizations have been made to the pipeline for the US instruments (EXES, FORCAST, HAWC+) to date. Improvements to the pipeline for the FIFI-LS instrument are currently (summer 2025) in full swing, as the SDC has the complete expertise for this instrument. The pipeline for the GREAT instrument is being worked on at the University of Cologne.

Absorption of infrared radiation by the Earth's atmosphere as a function of wavelength at different altitudes
Absorption of infrared radiation by the Earth's atmosphere as a function of wavelength at different altitudes

Precipitable Water Vapor (PWV)

Water vapor, or more precisely “precipitable water vapor” (PWV), refers to the amount of water vapor in a vertical column of the Earth's atmosphere (along the line of sight of the telescope) that can condense into liquid water. PWV is measured in mm and is the main reason for the absorption of long-wave infrared radiation in the atmosphere. Even at SOFIA's flight altitude (stratosphere), PWV significantly influences astronomical measurements. The SDC project will use a novel method for determining water vapor based on the reanalysis of the Earth's atmosphere by the European Centre for Medium-Range Weather Forecasts (ECMWF). The reanalysis draws on a catalog that records a large number of physical parameters of the atmosphere with a geographical resolution of 25 km. These data are then used to determine the water vapor content in the telescope's line of sight as a function of time, geographical coordinates, and flight altitude.

This information is determined for all SOFIA science flights and made available as publicly accessible water vapor products. They enable significantly improved correction of astronomical observations with regard to atmospheric transmission. The procedure has already been successfully demonstrated using data from the FIFI-LS instrument.

Imager and Pointing

The SOFIA telescope had a total of three cameras in the visible light wavelength range, which were in constant use during all scientific flights and captured around 25,000 images per camera during a flight. These were all stored, but were not previously available to researchers. The cameras are:

  1. Wide Field Imager (WFI) with a focal length of 136 mm (field of view 360 x 360 arcmin)
  2. Fine Field Imager (FFI) with a focal length of 733 mm (field of view 67 x 67 arcmin)
  3. Focal Plane Imager plus (FPI+) in the focal plane of the telescope; uses the telescope optics with a focal length of 5,240 mm (field of view 9.4 x 9.4 arcmin)

The two CCD cameras, WFI and FFI, were mounted on the front ring of the telescope and were mainly used by the telescope operator for orientation in the sky and to help identify star patterns. The FPI+ was the telescope's tracking camera and was also used as a scientific instrument for observing stellar occultations.

In order to make the images recorded by the three telescope cameras usable, they must first be converted from ARK format (Level 0 images) to the FITS format (Flexible Image Transport System), which is widely used in astronomy. In the process, the FITS files are enriched in their header with a variety of relevant data extracted from various housekeeping files and stored under corresponding FITS keywords in the header. This also includes the WCS (World Coordinate System) data, which provides information about the sky coordinates of the image (Level 1 Images).

The celestial coordinates correspond to the data commanded to the telescope, which are derived from the position of the observed object. However, the actual celestial coordinates, i.e., the telescope pointing, can be determined even more accurately using plate solving methods. Programs such as Astrometry.net or SCAMP are used for this purpose, which can determine the exact position and orientation of an image based on a star catalog. These are then stored in the WCS header of the FITS image and saved for later use (Level 2 Images). This can lead to a further improvement in the subsequent data reduction.

3D CAD model of the SOFIA telescope
3D CAD model of the SOFIA telescope

Operations and Engineering Archive

A large amount of technical documentation was created during the development and operation of the SOFIA telescope. The Acceptance Data Package (ADP) from the developer consortium alone comprises around 150 A4 folders, while the documentation from NASA's Critical Design Review (CDR) comprises a further 50 A4 folders. This documentation was digitized during operation and kept up to date when modifications, repairs, or improvements were introduced. The documents were subject to version control and were managed in the Windchill document management tool. In addition, there are technical and operational documents from other areas that are also available in electronic form. The most important document categories are

  • Technical telescope documentation (Windchill)
  • Documentation of maintenance work (NAMIS) including the work order system
  • Documentation of telescope electronics (EPALN)
  • Issue tracking and configuration management (JIRA)
  • Documentation of development projects and other documents
  • CAD data from various tools such as SolidWorks, Zemax, etc.

To ensure the long-term availability of all this data, it is transferred to the University of Stuttgart's data repository (DaRUS). This is based on the open source software Dataverse and guarantees that the data is stored securely for the long term. In addition, access to the data (internal and external with access control) is ensured. The documents can be clearly structured and organized, and the database can be searched using search criteria so that individual documents or groups of documents can be found quickly.

FORCAST mounted to SOFIA's instrument flange
FORCAST mounted to SOFIA's instrument flange

Scientific instruments and Redux pipeline

During its operational phase, the SOFIA observatory had a large number of different scientific instruments (SIs), each with a dedicated functionality in the fields of spectroscopy, photometry, or polarimetry. Some of the instruments could be operated in different operating modes or wavelength ranges. The instruments and their key characteristics are described in a separate section of the SDC website.

The SI data (raw data) recorded during the observation campaign are not directly suitable for data analysis and visualization of astronomical phenomena or even results. Rather, extensive and complex reprocessing of the raw data (data reduction) is necessary to remove all instrumental and atmospheric disturbances and to calibrate the data in order to obtain physically meaningful and scientifically usable images or spectra of the observed astronomical object. Typical disturbance variables that must be corrected by data reduction include, for example

  • Intrinsic emission from the atmosphere and the telescope
  • Absorption by the Earth's atmosphere
  • Detector effects

For the SOFIA data, data reduction for all instruments (except GREAT, see below) took place under a uniform framework called “Redux” or “Redux Pipeline.” The aim of the SDC project is to modify the Redux pipeline in line with the latest findings and using improved algorithms, thereby eliminating instrumental peculiarities, artifacts, and calibration deficits as far as possible. In addition, the pointing and water vapor data recalculated in the SDC project will also be included in the data reduction. Faulty or incomplete data sets will also be identified and flagged if they cannot be repaired. This will allow them to be excluded from analysis and evaluation.

The data from all nine SOFIA observation cycles (OC #1 to OC #9) with all instruments will be reprocessed in this way. This will enable researchers from different astrophysical disciplines to reliably access, use, and evaluate SOFIA data even without in-depth specific knowledge of the instruments and their data reduction. This will also lead to further dissemination of the data obtained with SOFIA.

With regard to the prioritization of instruments, the German Space Agency, in cooperation with the GSSWG, has decided on the following order, which will be applied in particular in the event of capacity bottlenecks during processing: Priority will be given to instruments financed by Germany (GREAT, FIFI-LS, FPI+) ahead of US instruments in the order HAWC+, FORCAST, and EXES. Data from the HIPO and FLITECAM instruments, which were only in use for a very short time at the beginning of SOFIA's operation, will not be reprocessed. Adjustments to this prioritization can be discussed within the SDC User Group and proposed to the space agency.

The GREAT (https://upgreat.uni-koeln.de) instrument also occupies a special position. As was already the case during the active phase of the SOFIA project, the data is reprocessed by the University of Cologne. However, there are plans to transfer the GREAT data to the SDC database, thus making the data from all instruments available under a uniform interface.

The SDC's work is therefore currently (11/2025) focused on the topics of water vapor, pointing, and FIFI-LS.

The further development of the Redux pipeline is carried out on the University of Stuttgart's GitHub Enterprise Server, and the code is made available to users. The server is publicly accessible at:

https://github.com/SOFIA-Data-Center/sofia_redux

and was derived from:

https://github.com/SOFIA-USRA/sofia_redux

A schematic representation of the reprocessing workflow is summarized in this graph.

Schematic representation of the planned reprocessing workflow for SOFIA data
Schematic representation of the planned reprocessing workflow for SOFIA data

SOFIA Data Archive and Virtual Observatory (VO)

The SOFIA Data Archive is designed to efficiently manage and provide access to all astronomical and technical data from the observatory, enabling further scientific use. In addition, a wide range of supporting information is also provided to facilitate and support work with the scientific data.

For example, the contents of the Data Cycle System (DCS) will be made available, including flight plans, publications, information on observation programs, and other metadata. The housekeeping data of the observatory and the telescope will also be accessible in order to obtain additional information regarding the quality of the data, if necessary. The links between data sets and already published publications will also be highlighted.

In addition to the raw data from the instruments, the file system containing the scientific data will also include data from various processing stages, up to and including composite mosaics. The reprocessing itself will take place in a largely automated environment in the background and without time-consuming interaction. This allows reprocessing to be carried out iteratively and in clear steps and to be repeated if required.

The contents of the operations and engineering archive has already been discussed above. The results of the imager pipeline, i.e., FITS files of levels 0, 1, and 2, will also be made available in the archive.

Access to the data archive is to be made compatible with the standards of the Virtual Observatory (VO). On the one hand, this has the advantage that the programming effort required to create the user interfaces can be reduced, as existing software packages can be used. On the other hand, users will find a familiar user interface, as IRSA uses the same tools. This also increases the findability of SOFIA data outside the immediate FIR community.

A possible solution has already been discussed with the Astronomical Computing Institute at the University of Heidelberg (ARI, Markus Demleitner), which hosts the German Astrophysical Virtual Observatory (GAVO). The “DaCHS” software library available there can be used to set up a VO-compatible data archive (“VO Archive SDC”). The VO client “Firefly,” developed at Caltech/IPAC and already in use at IRSA for several large projects, is planned as the user interface.

The data repository of the University of Stuttgart (DaRUS) is currently targeted as the storage location for the SOFAI data archive. After completion of the SDC project in 2029, a transfer of the data to the German Center for Astrophysics in Görlitz is under discussion, but permanent data storage in Stuttgart is also possible.

Based on the scientific and all accompanying data and documents, a comprehensive picture of the observatory's scientific observations, its technology, its operations, and its workflows can be reconstructed and studied in detail. This goes far beyond the contents of the currently available IRSA archive. The diagram shows a schematic representation of the SDC VO archive.

Schematic representation of the SDC VO archive
Schematic representation of the SDC VO archive

Kontakt

This image showsBernhard Schulz

Bernhard Schulz

Dr. rer. nat.

Project scientist SOFIA Data Center

This image showsBenjamin Greiner

Benjamin Greiner

Dr.-Ing.

Research Associate SOFIA Data Center

This image showsMichael Hütwohl

Michael Hütwohl

Dipl.-Ing.

Project Manager SOFIA Data Center

To the top of the page