Skip to content

Commit

Permalink
Merge pull request #776 from dirac-institute/doc_fixes
Browse files Browse the repository at this point in the history
Bring documentation up to date
  • Loading branch information
jeremykubica authored Jan 17, 2025
2 parents 06e3fb2 + 06d509c commit e4e7ac2
Show file tree
Hide file tree
Showing 10 changed files with 149 additions and 86 deletions.
Binary file added docs/source/_static/workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 16 additions & 7 deletions docs/source/user_manual/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ User Manual
.. toctree::
:maxdepth: 1

overview
input_files
search_space
masking
Expand All @@ -14,6 +15,12 @@ User Manual
custom_filtering


Overview
--------

For an introduction to KBMOD and its components, see the :ref:`KBMOD Overview` page.


GPU Requirements
----------------

Expand Down Expand Up @@ -91,23 +98,25 @@ Running KBMOD
To run a search, KBMOD must be provided with

* appropriately pre-processed input data (see :ref:`Input Files`)
* appropriate search and filter parameters (see :ref:`Masking`)
* appropriate search and filter parameters (see :ref:`Search Parameters`)

For an introduction to the KBMOD search algorithm, see the :ref:`Search Algorithm and Search Space` page.

The search is initiated via the :py:class:`~~kbmod.run_search.run_search` class and consists of several phases:

* Data is loaded from the input files as specified above (see :ref:`Input Files` for more details).
* Masks are applied to the images to remove invalid pixels (see :ref:`Masking` for more details).
* The shift and stack approach is used to search for potential trajectories originating from each pixel in the first image.
* The list of potential trajectories is filtered using various metrics.
* Remaining trajectories are clustered to remove duplicates. Only one trajectory per cluster is kept.
* The found trajectories are compared against known objects and matches are indicated.
* The found trajectories are output to result files for later analysis.
* The list of potential trajectories is filtered using various metrics (see :ref:`Results Filtering`).
* Remaining trajectories are clustered to remove duplicates. Only one trajectory per cluster is kept (see :ref:`Results Filtering`).
* The found trajectories are output to result files for later analysis (see :ref:`Output Files`).


Data Model
----------

KBMOD uses an hierarchy of three nested data structures to store the image data over which it searches.
The :py:class:`~~kbmod.work_unit.WorkUnit` is the basic data unit for KBMOD. It includes all of the information needed for KBMOD to run, including images, time stamps, WCS, and metadata.

Within the :py:class:`~~kbmod.work_unit.WorkUnit`, KBMOD uses an hierarchy of three nested data structures to store the image data over which it searches.

.. image:: ../_static/datamodel.png
:alt: schematic
Expand Down
63 changes: 14 additions & 49 deletions docs/source/user_manual/input_files.rst
Original file line number Diff line number Diff line change
@@ -1,65 +1,30 @@
Input Files
===========

KBMOD expects Vera C. Rubin Science Pipelines calexp-style FITS files. These are multi-extension fits files that contain:

* photometrically and astometrically calibrated single-CCD image, usually refered to as the "science image",
* variance image, representing per-pixel noise levels, and a
* pixel bitmask

stored in 1st, 2nd and 3rd header extension/plane respectively. The zeroth header extension is expected to contain the image metadata. A collection of science images that overlap the same area on the sky at different times are expected to be grouped into directories, usually refered to as "pointing groups". The path to this directory is a required input to KBMOD, see :ref:`Search Parameters`.

The images are expected to be warped, i.e. geometrically transformed to a set of images with a consistent and uniform relationship between sky coordinates and image pixels on a shared pixel grid.

Visit ID
--------

In order to associate input files with auxiliary data, such as time stamps or PSFs, each visit uses a unique numeric ID. This ID string can be provided in the ``IDNUM`` field of the FITS file’s header 0. If no ``IDNUM`` field is provided, then KBMOD will attempt to derive the visit ID from the file name as described in the next section.

Naming Scheme
-------------
KBMOD expects Vera C. Rubin Science Pipelines calexp-style data. These can be provided as a set of multi-extension FITS files, references to the data's location on a Butler instance, or a saved :py:class:`~kbmod.work_unit.WorkUnit`.

Each file **must** include ``.fits`` somewhere in the file name. Additionally the file names can be used to encode the visit ID. If no ``IDNUM`` field is provided, KBMOD will look for a contiguous sequence of five or more numeric digits in the file name. If found, the first such sequence is used as the visit ID. For example a file name “my12345.fits” will map to the visit ID “12345”.
Butler
------

Time file
---------
TODO

There are two cases where you would want to use an external time file:

* when the FITS files do not contain timestamp information
If no file is included, KBMOD will attempt to extract the timestamp from the FITS file header (in the MJD field).
* when you want to prefilter the files based on the parameter ``mjd_lims`` (see :ref:`Search Parameters`) before loading the file.
This reduces loading time when accessing a large directory.

The time file provides a mapping of visit ID to timestamp. The time file is an ASCII text file containing two space-separated columns of data: the visit IDs and MJD time of the observation. The first line is a header denoted by ``#``::

# visit_id mean_julian_date
439116 57162.42540605324
439120 57162.42863899306
439124 57162.43279313658
439128 57162.436995358796
439707 57163.41836675926
439711 57163.421717488425



PSF File
WorkUnit
--------

The PSF file is an ASCII text file containing two space-separated columns of data: the visit IDs and variance of the PSF for the corresponding observation. The first line is a header denoted by ``#``::
The :py:class:`~kbmod.work_unit.WorkUnit` objects provide functions for writing to and loading from files. In addition to image data, the :py:class:`~kbmod.work_unit.WorkUnit` includes configuration data for the run and all necessary metadata (e.g. the WCS). To load a :py:class:`~kbmod.work_unit.WorkUnit` from a file, use `WorkUnit.from_fits(input_filename)`.

# visit_id psf_val
439116 1.1
439120 1.05
439124 1.4

A PSF file is needed whenever you do not want to use the same default value for every image.
FITS Files
----------

If loading data from raw FITS files, these must be Vera C. Rubin Science Pipelines calexp-style FITS files that contain:

Data Loading
------------
* photometrically and astometrically calibrated single-CCD image, usually referred to as the "science image",
* variance image, representing per-pixel noise levels, and a
* pixel bitmask

Data is loaded using :py:meth:`~kbmod.analysis_utils.Interface.load_images`. The method creates an :py:class:`~kbmod.search.ImageStack` object, which is a collection of :py:class:`~kbmod.search.LayeredImage` objects. Each :py:class:`~kbmod.search.LayeredImage` contains the PSF, mask and the science image while :py:class:`~kbmod.search.ImageStack` tracks the properties that apply to all images in the collection, such as global masks etc. The :py:class:`~kbmod.search.ImageStack` will include only those images that with observation timestamps within the given MJD bounds.
stored in 1st, 2nd and 3rd header extension/plane respectively. The zeroth header extension is expected to contain the image metadata. A single FITS file can be loaded with the :py:meth:`kbmod.util_functions.load_deccam_layered_image` function, which takes the file name and a :py:class:`~kbmod.search.psf.PSF` object and produces a :py:class:`~kbmod.search.layered_image.LayeredImage`.

The :py:meth:`~kbmod.analysis_utils.Interface.load_images` method also returns helper information:
* ``img_info`` - An object containing auxiliary data from the fits files such as their WCS and the location of the observatory.
To build an :py:class:`~kbmod.image_collection.ImageCollection` from multiple FITS files, use the class's :py:meth:`~kbmod.image_collection.ImageCollection.fromDir` function. The images within a single run are expected to be warped, i.e. geometrically transformed to a set of images with a consistent and uniform relationship between sky coordinates and image pixels on a shared pixel grid.
6 changes: 0 additions & 6 deletions docs/source/user_manual/masking.rst

This file was deleted.

35 changes: 31 additions & 4 deletions docs/source/user_manual/output_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,37 @@ Output Files

KBMOD outputs a range of information about the discovered trajectories.


Results Table
-------------

If the ``result_filename`` is provided, KBMOD will serialized the most of the :py:class:`~kbmod.Results` object into a single file. This filename should be the full or relative path and include the ``.ecsv`` suffix.
KBMOD stores all of the result information in a :py:class:`~kbmod.Results` object, which provides a wrapper around an AstroPy Table. Most users can treat the produced :py:class:`~kbmod.Results` object as a table and access columns directly. However, internally the object provides a range of helper functions to create derived columns described below.

At a minimum, the results table includes the basic trajectory information, including:

* the positions and velocities in pixel space (`x`, `y`, `vx`, and `vy`)
* basic statistics (`likelihood`, `flux`, and `obs_count`).

By default it includes additional derived information such as the series of psi and phi values from the shift and stack algorithm (`psi_curve` and `phi_curve`), a vector of which time steps were marked valid by sigma-G (`obs_valid`), coadded stamps, the corresponding RA, dec in both the search images and (if applicable) the original, un-reprojected images.

The coadded stamp information is controlled by the ``coadds`` and ``stamp_radius`` configuration parameters. The ``coadds`` parameter takes a list of which coadds to include in the results table, including:

* ``mean`` - The mean pixel value.
* ``median`` - The median pixel value.
* ``sum`` - The sum of pixel values over all times (with no data mapping to 0.0).
* ``weighted`` - The weighted average of pixel values using 1.0 / variance as the weighting function.

Each coadd is stored in its own column with the name `coadd_<type>`. For more information on the stamps, see :ref:`Results Filtering`.

The mapped RA, dec information consists of up to four columns. The columns `global_ra` and `global_dec` provide the (RA, dec) in the common WCS frame. If the images have been reprojected, this will be the WCS to which they were reprojected. If there is no global WCS given, these columns will not be present.

The columns `img_ra` and `img_dec` indicate the positions in the original images. These could be the same or different from the global (RA, dec) even for reprojected images. If the reprojection consists of aligning the images, such as correcting for rotation, the coordinates will be the same. In that case, the RA and dec are not actually changing, just the mappping from RA, dec to pixels. However if the reprojection includes a shift of the viewing location, such as with the barycentric reprojection, we would expect the RA and dec to also change.


Results File
------------

If the ``result_filename`` is provided, KBMOD will serialize most of the :py:class:`~kbmod.Results` object into a single file. This filename should be the full or relative path and include the ``.ecsv`` suffix.

This results file can be read as::

Expand All @@ -17,7 +44,7 @@ By default the "all_stamps" column is dropped to save space. This can disabled (
See the notebooks (especially the KBMOD analysis notebook) for examples of how to work with these results.


Legacy Text File
----------------
ML Filtering
------------

If the ``legacy_result_filename`` is provided, KBMOD will output the minimal result information (Trajectory details) in a text file format that can be read by numpy. The main results file includes the found trajectories, their likelihoods, and fluxes.
The results file can be further filtered using a neural network model trained on image stamp data via the `KBMOD ML <https://github.com/dirac-institute/kbmod-ml>`_ package. See the documentation in that repository for more information.
46 changes: 46 additions & 0 deletions docs/source/user_manual/overview.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
KBMOD Overview
==============

KBMOD is a shift and stack algorithm for detecting faint objects from a sequence of images. However the core shift and stack algorithm is only part of the broader system. KBMOD provides support for ingesting data, creating intermediate data formats, reprojecting images, running the search, filtering the candidates, and analyzing the results. The overall KBMOD workflow is shown in the figure below.

.. image:: ../_static/workflow.png
:width: 1000
:alt: A flow diagram of the KBMOD system.


Ingesting Data
--------------

KBMOD provides multiple mechanisms for ingesting data. The preferred path is through a `Rubin Butler <https://pipelines.lsst.io/getting-started/data-setup.html>`_. However support is also available for ingesting data from raw FITS files. See :ref:`Input Files` for more details.


ImageCollection and WorkUnit
----------------------------

The :py:class:`~~kbmod.work_unit.WorkUnit` is the basic data unit for KBMOD. It includes all of the information needed for KBMOD to run, including images, time stamps, WCS, and metadata. WorkUnits can be created from the Butler via an :py:class:`~~kbmod.image_collection.ImageCollection` (which stores information about the data locations within the Butler) or from a set of raw FITS files.


Reprojection
------------

In order to shift and stack, KBMOD requires that all of the images align in pixel space. This is not always the case with arbitrary data. For example Rubin will produce images at arbitrary rotations. To address this, KBMOD provides the ability to reproject images to a common WCS.

In addition, we can account for the Earth's motion by reprojecting the images to a barycentric view. To do this we, need a guess distance of where the object is relative to the sun (e.g. 40 AU). KBMOD can then reproject the image to appear as though it was take from the barycenter. This improves linearization of the trajectory by removing the component of motion due to the Earth's motion.


Core Search
-----------

KBMOD uses a shift and stack approach for search. For details on the search algorithm and parametrization, see :ref:`Search Algorithm and Search Space`.


Filtering
---------

The shift and stack algorithm can generate a vast number of potential candidates. KBMOD pre-filters these in multiple stages, including using a sigma-G filter to remove outliers, filters on the count of observations and the likelihood of the trajectory, filtering on the properties of coadded stamps, and clustering of similar trajectories. For details on the filtering stages see :ref:`Results analysis`.


Analysis
--------

Candidates from the results file can be loaded and visualized with tools in the analysis directory. See the notebooks for multiple tutorials and guides.
Loading

0 comments on commit e4e7ac2

Please sign in to comment.