diff --git a/docs/source/_static/workflow.png b/docs/source/_static/workflow.png new file mode 100644 index 000000000..0f295fc61 Binary files /dev/null and b/docs/source/_static/workflow.png differ diff --git a/docs/source/user_manual/index.rst b/docs/source/user_manual/index.rst index ecb87f1cb..63edf8b49 100644 --- a/docs/source/user_manual/index.rst +++ b/docs/source/user_manual/index.rst @@ -5,6 +5,7 @@ User Manual .. toctree:: :maxdepth: 1 + overview input_files search_space masking @@ -14,6 +15,12 @@ User Manual custom_filtering +Overview +-------- + +For an introduction to KBMOD and its components, see the :ref:`KBMOD Overview` page. + + GPU Requirements ---------------- @@ -91,23 +98,25 @@ Running KBMOD To run a search, KBMOD must be provided with * appropriately pre-processed input data (see :ref:`Input Files`) -* appropriate search and filter parameters (see :ref:`Masking`) +* appropriate search and filter parameters (see :ref:`Search Parameters`) + +For an introduction to the KBMOD search algorithm, see the :ref:`Search Algorithm and Search Space` page. The search is initiated via the :py:class:`~~kbmod.run_search.run_search` class and consists of several phases: * Data is loaded from the input files as specified above (see :ref:`Input Files` for more details). -* Masks are applied to the images to remove invalid pixels (see :ref:`Masking` for more details). * The shift and stack approach is used to search for potential trajectories originating from each pixel in the first image. -* The list of potential trajectories is filtered using various metrics. -* Remaining trajectories are clustered to remove duplicates. Only one trajectory per cluster is kept. -* The found trajectories are compared against known objects and matches are indicated. -* The found trajectories are output to result files for later analysis. +* The list of potential trajectories is filtered using various metrics (see :ref:`Results Filtering`). +* Remaining trajectories are clustered to remove duplicates. Only one trajectory per cluster is kept (see :ref:`Results Filtering`). +* The found trajectories are output to result files for later analysis (see :ref:`Output Files`). Data Model ---------- -KBMOD uses an hierarchy of three nested data structures to store the image data over which it searches. +The :py:class:`~~kbmod.work_unit.WorkUnit` is the basic data unit for KBMOD. It includes all of the information needed for KBMOD to run, including images, time stamps, WCS, and metadata. + +Within the :py:class:`~~kbmod.work_unit.WorkUnit`, KBMOD uses an hierarchy of three nested data structures to store the image data over which it searches. .. image:: ../_static/datamodel.png :alt: schematic diff --git a/docs/source/user_manual/input_files.rst b/docs/source/user_manual/input_files.rst index 03affdf60..63fed0cb3 100644 --- a/docs/source/user_manual/input_files.rst +++ b/docs/source/user_manual/input_files.rst @@ -1,65 +1,30 @@ Input Files =========== -KBMOD expects Vera C. Rubin Science Pipelines calexp-style FITS files. These are multi-extension fits files that contain: -* photometrically and astometrically calibrated single-CCD image, usually refered to as the "science image", -* variance image, representing per-pixel noise levels, and a -* pixel bitmask - -stored in 1st, 2nd and 3rd header extension/plane respectively. The zeroth header extension is expected to contain the image metadata. A collection of science images that overlap the same area on the sky at different times are expected to be grouped into directories, usually refered to as "pointing groups". The path to this directory is a required input to KBMOD, see :ref:`Search Parameters`. - -The images are expected to be warped, i.e. geometrically transformed to a set of images with a consistent and uniform relationship between sky coordinates and image pixels on a shared pixel grid. - -Visit ID --------- - -In order to associate input files with auxiliary data, such as time stamps or PSFs, each visit uses a unique numeric ID. This ID string can be provided in the ``IDNUM`` field of the FITS file’s header 0. If no ``IDNUM`` field is provided, then KBMOD will attempt to derive the visit ID from the file name as described in the next section. - -Naming Scheme -------------- +KBMOD expects Vera C. Rubin Science Pipelines calexp-style data. These can be provided as a set of multi-extension FITS files, references to the data's location on a Butler instance, or a saved :py:class:`~kbmod.work_unit.WorkUnit`. -Each file **must** include ``.fits`` somewhere in the file name. Additionally the file names can be used to encode the visit ID. If no ``IDNUM`` field is provided, KBMOD will look for a contiguous sequence of five or more numeric digits in the file name. If found, the first such sequence is used as the visit ID. For example a file name “my12345.fits” will map to the visit ID “12345”. +Butler +------ -Time file ---------- +TODO -There are two cases where you would want to use an external time file: -* when the FITS files do not contain timestamp information - If no file is included, KBMOD will attempt to extract the timestamp from the FITS file header (in the MJD field). -* when you want to prefilter the files based on the parameter ``mjd_lims`` (see :ref:`Search Parameters`) before loading the file. - This reduces loading time when accessing a large directory. - -The time file provides a mapping of visit ID to timestamp. The time file is an ASCII text file containing two space-separated columns of data: the visit IDs and MJD time of the observation. The first line is a header denoted by ``#``:: - - # visit_id mean_julian_date - 439116 57162.42540605324 - 439120 57162.42863899306 - 439124 57162.43279313658 - 439128 57162.436995358796 - 439707 57163.41836675926 - 439711 57163.421717488425 - - - -PSF File +WorkUnit -------- -The PSF file is an ASCII text file containing two space-separated columns of data: the visit IDs and variance of the PSF for the corresponding observation. The first line is a header denoted by ``#``:: +The :py:class:`~kbmod.work_unit.WorkUnit` objects provide functions for writing to and loading from files. In addition to image data, the :py:class:`~kbmod.work_unit.WorkUnit` includes configuration data for the run and all necessary metadata (e.g. the WCS). To load a :py:class:`~kbmod.work_unit.WorkUnit` from a file, use `WorkUnit.from_fits(input_filename)`. - # visit_id psf_val - 439116 1.1 - 439120 1.05 - 439124 1.4 -A PSF file is needed whenever you do not want to use the same default value for every image. +FITS Files +---------- +If loading data from raw FITS files, these must be Vera C. Rubin Science Pipelines calexp-style FITS files that contain: -Data Loading ------------- +* photometrically and astometrically calibrated single-CCD image, usually referred to as the "science image", +* variance image, representing per-pixel noise levels, and a +* pixel bitmask -Data is loaded using :py:meth:`~kbmod.analysis_utils.Interface.load_images`. The method creates an :py:class:`~kbmod.search.ImageStack` object, which is a collection of :py:class:`~kbmod.search.LayeredImage` objects. Each :py:class:`~kbmod.search.LayeredImage` contains the PSF, mask and the science image while :py:class:`~kbmod.search.ImageStack` tracks the properties that apply to all images in the collection, such as global masks etc. The :py:class:`~kbmod.search.ImageStack` will include only those images that with observation timestamps within the given MJD bounds. +stored in 1st, 2nd and 3rd header extension/plane respectively. The zeroth header extension is expected to contain the image metadata. A single FITS file can be loaded with the :py:meth:`kbmod.util_functions.load_deccam_layered_image` function, which takes the file name and a :py:class:`~kbmod.search.psf.PSF` object and produces a :py:class:`~kbmod.search.layered_image.LayeredImage`. -The :py:meth:`~kbmod.analysis_utils.Interface.load_images` method also returns helper information: - * ``img_info`` - An object containing auxiliary data from the fits files such as their WCS and the location of the observatory. +To build an :py:class:`~kbmod.image_collection.ImageCollection` from multiple FITS files, use the class's :py:meth:`~kbmod.image_collection.ImageCollection.fromDir` function. The images within a single run are expected to be warped, i.e. geometrically transformed to a set of images with a consistent and uniform relationship between sky coordinates and image pixels on a shared pixel grid. diff --git a/docs/source/user_manual/masking.rst b/docs/source/user_manual/masking.rst deleted file mode 100644 index 4a89a0be5..000000000 --- a/docs/source/user_manual/masking.rst +++ /dev/null @@ -1,6 +0,0 @@ -Masking -======= - -The KBMOD algorithm uses a data mask to represent invalid pixel values that should be ignored during the search. Masking is applied by in the standardizers. - -TODO: Add more detail. diff --git a/docs/source/user_manual/output_files.rst b/docs/source/user_manual/output_files.rst index 460203bd6..37cbe97ca 100644 --- a/docs/source/user_manual/output_files.rst +++ b/docs/source/user_manual/output_files.rst @@ -3,10 +3,37 @@ Output Files KBMOD outputs a range of information about the discovered trajectories. + Results Table ------------- -If the ``result_filename`` is provided, KBMOD will serialized the most of the :py:class:`~kbmod.Results` object into a single file. This filename should be the full or relative path and include the ``.ecsv`` suffix. +KBMOD stores all of the result information in a :py:class:`~kbmod.Results` object, which provides a wrapper around an AstroPy Table. Most users can treat the produced :py:class:`~kbmod.Results` object as a table and access columns directly. However, internally the object provides a range of helper functions to create derived columns described below. + +At a minimum, the results table includes the basic trajectory information, including: + +* the positions and velocities in pixel space (`x`, `y`, `vx`, and `vy`) +* basic statistics (`likelihood`, `flux`, and `obs_count`). + +By default it includes additional derived information such as the series of psi and phi values from the shift and stack algorithm (`psi_curve` and `phi_curve`), a vector of which time steps were marked valid by sigma-G (`obs_valid`), coadded stamps, the corresponding RA, dec in both the search images and (if applicable) the original, un-reprojected images. + +The coadded stamp information is controlled by the ``coadds`` and ``stamp_radius`` configuration parameters. The ``coadds`` parameter takes a list of which coadds to include in the results table, including: + +* ``mean`` - The mean pixel value. +* ``median`` - The median pixel value. +* ``sum`` - The sum of pixel values over all times (with no data mapping to 0.0). +* ``weighted`` - The weighted average of pixel values using 1.0 / variance as the weighting function. + +Each coadd is stored in its own column with the name `coadd_`. For more information on the stamps, see :ref:`Results Filtering`. + +The mapped RA, dec information consists of up to four columns. The columns `global_ra` and `global_dec` provide the (RA, dec) in the common WCS frame. If the images have been reprojected, this will be the WCS to which they were reprojected. If there is no global WCS given, these columns will not be present. + +The columns `img_ra` and `img_dec` indicate the positions in the original images. These could be the same or different from the global (RA, dec) even for reprojected images. If the reprojection consists of aligning the images, such as correcting for rotation, the coordinates will be the same. In that case, the RA and dec are not actually changing, just the mappping from RA, dec to pixels. However if the reprojection includes a shift of the viewing location, such as with the barycentric reprojection, we would expect the RA and dec to also change. + + +Results File +------------ + +If the ``result_filename`` is provided, KBMOD will serialize most of the :py:class:`~kbmod.Results` object into a single file. This filename should be the full or relative path and include the ``.ecsv`` suffix. This results file can be read as:: @@ -17,7 +44,7 @@ By default the "all_stamps" column is dropped to save space. This can disabled ( See the notebooks (especially the KBMOD analysis notebook) for examples of how to work with these results. -Legacy Text File ----------------- +ML Filtering +------------ -If the ``legacy_result_filename`` is provided, KBMOD will output the minimal result information (Trajectory details) in a text file format that can be read by numpy. The main results file includes the found trajectories, their likelihoods, and fluxes. +The results file can be further filtered using a neural network model trained on image stamp data via the `KBMOD ML `_ package. See the documentation in that repository for more information. diff --git a/docs/source/user_manual/overview.rst b/docs/source/user_manual/overview.rst new file mode 100644 index 000000000..cd4a305c6 --- /dev/null +++ b/docs/source/user_manual/overview.rst @@ -0,0 +1,46 @@ +KBMOD Overview +============== + +KBMOD is a shift and stack algorithm for detecting faint objects from a sequence of images. However the core shift and stack algorithm is only part of the broader system. KBMOD provides support for ingesting data, creating intermediate data formats, reprojecting images, running the search, filtering the candidates, and analyzing the results. The overall KBMOD workflow is shown in the figure below. + +.. image:: ../_static/workflow.png + :width: 1000 + :alt: A flow diagram of the KBMOD system. + + +Ingesting Data +-------------- + +KBMOD provides multiple mechanisms for ingesting data. The preferred path is through a `Rubin Butler `_. However support is also available for ingesting data from raw FITS files. See :ref:`Input Files` for more details. + + +ImageCollection and WorkUnit +---------------------------- + +The :py:class:`~~kbmod.work_unit.WorkUnit` is the basic data unit for KBMOD. It includes all of the information needed for KBMOD to run, including images, time stamps, WCS, and metadata. WorkUnits can be created from the Butler via an :py:class:`~~kbmod.image_collection.ImageCollection` (which stores information about the data locations within the Butler) or from a set of raw FITS files. + + +Reprojection +------------ + +In order to shift and stack, KBMOD requires that all of the images align in pixel space. This is not always the case with arbitrary data. For example Rubin will produce images at arbitrary rotations. To address this, KBMOD provides the ability to reproject images to a common WCS. + +In addition, we can account for the Earth's motion by reprojecting the images to a barycentric view. To do this we, need a guess distance of where the object is relative to the sun (e.g. 40 AU). KBMOD can then reproject the image to appear as though it was take from the barycenter. This improves linearization of the trajectory by removing the component of motion due to the Earth's motion. + + +Core Search +----------- + +KBMOD uses a shift and stack approach for search. For details on the search algorithm and parametrization, see :ref:`Search Algorithm and Search Space`. + + +Filtering +--------- + +The shift and stack algorithm can generate a vast number of potential candidates. KBMOD pre-filters these in multiple stages, including using a sigma-G filter to remove outliers, filters on the count of observations and the likelihood of the trajectory, filtering on the properties of coadded stamps, and clustering of similar trajectories. For details on the filtering stages see :ref:`Results analysis`. + + +Analysis +-------- + +Candidates from the results file can be loaded and visualized with tools in the analysis directory. See the notebooks for multiple tutorials and guides. \ No newline at end of file diff --git a/docs/source/user_manual/results_filtering.rst b/docs/source/user_manual/results_filtering.rst index 9eb08e7f7..029a7cf0a 100644 --- a/docs/source/user_manual/results_filtering.rst +++ b/docs/source/user_manual/results_filtering.rst @@ -1,18 +1,13 @@ -Results analysis -================ +Results Filtering +================= -The output files contain the set of all trajectories discovered by KBMOD. Many of these trajectories are false positive detections, some area already known objects and, because of the way KBMOD performs the search, some are duplicates. In the following sections we describe the various steps that remove unwanted trajectories from the set of results. +The output files contain the set of all trajectories discovered by KBMOD. Many of these trajectories are false positive detections, some area already known objects and, because of the way KBMOD performs the search, some are duplicates. In the following sections we describe the various steps that remove unwanted trajectories from the set of results. These steps are applied by KBMOD in the order listed below. -Filtering ---------- - -KBMOD uses two stages of filtering to reduce the number of candidate trajectories. The first stage uses the candidate trajectory's light curve and the second uses the coadded stamp generated from the trajectory's predicted positions. - Clipped SigmaG Filtering ------------------------ -During the light curve filtering phase, KBMOD computes the predicted positions at each time steps, assembles a light curve, and looks for statistical outliers along this light curve using clipped-sigmaG filtering. This function identifies outlier points along the likelihood curve and marks them as invalid points. The candidate's overall likelihood is recomputed using only the valid points. The entire candidate trajectory is filtered if less than three valid points remain or the new likelihood is below the threshold defined by the ``lh_level`` parameter. Additional parameters, such as ``sigmaG_lims`` are used to control the light curve filtering. +During the light curve filtering phase, KBMOD computes the predicted positions at each time steps, assembles a light curve, and looks for statistical outliers along this light curve using clipped-sigmaG filtering. This function identifies outlier points along the likelihood curve and marks them as invalid points. The candidate's overall likelihood is recomputed using only the valid points. The entire candidate trajectory is filtered if less than ``num_obs`` remain or the new likelihood is below the threshold defined by the ``lh_level`` parameter. Additional parameters, such as ``sigmaG_lims`` are used to control the light curve filtering. Relevant light curve filtering parameters include: * ``clip_negative`` - Whether to remove all negative values during filtering. @@ -22,10 +17,11 @@ Relevant light curve filtering parameters include: * ``max_lh`` - The maximum likelihood to keep. * ``sigmaG_lims`` - The percentiles for sigmaG filtering (default of [25, 75]). + Stamp Filtering --------------- -The stamp filtering stage is only applied if the ``do_stamp_filter`` parameter is set to True. This stage creates a single stamp representing the sum, mean, or median of pixel values for the stamps at each time step. The stamp type is defined by the ``stamp_type`` parameter and can take on values ``median``, ``mean``, or ``sum``. All of the stamp types drop masked pixels from their computations. The mean and median sums are computed over only the valid time steps from the light curve filtering phase (dropping stamps with outlier fluxes). The sum coadd uses all the time steps regardless of the first phase of filtering. +The stamp filtering stage is only applied if the ``do_stamp_filter`` parameter is set to True. This stage creates a single stamp representing the sum, mean, or median of pixel values for the stamps at each time step. The stamp type is defined by the ``stamp_type`` parameter and can take on values ``median``, ``mean``, ``sum``, or ``weighted`` (for variance weighted). All of the stamp types drop masked pixels from their computations. The mean and median sums are computed over only the valid time steps from the light curve filtering phase (dropping stamps with outlier fluxes). The sum coadd uses all the time steps regardless of the first phase of filtering. The stamps are filtered based on how closely the pixel values in the stamp image represent a Gaussian defined with the parameters: * ``center_thresh`` - The percentage of flux in the central pixel. For example setting this to 0.9 will require that the central pixel of the stamp has 90 percent of all the flux in the stamp. @@ -65,6 +61,10 @@ Most of the clustering approaches rely on predicted positions at different times The way DBSCAN computes distances between the trajectories depends on the encoding used. For positional encodings, such as ``position``, ``mid_position``, and ``start_end_position``, the distance is measured directly in pixels. The ``all`` encoding behaves somewhat similarly. However since it combines positions and velocities (or change in pixels per day), they are not actually in the same space. +In addition KBMOD also provides a cheap approximate clustering algorithm called ``nn_start_end``, which does not use DBSCAN. This algorithm finds the highest likelihood trajectory in a region of 4-d space (defined by the starting and ending x, y positions) and then masks all lower likelihood trajectories. The user can think of this as only returning the "best" candidate in a given parameter region. + +While not a true "clustering" algorithm, it is a fast way to quickly filter out similar trajectories. To use, you set ``cluster_type=nn_start_end``. + Relevant clustering parameters include: * ``cluster_type`` - The types of predicted values to use when determining which trajectories should be clustered together, including position, velocity, and angles (if ``do_clustering = True``). Must be one of all, position, or mid_position. diff --git a/docs/source/user_manual/search_params.rst b/docs/source/user_manual/search_params.rst index c191d8ada..d2d3b39af 100644 --- a/docs/source/user_manual/search_params.rst +++ b/docs/source/user_manual/search_params.rst @@ -20,7 +20,7 @@ This document serves to provide a quick overview of the existing parameters and | | | remove all negative values prior to | | | | computing the percentiles. | +------------------------+-----------------------------+----------------------------------------+ -| ``cluster_eps `` | 20.0 | The threshold to use for clustering | +| ``cluster_eps`` | 20.0 | The threshold to use for clustering | | | | similar results. | +------------------------+-----------------------------+----------------------------------------+ | ``cluster_type`` | all | Types of predicted values to use when | @@ -39,7 +39,7 @@ This document serves to provide a quick overview of the existing parameters and | | | These are not used in filtering, but | | | | saved to columns for analysis. Can | | | | include: "sum", "mean", "median", and | -| | | "weighted". +| | | "weighted". | | | | The filtering coadd is controlled by | | | | the ``stamp_type`` parameter. | +------------------------+-----------------------------+----------------------------------------+ @@ -49,7 +49,7 @@ This document serves to provide a quick overview of the existing parameters and | | | remove duplicates and known objects. | | | | See :ref:`Clustering` for more. | +------------------------+-----------------------------+----------------------------------------+ -| ``do_mask`` | True | Perform masking. See :ref:`Masking`. | +| ``do_mask`` | True | Apply the mask to the raw pixels. | +------------------------+-----------------------------+----------------------------------------+ | ``do_stamp_filter`` | True | Apply post-search filtering on the | | | | image stamps. | @@ -105,13 +105,8 @@ This document serves to provide a quick overview of the existing parameters and | ``psf_val`` | 1.4 | The value for the standard deviation of| | | | the point spread function (PSF). | +------------------------+-----------------------------+----------------------------------------+ -| ``repeated_flag_keys`` | default_repeated_flag_keys | The flags used when creating the global| -| | | mask. See :ref:`Masking`. | -+------------------------+-----------------------------+----------------------------------------+ | ``result_filename`` | None | Full filename and path for a single | | | | tabular result saves as ecsv. | -| | | Can be use used in addition to | -| | | outputting individual result files. | +------------------------+-----------------------------+----------------------------------------+ | ``results_per_pixel`` | 8 | The maximum number of results to | | | | to return for each pixel search. | @@ -135,7 +130,7 @@ This document serves to provide a quick overview of the existing parameters and | | | * ``median`` - Per pixel median | | | | * ``mean`` - Per pixel mean | | | | * ``weighted`` - Per pixel mean | -| | | weighted by 1.0 / variance. | +| | | weighted by 1.0 / variance. | +------------------------+-----------------------------+----------------------------------------+ | ``track_filtered`` | False | A Boolean indicating whether to track | | | | the filtered trajectories. Warning | diff --git a/docs/source/user_manual/search_space.rst b/docs/source/user_manual/search_space.rst index 218aee193..fc365b9e4 100644 --- a/docs/source/user_manual/search_space.rst +++ b/docs/source/user_manual/search_space.rst @@ -33,6 +33,7 @@ Choosing Velocities Perhaps the most complex aspect of the KBMOD algorithm is how it defines the grid of search velocities. KBMOD allows you to define custom search strategies to best match the data. These include: * ``SingleVelocitySearch`` - A single predefined x and y velocity * ``VelocityGridSearch`` - An evenly spaced grid of x and y velocities +* ``PencilSearch`` - A search in a small cone around a given velocity. * ``EclipticCenteredSearch`` - An evenly spaced grid of velocity magnitudes and angles (using a current parameterization) centered on a given or computed ecliptic angle. * ``KBMODV1SearchConfig`` - An evenly spaced grid of velocity magnitudes and angles (using the legacy parameterization). * ``RandomVelocitySearch`` - Randomly sampled x and y velocities @@ -78,6 +79,32 @@ The ``VelocityGridSearch`` strategy searches a uniform grid of x and y velocitie | ``max_vy`` | The maximum velocity in the y-dimension (pixels per day). | +------------------------+-----------------------------------------------------------+ +SingleVelocitySearch +-------------------- + +This search explores a cone around a given velocity, which allows it to refine the results for a given candidate or to search for a known (but approximate) object. The angles and velocity magnitudes are specified relative to a given center velocity. + ++------------------------+----------------------------------------------------------+ +| **Parameter** | **Interpretation** | ++------------------------+----------------------------------------------------------+ +| ``vx`` | The center velocity in pixels per day in the x-dimension | ++------------------------+----------------------------------------------------------+ +| ``vy`` | The center velocity in pixels per day in the y-dimension | ++------------------------+----------------------------------------------------------+ +| ``max_ang_offset`` | The maximum offset of a candidate trajectory from the | +| | center (in radians). Default: 0.2618 | ++------------------------+----------------------------------------------------------+ +| ``ang_step`` | The step size to explore for each angle (in radians). | +| | Default: 0.035 | ++------------------------+----------------------------------------------------------+ +| ``max_vel_offset`` | The maximum offset of the velocity's magnitude from the | +| | center (in pixels per day). Default: 10.0 | ++------------------------+----------------------------------------------------------+ +| ``vel_step`` | The step size to explore for each velocity magnitude | +| | (in pixels per day). Default: 0.5 | ++------------------------+----------------------------------------------------------+ + + EclipticCenteredSearch ---------------------- diff --git a/src/kbmod/trajectory_generator.py b/src/kbmod/trajectory_generator.py index b57632daf..5d9fb1c04 100644 --- a/src/kbmod/trajectory_generator.py +++ b/src/kbmod/trajectory_generator.py @@ -240,7 +240,7 @@ class PencilSearch(TrajectoryGenerator): The search varies the given velocity's angle and magnitude. The angle includes the range: original angle +/- max_ang_offset. - The velcoty magnitude includes the range: original magnitude +/- max_vel_offset + The velocity magnitude includes the range: original magnitude +/- max_vel_offset Parameters ----------