qiskit-community · nkanazawa1989 · Feb 6, 2024 · Jan 10, 2024 · Jan 18, 2024 · Jan 18, 2024
diff --git a/docs/howtos/rerun_analysis.rst b/docs/howtos/rerun_analysis.rst
@@ -17,7 +17,7 @@ Solution
     consult the `migration guide <https://docs.quantum.ibm.com/api/migration-guides/qiskit-runtime-from-provider>`_.\
 
 Once you recreate the exact experiment you ran and all of its parameters and options,
-you can call the :meth:`.add_jobs` method with a list of :class:`Job
+you can call the :meth:`.ExperimentData.add_jobs` method with a list of :class:`Job
 <qiskit.providers.JobV1>` objects to generate the new :class:`.ExperimentData` object.
 The following example retrieves jobs from a provider that has access to them via their
 job IDs:
@@ -47,7 +47,7 @@ job IDs:
 instead of overwriting the existing one.
 
 If you have the job data in the form of a :class:`~qiskit.result.Result` object, you can
-invoke the :meth:`.add_data` method instead of :meth:`.add_jobs`:
+invoke the :meth:`.ExperimentData.add_data` method instead of :meth:`.ExperimentData.add_jobs`:
 
 .. jupyter-input::
 

diff --git a/docs/tutorials/curve_analysis.rst b/docs/tutorials/curve_analysis.rst
@@ -273,19 +273,19 @@ This table may look like:
 
 .. code-block::
 
-        xval      yval      yerr name  class_id category  shots
-    0    0.1  0.153659  0.011258    A         0      raw   1024
-    1    0.1  0.590732  0.015351    B         1      raw   1024
-    2    0.1  0.315610  0.014510    A         0      raw   1024
-    3    0.1  0.376098  0.015123    B         1      raw   1024
-    4    0.2  0.937073  0.007581    A         0      raw   1024
-    5    0.2  0.323415  0.014604    B         1      raw   1024
-    6    0.2  0.538049  0.015565    A         0      raw   1024
-    7    0.2  0.530244  0.015581    B         1      raw   1024
-    8    0.3  0.143902  0.010958    A         0      raw   1024
-    9    0.3  0.261951  0.013727    B         1      raw   1024
-    10   0.3  0.830732  0.011707    A         0      raw   1024
-    11   0.3  0.874634  0.010338    B         1      raw   1024
+        xval      yval      yerr name  data_uid category  shots     analysis
+    0    0.1  0.153659  0.011258    A         0      raw   1024   MyAnalysis
+    1    0.1  0.590732  0.015351    B         1      raw   1024   MyAnalysis
+    2    0.1  0.315610  0.014510    A         0      raw   1024   MyAnalysis
+    3    0.1  0.376098  0.015123    B         1      raw   1024   MyAnalysis
+    4    0.2  0.937073  0.007581    A         0      raw   1024   MyAnalysis
+    5    0.2  0.323415  0.014604    B         1      raw   1024   MyAnalysis
+    6    0.2  0.538049  0.015565    A         0      raw   1024   MyAnalysis
+    7    0.2  0.530244  0.015581    B         1      raw   1024   MyAnalysis
+    8    0.3  0.143902  0.010958    A         0      raw   1024   MyAnalysis
+    9    0.3  0.261951  0.013727    B         1      raw   1024   MyAnalysis
+    10   0.3  0.830732  0.011707    A         0      raw   1024   MyAnalysis
+    11   0.3  0.874634  0.010338    B         1      raw   1024   MyAnalysis
 
 where the experiment consists of two subset series A and B, and the experiment parameter (xval)
 is scanned from 0.1 to 0.3 in each subset. In this example, the experiment is run twice
@@ -295,9 +295,12 @@ for each condition. The role of each column is as follows:
 - ``yval``: Nominal part of the outcome. The outcome is something like expectation value, which is computed from the experiment result with the data processor.
 - ``yerr``: Standard error of the outcome, which is mainly due to sampling error.
 - ``name``: Unique identifier of the result class. This is defined by the ``data_subfit_map`` option.
-- ``class_id``: Numerical index corresponding to the result class. This number is automatically assigned.
-- ``category``: The attribute of data set. The "raw" category indicates an output from the data processing.
+- ``data_uid``: Integer number corresponding to the data unique index. This number is automatically assigned.
+- ``category``: The tag of data group. The "raw" category indicates an output from the data processing.
 - ``shots``: Number of measurement shots used to acquire this result.
+- ``analysis``: The name of curve analysis instance that generated this data. In :class:`.CompositeCurveAnalysis`, the table is a composite of tables from all component analyses.
+
+To find data points that belong to a particular dataset, you can follow :ref:`filter_scatter_table`.
 
 3. Formatting
 ^^^^^^^^^^^^^
@@ -310,7 +313,7 @@ This allows the analysis to easily estimate the slope of the curves to
 create algorithmic initial guess of fit parameters.
 A developer can inject extra data processing, for example, filtering, smoothing,
 or elimination of outliers for better fitting.
-The new class_id is given here so that its value corresponds to the fit model object index
+The new data_uid is given here so that its value corresponds to the fit model object index
 in this analysis class. This index mapping is done based upon the correspondence of
 the data name and the fit model name.
 
@@ -319,12 +322,12 @@ This may return new scatter table object with the addition of rows like the foll
 
 .. code-block::
 
-    12   0.1  0.234634  0.009183    A         0  formatted   2048
-    13   0.2  0.737561  0.008656    A         0  formatted   2048
-    14   0.3  0.487317  0.008018    A         0  formatted   2048
-    15   0.1  0.483415  0.010774    B         1  formatted   2048
-    16   0.2  0.426829  0.010678    B         1  formatted   2048
-    17   0.3  0.568293  0.008592    B         1  formatted   2048
+    12   0.1  0.234634  0.009183    A         0  formatted   2048   MyAnalysis
+    13   0.2  0.737561  0.008656    A         0  formatted   2048   MyAnalysis
+    14   0.3  0.487317  0.008018    A         0  formatted   2048   MyAnalysis
+    15   0.1  0.483415  0.010774    B         1  formatted   2048   MyAnalysis
+    16   0.2  0.426829  0.010678    B         1  formatted   2048   MyAnalysis
+    17   0.3  0.568293  0.008592    B         1  formatted   2048   MyAnalysis
 
 The default :meth:`_format_data` method adds its output data with the category "formatted".
 This category name must be also specified in the analysis option ``fit_category``.

diff --git a/qiskit_experiments/curve_analysis/__init__.py b/qiskit_experiments/curve_analysis/__init__.py
@@ -39,6 +39,7 @@
 .. autosummary::
     :toctree: ../stubs/
 
+    ScatterTable
     SeriesDef
     CurveData
     CurveFitResult

diff --git a/qiskit_experiments/curve_analysis/composite_curve_analysis.py b/qiskit_experiments/curve_analysis/composite_curve_analysis.py
@@ -230,34 +230,35 @@ def _create_figures(
             A list of figures.
         """
         for analysis in self.analyses():
-            sub_data = curve_data[curve_data.group == analysis.name]
-            for name, data in list(sub_data.groupby("name")):
-                full_name = f"{name}_{analysis.name}"
+            group_data = curve_data.filter(analysis=analysis.name)
+            model_names = analysis.model_names()
+            for uid, sub_data in group_data.iter_by_data():
+                full_name = f"{model_names[uid]}_{analysis.name}"
                 # Plot raw data scatters
                 if analysis.options.plot_raw_data:
-                    raw_data = data[data.category == "raw"]
+                    raw_data = sub_data.filter(category="raw")
                     self.plotter.set_series_data(
                         series_name=full_name,
-                        x=raw_data.xval.to_numpy(),
-                        y=raw_data.yval.to_numpy(),
+                        x=raw_data.x,
+                        y=raw_data.y,
                     )
                 # Plot formatted data scatters
-                formatted_data = data[data.category == analysis.options.fit_category]
+                formatted_data = sub_data.filter(category=analysis.options.fit_category)
                 self.plotter.set_series_data(
                     series_name=full_name,
-                    x_formatted=formatted_data.xval.to_numpy(),
-                    y_formatted=formatted_data.yval.to_numpy(),
-                    y_formatted_err=formatted_data.yerr.to_numpy(),
+                    x_formatted=formatted_data.x,
+                    y_formatted=formatted_data.y,
+                    y_formatted_err=formatted_data.y_err,
                 )
                 # Plot fit lines
-                line_data = data[data.category == "fitted"]
+                line_data = sub_data.filter(category="fitted")
                 if len(line_data) == 0:
                     continue
-                fit_stdev = line_data.yerr.to_numpy()
+                fit_stdev = line_data.y_err
                 self.plotter.set_series_data(
                     series_name=full_name,
-                    x_interp=line_data.xval.to_numpy(),
-                    y_interp=line_data.yval.to_numpy(),
+                    x_interp=line_data.x,
+                    y_interp=line_data.y,
                     y_interp_err=fit_stdev if np.isfinite(fit_stdev).all() else None,
                 )
 
@@ -354,7 +355,7 @@ def _run_analysis(
             metadata["group"] = analysis.name
 
             table = analysis._format_data(analysis._run_data_processing(experiment_data.data()))
-            formatted_subset = table[table.category == analysis.options.fit_category]
+            formatted_subset = table.filter(category=analysis.options.fit_category)
             fit_data = analysis._run_curve_fit(formatted_subset)
             fit_dataset[analysis.name] = fit_data
 
@@ -376,32 +377,35 @@ def _run_analysis(
 
             if fit_data.success:
                 # Add fit data to curve data table
-                fit_curves = []
-                columns = list(table.columns)
                 model_names = analysis.model_names()
-                for i, sub_data in list(formatted_subset.groupby("class_id")):
-                    xval = sub_data.xval.to_numpy()
+                for data_id, sub_data in formatted_subset.iter_by_data():
+                    xval = sub_data.x
                     if len(xval) == 0:
                         # If data is empty, skip drawing this model.
                         # This is the case when fit model exist but no data to fit is provided.
                         continue
                     # Compute X, Y values with fit parameters.
-                    xval_fit = np.linspace(np.min(xval), np.max(xval), num=100)
-                    yval_fit = eval_with_uncertainties(
-                        x=xval_fit,
-                        model=analysis.models[i],
+                    xval_arr_fit = np.linspace(np.min(xval), np.max(xval), num=100, dtype=float)
+                    uval_arr_fit = eval_with_uncertainties(
+                        x=xval_arr_fit,
+                        model=analysis.models[data_id],
                         params=fit_data.ufloat_params,
                     )
-                    model_fit = np.full((100, len(columns)), np.nan, dtype=object)
-                    fit_curves.append(model_fit)
-                    model_fit[:, columns.index("xval")] = xval_fit
-                    model_fit[:, columns.index("yval")] = unp.nominal_values(yval_fit)
+                    yval_arr_fit = unp.nominal_values(uval_arr_fit)
                     if fit_data.covar is not None:
-                        model_fit[:, columns.index("yerr")] = unp.std_devs(yval_fit)
-                    model_fit[:, columns.index("name")] = model_names[i]
-                    model_fit[:, columns.index("class_id")] = i
-                    model_fit[:, columns.index("category")] = "fitted"
-                table = table.append_list_values(other=np.vstack(fit_curves))
+                        yerr_arr_fit = unp.std_devs(uval_arr_fit)
+                    else:
+                        yerr_arr_fit = np.zeros_like(xval_arr_fit)
+                    for xval, yval, yerr in zip(xval_arr_fit, yval_arr_fit, yerr_arr_fit):
+                        table.add_row(
+                            name=model_names[data_id],
+                            data_uid=data_id,
+                            category="fitted",
+                            x=xval,
+                            y=yval,
+                            y_err=yerr,
+                            analysis=analysis.name,
+                        )
                 analysis_results.extend(
                     analysis._create_analysis_results(
                         fit_data=fit_data,
@@ -416,11 +420,11 @@ def _run_analysis(
                     analysis._create_curve_data(curve_data=formatted_subset, **metadata)
                 )
 
-            # Add extra column to identify the fit model
-            table["group"] = analysis.name
             curve_data_set.append(table)
 
-        combined_curve_data = pd.concat(curve_data_set)
+        combined_curve_data = ScatterTable.from_dataframe(
+            pd.concat([d.dataframe for d in curve_data_set])
+        )
         total_quality = self._evaluate_quality(fit_dataset)
 
         # After the quality is determined, plot can become a boolean flag for whether