Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup dataframes #1360

Merged
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
78a967a
Update internals of AnalysisResultTable
nkanazawa1989 Jan 10, 2024
7a83179
Update internals of ScatterTable
nkanazawa1989 Jan 18, 2024
3bc9525
Removed unused mixin
nkanazawa1989 Jan 18, 2024
68013a8
Fix index mismatch issue after JSON serialization
nkanazawa1989 Jan 31, 2024
5032e86
Add more tests
nkanazawa1989 Jan 31, 2024
7736f19
Bug fixes
nkanazawa1989 Jan 31, 2024
8bbaa15
Merge branch 'main' of github.com:Qiskit/qiskit-experiments into clea…
nkanazawa1989 Jan 31, 2024
fc9273e
Unpin pandas 2.2
nkanazawa1989 Jan 31, 2024
0cae116
Update old pattern
nkanazawa1989 Jan 31, 2024
2fb28dc
Fix cross-reference
nkanazawa1989 Feb 1, 2024
ac972fd
Update curve analysis tutorial
nkanazawa1989 Feb 2, 2024
01471bb
Add shortcut methods
nkanazawa1989 Feb 2, 2024
8dc6c4f
Bugfix autosave
nkanazawa1989 Feb 2, 2024
144127a
Raise user warning when numbers contain multiple series
nkanazawa1989 Feb 2, 2024
a81f97c
Merge branch 'main' into cleanup/more_composition
nkanazawa1989 Feb 2, 2024
7c0662c
Bugfix: Missing circuit metadata in composite analysis
nkanazawa1989 Feb 2, 2024
92cfc92
Replace class_id with data_uid
nkanazawa1989 Feb 5, 2024
346d23a
Add documentation for filtering triplet
nkanazawa1989 Feb 5, 2024
ee03161
Apply review comments
nkanazawa1989 Feb 5, 2024
ee5b34d
Wording suggestions
nkanazawa1989 Feb 6, 2024
38abdff
Remove DEFAULT_
nkanazawa1989 Feb 6, 2024
9e27f16
Reorganize the doc
nkanazawa1989 Feb 6, 2024
b870be3
Remove _data
nkanazawa1989 Feb 6, 2024
cc905c6
Remove key from add_data
nkanazawa1989 Feb 6, 2024
0dc4eb2
Remove type cast depending on the entry number
nkanazawa1989 Feb 6, 2024
f8c1efe
Minor docs formatting
nkanazawa1989 Feb 6, 2024
ee92f1d
Add more tests for result table
nkanazawa1989 Feb 6, 2024
03aac67
Performance optimization
nkanazawa1989 Feb 6, 2024
ac5bdd8
name, data_uid -> series_name, series_id
nkanazawa1989 Feb 6, 2024
58671eb
Add more tests for construction
nkanazawa1989 Feb 6, 2024
7ff2c6a
Update Ramsey analysis
nkanazawa1989 Feb 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/howtos/rerun_analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Solution
consult the `migration guide <https://docs.quantum.ibm.com/api/migration-guides/qiskit-runtime-from-provider>`_.\

Once you recreate the exact experiment you ran and all of its parameters and options,
you can call the :meth:`.add_jobs` method with a list of :class:`Job
you can call the :meth:`.ExperimentData.add_jobs` method with a list of :class:`Job
<qiskit.providers.JobV1>` objects to generate the new :class:`.ExperimentData` object.
The following example retrieves jobs from a provider that has access to them via their
job IDs:
Expand Down Expand Up @@ -47,7 +47,7 @@ job IDs:
instead of overwriting the existing one.

If you have the job data in the form of a :class:`~qiskit.result.Result` object, you can
invoke the :meth:`.add_data` method instead of :meth:`.add_jobs`:
invoke the :meth:`.ExperimentData.add_data` method instead of :meth:`.ExperimentData.add_jobs`:

.. jupyter-input::

Expand Down
47 changes: 25 additions & 22 deletions docs/tutorials/curve_analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -273,19 +273,19 @@ This table may look like:

.. code-block::

xval yval yerr name class_id category shots
0 0.1 0.153659 0.011258 A 0 raw 1024
1 0.1 0.590732 0.015351 B 1 raw 1024
2 0.1 0.315610 0.014510 A 0 raw 1024
3 0.1 0.376098 0.015123 B 1 raw 1024
4 0.2 0.937073 0.007581 A 0 raw 1024
5 0.2 0.323415 0.014604 B 1 raw 1024
6 0.2 0.538049 0.015565 A 0 raw 1024
7 0.2 0.530244 0.015581 B 1 raw 1024
8 0.3 0.143902 0.010958 A 0 raw 1024
9 0.3 0.261951 0.013727 B 1 raw 1024
10 0.3 0.830732 0.011707 A 0 raw 1024
11 0.3 0.874634 0.010338 B 1 raw 1024
xval yval yerr name data_uid category shots analysis
0 0.1 0.153659 0.011258 A 0 raw 1024 MyAnalysis
1 0.1 0.590732 0.015351 B 1 raw 1024 MyAnalysis
2 0.1 0.315610 0.014510 A 0 raw 1024 MyAnalysis
3 0.1 0.376098 0.015123 B 1 raw 1024 MyAnalysis
4 0.2 0.937073 0.007581 A 0 raw 1024 MyAnalysis
5 0.2 0.323415 0.014604 B 1 raw 1024 MyAnalysis
6 0.2 0.538049 0.015565 A 0 raw 1024 MyAnalysis
7 0.2 0.530244 0.015581 B 1 raw 1024 MyAnalysis
8 0.3 0.143902 0.010958 A 0 raw 1024 MyAnalysis
9 0.3 0.261951 0.013727 B 1 raw 1024 MyAnalysis
10 0.3 0.830732 0.011707 A 0 raw 1024 MyAnalysis
11 0.3 0.874634 0.010338 B 1 raw 1024 MyAnalysis

where the experiment consists of two subset series A and B, and the experiment parameter (xval)
is scanned from 0.1 to 0.3 in each subset. In this example, the experiment is run twice
Expand All @@ -295,9 +295,12 @@ for each condition. The role of each column is as follows:
- ``yval``: Nominal part of the outcome. The outcome is something like expectation value, which is computed from the experiment result with the data processor.
- ``yerr``: Standard error of the outcome, which is mainly due to sampling error.
- ``name``: Unique identifier of the result class. This is defined by the ``data_subfit_map`` option.
- ``class_id``: Numerical index corresponding to the result class. This number is automatically assigned.
- ``category``: The attribute of data set. The "raw" category indicates an output from the data processing.
- ``data_uid``: Integer number corresponding to the data unique index. This number is automatically assigned.
nkanazawa1989 marked this conversation as resolved.
Show resolved Hide resolved
- ``category``: The tag of data group. The "raw" category indicates an output from the data processing.
nkanazawa1989 marked this conversation as resolved.
Show resolved Hide resolved
- ``shots``: Number of measurement shots used to acquire this result.
- ``analysis``: The name of curve analysis instance that generated this data. In :class:`.CompositeCurveAnalysis`, the table is a composite of tables from all component analyses.
nkanazawa1989 marked this conversation as resolved.
Show resolved Hide resolved

To find data points that belong to a particular dataset, you can follow :ref:`filter_scatter_table`.

3. Formatting
^^^^^^^^^^^^^
Expand All @@ -310,7 +313,7 @@ This allows the analysis to easily estimate the slope of the curves to
create algorithmic initial guess of fit parameters.
A developer can inject extra data processing, for example, filtering, smoothing,
or elimination of outliers for better fitting.
The new class_id is given here so that its value corresponds to the fit model object index
The new data_uid is given here so that its value corresponds to the fit model object index
nkanazawa1989 marked this conversation as resolved.
Show resolved Hide resolved
in this analysis class. This index mapping is done based upon the correspondence of
the data name and the fit model name.

Expand All @@ -319,12 +322,12 @@ This may return new scatter table object with the addition of rows like the foll

.. code-block::

12 0.1 0.234634 0.009183 A 0 formatted 2048
13 0.2 0.737561 0.008656 A 0 formatted 2048
14 0.3 0.487317 0.008018 A 0 formatted 2048
15 0.1 0.483415 0.010774 B 1 formatted 2048
16 0.2 0.426829 0.010678 B 1 formatted 2048
17 0.3 0.568293 0.008592 B 1 formatted 2048
12 0.1 0.234634 0.009183 A 0 formatted 2048 MyAnalysis
nkanazawa1989 marked this conversation as resolved.
Show resolved Hide resolved
13 0.2 0.737561 0.008656 A 0 formatted 2048 MyAnalysis
14 0.3 0.487317 0.008018 A 0 formatted 2048 MyAnalysis
15 0.1 0.483415 0.010774 B 1 formatted 2048 MyAnalysis
16 0.2 0.426829 0.010678 B 1 formatted 2048 MyAnalysis
17 0.3 0.568293 0.008592 B 1 formatted 2048 MyAnalysis

The default :meth:`_format_data` method adds its output data with the category "formatted".
This category name must be also specified in the analysis option ``fit_category``.
Expand Down
1 change: 1 addition & 0 deletions qiskit_experiments/curve_analysis/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
.. autosummary::
:toctree: ../stubs/

ScatterTable
SeriesDef
CurveData
CurveFitResult
Expand Down
74 changes: 39 additions & 35 deletions qiskit_experiments/curve_analysis/composite_curve_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,34 +230,35 @@ def _create_figures(
A list of figures.
"""
for analysis in self.analyses():
sub_data = curve_data[curve_data.group == analysis.name]
for name, data in list(sub_data.groupby("name")):
full_name = f"{name}_{analysis.name}"
group_data = curve_data.filter(analysis=analysis.name)
model_names = analysis.model_names()
for uid, sub_data in group_data.iter_by_data():
full_name = f"{model_names[uid]}_{analysis.name}"
# Plot raw data scatters
if analysis.options.plot_raw_data:
raw_data = data[data.category == "raw"]
raw_data = sub_data.filter(category="raw")
self.plotter.set_series_data(
series_name=full_name,
x=raw_data.xval.to_numpy(),
y=raw_data.yval.to_numpy(),
x=raw_data.x,
y=raw_data.y,
)
# Plot formatted data scatters
formatted_data = data[data.category == analysis.options.fit_category]
formatted_data = sub_data.filter(category=analysis.options.fit_category)
self.plotter.set_series_data(
series_name=full_name,
x_formatted=formatted_data.xval.to_numpy(),
y_formatted=formatted_data.yval.to_numpy(),
y_formatted_err=formatted_data.yerr.to_numpy(),
x_formatted=formatted_data.x,
y_formatted=formatted_data.y,
y_formatted_err=formatted_data.y_err,
)
# Plot fit lines
line_data = data[data.category == "fitted"]
line_data = sub_data.filter(category="fitted")
if len(line_data) == 0:
continue
fit_stdev = line_data.yerr.to_numpy()
fit_stdev = line_data.y_err
self.plotter.set_series_data(
series_name=full_name,
x_interp=line_data.xval.to_numpy(),
y_interp=line_data.yval.to_numpy(),
x_interp=line_data.x,
y_interp=line_data.y,
y_interp_err=fit_stdev if np.isfinite(fit_stdev).all() else None,
)

Expand Down Expand Up @@ -354,7 +355,7 @@ def _run_analysis(
metadata["group"] = analysis.name

table = analysis._format_data(analysis._run_data_processing(experiment_data.data()))
formatted_subset = table[table.category == analysis.options.fit_category]
formatted_subset = table.filter(category=analysis.options.fit_category)
fit_data = analysis._run_curve_fit(formatted_subset)
fit_dataset[analysis.name] = fit_data

Expand All @@ -376,32 +377,35 @@ def _run_analysis(

if fit_data.success:
# Add fit data to curve data table
fit_curves = []
columns = list(table.columns)
model_names = analysis.model_names()
for i, sub_data in list(formatted_subset.groupby("class_id")):
xval = sub_data.xval.to_numpy()
for data_id, sub_data in formatted_subset.iter_by_data():
xval = sub_data.x
if len(xval) == 0:
# If data is empty, skip drawing this model.
# This is the case when fit model exist but no data to fit is provided.
continue
# Compute X, Y values with fit parameters.
xval_fit = np.linspace(np.min(xval), np.max(xval), num=100)
yval_fit = eval_with_uncertainties(
x=xval_fit,
model=analysis.models[i],
xval_arr_fit = np.linspace(np.min(xval), np.max(xval), num=100, dtype=float)
uval_arr_fit = eval_with_uncertainties(
x=xval_arr_fit,
model=analysis.models[data_id],
params=fit_data.ufloat_params,
)
model_fit = np.full((100, len(columns)), np.nan, dtype=object)
fit_curves.append(model_fit)
model_fit[:, columns.index("xval")] = xval_fit
model_fit[:, columns.index("yval")] = unp.nominal_values(yval_fit)
yval_arr_fit = unp.nominal_values(uval_arr_fit)
if fit_data.covar is not None:
model_fit[:, columns.index("yerr")] = unp.std_devs(yval_fit)
model_fit[:, columns.index("name")] = model_names[i]
model_fit[:, columns.index("class_id")] = i
model_fit[:, columns.index("category")] = "fitted"
table = table.append_list_values(other=np.vstack(fit_curves))
yerr_arr_fit = unp.std_devs(uval_arr_fit)
else:
yerr_arr_fit = np.zeros_like(xval_arr_fit)
for xval, yval, yerr in zip(xval_arr_fit, yval_arr_fit, yerr_arr_fit):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still surprises me that it is better to iterate over numpy arrays point by point and add them to them to lists to add to a new dataframe rather than just adding the numpy arrays to a new dataframe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling of the empty column is expensive because it requires careful handling of missing values. Without doing this shots column may be accidentally typecasted to float because numpy doesn't support nullable integer. This means we first need to create a 2D object-dtype ndarray and populate values, then convert it into dataframe. Since current _lazy_add_rows buffer assumes row-wise data list, arrays needs to be converted into this form internally.

table.add_row(
name=model_names[data_id],
data_uid=data_id,
category="fitted",
x=xval,
y=yval,
y_err=yerr,
analysis=analysis.name,
)
analysis_results.extend(
analysis._create_analysis_results(
fit_data=fit_data,
Expand All @@ -416,11 +420,11 @@ def _run_analysis(
analysis._create_curve_data(curve_data=formatted_subset, **metadata)
)

# Add extra column to identify the fit model
table["group"] = analysis.name
curve_data_set.append(table)

combined_curve_data = pd.concat(curve_data_set)
combined_curve_data = ScatterTable.from_dataframe(
pd.concat([d.dataframe for d in curve_data_set])
)
total_quality = self._evaluate_quality(fit_dataset)

# After the quality is determined, plot can become a boolean flag for whether
Expand Down
Loading
Loading