Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-46073 Switch consdb to multi-column primary key +1018-126 #36

Merged
merged 23 commits into from
Sep 26, 2024

Conversation

bbrondel
Copy link
Collaborator

This PR is also not as bad as the line counts would suggest, because a lot of that is machine-generated alembic migrations.

Take note, though, that there is handwritten code in the alembic files, at the end of upgrade().

Most of the changes are meant to migrate or interoperate between databases with exposure_id PK and day_obs + seq_num PK.

This goes along with a PR I have ready for lsst/sdm_schemas, not yet submitted.

@bbrondel bbrondel marked this pull request as ready for review September 11, 2024 03:47
# $SDM_SCHEMAS_DIR/yml/cdb_latiss.yaml,
# $SDM_SCHEMAS_DIR/yml/cdb_lsstcomcam.yaml, etc.
# 2. Load the LSST environment and setup felis.
# source loadLSST.bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I have used lsstsw it has directed me to source bin/envconfig, is there a different purpose for this script? feel free to reply outside of your PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone else is probably better positioned to answer this, but my understanding is that there are a few ways to install/load the LSST environment:

  • newinstall.sh
  • lsstinstall/eups
  • lsstsw

The first two initialize with loadLSST.bash, whereas the third uses envconfig.

@@ -181,6 +182,26 @@ def get_timestamp_columns(self, table: str) -> set[str]:
columns = self.timestamp_columns[table]
return columns

def get_schema_version(self, instrument: str) -> Version:
if "day_obs" in self.schemas[instrument].tables[f"cdb_{instrument}.ccdexposure"].columns:
return Version("3.2.0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we are on 3.3.0, would we have to change this every time we upgrade?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is that Jeremy will modify felis so that the version number is stored in the database. We can then use a database query to get the version, with maybe something like the above as a fallback when the version number is not present. As we leave behind older schemas with no versioning in the database, this would become unnecessary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is he aware of that request/expectation

tests/test_hinfo.py Show resolved Hide resolved
tests/test_pqserver.py Outdated Show resolved Hide resolved
tests/test_pqserver.py Show resolved Hide resolved
tests/test_pqserver.py Show resolved Hide resolved
tests/test_pqserver.py Show resolved Hide resolved
@@ -976,6 +1030,7 @@ def schema(
return {c.name: [str(c.type), c.doc] for c in schema.tables[table].columns}


if __name__ == "__main__":
logger.info(f"{__name__=}")
if __name__ == "__main__" or __name__ == "consdb_pq.pqserver":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guard even really necessary? Is anyone else going to import this?

Copy link
Collaborator Author

@bbrondel bbrondel Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's imported by pytest. I'm hoping that adding FastAPI dependency injection will properly clean up the situation.


#
# How to use this script:
# 1. Set the SDM_SCHEMAS_DIR environment variable to point to your sdm_schemas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be after step 2, and the traditional way to do this is to setup -r /path/to/sdm_schemas (or cd /path/to/sdm_schemas; setup -r .).

It should be after step 2 in case setup felis also sets up a version of sdm_schemas.

alembic-autogenerate.py Show resolved Hide resolved
from felis.tests.postgresql import setup_postgres_test_db

if len(sys.argv) <= 1:
print()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of """ would be better than the multiple print() functions here.

alembic-autogenerate.py Show resolved Hide resolved
}
the_schema = "cdb_latiss"
for destination_table in ("ccdexposure", "exposure_flexdata", "visit1_quicklook"):
for column in ("day_obs", "seq_num"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a lot more efficient to do both of these in the same UPDATE?

else:
return Version("3.1.0")

def get_day_obs_and_seq_num(self, instrument: str, exposure_id: int) -> tuple[int, int]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this becomes a performance problem, the day_obs and seq_num are computable from the exposure_id. Although that function shouldn't really be exposed to the users, having it in pqserver wouldn't be a problem.

with engine.connect() as conn:
for key, value in value_dict.items():
value_str = str(value)

values = {"obs_id": obs_id, "key": key, "value": value_str}
if has_multi_column_primary_keys:
day_obs, seq_num = instrument_tables.get_day_obs_and_seq_num(instrument_l, obs_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem good to call this for every key/value pair. Move it outside the loop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that works because there is no guarantee the day_obs and seq_num will be the same for all items.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If instrument_l and obs_id are constant, day_obs/seq_num have to be constant. The only things that are changing are key and value.

python/lsst/consdb/pqserver.py Show resolved Hide resolved

with setup_postgres_test_db() as instance:
context = DatabaseContext(md, instance.engine)
print(f"{type(instance.engine)=}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally shouldn't have "debugging" print() calls in the code. But maybe there's a good reason for it, as I see one in test_pqserver.py that seems to be left in despite a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's just an oversight.

# source loadLSST.bash
# setup felis
# setup -r /path/to/sdm_schemas
# 2. Set the SDM_SCHEMAS_DIR environment variable to point to your sdm_schemas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done by the setup -r command above, so it's not necessary as a separate step.
(Oh, and if the already-installed sdm_schemas is adequate and not a local clone, then setup sdm_schemas is sufficient.)

alembic-autogenerate.py Show resolved Hide resolved
command.upgrade(alembic_cfg, "head")

# Autogenerate a new migration
command.revision(alembic_cfg, autogenerate=True, message=revision_message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the migrations are going to have the same message, then does it make sense to combine all instruments into a single migration?

I guess I can see that operationally we might want to apply them at different times.

python/lsst/consdb/pqserver.py Show resolved Hide resolved
@ktlim
Copy link
Contributor

ktlim commented Sep 25, 2024

Still the question about the foreign (and one unique) keys getting dropped. While I think there's enough indirect constraints on their content, having a direct constraint doesn't hurt, I think.

@@ -181,6 +182,26 @@ def get_timestamp_columns(self, table: str) -> set[str]:
columns = self.timestamp_columns[table]
return columns

def get_schema_version(self, instrument: str) -> Version:
if "day_obs" in self.schemas[instrument].tables[f"cdb_{instrument}.ccdexposure"].columns:
return Version("3.2.0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is he aware of that request/expectation

@bbrondel bbrondel merged commit fb49ac2 into main Sep 26, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants