-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-46073 Switch consdb to multi-column primary key +1018-126 #36
Conversation
# $SDM_SCHEMAS_DIR/yml/cdb_latiss.yaml, | ||
# $SDM_SCHEMAS_DIR/yml/cdb_lsstcomcam.yaml, etc. | ||
# 2. Load the LSST environment and setup felis. | ||
# source loadLSST.bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I have used lsstsw it has directed me to source bin/envconfig, is there a different purpose for this script? feel free to reply outside of your PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone else is probably better positioned to answer this, but my understanding is that there are a few ways to install/load the LSST environment:
- newinstall.sh
- lsstinstall/eups
- lsstsw
The first two initialize with loadLSST.bash
, whereas the third uses envconfig
.
alembic/latiss/versions/53707815663e_change_primary_key_to_day_obs_seq_num.py
Outdated
Show resolved
Hide resolved
@@ -181,6 +182,26 @@ def get_timestamp_columns(self, table: str) -> set[str]: | |||
columns = self.timestamp_columns[table] | |||
return columns | |||
|
|||
def get_schema_version(self, instrument: str) -> Version: | |||
if "day_obs" in self.schemas[instrument].tables[f"cdb_{instrument}.ccdexposure"].columns: | |||
return Version("3.2.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we are on 3.3.0, would we have to change this every time we upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope is that Jeremy will modify felis so that the version number is stored in the database. We can then use a database query to get the version, with maybe something like the above as a fallback when the version number is not present. As we leave behind older schemas with no versioning in the database, this would become unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is he aware of that request/expectation
c9a36a4
to
938e48b
Compare
python/lsst/consdb/pqserver.py
Outdated
@@ -976,6 +1030,7 @@ def schema( | |||
return {c.name: [str(c.type), c.doc] for c in schema.tables[table].columns} | |||
|
|||
|
|||
if __name__ == "__main__": | |||
logger.info(f"{__name__=}") | |||
if __name__ == "__main__" or __name__ == "consdb_pq.pqserver": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this guard even really necessary? Is anyone else going to import this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's imported by pytest. I'm hoping that adding FastAPI dependency injection will properly clean up the situation.
alembic-autogenerate.py
Outdated
|
||
# | ||
# How to use this script: | ||
# 1. Set the SDM_SCHEMAS_DIR environment variable to point to your sdm_schemas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be after step 2, and the traditional way to do this is to setup -r /path/to/sdm_schemas
(or cd /path/to/sdm_schemas; setup -r .
).
It should be after step 2 in case setup felis
also sets up a version of sdm_schemas
.
alembic-autogenerate.py
Outdated
from felis.tests.postgresql import setup_postgres_test_db | ||
|
||
if len(sys.argv) <= 1: | ||
print() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of """
would be better than the multiple print()
functions here.
} | ||
the_schema = "cdb_latiss" | ||
for destination_table in ("ccdexposure", "exposure_flexdata", "visit1_quicklook"): | ||
for column in ("day_obs", "seq_num"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it a lot more efficient to do both of these in the same UPDATE?
else: | ||
return Version("3.1.0") | ||
|
||
def get_day_obs_and_seq_num(self, instrument: str, exposure_id: int) -> tuple[int, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this becomes a performance problem, the day_obs and seq_num are computable from the exposure_id. Although that function shouldn't really be exposed to the users, having it in pqserver wouldn't be a problem.
python/lsst/consdb/pqserver.py
Outdated
with engine.connect() as conn: | ||
for key, value in value_dict.items(): | ||
value_str = str(value) | ||
|
||
values = {"obs_id": obs_id, "key": key, "value": value_str} | ||
if has_multi_column_primary_keys: | ||
day_obs, seq_num = instrument_tables.get_day_obs_and_seq_num(instrument_l, obs_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem good to call this for every key/value pair. Move it outside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that works because there is no guarantee the day_obs and seq_num will be the same for all items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If instrument_l
and obs_id
are constant, day_obs
/seq_num
have to be constant. The only things that are changing are key
and value
.
tests/test_hinfo.py
Outdated
|
||
with setup_postgres_test_db() as instance: | ||
context = DatabaseContext(md, instance.engine) | ||
print(f"{type(instance.engine)=}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally shouldn't have "debugging" print()
calls in the code. But maybe there's a good reason for it, as I see one in test_pqserver.py
that seems to be left in despite a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's just an oversight.
f4571bb
to
c93f653
Compare
alembic-autogenerate.py
Outdated
# source loadLSST.bash | ||
# setup felis | ||
# setup -r /path/to/sdm_schemas | ||
# 2. Set the SDM_SCHEMAS_DIR environment variable to point to your sdm_schemas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done by the setup -r
command above, so it's not necessary as a separate step.
(Oh, and if the already-installed sdm_schemas
is adequate and not a local clone, then setup sdm_schemas
is sufficient.)
command.upgrade(alembic_cfg, "head") | ||
|
||
# Autogenerate a new migration | ||
command.revision(alembic_cfg, autogenerate=True, message=revision_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all the migrations are going to have the same message, then does it make sense to combine all instruments into a single migration?
I guess I can see that operationally we might want to apply them at different times.
Still the question about the foreign (and one unique) keys getting dropped. While I think there's enough indirect constraints on their content, having a direct constraint doesn't hurt, I think. |
alembic/latiss/versions/53707815663e_change_primary_key_to_day_obs_seq_num.py
Outdated
Show resolved
Hide resolved
@@ -181,6 +182,26 @@ def get_timestamp_columns(self, table: str) -> set[str]: | |||
columns = self.timestamp_columns[table] | |||
return columns | |||
|
|||
def get_schema_version(self, instrument: str) -> Version: | |||
if "day_obs" in self.schemas[instrument].tables[f"cdb_{instrument}.ccdexposure"].columns: | |||
return Version("3.2.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is he aware of that request/expectation
alembic/latiss/versions/56077b746de8_add_unique_constraint_for_day_obs_seq_.py
Outdated
Show resolved
Hide resolved
e503953
to
cec5b9a
Compare
This PR is also not as bad as the line counts would suggest, because a lot of that is machine-generated alembic migrations.
Take note, though, that there is handwritten code in the alembic files, at the end of upgrade().
Most of the changes are meant to migrate or interoperate between databases with exposure_id PK and day_obs + seq_num PK.
This goes along with a PR I have ready for lsst/sdm_schemas, not yet submitted.