DM-46073 Switch consdb to multi-column primary key +1018-126 #36

bbrondel · 2024-09-11T00:32:59Z

This PR is also not as bad as the line counts would suggest, because a lot of that is machine-generated alembic migrations.

Take note, though, that there is handwritten code in the alembic files, at the end of upgrade().

Most of the changes are meant to migrate or interoperate between databases with exposure_id PK and day_obs + seq_num PK.

This goes along with a PR I have ready for lsst/sdm_schemas, not yet submitted.

Vebop · 2024-09-13T03:36:26Z

alembic-autogenerate.py

+#    $SDM_SCHEMAS_DIR/yml/cdb_latiss.yaml,
+#    $SDM_SCHEMAS_DIR/yml/cdb_lsstcomcam.yaml, etc.
+# 2. Load the LSST environment and setup felis.
+#        source loadLSST.bash


When I have used lsstsw it has directed me to source bin/envconfig, is there a different purpose for this script? feel free to reply outside of your PR

Someone else is probably better positioned to answer this, but my understanding is that there are a few ways to install/load the LSST environment:

newinstall.sh

lsstinstall/eups

lsstsw

The first two initialize with loadLSST.bash, whereas the third uses envconfig.

alembic/latiss/versions/53707815663e_change_primary_key_to_day_obs_seq_num.py

Vebop · 2024-09-13T23:05:42Z

python/lsst/consdb/pqserver.py

@@ -181,6 +182,26 @@ def get_timestamp_columns(self, table: str) -> set[str]:
        columns = self.timestamp_columns[table]
        return columns

+    def get_schema_version(self, instrument: str) -> Version:
+        if "day_obs" in self.schemas[instrument].tables[f"cdb_{instrument}.ccdexposure"].columns:
+            return Version("3.2.0")


What happens if we are on 3.3.0, would we have to change this every time we upgrade?

My hope is that Jeremy will modify felis so that the version number is stored in the database. We can then use a database query to get the version, with maybe something like the above as a fallback when the version number is not present. As we leave behind older schemas with no versioning in the database, this would become unnecessary.

Is he aware of that request/expectation

tests/test_hinfo.py

tests/test_pqserver.py

ktlim · 2024-09-11T16:17:25Z

python/lsst/consdb/pqserver.py

@@ -976,6 +1030,7 @@ def schema(
    return {c.name: [str(c.type), c.doc] for c in schema.tables[table].columns}


-if __name__ == "__main__":
+logger.info(f"{__name__=}")
+if __name__ == "__main__" or __name__ == "consdb_pq.pqserver":


Is this guard even really necessary? Is anyone else going to import this?

It's imported by pytest. I'm hoping that adding FastAPI dependency injection will properly clean up the situation.

ktlim · 2024-09-23T02:39:37Z

alembic-autogenerate.py

+
+#
+# How to use this script:
+# 1. Set the SDM_SCHEMAS_DIR environment variable to point to your sdm_schemas


This should be after step 2, and the traditional way to do this is to setup -r /path/to/sdm_schemas (or cd /path/to/sdm_schemas; setup -r .).

It should be after step 2 in case setup felis also sets up a version of sdm_schemas.

alembic-autogenerate.py

ktlim · 2024-09-23T02:41:44Z

alembic-autogenerate.py

+from felis.tests.postgresql import setup_postgres_test_db
+
+if len(sys.argv) <= 1:
+    print()


Use of """ would be better than the multiple print() functions here.

alembic-autogenerate.py

ktlim · 2024-09-23T03:04:42Z

alembic/latiss/versions/53707815663e_change_primary_key_to_day_obs_seq_num.py

+    }
+    the_schema = "cdb_latiss"
+    for destination_table in ("ccdexposure", "exposure_flexdata", "visit1_quicklook"):
+        for column in ("day_obs", "seq_num"):


Isn't it a lot more efficient to do both of these in the same UPDATE?

ktlim · 2024-09-23T03:10:04Z

python/lsst/consdb/pqserver.py

+        else:
+            return Version("3.1.0")
+
+    def get_day_obs_and_seq_num(self, instrument: str, exposure_id: int) -> tuple[int, int]:


If this becomes a performance problem, the day_obs and seq_num are computable from the exposure_id. Although that function shouldn't really be exposed to the users, having it in pqserver wouldn't be a problem.

ktlim · 2024-09-23T03:12:47Z

python/lsst/consdb/pqserver.py

    with engine.connect() as conn:
        for key, value in value_dict.items():
            value_str = str(value)

+            values = {"obs_id": obs_id, "key": key, "value": value_str}
+            if has_multi_column_primary_keys:
+                day_obs, seq_num = instrument_tables.get_day_obs_and_seq_num(instrument_l, obs_id)


Doesn't seem good to call this for every key/value pair. Move it outside the loop?

I don't think that works because there is no guarantee the day_obs and seq_num will be the same for all items.

If instrument_l and obs_id are constant, day_obs/seq_num have to be constant. The only things that are changing are key and value.

python/lsst/consdb/pqserver.py

ktlim · 2024-09-23T03:20:48Z

tests/test_hinfo.py

+
+    with setup_postgres_test_db() as instance:
+        context = DatabaseContext(md, instance.engine)
+        print(f"{type(instance.engine)=}")


We generally shouldn't have "debugging" print() calls in the code. But maybe there's a good reason for it, as I see one in test_pqserver.py that seems to be left in despite a comment.

No, it's just an oversight.

ktlim · 2024-09-25T18:12:12Z

alembic-autogenerate.py

+#        source loadLSST.bash
+#        setup felis
+#        setup -r /path/to/sdm_schemas
+# 2. Set the SDM_SCHEMAS_DIR environment variable to point to your sdm_schemas


This is done by the setup -r command above, so it's not necessary as a separate step.
(Oh, and if the already-installed sdm_schemas is adequate and not a local clone, then setup sdm_schemas is sufficient.)

alembic-autogenerate.py

ktlim · 2024-09-25T18:18:15Z

alembic-autogenerate.py

+        command.upgrade(alembic_cfg, "head")
+
+        # Autogenerate a new migration
+        command.revision(alembic_cfg, autogenerate=True, message=revision_message)


If all the migrations are going to have the same message, then does it make sense to combine all instruments into a single migration?

I guess I can see that operationally we might want to apply them at different times.

python/lsst/consdb/pqserver.py

ktlim · 2024-09-25T18:22:38Z

Still the question about the foreign (and one unique) keys getting dropped. While I think there's enough indirect constraints on their content, having a direct constraint doesn't hurt, I think.

alembic/latiss/versions/53707815663e_change_primary_key_to_day_obs_seq_num.py

Vebop · 2024-09-17T01:38:56Z

python/lsst/consdb/pqserver.py

@@ -181,6 +182,26 @@ def get_timestamp_columns(self, table: str) -> set[str]:
        columns = self.timestamp_columns[table]
        return columns

+    def get_schema_version(self, instrument: str) -> Version:
+        if "day_obs" in self.schemas[instrument].tables[f"cdb_{instrument}.ccdexposure"].columns:
+            return Version("3.2.0")


Is he aware of that request/expectation

alembic/latiss/versions/56077b746de8_add_unique_constraint_for_day_obs_seq_.py

bbrondel requested review from ktlim, JeremyMcCormick and Vebop September 11, 2024 00:33

bbrondel marked this pull request as ready for review September 11, 2024 03:47

Vebop reviewed Sep 13, 2024

View reviewed changes

bbrondel force-pushed the tickets/DM-46073 branch from c9a36a4 to 938e48b Compare September 16, 2024 22:01

ktlim reviewed Sep 23, 2024

View reviewed changes

bbrondel force-pushed the tickets/DM-46073 branch from f4571bb to c93f653 Compare September 25, 2024 16:05

ktlim approved these changes Sep 25, 2024

View reviewed changes

Vebop reviewed Sep 25, 2024

View reviewed changes

ktlim and others added 19 commits September 25, 2024 16:54

Ensure globals are initialized.

f8cfaf5

Don't use __main__.

595f047

Make pytest pass

0b17b40

Change keys to multi-column, add alembic migrations

476fdb7

Expand testing with postgres

48d2b37

Re-generate alembic migrations

36598b4

pgsql for all testing

1a0a432

pgsql for all testing

76bd10b

Remove unused import

3654b86

Add unique constraint in flexdata table

92408a2

Minor bug fix

c56bd62

Remove unneeded imports

ac9623d

SQL files no longer needed since tests use felis

7930699

Fix bug in recursive directory processing

2c42dff

Revise instructions at the top of the script

2149300

Prefer multi-line strings over multiple prints

5d81bf5

Generate list of instruments based on files present in sdm_schemas

eba0c2f

Single SQL query for column update instead of loop

bb85b0a

Remove print statements from tests

9577186

bbrondel added 4 commits September 25, 2024 16:54

Clean up print formatting

c54bb58

Add primary keys and unique constraint for exposure.exposure_id

77441af

Delete step 2 from autogenerate instructions

b05b9fc

Move SQL query outside of loop

cec5b9a

bbrondel force-pushed the tickets/DM-46073 branch from e503953 to cec5b9a Compare September 25, 2024 23:55

bbrondel merged commit fb49ac2 into main Sep 26, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-46073 Switch consdb to multi-column primary key +1018-126 #36

DM-46073 Switch consdb to multi-column primary key +1018-126 #36

bbrondel commented Sep 11, 2024

Vebop Sep 13, 2024

bbrondel Sep 16, 2024

Vebop Sep 13, 2024

bbrondel Sep 16, 2024

Vebop Sep 17, 2024

ktlim Sep 11, 2024

bbrondel Sep 23, 2024 •

edited

Loading

ktlim Sep 23, 2024

ktlim Sep 23, 2024

ktlim Sep 23, 2024

ktlim Sep 23, 2024

ktlim Sep 23, 2024

bbrondel Sep 25, 2024

ktlim Sep 25, 2024

ktlim Sep 23, 2024

bbrondel Sep 23, 2024

ktlim Sep 25, 2024

ktlim Sep 25, 2024

ktlim commented Sep 25, 2024

Vebop Sep 17, 2024

DM-46073 Switch consdb to multi-column primary key +1018-126 #36

DM-46073 Switch consdb to multi-column primary key +1018-126 #36

Conversation

bbrondel commented Sep 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbrondel Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktlim commented Sep 25, 2024

Choose a reason for hiding this comment

bbrondel Sep 23, 2024 •

edited

Loading