Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need a better definition for artefact. #12

Open
phochste opened this issue Dec 8, 2021 · 2 comments
Open

We need a better definition for artefact. #12

phochste opened this issue Dec 8, 2021 · 2 comments

Comments

@phochste
Copy link
Contributor

phochste commented Dec 8, 2021

Currently in Overview the artefact is defined as:

A web resource such as a file or document that serves as the object of exchange between actors and therefore is the smallest divisable unit on the network.

While this text is clear in our colloquial usage of the term in our discussions, it makes the exact understanding of this term in light of lifecycle events and the possibility of complex object open to interpretation. Even in our internal communications, the artefact sometimes means the PDF file, sometimes the PDF + metadata file, sometimes the landing pages (which is assumed to have the semantics ,e.g. Signposting, that makes clear what is the composition of the complex object artefact).

E.g. when archiving an artefact is it clear what this means for a single File / Bitstream and to a lesser extend the Representation of the artefact. But, in an archival context this becomes a bit of a slippery slope when talking about complex objects.

E.g. in PREMIS meaning the artefact under consideration is something else then the smallest divisable unit on the network. They are talking about an Intellectual Entity: that

[sic] is a distinct intellectual or artistic creation that is considered relevant to a
designated community in the context of digital preservation: for example, a particular book, map,
photograph, database, or hardware or software. An Intellectual Entity can include other
Intellectual Entities; for example, a web site can include a web page and a web page can include
an image
[my emphasis]. An Intellectual Entity may have one or more digital or non-digital Representations.

Ref: https://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf

In context of interaction events (e.g. annotation of artefacts) the object of interaction can be a fragment of what we have in mind as indivisable artefact.

E.g. in Web Annotation the target (what would also be like the artefact in our case):

In particular,
The Target or Body resource may be a specific segment of the resource.
The Target or Body resource may be styled in a specific way.
The Target or Body resource may be a specific state of the resource.
The Target or Body resource may be included in the Annotation to play a specific role.
The Target or Body resource may be any combination of the above.
Ref: https://www.w3.org/TR/annotation-model/

The question: is our definition deliberately made vague to accommodate all these use cases (and corallary is it vague enough in this regard), or do we really have a more formal understanding what an artefact is (and what it is not).

It is quite possible that what an artefact is depends on the use case. If you just get a reference, then it the name of the artefact in some pod that you can dereference. Using dereferencing one can learn more about it:

  • Is it a complex object
  • Is it a versioned object
  • Is it a fragment
  • Is it a particular representation of an object

We are not going to solve the problem of dealing with complex objects, but need a bit clearer what artefact can mean in our specs.

@phochste
Copy link
Contributor Author

phochste commented Dec 8, 2021

I see that PREMIS is deliberately vague too but explains this vagueness:

Event contains the identifier of the Object involved. What is important is that this association is
arbitrary and is not meant to imply that a particular implementation is required. The choice of
semantic unit is down to individual implementations
.

In some cases a semantic unit takes the form of a container that groups a set of related semantic
units. For example, the semantic unit identifier groups the two semantic units identifierType and
identifierValue. The grouped subunits are called semantic components of the container. Some
containers are defined as extension containers, to allow the use of metadata encoded according
to an external schema. This enables PREMIS to be extended with metadata elements that are
more granular, non-core, or otherwise out of scope for the Data Dictionary.

@mielvds
Copy link
Contributor

mielvds commented Dec 21, 2021

The question: is our definition deliberately made vague to accommodate all these use cases (and corallary is it vague enough in this regard), or do we really have a more formal understanding what an artefact is (and what it is not).

Yes, deliberately vangue, and no, I don't think we need a formal understanding beyond 'it need to be identifiable'. The reasoning is that our network should really be able to consider artefacts as black boxes. If not, a decent level of scalable interop will be hard to achieve. And since it doesn't matter to the network what the artefact is, you can use its components for a complex object, a file, a fragment, ... however...

It is quite possible that what an artefact is depends on the use case.

... the use case should probably be more specific about what the possible artefacts are.

If you just get a reference, then it the name of the artefact in some pod that you can dereference. Using dereferencing one can learn more about it:

* Is it a complex object

* Is it a versioned object

* Is it a fragment

* Is it a particular representation of an object

Yep! But then we are moving beyond the scope of this project, or at least this 'generic base'.

We are not going to solve the problem of dealing with complex objects, but need a bit clearer what artefact can mean in our specs.

That's definitely a good idea. We can give concrete examples for the use cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants