-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tv_grab_zz_sdjson: improve episode/season metadata handling #222
tv_grab_zz_sdjson: improve episode/season metadata handling #222
Conversation
I just noticed my commit changes the file permissions which I should probably fix before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what the "|| !length($season)" in line 1277 purports to do. It looks like
if we have not seen the season number yet, but we did see the episode number before, we will discard the episode number previously found in favor of whatever we found this time (even if $_season is empty!).
The code seems to work as intended, but I just cannot get my head around the if stmt.
Does the fix work when the metadata contains just an episode number but no series? |
I've seen programs (using EPG data from Channel 5+1) with only the episode number getting extracted correctly. In that case $season = $_season is executed only once and passed down for storing (unless there is also a provider that only contains the Season later on in the array of providers). |
It is hard to decide if the code is correct without agreeing on the requirements. I didn’t even fully formulate a full set of requirements in my mind when I wrote and pushed my first attempt at fix. Some thoughts I had while working it: When there is more than one source of metadata, which do you choose to include in the xml output? Do you ever mix and match info from different providers? If one provider only has the season and one only has the episode do you combine them? What if the providers have conflicting information, which do you use? My initial thinking was to never mix and match info from different providers but instead try to find the most detailed entry, preferring earlier entries with the same level of detail but possibly conflicting info. But if one source only has season and one only has episode which one is “more detailed”? The API for the source data from Schedules Direct states the only mandatory field is season. So I was giving preference to season over episode and that is where the
If the source data follows the spec this should never happen. But maybe it should be considered anyway, especially since it seems @rob-vh has found a case where it happens . Right now I think it will only use an isolated episode number if it was the first metadata entry in the array. So there is probably room for improvement there. But before tweaking the code further I would be interested to hear others thoughts on possibly combining info from multiple providers or other requirements in general. |
Scanning through Channel5 +1 EPG I found an occurrence where Gracenote has only the season but TVMaze has both fields: "metadata":[{"TVmaze":{"season":2,"episode":34,"url":"https://www.tvmaze.com/episodes/2215086/the-adventures-of-paddington-2x34-paddington-feels-the-music"}},{"Gracenote":{"season":2}}] So I would not like to wipe the info collected from TVmaze when Gracenote only contains partial information. Earlier I saw isolated episode numbers with no season, but not in the data I have checked today. |
I realize why I have isolated episode numbers in my data. I also parse the Smm/Enn info from over the air EPG, esp. for sat channels that SD does not cover. Just now I saw Doctor Who mentioned on ONS with E01. |
I probably wouldn't recommend merging different sources. I have seen many instances where Gracenote, Epguides website, IMDb website have different opinions. Althoough, to be fair, it mostly seems to be IMDb which is wrong (Gracenote and Epguides usually agree). |
(In the UK at least) we used to get isolated episode numbers, particularly for things like soaps, or multi-part documentaries. Recently though it seems the soaps are coming through with the year as the season (S/E = year/num-in-year). I assume that is in the Gracenote data, and seems to be something invented by Gracenote. For example, EastEnders tonight is listed on TVGuide as Episode 6843, but I think Gracenote has it as Season 2024 Episode 12 ! TVGuide as Ep. 6843 Many guides use the isolated episode number ("6843") It seems the json feed S/E numbers are invented by Gracenote (or SD?). Especially since they are actually out of sequence: S/E 2024/11 Friday 19 January, 2024 (EP100330186088) Whether Gracenote / SD should be inventing S/E numbers is maybe a discussion for another thread. |
I don't know who invented it, but I have seen the series being the year in more than one guide source, so it seems to have become some new form of normal.
Are they really out of some, arguably valid, sequence? As discussed before, there are numerous orderings to chose from. Some may be original production numbers/order (as assigned during the planning process at the beginning of the series just to have unique numbers to fill in the blanks later), some may be order of start, or completion, of production (some shows may start production earlier than others but take longer to complete than others), some may be scheduled broadcast order, and some may be actual broadcast order (which due to preemption may not be the same as scheduled broadcast order). All of those orders may correct to some of the people some of the time for some of the shows (and I have seen some of all of those values from different sources). |
Spoken like a true academic :-) Anyone watching this serial in that s/e order is going to get very confused though (with dramatic events happening out of sequence) |
I would, however, be great to have consistency in the choice of season number. I use PVR automation logic to record episodes of favorite shows for season > season last completed. So flipping between "sequence season" and "annual season" numbers is going to hurt. |
tv_grab_na_dd uses the Gracenote "flat file" feed as it's data source.
That doesn't provide Season/Episode numbers are we humans understated
it. It passes on a producer reported (and used) episode number that
makes sense only to the series. It calls it the episode system
"onscreen". It also provides a "dd_progid" system to pass on the
Gracenote Episode-id. It doesn't know (nor does it set), the "normal"
xmltv_ns system.
I think a grabber should report all available episode-num options using
different "system" attributes, and report its best guess (based on
knowledge of the data source) in the "xmltv_ns" system.
Because
|
And then there is the order that the episodes SHOULD have been broadcast in (and should be viewed in) except for the stupidity of certain network executives (cough Firefly cough). And when a series gets rebooted, some (but not all) guide sources start with a new series (season) number of 1. Standards are wonderful things, and there are many of them: https://xkcd.com/927/ |
That sounds very sensible. Then it is up to the consuming program to pick what it thinks best. ( I can easily imagine a situation where a source has good data for US programmes but poor for UK ones (in which they are not really interested!). And vice-versa. ) BST (blue-sky thinking): Perhaps the grabber could offer a choice of which source to use for the xmltv_ns system? |
On 1/25/2024 2:34 AM, Geoff wrote:
BST (blue-sky thinking): Perhaps the grabber could offer a choice of
which source to use for the xmltv_ns system?
I'll match your BST and raise...
An external utility script that promotes your favorite xmltv_ns system
from any xmltv file. :)
(personally, I don't think it's necessary, but for those in a situation
with a bad xmltv_ns source, it could be handy)
Robert
|
I am not sure if all of the above helps or just leaves me even more confused... I think everyone pretty much agrees that mixing and matching the data from different providers for the
This sounds reasonable. And it looks like the But it is unclear to me how the SD JSON data should be formatted in the XML. Sample output from
Is this schema defined anywhere? Or does each grabber author just make up whatever they want? It seems strange to have multiple entries with the same
In some ways it seems And if we are just trying to pass on all the keys in the SD JSON data in a uniform way maybe it should be something more like:
I don't really have a strong preference myself on any of this. Maybe this is already defined a bit better somewhere. Ideally the different xmltv grabbers would be somewhat consistent. |
Some random thoughts (which should likely be ignored):
(*) At least for content originated in the US. Outside of the US Gracenote has to depend (to various extents) on the local guide source providers, but, importantly, Gracenote is still responsible for curating the data, and they pay people to do it. |
I agree. There has not been a general call for numbering systems outside of Any other numbering systems (i.e. other 'system' elements/attributes) is simply froth. It MAY be used by a downstream program but I think that would be a niche use case. The focus here should be on populating Only if Gracenote was fully empty should another provider be considered for populating |
a0c8c88
to
a3bd1b4
Compare
Thanks everyone for the feedback. I have been getting more helpful feedback than expected.
That is about the only thing I have been sure of in this discussion. It was so long ago when I wrote that code I don't remember what I was thinking at the time but I would like to think I knew it was dumb / wrong at the time and that it was just a quick temporary way to get the the other parts working and then I forget to fix it. Based on the most recent comments I am also starting to think it is best to keep things simple for now (and probably for a long time) and only populate I just pushed an updated version of my branch which implements the above. |
The program metadata from SD is an array of different providers. The previous code was specifically looking for 'Gracenote' provider in the first array element and ignoring everything else. For example, it seems the first element is now often 'TVmaze' (at least for the lineups I use) and the episode/season data was all being ignored. The code now iterates over all the metadata providers preferring Gracenote but using the first available as a fallback. Fixes: XMLTV#221
a3bd1b4
to
ad15def
Compare
And of course immediately after pushing the update I see a mistake... so I just pushed again. But with my luck there is still something stupid in there. |
@honir @rmeden @garybuhrmaster @kgroeneveld With a view to getting a new release out very soon, are we happy that the latest changes provide the expected data (if available)? |
No news is good news? I have been using the latest version of this PR without any issues on my MythTV system from around the time I pushed it. Although, I must admit I don't pay much attention to the episode / season metadata. I did also do some more specific testing with various hand created test data and everything I tried worked as per my expectations. Unless someone spots a major bug I say we should merge it before the next release. It is definitely wrong in the current release. Maybe I should just merge it myself in a couple days if I don't hear anything. I think I can merge it... I have not made an actual commit / merge since XMLTV was still in CVS! |
Thank you @kgroeneveld ! |
What type of Pull Request is this?
Does this PR close any currently open issues?
#221
Please explain what this PR does
The program metadata from SD is an array of different providers. The previous code was specifically looking for 'Gracenote' provider in the first array element and ignoring everything else. For example, it seems the first element is now often 'TVmaze' (at least for the lineups I use) and the episode/season data was all being ignored.
The code now iterates over all the metadata providers generally preferring the earlier entries unless a later one looks more complete.
Where have you tested these changes?
Operating System:
ubuntu 22.04
Perl Version:
v5.34.0
I have only tested this very quickly so far. I would like to do more testing / review before merging. But I thought I would create the PR now in case @rob-vh who reported the issue wants to test it or is very eager for a fix.