-
Notifications
You must be signed in to change notification settings - Fork 96
[REEF-1978] Adding Checkpoint handler for IMRU master #1429
base: master
Are you sure you want to change the base?
Conversation
* Adding IMRUCheckpointHandler to handle task state persistent * Added configuration module for IMRUCheckpointHandler * Update IMRUJobDefination to all client to set checkpoint configuration * Add UpdateTaskStateCodec implementation * Update IMRU examples to set checkpoint config and call the check point handler * Update test cases JIRA: [REEF-1978](https://issues.apache.org/jira/browse/REEF-1978) This closes #
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a surface level read:
- Please document all
public
classes, interfaces and theirpublic
members. - Make all new
public
classessealed
. - Check constructor parameters in the constructor instead of in methods to facilitate early failure and concise code.
- Reformat all log lines not to contain
####
and such. Also, consider moving them to higher log levels. - Reduce the number of new
public
classes and interfaces where possible.
@@ -174,5 +176,24 @@ protected virtual IConfiguration BuildMapperFunctionConfig() | |||
GenericType<BroadcastReceiverReduceSenderMapFunction>.Class) | |||
.Build(); | |||
} | |||
|
|||
/// <summary> | |||
/// Build checkpoint configuration. Subclass can override it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to use public abstract
classes as APIs. Consider re-structuring this using composition instead of inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to follow the existing pattern for getting configurations. It was to let client to share the same CreateJobDefinitionBuilder but have its own way to override the configuration. If we really want to change it, it needs to do in different PR as the change must be consistent cross other methods.
/// <summary> | ||
/// Build checkpoint configuration. Subclass can override it. | ||
/// </summary> | ||
protected override IConfiguration BuildCheckpointConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. It is a sample client class, mainly contains driver configurations. I will change the class into internal.
/// </summary> | ||
protected override IConfiguration BuildCheckpointConfig() | ||
{ | ||
var filePath = Path.Combine(Path.GetTempPath(), Guid.NewGuid() + "state.txt"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the temp files generated by REEF, not System.Path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
private void PersistState() | ||
{ | ||
Logger.Log(Level.Info, "$$$$$$$$$$$ State to save: {0}", _taskState.Input[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reformat all log lines and consider moving them to more fine grained log levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
{ | ||
var obj = (UpdateTaskState<int[], int[]>)_stateHandler.Restore(_stateCodec); | ||
|
||
if (obj != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is obj
used? Also, what if it is null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obj is used to update the current state in momer
_taskState.Update(obj);
If null, that means the checkpoint handler is not able to get any old state for whatever reason, then the current state in the memory keeps the same.
/// <returns></returns> | ||
public ITaskState Restore(ICodec<ITaskState> codec) | ||
{ | ||
if (!string.IsNullOrEmpty(_checkpointFilePath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be validated in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not for validation but for backward compatibility. If the client doesn't set it, we will do nothing but return null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it can be moved to constructor.
{ | ||
if (!string.IsNullOrEmpty(_checkpointFilePath)) | ||
{ | ||
var files = _fileSystem.GetChildren(_fileSystem.CreateUriForPath(_checkpointFilePath)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The URI should have been created in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be. Persist can be called many times and restore is called only once for a single recovery. So not much to optimize.
var localLatestFlagfile = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString("N").Substring(0, 4)); | ||
var localLatestStatefile = Path.Combine(Path.GetTempPath() + Guid.NewGuid().ToString("N").Substring(0, 4)); | ||
|
||
_fileSystem.CopyToLocal(latestFlagFile, localLatestFlagfile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copy to local should not be necessary. Can't you just .Open()
the remote file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can read directly from remote. I just want to match with the way the data is written to ensure the sate format.
/// <returns></returns> | ||
public bool GetResult() | ||
{ | ||
if (!string.IsNullOrEmpty(_checkpointFilePath) && _fileSystem.Exists(_resultFileUrl)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checks of readonly attributes should move to the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not really for validation but for backward compatibility. We want to make sure the code is still working if the client doesn't config it.
} | ||
catch (Exception e) | ||
{ | ||
Exceptions.Throw(e, "Unable to deserialize checkpoint configuration", Logger); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the Exceptions
use while you are at it :). Better yet, remove the useless catch
here altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to follow the existing pattern. Yes, the try catch is not necessary.
@markusweimer I have addressed your review comments. |
JIRA: REEF-1978
This closes #