Skip to content

[internal/hcs] Migrate package from HCS V1 to V2#2735

Open
rawahars wants to merge 5 commits into
microsoft:mainfrom
rawahars:migrate_hcs_v1_v2
Open

[internal/hcs] Migrate package from HCS V1 to V2#2735
rawahars wants to merge 5 commits into
microsoft:mainfrom
rawahars:migrate_hcs_v1_v2

Conversation

@rawahars
Copy link
Copy Markdown
Contributor

Presently, we were using HCS V1 (vmcompute) within internal/hcs but it has been deprecated and therefore, we are moving to use computecore within hcs package.

@rawahars rawahars force-pushed the migrate_hcs_v1_v2 branch 15 times, most recently from 54b7a44 to d3f62f2 Compare May 17, 2026 19:39
@rawahars rawahars force-pushed the migrate_hcs_v1_v2 branch 3 times, most recently from adb6ee4 to c399db9 Compare May 19, 2026 11:24
Presently, we were using HCS V1 (vmcompute) within internal/hcs but it has been deprecated and therefore, we are moving to use computecore within hcs package.

Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
@rawahars rawahars force-pushed the migrate_hcs_v1_v2 branch from c399db9 to e76ebeb Compare May 19, 2026 17:22
@rawahars rawahars marked this pull request as ready for review May 19, 2026 17:23
@rawahars rawahars requested a review from a team as a code owner May 19, 2026 17:23
Copy link
Copy Markdown
Contributor

@helsaawy helsaawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change will cause the shim to always block on all operations, right? do we have any perf numbers on how much impact this will have?

Comment thread internal/hcs/notification.go Outdated
closeOnce sync.Once
exited chan struct{}
closed chan struct{}
raw string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be cleaner to use:

Suggested change
raw string
raw json.RawMessage

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Incorporated the same.

Comment thread internal/hcs/operation.go
// owns the operation handle leads to use-after-free crashes
// (EXCEPTION_ACCESS_VIOLATION) inside computecore.dll. Callers must not
// rely on ctx to bound the call's duration.
func runOperation(ctx context.Context, fn func(op computecore.HcsOperation) error) (resultDoc string, err error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always end up calling processHcsResult on the operation result; ie:

	resultJSON, err := runOperation(ctx, func(op computecore.HcsOperation) error { /* ... */ })
	if err != nil {
		return nil, makeSystemError(computeSystem, operation, err, processHcsResult(ctx, resultJSON))
	}

can we elevate hcsResult to an error type (since it already (sorta) matches the ResultError API type), and then move the processHcsResult logic into run[Process]Operation?
that would reduce a lot of boilerplate

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Incorporated the same.

Comment thread internal/hcs/system.go Outdated
@rawahars rawahars force-pushed the migrate_hcs_v1_v2 branch from 3325c27 to 78d3a26 Compare May 23, 2026 16:42
Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
@rawahars rawahars force-pushed the migrate_hcs_v1_v2 branch from 78d3a26 to 5e7dd9c Compare May 23, 2026 16:47
@rawahars
Copy link
Copy Markdown
Contributor Author

@helsaawy You're right that every call now goes through HcsCreateOperationAPIHcsWaitForOperationResult.

Looking at ClientOperation::HcsWaitResult in computecore, the wait itself is just an event wait, so when HCS completes the work synchronously inside the API call, the wait returns immediately — wall-clock per-operation should be ~identical to V1 (which either completed inline or returned ERR_PENDING and made us wait on a callback anyway).

We are planning on creating a perf calculation tool which can run in CI and can provide us a ballpark of the figures.

rawahars added 3 commits May 23, 2026 23:52
Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
Previously, Close signaled waitBackground via a dedicated
notify.closed channel, and waitBackground had to forward that into
waitBlock with the right waitError. This left two channels doing
the same job and split the "finalize the wait state" logic across
two places.

Now Close finalizes the wait state itself: under closedWaitOnce it
sets waitError = ErrAlreadyClosed and closes waitBlock directly.
waitBackground selects on waitBlock (Close path, return early) or
notify.exited (real exit, parse status and publish). The
notify.closed channel and its signalClosed helper are removed.

Behavior is unchanged: Close still does not synthesize an exit,
so Wait/WaitChannel callers see ErrAlreadyClosed instead of a
fake exit code.

Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
The V2 notification handler only dispatched System/ProcessExited, so a
ServiceDisconnect (no payload) left waitBackground blocked forever and
Wait() hung until Close. V1 returned ErrUnexpectedProcessAbort here.

notificationState now has two channels -- exit (payload) and abort
(error) -- and waitBackground selects on both. ServiceDisconnect routes
to signalAbort(ErrUnexpectedProcessAbort), surfaced via make{System,
Process}Error.

Also stop swallowing HcsSet*Callback failures in registerNotification:
return the error so Create/Open{ComputeSystem,Process} can tear down
the handle (OpenProcess gains the missing cleanup defer).

Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
Comment thread internal/hcs/migration.go
@@ -361,36 +225,33 @@ func (computeSystem *System) FinalizeLiveMigration(ctx context.Context, resume b
}
optionsJSON, err := json.Marshal(hcsschema.MigrationFinalizedOptions{FinalizedOperation: finalOp})
Copy link
Copy Markdown

@Kavya-Bharadwaj Kavya-Bharadwaj May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MigrationFinalizedOptions takes two parameters Origin and FinalizedOperation. HCS uses Origin to evaluate if this is a normal finalization or a rollback after failure. Omitting origin would lead to wrong recovery path. It's safer to set the origin field here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants