Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prune wasm codes proposal #1455

Closed
wants to merge 18 commits into from
Closed

Conversation

pinosu
Copy link
Contributor

@pinosu pinosu commented Jun 16, 2023

Resolves #999

@codecov
Copy link

codecov bot commented Jun 16, 2023

Codecov Report

Merging #1455 (37b872e) into main (1ce86ca) will decrease coverage by 0.02%.
The diff coverage is 54.62%.

❗ Current head 37b872e differs from pull request most recent head e7d4b32. Consider uploading reports for the commit e7d4b32 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1455      +/-   ##
==========================================
- Coverage   57.81%   57.79%   -0.02%     
==========================================
  Files          62       62              
  Lines        8308     8416     +108     
==========================================
+ Hits         4803     4864      +61     
- Misses       3099     3143      +44     
- Partials      406      409       +3     
Impacted Files Coverage Δ
x/wasm/client/cli/gov_tx.go 13.67% <0.00%> (-0.75%) ⬇️
x/wasm/types/wasmer_engine.go 0.00% <ø> (ø)
x/wasm/types/tx.go 46.47% <35.29%> (-0.59%) ⬇️
x/wasm/keeper/msg_server.go 61.11% <63.63%> (+0.13%) ⬆️
x/wasm/keeper/keeper.go 86.61% <88.00%> (+0.41%) ⬆️
x/wasm/types/codec.go 100.00% <100.00%> (ø)

Copy link
Member

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work. 🏄
Unfortunately the task has some more complexity with duplicate code uploads and instances. This was not well documented in the issue. I have added some notes in the review.

I have some feeling that the list of codeIDs is very hard to generate and validate by voters off chain. How about replacing them with a single max codeID that is ensured to be < current lastCodeId sequence. We can then iterate through all codeIDs up to this number and build the list of unused codeIs on proposal execution. This is going to be an expensive operation on Gas but it is executed in an end-blocker.

It would be good to collect some numbers how much gas this would cost and how long this takes for some realistic data like Juno mainnet. But this can be a seconds task

string authority = 1 [ (cosmos_proto.scalar) = "cosmos.AddressString" ];
// CodeIDs references the WASM codes
repeated uint64 code_ids = 2 [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a large number. The proposal submitter needs to provide this list upfront, which can be a lot of work.
It would be nice if there would be a way that this list is generated automatically on execution.

expErr: true,
},
}
for name, spec := range specs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good test cases.
We need to ensure also that no codes are pruned where an instance exists.

k.wasmVM.RemoveCode(info.CodeHash)
k.deleteCodeInfo(ctx, c)
}
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would also need to ensure that no instance exists for a codeID.


info := k.GetCodeInfo(ctx, c)
if info == nil {
return types.ErrNoSuchCodeFn(c).Wrapf("code id %d", c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not abort the operation but skip the number. There is not much we can do in this case.


// then
if spec.expErr {
require.Error(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a return or the code after the else block is executed as well. The else can then be dropped and NoError is on the main path

if info == nil {
return types.ErrNoSuchCodeFn(c).Wrapf("code id %d", c)
}
k.wasmVM.RemoveCode(info.CodeHash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, we are removing code by checksum and not codeID. It is very likely that multiple codeIDs for the same hash exist. We need to ensure that we only delete by a checksum when no instance for any of codeID exists.

@pinosu pinosu marked this pull request as ready for review June 20, 2023 14:33
Copy link
Member

@webmaster128 webmaster128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

string authority = 1 [ (cosmos_proto.scalar) = "cosmos.AddressString" ];
// LatestCodeID references the upper limit until unpinned
// code ids will be pruned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you be explicit here if there upper limit is included or not? "latest code ID" sounds lit it is included but the implementation seems to be <.

I think <= is a bit simpler to reason about.

pinnedCodesHash := make(map[string]struct{}, 0)
unpinnedCodes := make([]uint64, 0)

for c := uint64(1); c < codeID; c++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a minimum here? This can be optional. But then you can clean

First proposal: ID <= 1000
Second proposal 1001 <= ID <= 3000
Third proposal 3001 <= ID <= 5000
Fourth proposal 2001 <= ID <= 4000 (let's say some old codes got unpinned or migrated aways from)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looking at the other problem regarding deletion by hash I am not sure if this suggestion makes sense. We need to look through all code IDs anyways to know which checksum is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 we would aways iterate through all contract-infos and pined codes but when we store the max-code-id of the last run, we could abort early when reached. No need to pass it with the proposal IMHO.
This would be an optimization, let's do it in a second PR

}

k.deleteCodeInfo(ctx, code)
_ = k.wasmVM.RemoveCode(info.CodeHash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to ignore those errors? It typically means a file could not be removed and indicates invalid state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is either abort the whole operation or not. IMHO there is value in removing the code-info data from the chain state even in case of an error here. There may be individual reasons on single machines, like file permissions that must not affect consensus. On the next state sync, the file would not be transferred to new nodes.
Logging the error makes sense IMHO

Copy link
Member

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! There is some non optimal code when looping over the code-id range and using single get-pinned calls. It may not be very intuitive to work with iterators but they are essential for bulk operations. Please see my code example for inspiration:

// iteratePinnedCodes iterates over all pinned code ids in ascending order
func (k Keeper) iteratePinnedCodes(ctx sdk.Context, cb func(codeID uint64) bool) {
	store := prefix.NewStore(ctx.KVStore(k.storeKey), types.PinnedCodeIndexPrefix)
	iter := store.Iterator(nil, nil)
	defer iter.Close()

	l := len(types.PinnedCodeIndexPrefix)
	for ; iter.Valid(); iter.Next() {
		codeID := sdk.BigEndianToUint64(iter.Key()[l:])
		if cb(codeID) {
			return
		}
	}
}

// PruneWasmCodes iterates over all contracts and code-infos to remove unused codes that are smaller than given code id.
func (k Keeper) PruneWasmCodes(ctx sdk.Context, maxCodeID uint64) error {
	usedCodeIDs := make(map[uint64]struct{})

	// used by contracts
	k.IterateContractInfo(ctx, func(_ sdk.AccAddress, info types.ContractInfo) bool {
		if info.CodeID < maxCodeID {
			usedCodeIDs[info.CodeID] = struct{}{}
		}
		return false
	})
	// pinned to cache by gov
	k.iteratePinnedCodes(ctx, func(codeID uint64) bool {
		usedCodeIDs[codeID] = struct{}{} // may add maxCodeID but does not matter as we check later
		return codeID >= maxCodeID       // abort early
	})

	// execute prune
	done := make(map[string]struct{})
	var deleteCounter int
	k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool {
		if codeID >= maxCodeID {
			return true // abort early
		}
		checksum := string(info.CodeHash)
		if _, ok := usedCodeIDs[codeID]; ok {
			done[checksum] = struct{}{}
			return false
		}
		k.deleteCodeInfo(ctx, codeID) // clean up state

		if _, ok := done[checksum]; ok { // do not delete twice on disk
			return false
		}

		done[checksum] = struct{}{}
		if err := k.wasmVM.RemoveCode(info.CodeHash); err != nil {
			k.Logger(ctx).Error("failed to delete wasm file on disk", "checksum", info.CodeHash)
		} else {
			deleteCounter++
		}
		return false
	})
	k.Logger(ctx).Info("deleted unused wasm files", "total", deleteCounter)
	return nil
}

pinnedCodesHash := make(map[string]struct{}, 0)
unpinnedCodes := make([]uint64, 0)

for c := uint64(1); c < codeID; c++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 we would aways iterate through all contract-infos and pined codes but when we store the max-code-id of the last run, we could abort early when reached. No need to pass it with the proposal IMHO.
This would be an optimization, let's do it in a second PR

}

k.deleteCodeInfo(ctx, code)
_ = k.wasmVM.RemoveCode(info.CodeHash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is either abort the whole operation or not. IMHO there is value in removing the code-info data from the chain state even in case of an error here. There may be individual reasons on single machines, like file permissions that must not affect consensus. On the next state sync, the file would not be transferred to new nodes.
Logging the error makes sense IMHO

// PruneWasmCodes deletes code info for unpinned codes <= than latestCodeID
func (k Keeper) PruneWasmCodes(ctx sdk.Context, latestCodeID uint64) error {
usedCodeIds := make(map[uint64]struct{}, 0)
usedChecksums := make(map[string]bool, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove the size parameter here and the line above. We know already that it will be >0. Let's go with stdlib defaults instead


// check if instances are used only by unpinned code ids <= latestCodeID
k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool {
if _, ok := usedCodeIds[codeID]; !ok && codeID <= latestCodeID {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codeID <= latestCodeID is not correct. Just remove this part that we iterate over the whole set of code infos

For example, there can be code ID 1 and 10 for the same checksum. If only code ID 10 has an instance then a prune with max code id 2 would miss the instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I iterate over the whole set of code infos. I need this condition to mark as "false" only the checksums with unpinned code ids <= latestCodeID. This helps because we cannot delete checksums with unpinned code ids > latestCodeID.
These will be marked as "true", so as usedChecksums.. same as if they were pinned.
This also answer to the other comment why we need map[string]bool and it is not enough to have map[string]struct{}

// check if instances are used only by unpinned code ids <= latestCodeID
k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool {
if _, ok := usedCodeIds[codeID]; !ok && codeID <= latestCodeID {
if _, found := usedChecksums[string(info.CodeHash)]; !found {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting solution to store state with the checksum. But is it needed? If you don't set anything here, the result would be the same for if !usedChecksums[string(info.CodeHash)]. Or do I miss something?

// delete all unpinned code ids <= latestCodeID and all the
// instances used only by unpinned code ids <= latestCodeID
k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool {
if _, ok := usedCodeIds[codeID]; !ok && codeID <= latestCodeID {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditions make sense here. I would prefer some better naming for latestCodeID. Maybe maxCodeID? Latest is a moving target for me as there can be new code-infos stored while the gov proposal is in-flight

}
return false
}
usedChecksums[string(info.CodeHash)] = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A struct{} would be a more memory efficient value in usedChecksums

@pinosu
Copy link
Contributor Author

pinosu commented Jun 23, 2023

I checked logs and it fails for the issue we already raised: empty validator set.
Selected randomly generated parameters for simulated genesis: { stake_per_account: "{824677238976}", initially_bonded_validators: "0" }
So we can ignore it

@@ -167,7 +167,7 @@ func (m *MockWasmer) Unpin(checksum wasmvm.Checksum) error {
}

func (m *MockWasmer) RemoveCode(checksum wasmvm.Checksum) error {
if m.UnpinFn == nil {
if m.RemoveCodeFn == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -100,3 +102,26 @@ func BenchmarkCompilation(b *testing.B) {
})
}
}

// Calculate the time it takes to prune about 100k wasm codes
func BenchmarkPruningWasmCodes(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start to get some numbers.
This runs with a mem db, tiny contracts and no duplicate instances. I can imagine disk IO can have some impact here but let's add another DB to compare them

@pinosu pinosu force-pushed the 999-add_prune_wasm_codes_proposal branch from 52de6a2 to 4dfc1f4 Compare June 27, 2023 14:22
@pinosu
Copy link
Contributor Author

pinosu commented Jun 29, 2023

Closing because performance are very bad (See bench tests).

@pinosu pinosu closed this Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gov: add prune wasm codes proposal
3 participants