feat: register identity hash roots as existing in shards #138
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is only a proposal for the purpose of discussion around filecoin-project/boost#715, if we go this route there'd be more testing needed.
Summary of the problem
Storage providers have pieces stored where the root of the CAR is an identity multihash which is not also stored as an indexable section within the CAR. This is treated as the PayloadCID by clients, legitimately so. This happens as part of UnixFS creation, even lotus import is doing it: https://github.com/filecoin-project/lotus/blob/28722de72dce22c7ef41fd5442ec3fac0f524a9f/lib/unixfs/filestore.go#L37-L40
Then, when retrieving via this PayloadCID, we try to map it to a piece using the normal "which pieces contain this CID" functions afforded by the Dagstore. But because that CID isn't included in a CARv2 index, it's not found, the mapping fails and the retrieval is rejected.
Solutions re Dagstore
One possible solution (there are others being considered, see filecoin-project/boost#715) is to make the Dagstore aware of these roots and get the lookup to successfully map an identity CID root to that payload. We could either:
This PR does option 2. The reason this works is because CARv2's blockstore interface will return identity CID bodies without bothering to look them up regardless of whether they are in the blockstore or not (arguably the right behaviour for any blockstore, maybe not if you want a strict "only if you have it" though): https://github.com/ipld/go-car/blob/1478bbd911efbe3735f3f2e909353c90137a8837/v2/blockstore/readonly.go#L271-L280. Then when asked "which shards have this CID", the Dagstore will return the right answer for root identity CIDs, and then fetching them should also work. So it's not even necessarily a hack: the CAR does have that identity CID, and the blockstore will return it when asked for it. We just lack a bit of explicit information about it being the PayloadCID.