Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: register identity hash roots as existing in shards #138

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Aug 29, 2022

This is only a proposal for the purpose of discussion around filecoin-project/boost#715, if we go this route there'd be more testing needed.

Summary of the problem

Storage providers have pieces stored where the root of the CAR is an identity multihash which is not also stored as an indexable section within the CAR. This is treated as the PayloadCID by clients, legitimately so. This happens as part of UnixFS creation, even lotus import is doing it: https://github.com/filecoin-project/lotus/blob/28722de72dce22c7ef41fd5442ec3fac0f524a9f/lib/unixfs/filestore.go#L37-L40

Then, when retrieving via this PayloadCID, we try to map it to a piece using the normal "which pieces contain this CID" functions afforded by the Dagstore. But because that CID isn't included in a CARv2 index, it's not found, the mapping fails and the retrieval is rejected.

Solutions re Dagstore

One possible solution (there are others being considered, see filecoin-project/boost#715) is to make the Dagstore aware of these roots and get the lookup to successfully map an identity CID root to that payload. We could either:

  1. Add a new property to the inverted index that allows us to explicitly query for roots, which might be a useful feature in general - "which CARs have this CID as a root?"
  2. Including the identity CID in the index for the CAR, as if it were stored as a block, with no distinction.

This PR does option 2. The reason this works is because CARv2's blockstore interface will return identity CID bodies without bothering to look them up regardless of whether they are in the blockstore or not (arguably the right behaviour for any blockstore, maybe not if you want a strict "only if you have it" though): https://github.com/ipld/go-car/blob/1478bbd911efbe3735f3f2e909353c90137a8837/v2/blockstore/readonly.go#L271-L280. Then when asked "which shards have this CID", the Dagstore will return the right answer for root identity CIDs, and then fetching them should also work. So it's not even necessarily a hack: the CAR does have that identity CID, and the blockstore will return it when asked for it. We just lack a bit of explicit information about it being the PayloadCID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant