Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split repository traits, Preserve metadata hashes #274

Merged
merged 17 commits into from
Feb 13, 2020

Conversation

wellsie1116
Copy link

This series of changes:

  • Splits the Repository trait into a read half (RepositoryProvider) and write half (RepositoryStorage), resolving Split Repository trait into local and remote components #119
  • Moves hash and max length verification out of the repository trait implementations and into a Repository struct that wraps a repository implementation, allowing this crate to enforce those constraints even with externally implemented RepositoryProviders.
  • Preserves the original bytes of metadata long enough to store it locally, allowing hash checks to pass when re-reading locally stored metadata.

Kevin Wells added 6 commits February 3, 2020 10:03
A TUF client has no need to store metadata in a remote repository, and
some repository implementations cannot implement the store_* methods, so
this change splits defines the Repository trait to be a supertrait of
RepositoryProvider (the read half) and RepositoryStorage (the write
half).  Other than requiring clients to import additional traits, this
change should not change any existing functionality.
Since the Repository traits can be implemented outside this crate, it is
not safe to assume that those implementations will verify the maximum
length or hash of metadata/targets before succeeding a fetch request.

In preparation to move the length check and hash verification into the
tuf client, this change splits the SafeReader into 2 types:
 * EnforceMinimumBitrate enforces a minimum transfer rate, currently
   utilized by only the http repository implementation
 * SafeReader, which retains the logic to enforce a maximum file size
   and hash value.
This change also defines an extension trait on AsyncRead to easily wrap
an AsyncRead in these types.

A future change will move hash and length enforcement out of the
Repository implementations.
The current definition the fetch_metadata and store_metadata repository
trait methods require the repository trait implementations to parse the
metadata, and the SignedMetadata structs and DataInterchange traits
cannot provide a guarantee that parsing and re-serializing metadata will
produce byte-for-byte identical metadata.

This behavior is an issue for TUF as during the metadata update
workflow, clients fetch metadata from a remote repository and store it
in the local repository for later use. However, for metadata referenced
by snapshot, the hash of the data must match in order to use the local
version.

This change:
 * modifies the fetch_metadata and store_metadata trait methods to
   interact with AsyncRead instead of SignedMetadata so that the
   original unparsed bytes can be stored as-is in the local repository
   as long as the metadata is determined to be valid.
 * defines the RawSignedMetadata type, which is a simple wrapper around
   raw bytes and contains type information identifying its inner
   Metadata type and serialization format.
 * introduces a new type to wrap an instance of the Repository traits
   and provide an API surface that can store the raw metadata and fetch
   both raw and parsed metadata to simplify interacting with the now
   less ergonomic trait methods.
 * moves the metadata max length checks and hash checks out of the
   repository trait implementations and into the RepositoryClient,
   ensuring invalid metadata provided by external RepositoryProviders
   will not be trusted (for metadata that is is hash-checked).
This change modifies the local/remote parameters for Client's
constructors to be `impl Into<Repository<R, D>>`, which both preserves
the existing behavior where implementations of RepositoryProvider and/or
RepositoryStorage are accepted and allows callers to provide a
Repository directly.

As long as one of the parameters is a Repository instance, there won't
be a need to turbofish or annotate the Client type to include the
DataInterchange parameter.
@erickt erickt self-requested a review February 5, 2020 00:15
src/client.rs Outdated
pub async fn with_trusted_local(config: Config<T>, local: L, remote: R) -> Result<Self> {
pub async fn with_trusted_local(
config: Config<T>,
local: impl Into<Repository<L, D>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is part of the public interface, can you switch to using where clauses? Using impl Into prevents users from turbofish-ing this type, and since this is a fairly generic type, it's possible we might need a turbofish for some complex situations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Repository from the public interface, so these are all back to L and R.

src/client.rs Outdated
let root_path = MetadataPath::from_role(&Role::Root);
let (local, remote) = (local.into(), remote.into());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, I recommend doing the .into() first, since logically the root_path and root_version are more closely related than this (local, remote).

src/client.rs Outdated
@@ -356,6 +371,10 @@ where
)
.await?;

// FIXME why does with_trusted_root_keyids store using the metadata version but
// with_trusted_root_keys use the version from the caller?
let root_version = MetadataVersion::Number(trusted_root.version());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just forgot to copy this case into with_truted_root_keys. We haven't gotten any checks yet that the metadata returned the version we requested (see #253), so the safest thing is to use the metadata version, since the whole system could get rather confused if we stored it with the wrong version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that to with_trusted_root_keys then, thanks for clarifying.

let raw_signed_meta = self
.fetch_raw_metadata(meta_path, version, max_length, hash_data)
.await?;
let signed_meta = raw_signed_meta.parse()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better to move this to the caller, to make it easier to see where the parsing happens?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The match statements in lookup_target_description get rather nasty if there isn't a single method I can call to get both.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a little concern that if we ever have two fetch_metadata calls in the same scope, it'd be a little easier to hash check one metadata, but use the other when trying to update the tuf database. This probably isn't a large problem in practice since we still go through signature verification and the rest of the validation, but it'd be nice to avoid this.

We can defer addressing this for a future patch though.

src/util.rs Outdated
}

/// Wraps an `AsyncRead` to detect and fail transfers slower than a minimum bitrate.
pub(crate) struct EnforceMinimumBitrate<R> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let components = target_path.components();
let resp = self.get(&self.targets_prefix, &components).await?;

let stream = resp
// TODO check content length if known and fail early if the payload is too large.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file a ticket for this, and link to it here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#278 (also added to the TODO)

{
/// Store signed metadata.
/// A readable TUF repository.
pub trait RepositoryProvider {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move the DataInterchange parameter onto the methods? This has the side effect of us not being able to create a Box<dyn RepositoryProvider>.

I suppose this would allow us to interact with a repository that supports interchanges, but I'm not sure if that's worth trying to support just yet, since we (and the tuf spec) hasn't actually figured out how alternative interchanges would work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Repository<EphemeralRepository<Json>, Json> was redundant, and the repository provider is where the metadata and targets are, in whatever form they take. Yeah, it is a bit weird to allow a provider to provide different data interchanges when we don't have a story for what that looks like yet.

I'll plumb the type parameter back through the trait itself.

@@ -105,7 +105,7 @@ where
}

async fn init_server<'a, T>(
remote: &'a mut EphemeralRepository<Json>,
remote: &'a mut Repository<EphemeralRepository, Json>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about just passing in EphemeralRepository and not wrap it in Repository? The only thing that buys us is that we call Repository::check to see if the metadata path matches the role, but I'm not sure if that check actually buys us that much. If a server messed up and wrote the root metadata as timestamp.json, the client would fetch that file then it should fail to deserialize into TimestampMetadata. So really, this check is more for making sure we don't generate a malformed repository, but given how low-level the current implementation is, the whole thing probably needs to be architected (see #232).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. The reason I used Repository in these test binaries was because its method signatures were nicer to use (I could store RawMetadata instead of having to convert them to bytes first).

/// to a specific [`DataInterchange`](crate::interchange::DataInterchange) that will enforce
/// provided length limits and hash checks.
#[derive(Debug, Clone)]
pub struct Repository<R, D> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could make this pub(crate). It's mainly about making sure the client doesn't accidentally access the repositories directly. I'm not sure if external users would need this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. It was a little more convenient to use in the test binaries in this crate, but it isn't too terrible to use the repository implementations directly, and I suppose I could add some similar methods to EphemeralRepository to simplify testing if we find ourselves doing a lot of repository manipulation from external tests.

fn fetch_metadata<'a, M>(
/// A writable TUF repository. Most implementors of this trait should also implement
/// `RepositoryProvider`.
pub trait RepositoryStorage {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I think this should take D as a generic parameter.

@@ -206,10 +207,16 @@ async fn extract_keys(dir: &Path) -> Vec<KeyId> {
let remote = init_remote(dir).unwrap();

let root_path = MetadataPath::from_role(&Role::Root);
let metadata: SignedMetadata<_, RootMetadata> = remote

let mut buf = Vec::new();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is a good example of why I exported Repository.

Kevin Wells added 2 commits February 5, 2020 12:33
@wellsie1116
Copy link
Author

Merged in develop and responded to the comments. Please take another look.

let raw_signed_meta = self
.fetch_raw_metadata(meta_path, version, max_length, hash_data)
.await?;
let signed_meta = raw_signed_meta.parse()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a little concern that if we ever have two fetch_metadata calls in the same scope, it'd be a little easier to hash check one metadata, but use the other when trying to update the tuf database. This probably isn't a large problem in practice since we still go through signature verification and the rest of the validation, but it'd be nice to avoid this.

We can defer addressing this for a future patch though.

@erickt erickt merged commit daa9461 into theupdateframework:develop Feb 13, 2020
@wellsie1116 wellsie1116 deleted the repo-split branch February 13, 2020 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants