Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory safety bug in hashtable replace #255

Open
davidchisnall opened this issue Aug 15, 2021 · 7 comments
Open

Memory safety bug in hashtable replace #255

davidchisnall opened this issue Aug 15, 2021 · 7 comments

Comments

@davidchisnall
Copy link
Contributor

The randomness in the hash implementation makes it difficult for me to reproduce this bug reliably.

If I call ucl_object_replace_key replacing property that is a string with another one of the same type, I get an intermittent use-after-free (reported by asan - without asan I get some random data in memory and when I try to call the emit function I get truncated nonsense with an invalid character in the middle).

The random number generator is seeded with the current time and so this reproduces for a few seconds and then goes away. It's probably a good idea to run a fuzzer over this.

It looks as if the replaced key (which I hold other references to, which are dropped later, and which trigger deallocation at the expected point) remains in the hash table, in spite of the new version being added.

@davidchisnall
Copy link
Contributor Author

Addendum: It looks as if the replacement simply does not take place in some cases. I tried to add a work around where I just leaked a reference to the old value. When I do this, the later emit alternates between showing the old and new values, depending on the value of the random seed.

@davidchisnall
Copy link
Contributor Author

It looks as if this is somewhat more complicated. The sequence is:

  1. Three properties in kh_ucl_hash_node_t values fields 0, 1, and 2.
  2. ucl_object_delete_key deletes the one in field 0
  3. ucl_object_replace_key replaces the one in field 2.

At the end of this, iterating over the fields shows the old and new versions of property 3.

This appears to happen only if the deleted property is the first one and another is renamed (removed / inserted at a fixed location). My guess is that the hash table is preserving insertion order by keeping an array of elements and having a hash map that points to them, but it isn't correctly updated in some corner cases.

Unfortunately, I have reached the limit of my ability to debug this code.

@vstakhov
Copy link
Owner

Yes, there is a hash + array to preserve order which makes things complicated. I will take a look shortly, thank you for your investigations!

@vstakhov
Copy link
Owner

Well, the whole deletion code is wrong in fact... I don't have a clear opinion about how to fix it. Khash pointers are not stable, so we cannot store them in an array. We also cannot store array index in ucl_object_t. The only alternative is to rewrite this data structure to probably an intrusive double-linked list.

@vstakhov
Copy link
Owner

This line is totally wrong: https://github.com/vstakhov/libucl/blob/master/src/ucl_hash.c#L515
It address hash table via kh_value using array indicies, which is just not correct. However, there is no way to do this procedure using kv_A function, as array stores ucl_object_t that has no array index stored nor access to the underlying hash element structure. Also, O(N) deletion is not something one would expect from a hash table...

@davidchisnall
Copy link
Contributor Author

O(n) deletion isn't ideal, but deleting keys in a config file is a fairly unusual case, so I'd much rather see O(n) deletion than deletion causing memory corruption.

@vstakhov
Copy link
Owner

Ok, I have updated the structure but hash table would now store pointers to entries not entries + each hash insertion would imply an additional malloc call :( It is very suboptimal so it might be revised at some point. However, I have not found any other way to keep order counting that khash uses open addressing so pointers safety is not an option, so I just cannot use pointers returned from khash in linked lists as they are invalidated on each hash resize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants