You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're using Trino 400 as an ad-hoc engine. We store metadata in hive, create and alter table by Spark, save the material table data in parquet format in HDFS. We have a complex StructType column(Row type in Trino) called c1 here need to add new fields inside it periodically. As I mentioned, we do all alter operations by Spark, and old-partition data without the new fields can still be queried by Spark SQL(just show null for these new fields). But we met HIVE_PARTITION_SCHEMA_MISMATCH error in Trino if my search range includes the old partitions. The error message shows"There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'c1' in table 'db1.tb1' is declared as type 'struct<..., , but partition 'dt=20180207' declared column 'c1' as type 'struct<...
I compare the table schema and partition one, I notice that new added fields are inserted in the randomly middle of c1 rather than the end. And I look over the source code, you guys commented that fields may be added from the end.
But a much more confusing test is, I create another external table called tb1_test with the same location, same meta schema of tb1 above, and do msck statement by Spark, query it by Trino, it works. Also I drop tb1 and recreate it, do msck. Then I use 'CALL system.flush_metadata_cache()' to fresh the metadata cache, and this new table can also be queried by Trino.
I thought there is some kinda of mechanism to flush the metadata caches of the new table?
Can you please explain why the new created table works for the old partitions, and how can I make the old table be queried?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
hi there,
We're using Trino 400 as an ad-hoc engine. We store metadata in hive, create and alter table by Spark, save the material table data in parquet format in HDFS. We have a complex StructType column(Row type in Trino) called c1 here need to add new fields inside it periodically. As I mentioned, we do all alter operations by Spark, and old-partition data without the new fields can still be queried by Spark SQL(just show null for these new fields). But we met HIVE_PARTITION_SCHEMA_MISMATCH error in Trino if my search range includes the old partitions. The error message shows"There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'c1' in table 'db1.tb1' is declared as type 'struct<..., , but partition 'dt=20180207' declared column 'c1' as type 'struct<...
I compare the table schema and partition one, I notice that new added fields are inserted in the randomly middle of c1 rather than the end. And I look over the source code, you guys commented that fields may be added from the end.
But a much more confusing test is, I create another external table called tb1_test with the same location, same meta schema of tb1 above, and do msck statement by Spark, query it by Trino, it works. Also I drop tb1 and recreate it, do msck. Then I use 'CALL system.flush_metadata_cache()' to fresh the metadata cache, and this new table can also be queried by Trino.
I thought there is some kinda of mechanism to flush the metadata caches of the new table?
Can you please explain why the new created table works for the old partitions, and how can I make the old table be queried?
Thanks a lot.
,
Beta Was this translation helpful? Give feedback.
All reactions