Skip to content

FMEmbedding #1257

Answered by Sherry-XLL
kuzma-long asked this question in General
Apr 18, 2022 · 1 comments · 2 replies
Discussion options

You must be logged in to vote

@kuzma-long 您好!

RecBole 支持四种数据格式,'token'、'token_seq'、'float'、'float_seq'。

feat_type Explanations Examples
token single discrete feature user_id, age
token_seq discrete features sequence review
float single continuous feature rating, timestamp
float_seq continuous feature sequence vector

一个 token 可以直接作为一个 ID 用于特征 embedding,float 是浮点式的具体数据,一般直接用于数值的相关使用,无需 embedding;而 float_seq 通常用于加载预训练好的 embedding 向量。

由于 token_seq 类的数据本身就是以分隔符分割的 token 序列,无论是 token 还是 token_seq,RecBole 都会将字段映射为连续的 ID,可以通过 field2id_tokenfield2token_id 进行 ID 的相互转换:

field2id_token (dict): Dict mapping feature name (str) to a :class:`np.ndarray`, which stores the original token
    of this feature. For ex…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@kuzma-long
Comment options

@Sherry-XLL
Comment options

Answer selected by Sherry-XLL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
2 participants