This is a reading notes for recipe generation papers, whcih is for my 10701 projects.

Generating Personalized Recipes from Historical User Preferences

Input:

the tokenized name of a specific dish, (vocabulary embedding)
a few (partial) key ingredients (ingredient embedding )
calorie level. (caloric-level embedding)

tokenization method: Byte-Pair Encoding (BPE) tokenization

Output

tokenized recipe sequence

Approach

Input -> encoder-decoder -> attend with hiden user features (come from previous recipe ranking of this particular user) -> combine them with a attention fusion layer to jointly determine text generation

Encoder

BiGRU for dish name and ingredients

Projection for calorie level

Decoder ((output: recipe embeeding ($h_t$)))

two-layer GRU, where first layer input $h_0$ is the concatenation of the output from the encoder, for each layer, there is an attention term that calculates an weighted sum of the encoded ingredient feature

Combine with user hitorical review data

Each prior reviewed recipe has a recipe embedding, and is used to calculate the recipe attention

Each recipe has multiple used techniques and we use them to calculate the recipe

Attention Fusion Layer

fuse all contexts calculated at time t, concatenating them with decoder GRU output and previous token embedding:

$o_t$ is the output of decoder, $a_t^i$ is the ingredion features (calculated based on weighted sum of ingredient sum), $a_t^{r_u}$ is the user recipe feature (calculated based on weighted sum of user previous ranked recipe representation)

Final output

Top k sampling means sorting by probability and zero-ing out the probabilities for anything below the k’th token. It appears to improve quality by removing the tail and making it less likely to go off topic

Experiments

Dataset

Here, we restrict to recipes with at least 3 steps, and at least 4 and no more than 20 ingredients.

We discard users with fewer than 4 reviews, giving 180K+ recipes and 700K+ reviews

in our training data, the average recipe length is 117 tokens with a maximum of 256.

Train/Test/Validation

We order reviews by timestamp, keeping the most recent review for each user as the test set, the second most recent for validation, and the remainder for training

Cooking techniques

We manually construct a list of 58 cooking techniques from 384 cooking actions collected by Bosselut et al. (2018b);

Evaluation metric

BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. each word in the output has a maxinum clip value, which is the count in the true labels.

ROUGE-L: Longest Common Subsequence (LCS)[3] based statistics

Personalization (randomly 9 user input + 1 golden user input -> output probability of word (prob of sensente ))

(UMA)—the proportion where the gold user is ranked highest—

Mean Reciprocal Rank (MRR) (Radev et al., 2002) of the gold user

NEWAY

Recipe generation