0%

写完这个文章 发现自己的文学功底退化的十分明显。但又能怎么办呢,时间把我的才华偷走了。

关于这篇floorplan的那些事

这个work的整个过程真有点哲学意味:
本来想的是投一篇apple做的work,结果apple不给数据,
反而是在那个work中发现了一个子问题自己捣鼓出了这个work;
捣鼓完paper结果因为系统钓鱼执法不能提交了,
其实那时候心里已经想着烂掉了,就这样吧,但不知道为什么还是坚持“骚扰”了chair两三次,最后还是交上去了。
人生哪,永远不知道下一粒巧克力是什么味道,但是如果不去尝一下,那就永远都不知道下一粒没准是提交成功并且accepted的“惊喜”呢。
就像Jose说的

Fantastic. Persistence wins.

尽吾志也而不能至者,可以无悔矣。“可以无悔”重要,“尽吾志也”也很重要。

paper的规划

来CMU前,我定下了一个看起来很完美的五年计划

  • 每年 中 一篇paper
  • 每学期 投 一篇新paper

计划有条不紊的进行,现在也是达标的状态。但我忽然发现一个问题:
如果每年中一篇,但是每学期投一篇,那岂不是意味着最后我会有一半的投稿都跟我的coloring一样去arxiv了?
emmm。。。

方向

认识了一些人、投了这么些paper后,发现方向实在是太重要了。
虽然其实不同方向需要的数学方法都大同小异,但是
发paper、找教职、拉funding看的不是你数学能力、实现能力、思维能力,还得看你的方向match不match。
说到这个我就头大了。
之前说过,感觉自己只是一个天马行空的铺水泥的,缺少一个建筑师该有的大局观。
也许我需要多学习、多思考、多交流、多聆听,才能找到那个方向吧。

三国志14

之前沉迷三国志14的时候,因为对游戏中的数值设定不满,于是自学了CE,然后自己调了一些数值,并且发到了网上。
最开始一两天没有掀起什么波澜,甚至连回复的也没有。我本来挺失落的:第一次游戏编程 就这么结束了,一点水花都没掀起。
结果过了半年多,那个帖子火起来了,下面一堆“大神”、“大佬“的留言差点没让我迷失自我。
甚至还有一个初中生通过手机号加到我的微信,出钱让我帮他改数值,哈哈。
每次疲倦重复每日工作、会议的时候,想到这些,才会有种自己是真的在生活的感觉。真好呐。

我记得

这是我最近单曲循环的一首歌。
本来只是觉得曲子好听,直到听到这句词:

在星空另一端 思念从未停止 如同墓碑上的名字

然后就没法阻止的想到了高武。
想到当年他给我小姨发短信说我心情不好让小姨多照顾我;
想到他瞒着我偷偷给我妈汇报我最近的心情、学习;
想到高三最后两个月和他在被子里一起偷听他喜欢的电台;
想到大学后他在qq上给我的留言:李巍,你最近过的怎么样;
想到知道他去世时候我哭到站不起身;
想到去他家里他爸孤身一人家徒四壁的情境。
也许另一个时空,我仍和他互为挚友,回国时也会去找他把酒言欢。
但是我这个时空他永远的离开了,人世间这么精彩,但是高武已经只剩一个墓碑,永远孤零零的在那大山里了。
不过我相信,就像三体中说的,刻在石头上的信息往往比现代科技存储的更久。
高武的墓碑肯定会一直在那,就像我对他的思念一样。

因为实习太累,所以昨晚睡得很早,到了凌晨迷迷糊糊的反倒睡不着了。既然睡不着,便起床把一些要处理的工作顺手完成,然后吞了一粒褪黑素,继续躺床上,大脑反而越来越清醒。

不行啊,明天可不是周末,还有各种会要开。

本来想找本书看,但是又没带kindle或者什么书来加州。寻思了一会,既然读不了,干脆就随便写写。

人类的寿命一般是75岁左右,所以我的人生也在今天正式走过了三分之一的旅程(如果顺利活到75的话)。三分之一不是一个小数字了,比如什么task的accuracy只有66.7%的话,我大概率会觉得这个算法很水:有三分之一的样本都搞错了,有啥厉害的。

有三分之一的时间都很无趣的过去了,有啥厉害的。

是的,回过头看自己三分之一的人生,好像并没有值得说道的地方。世界上有那么多有意思的事、有意思的地方、有意思的美食等着我去尝试,但是我探索过的太少了。过去的这二十几年,感觉自己一直都活的很仓促。好像都有双无形的手在推着我前行,高中要搞奥赛、高考,大学后面要计划未来,然后MPhil要看paper考托福申请,到了PhD还要看paper写paper,哪怕来到加州也是各种各样的会议、deadline等着我去fulfill。年纪越大,时间仿佛就过的越快,也越来越多的希望自己能让时间停下来:这样我就可以把这个paper写完了/这样我就可以把这个实验跑完了/这样就可以。。。即便是自己的幻想中,我也只是希望自己能有足够的时间去工作,而不是有一点点时间可以停下来享受生活。

时间的长河看不到尽头 像只鱼儿游没办法逆流

这就是一个很多人都熟悉的矛盾点:小时候有时间有精力,却没钱没vision(视野?);大了有钱了知道自己喜欢什么想做什么了,却没时间了。

不过真要说起来,我对这三分之一的人生倒是很知足了。

我有太多受之有愧的恩赐了。这世上有太多在经历生老病死的人、有太多还在为生计发愁的人、有太多没有出过国、没有机会接受高等教育的人,大家不都还在努力的生活着吗。我现在的消愁,跟真正还在泥潭中挣扎的人来比,只能算得上是“为赋新词强说愁”吧。

PS:塞尔达

一个阶段性的总结

HI Neway。hou nai mo giai!现在是加州的晚上11点,不像香港的湿热、湖南的变幻、匹兹堡的阴沉,加州的太阳特别好,每次骑单车出门时都会被那恰到好处的阳光惊艳到,然后有种想立刻出去hiking的冲动。我也算理解为什么那么多人都说加州比匹兹堡好多了。

我想我需要回顾下过去半年、或者来CMU的一年发生了什么。之前说要投两篇paper:最终还是做到了——虽然又做了一次deadline fighter,写作质量什么的也无法顾及。

能力 想做的

其实我对自己的状态是挺不满意的。

大概来说,感觉自己是个“心比天高”却又“力比纸薄”的人物。我有好多想做的,但是自己的能力却又跟不上这些志向。

要学的 想学的 眉毛紧锁 焦虑

人生海海

至少快乐伤心是我自己决定

This is a reading notes for recipe generation papers, whcih is for my 10701 projects.

Generating Personalized Recipes from Historical User Preferences

Input:

  • the tokenized name of a specific dish, (vocabulary embedding)
  • a few (partial) key ingredients (ingredient embedding )
  • calorie level. (caloric-level embedding)

tokenization method: Byte-Pair Encoding (BPE) tokenization

Output

tokenized recipe sequence

Approach

Input -> encoder-decoder -> attend with hiden user features (come from previous recipe ranking of this particular user) -> combine them with a attention fusion layer to jointly determine text generation

Encoder

BiGRU for dish name and ingredients

Projection for calorie level

Decoder ((output: recipe embeeding ($h_t$)))

two-layer GRU, where first layer input $h_0$ is the concatenation of the output from the encoder, for each layer, there is an attention term that calculates an weighted sum of the encoded ingredient feature

Combine with user hitorical review data

Each prior reviewed recipe has a recipe embedding, and is used to calculate the recipe attention

Each recipe has multiple used techniques and we use them to calculate the recipe

Attention Fusion Layer

fuse all contexts calculated at time t, concatenating them with decoder GRU output and previous token embedding:

image-20220417165336643

$o_t$ is the output of decoder, $a_t^i$ is the ingredion features (calculated based on weighted sum of ingredient sum), $a_t^{r_u}$ is the user recipe feature (calculated based on weighted sum of user previous ranked recipe representation)

Final output

Top k sampling means sorting by probability and zero-ing out the probabilities for anything below the k’th token. It appears to improve quality by removing the tail and making it less likely to go off topic

Experiments

Dataset

Here, we restrict to recipes with at least 3 steps, and at least 4 and no more than 20 ingredients.

We discard users with fewer than 4 reviews, giving 180K+ recipes and 700K+ reviews

in our training data, the average recipe length is 117 tokens with a maximum of 256.

Train/Test/Validation

We order reviews by timestamp, keeping the most recent review for each user as the test set, the second most recent for validation, and the remainder for training

Cooking techniques

We manually construct a list of 58 cooking techniques from 384 cooking actions collected by Bosselut et al. (2018b);

Evaluation metric

BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. each word in the output has a maxinum clip value, which is the count in the true labels.

ROUGE-L: Longest Common Subsequence (LCS)[3] based statistics

Personalization (randomly 9 user input + 1 golden user input -> output probability of word (prob of sensente ))

(UMA)—the proportion where the gold user is ranked highest—

Mean Reciprocal Rank (MRR) (Radev et al., 2002) of the gold user

Recipe Level Coherence:

Ranking

input : recipe, user

Output: score (recommendation value) (classification -> soft label [0.2,0.1,0.4,0.1,0.1]) (regression)

Generator

Input: user

Output: recipe for this user

PAC learning try to study the generalization ability for a learning model (given a set of training data, its ability to carry over to new test data)

I recommend for reading this blog post (by the original author)

https://jeremykun.com/2014/01/02/probably-approximately-correct-a-formal-theory-of-learning/

Approximately correct means the error will be small on new samples,

Probably means that if we play the game over and over we’ll usually be able to get a good approximation.

Some Definitions (informal, just for easier understanding)

  • Error of hypothesis $h$ (with respect to the concept (our target, the “golden” solution) $c$ and the distribution $D$ ) is the probability over $X$ drawn via $D$ that $h(X)$ and $c(X)$ differs.

  • concept class: a collection of concepts (functions) over $X$

  • PAC Learnable: A problem is PAC-learnable if there is an algorithm $A$ which will likely generate a hypothesis whose error is small for all distributions $D$ over $X$.

    • More formal one: small error -> error probablity = $\epsilon \in [0,0.5]$; high probability -> $1 - \delta$ : $\delta \in [0,0.5]$. Then, we limit the algorithm to run in time and space polynomial in $1/\epsilon $ and $1/\delta$

      Let X be a set, and \mathsf{C} be a concept class over X. We say that \mathsf{C} is PAC-learnable if there is an algorithm $A(\varepsilon, \delta)$ with access to a query function for \mathsf{C} and runtime $O(\textup{poly}(\frac{1}{\varepsilon}, \frac{1}{\delta}))$, such that for all c \in \mathsf{C}, all distributions D over X, and all inputs $\varepsilon, \delta$ between 0 and 1/2, the probability that A produces a hypothesis h with error at most \varepsilon is at least 1- \delta. In symbols:

      $\displaystyle \textup{P}{D}(\textup{P}{x \sim D}(h(x) \neq c(x)) \leq \varepsilon) \geq 1-\delta$

How to prove a problem/ a concept class is PAC learnable?

big idea:

  1. Propose $A$ with time complexity $O(m)$
  2. Show that $A \propto 1/\epsilon, 1/\delta$ , easy right? :)

Example 1 Interval Problem

The target is an interval $[a_0,b_0]$, each query (unit time cost) will randomly pick up a value $x \in [a_0,b_0]$.

We propose A: pick up $m$ samples, (therefore $O(m)$ time), and pick up the smallest one $a_1$ and the largest one $b_1$ as the predicted interval (hypothesis) $[a_1,b_1]$

We should consider $P(h(x)\neq c(x))$ (this one must be a formulation related with $m$, say $g(m)$), that is, show that if $P_D(g(m) \leq \epsilon) \geq 1- \delta$ , then $m \propto 1/\epsilon, 1/\delta$

$P(h(x)\neq c(x)) = g(m) = \textup{P}{x \sim D}(x \in A) + \textup{P}{x \sim D}(x \in B)$, where $A = [a_0,a_1], B = [b_1,b_0]$

then, by symmetry, if we can prove that $\textup{P}_{x \sim D}(x \in A) \leq 2/\epsilon$ is very probabilitic ($\geq 1-\delta$), then, we are done.

but, how to find the correlation between $\epsilon$ and $m$? The solution here is to define a interval $A’$, where $A’ = [a_0,a’]$ and $P(x\in A’) = 2/\epsilon$.

if $a’ \geq a_1$, we already finish the proof.

Othersiwe, the probability that the sample does not belong to $A$ (remeber that we are considering $\textup{P}{x \sim D}(x \in A) \leq 2/\epsilon$) is smaller than $1- 2/\epsilon$, i.e., $\textup{P}{x \sim D}(x \ not \in A) \leq 1 - 2/\epsilon$

then the probability that all $m$ samples does not locate in $A$ Is: $\leq (1 - 2/\epsilon)^m$

Note that this probability equals to the probability that the error ($h(x) \neq c(x)$) due to the $A$ region is larger than $2/\epsilon$ (the probability that our chosen A contributes error greater than \varepsilon / 2)

Such a proof holds for B region. Since A and B are not intersected, we can conclude that

The probabity that the error is larger than $\epsilon$ is less than $2((1 - 2/\epsilon)^m)$

it means, The probabity that the error is less or equal than $\epsilon$ is larger than $1 - 2((1 - 2/\epsilon)^m)$

We want the probability of a small error is larger than $1-\delta$, that is:

$\displaystyle 2(1 - \varepsilon / 2)^m \leq \delta$

therefore, $m \geq (2 / \varepsilon \log (2 / \delta))$

(which is polynomial in $1/\epsilon, 1/\delta$)Therefore, we finish the proof

Example 2 hypotheses/concepts in a finite hypothesis space

The theorem is:

Let H be a finite hypothesis space. Let D be an arbitrary, fixed unknown probability distribution over X and let c ∗ be an arbitrary unknown target function. For any , δ > 0, if we draw a sample from D of size ![Screen Shot 2022-04-01 at 10.36.58 PM](/figures/Screen Shot 2022-04-01 at 10.36.58 PM.png)

then with probability at least 1 − δ, all hypotheses/concepts in H with error ≥ $\epsilon$ are inconsistent with the sampled data

(with probability at least 1−δ any hypothesis consistent with the sampled data will have error at most $\epsilon$)

Proof:

  1. Consider some specific “bad” hypothesis $h$ whose error is at least $\epsilon$, then the probability that $h$ is consistent with the sampled data is at most $(1-\epsilon)^m$
  2. There is at most $|H|$ bad hypothesis.
  3. the probabilty that there exists a bad hypothesis that is consistent with all sampled data is at most $|H|(1-\epsilon)^m$ (union)
  4. the probabilty at the third step should also be at most $\delta$ by definition
  5. That is, $|H|(1-\epsilon)^m < \delta$
  6. Using the inequality 1 − x ≤$e^{-x}$, we get the equation in the theorem

Relationship between observed error rate and true error rate

In Non-realizable case (more general case), there may not be a “golden solution”, we can only compete with the best function (the function of smallest true error rate) in some concept class H. A natural hope is that picking a concept c with a small observed error rate gives us small true error rate. It is therefore useful to find a relationship between observed error rate for a sample and the true error rate.

Hoeffding Bound

Consider a hypothesis with true error rate p (for example, a coin of bias p) observed on $m$ examples (the coin is flipped m times). Let S be the number of observed errors (the number of heads seen) so S/m is the observed error rate.

Hoeffding bounds state that for any  $\epsilon$ ∈ [0, 1]

![Screen Shot 2022-04-01 at 11.22.38 PM](/figures/Screen Shot 2022-04-01 at 11.22.38 PM.png)

The relationship result by hoeffding bound

Let H be a finite hypothesis space. Let D be an arbitrary, fixed unknown probability distribution over X and let c ∗ be an arbitrary unknown target function. For any $\epsilon$, δ > 0, if we draw a sample S from D of size

image-20220402131238691

then probability at least (1 − δ), all hypotheses h in H have

image-20220402131254790

Proof: given a hypothesis h, by hoeffding boundinhg, we get that the probability that its observed error outside $\epsilon$ of its true error is at most

![Screen Shot 2022-04-02 at 3.54.11 PM](/figures/Screen Shot 2022-04-02 at 3.54.11 PM.png)

(based on hoeffding boudning, and the definition of $m$)

By union bound over all all h in H, we then get the desired result.

Infinite hophothesis set?

union bound is too loose! There may be many overlap between two hypothesises. That is, two hypothesises may label most data points the same.

Some definitions

  • The growth function of $H$ $g_H(m)$ is the maximum number of distinct labelling $H$ (a labelling for a $h \in H$ is just the results induced by $h$ for the given samples) can induce on any set $S$ of $m$ data points. (对于m个sample,最多能产生多少种label)

    • we say $H$ shatters(打碎) $S$ if $|H(S)| = 2^m$ , we assume this is a binary classification problem, i.e., the label is in {-1,1} (有这样一个m个sample,给任意一种label,都存在一个h可以产生这个label)

    • $g_H(m) = 2^m$ if exist S that H shatters it.

      ![Screen Shot 2022-04-02 at 11.54.20 PM](/figures/Screen Shot 2022-04-02 at 11.54.20 PM.png)

    • 2D linear seperator cannot shatter 4 points

    • D dimentional linear seperator can shatter D+1 points but cannot shatter D + 2 points

VC bound (relationship between data size and errors)

image-20220403000015185

If no golden target:

image-20220403011338282

VC dimension (the complexity of the hyphothesis set)

$d_{VC}(H) = $ the largest value of $m$ s.t. $g_H(m) = 2^m$, i.e., the greatest number of data points that can be shattered by $H$

For example, for a N-dimensional linear seperator, its VC dimension is $D+1$

Statistical Learning Style bounds, (fixed data size,relationships between the true error and the data error)

image-20220403011717814

image-20220403011757356

Agnostic case: trainning error is not zero

traning error decreases as VC dim increases (easier to )

the bound increases as VC dim increases (harder to generalize)

image-20220425114624651

Two analysis illustration (VC analysis and bias-variance)

image-20220425162541501

GNNs for attacking

Now, all (at least from my knowledge) GNN-based works focus on attacking, that is, given a locked circuit, using GNNs to attack it (predict the correct key values).

These works can be roughly divided into two categories based on the target locking method.

To attack some logic locking methods, the problem can be formulated as a link prediction task. For example, for the MUX-based locking method, what the attacker is doing is actually trying to determine which edge in the MUX should be preserved.

Here are some papers (although most of them come from the same author)

  • ICCAD21 - UNTANGLE: Unlocking Routing and Logic Obfuscation Using Graph Neural Networks-based Link Prediction

  • DATE22 - MuxLink: Circumventing Learning-Resilient MUX-Locking Using Graph Neural Network-based Link Prediction

GNNs for node classification

  • DATE21- GNNUnlock: Graph Neural Networks-based Oracle-less Unlocking Scheme for Provably Secure Logic Locking
    • (Attack Anti-SAT) predict whether the node (gate) belongs to Anti-SAT (AN, green box) module or original design (DN, gray box)
    • image-20220508133120863
    • (TTLock and SFLL-HD) predict whether the node belongs to 1) orig design (DN) 2) pertub unit (PN) 3) restore unit (PN)
    • image-20220508171929174

A cheatsheet for some common operations I may need to ask Google again and again…

Pytorch

Select rows based on conditions of column values

Basic idea is to use conditions to find all true index, and select the row/column based on index

For example, return rows whose column 1 value is exactly 2:

1
a[:,a[:, 0] == 2]

Another example is (selct rows whose column values are all larger than 0)

1
a[(a > 0).all(1)]

2021年的总结

2021年又是在疫情中度过的一年。因为疫情,自己的大部分生活锁死在了宿舍。即便这样,2021也是我人生最“动荡”的一年。我就好像是一片充满补丁仍昂扬要追求大海的帆船,即便是狂风暴雨,但终究还是过完了这一年:虽然过去的旅程其实只是被风暴推着走的。

2021年是没有成果的一年:paper基本没有进展、学习上也没有学到什么新鲜有意思的东西、即便结婚却时常还是有孤独感。

不过也是很幸运的一年。其实不止是2021,感觉自己一直是比较幸运的人。爸妈给的脑子、大学碰到的同学、913的小伙伴、来美国后的labmate和特别友好的两位导师。我真的挺庆幸自己能活在他们的庇护下,一直都不需要为了生活和琐碎的人际关系发愁。

风格好像有点太阴沉了,积极点说吧!2021年我做到了这些:

  • 成为了名副其实的EDA/VLSI design/VLSI testing 半懂哥,基本上芯片是怎么从RTL到chip的也可以大概跟别人解释清楚啦
  • 再次成为幸运小子,两位导师对我都很好~(damn 好像不是我做到的哈哈哈 实在没什么写的了 还是写下来吧
  • 基本上完全适应美国的生活啦,甚至资本主义的羊毛都要被我薅秃了哈哈哈哈哈
  • 大概率拿到了apple的intern offer啦(其实本来没想申intern的。。就是老板让我去试试fellowship。。然后apple的人可能觉得我不够格但是research interest又挺适合他们intern需求的。。于是就。。。(upd,apple最后还真把fellowship给我了,邮件里对我方向的鼓励、presentation的认可也是很感动啊,一下子就有点报君黄金台上意的努力冲动了。)
  • 虽然paper没进展,但是两个proj都是有条不紊的继续着,好像暂时也没有排斥research 或者 不喜欢自己目前proj的想法,挺好的嘿嘿

然后是今年的计划(雄心勃勃

  • 两个proj第一学期都给我做完!然后第二学期再投一篇!You have to push yourself!不仅为自己,也要为两位这么nice的导师。OK?
  • 有时间就学着剪视频(哈哈哈每年愿望清单都有你)。It is always not a bad thing.
  • 好像没啥了额。。就祝你每天开心、不用压力太大,你已经很棒啦!

盗月社

看盗月社的时候,总会有种暖意。之前不知道为什么,后面明白了:是因为他们本身是温暖的、或者说,他们的视频、故事都是积极阳光的、他们就像太阳一样,会主动温暖身边的人。我想,这也是为什么他们的视频里人们也永远是那么热情友好吧。

是呀,我们也应该做一个这样温暖的人吧。人生其实也只有几十年,赚钱意义不是很大、改变造福世界又不太可能、做一个温暖的人,让自己和周围的人被阳光滋润着似乎是个挺靠谱的人生选择。主动表达自己的感激、永远乐观积极,给周围的人带去温暖而不是阴沉。希望我可以吧~

不过说起来,我其实不知道盗月社的大家是怎么能一直保持阳光的。也许他们背后也会脆弱也需要别人的温暖吧?反正我是这样的。虽然很早就明白人生注定是只属于自己的孤独的旅程,但是总是有很多时刻,我还是需要他人的情感支持。

遗憾

也许人生就是这样,当你选择做一个随风飘流的小船儿时,你注定就会有遗憾。

回顾我的二十四年,遗憾其实还不少。譬如小时候喜欢围棋却没有学习资源没有老师,只能自己靠着对“围”的理解瞎捣鼓;譬如小学自己模仿qq三国画了几十张地图,还设计了完善的装备 剧情 战斗系统 (虽然所有玩法都是基于掷骰子的随机性哈哈哈),但是大学却没有想过(或者说没有勇气?)去找游戏方面的工作,曾经的那些地图也都没有保存下来;再比如小时候挺喜欢画画,曾经对着书上的风景画照着画,被美术老师以“你这肯定是贴着画直接抄的”为理由给了不及格,委屈的哭的要死,也顺带扼杀了我的美术乐趣;在比如大学唯一报名的辩论队,最后还是怂了没去面试。。。

大部分的遗憾其实还是源自自己的短视和懦弱、短视可能受家庭环境教育影响、但是懦弱纯粹就是自己的问题了。这也许也是为什么我看着白鹿原里的白孝文如此“触目惊心”、看明朝那些事那么振聋发聩了。所以我很喜欢那句自己说了七年的那句话:follow your heart。

那么就以这段话作为结尾、也作为对新年的希冀吧!不论要做什么决定,不用担心后果、不用考虑其他有的没得影响,只要记得:

唯愿此心无怨尤