0%

ER model

a collection of entities and relationships

Entity

  • an object in the real world that is distinguishable from other objects. ( can be same category)
  • described using a set of attributes.

Entity set is a collection of entities of the same type. All entities in a given entity set have the same attributes ( the values may be different).

Domain: domain of possible values of each attribute in an entity set.

Key

superkey: any set of attributes which can uniquely identify an entity

key (candidate key): a minimal set of attributes whose values can uniquely identify an entity in the set. (should depend on the real life possibility 即使还有entity 加入/删除, 这个key永远identify an entity)

primary key: a candidate key chosen to serve as the key for the entity set

Relationships

an association among two or more entities.

eg. Teach: (John,CENG4567) ,(David, CSCI1234)

relationship set: a set of similar relationships: {(John,CENG4567) ,(David, CSCI1234)}

relationships can also have descriptive attributes

Recursive Relationship: relationship两边entity其实一样

One to many

图片2

Tenary:aggregation把3个变成2个

One to One

similar with one to many, but have two arrows

Key for relationship set
  • Many-many union of the primary keys for E1,…Ek
  • One-many 找many的那个primary key就行。(知道一个孩子的ID就能确定是哪个‘mother-of’)
  • One-one 随便找个primary key就行。
Participation constraint
  • total –each entity in the entity set must participate in at least one relationship. (对应的entity——relation的线加粗)
  • partial- 不需要

Weak entities

strong entity , an entity which has a super key

weak entity 相反

若需要identify一个weak entity: –by considering some of its attributes in conjunction with the primary key of employee (identifying owner).

partial key 例如:教授家属需要identify就需要(教授id,家属名字)

图片3

Class hierarchies

Overlap constraint 两个subclass能否包含相同entity (default:no)

Cover constraint 是否subclass包含了superclass里所有entity ( default:no)

Aggregation

A relationship between a collection of entities and relationships

图片3

Relational Model

main construct: a set of relations

relation consists of relation schema and relation instance

Relation instance

Simply a table with rows and columns

  • row = tuple/record, no two rows are identical
  • column = field/attribute

Relation schema

specifies relation’s name, name of each field(primary key should be underlined), domain of each field

Degree/arity of a relation: number of fields

Cardinality of a relation instance: number of tuples

Relational database : a collection of relations

Instance: a collection of relation instances (one per relation schema).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CREATE TABLE Students
(sid CHAR(20),
name CHAR(30),
login CHAR(20),
age INTEGER,
gpa REAL)

INSERT
INTO Students (sid, name, login, age, gpa)
VALUES (‘53688’, ‘Smith’, ‘smith@ee’, 18, 3.2)

DELETE
FROM Students S
WHERE S.name = ‘Smith’

UPDATE Students S
SET S.age = S.age + 1, S.gpa = S.gpa – 1
WHERE S.sid =53688

Integrity constraints

restricts the data that can be stored in an instance of the database on a database schema

  • Specified when the schema is defined.
  • **Checked **when relations are modified.

Key constraints

Certain minimal subset of the fields (candidate key) of a relation is a unique identifier for a tuple (instance).

设计者可以自己identify a primary key(比如index)

1
2
3
4
5
6
7
8
CREATE TABLE Students
(sid CHAR(20),
name CHAR(30),
login CHAR(20),
age INTEGER,
gpa REAL,
UNIQUE (name, age),//declare a key
CONSTRAINT StudentsKey PRIMARY KEY ( sid)) //StudentKey: Constraint name(opt)

Foreign key constraints

Sometimes, the information stored in a relation is linked to the information stored in another relation.

Enrolled(sid: string, cid: string, grade: string) sid就是Enrolled这个relation的 foreign key and refer to Students

Formally, a foreign key is a set of fields (the primary key of s) in one relation r that is used to “refer” to a tuple in another relation s.

1
2
3
4
5
6
CREATE TABLE Enrolled (
sid CHAR(20),
cid CHAR(20),
grade CHAR(10),
PRIMARY KEY (sid, cid),
FOREIGN KEY (sid) REFERENCES Students)

four options on DELETE and UPDATE.

  • NO ACTION (DEFAULT) rejected
  • CASCADE(级联) 如果被删除,那么相关联的所有row也全被删除
  • SET DEFAULT 如果被删除,那么相关联的所有row设置为DEFAULT
  • SET NULL

ER to Relational

  • entity sets -> tables

  • relationship sets(without constraints) -> tables,attributes包括

    • 每个entity的primary key
    • relationship set的descriptive attributes图片1
  • with constraints

    • arrow: arrow对应的m个entity,因此又m个candidate key,随便选一个作为primary key
  • weak entity

    将weak entity及对应relation 统一为一个table 并且(ON DELETE CASCADE)

  • Class hierarchies

    • different relations (父母孩子都有)

      more general。

    • (仅有孩子)

      not always possible。can be accessible easily

  • Aggregation

Relational Algebra

Baisc Operators

selection $\sigma$

a boolean combination (an expression using logical connectives $\wedge \vee$)

图片1

Projection $\Pi(fileds)$

extract columns from a relation

duplicated row will be eliminated

图片2

Union $U$

must be union-compatible:

  • same number of fields
  • corresponding fields have the same domains
Set Difference $-$

also union-compatible

Intersection $\cap$

also union-compatible

$R\cap S = R-(R-S)$

Rename $\rho$

$\rho (R(F),E) \ or\ \rho(R,E)$

E: arbitrary relation algebra ,F:renaming list, R= E 除了F rename的东西

Cartesian product (cross product) $\times$

图片3

Join $\Join$

Join can be defined as a cross-product followed by selections and sometimes with projections.

  • Join condition $\Join_c =\sigma_c(R\times S)$
  • Equi join c is R.name1 = S.name2, S.name2 will be droped in the final resulting relation
  • Natural join: C 是R S所有fields相同的

图片4

Example

图片5

Division /

图片6

Example

图片7

SQL

图片1

  1. 先计算 relation list的cross product
  2. 考虑是否满足 qualifications
  3. 选取target list里面的attributes
  4. 如果有 DISTINCT, 把duplicated rows删掉(默认不删)

Natural Join

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT  S.sname
FROM Sailors S, Reserves R
WHERE S.sid=R.sid AND R.bid=103
=
SELECT S.sname
FROM Sailors S NATURAL JOIN Reserves R
WHERE R.bid=103

#IN operator
SELECT B.bname
FROM Boats B
WHERE B.color IN (‘red’, ‘blue’, ’green’)

#Nested Queries
Find names of sailors who’ve reserved boat #103:
SELECT S.sname
FROM Sailors S
WHERE S.sid IN (SELECT R.sid
FROM Reserves R
WHERE R.bid=103) #inner function can use the relation in outer function
> ANY : 比某一些大就行

Expressions and string

LIKE for string matching. _ = any one character, % = 0 or more arbitrary characters

Aggregate Operators

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
COUNT([DISTINCT] A)
SUM ( [DISTINCT] A)
AVG ([DISTINCT] A)
MAX (A)
MIN (A) A: any column of a relation
# cannot be netsted! MIN(AVG(A)) is not allowed!
eg.Find the name and age of the oldest sailor
SELECT S.sname, S.age
FROM Sailors S
WHERE S.age =
(SELECT MAX (S2.age)
FROM Sailors S2)
eg.Find the names of sailors who are older than the oldest sailor with a rating of 10.
SELECT S.name
FROM Sailors S
WHERE S.age > ( SELECT MAX (S2.age)
FROM Sailors S2
WHERE S2.rating = 10)

GROUP BY and Having

1
2
3
4
5
6
7
8
9
10
eg Find the age of the youngest sailor for each rating level.
SELECT S.rating, MIN (S.age)
FROM Sailors S
GROUP BY S.rating

SELECT [DISTINCT] target-list STEP 2 //target list = attribute names + terms with ahhregate operations, The attribute names must be a subset of grouping-list.
FROM relation-list STEP 1
WHERE qualification STEP 2
GROUP BY grouping-list STEP3
HAVING group-qualification STEP 4

图片2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT  R.day
FROM Reserves R
GROUP BY R.day
HAVING COUNT(DISTINCT R.sid) = 1

Example: Find the average age of sailors for each rating level that has at least two sailors.
SELECT S.rating, AVG(S.age) AS avgage
FROM Sailor S
GROUP BY S.rating
HAVING COUNT (*) > 1

Example:Find those ratings for which the average age is the minimum over all ratings
SELECT Temp.rating, Temp.avgage
FROM (SELECT S.rating, AVG (S.age) AS avgage
FROM Sailors S
GROUP BY S.rating) AS Temp
WHERE Temp.avgage = (SELECT MIN (Temp.avgage)
FROM Temp)

Division

图片3

CREATE VIEW

图片4

Delete view:DROP VIEW Temp

Outer Join

S LEFT OUTER JOIN R: S 中不对应 R的也会加进去。 只是R的field就是 NULL

FULL OUTER JOIN: both

ENGG5189 FUZZY EXPERT SYSTEM REPORT

Li Wei 1155062148

Background

There are hundreds of programming languages to use now, which make it difficult for programmers to decide which language is the most suitable for them.

Below picture (cited from Quora) also shows that many people, no matter they are experienced programmers or not, have the question:

Which programming language should I learn next?

1519616125

Motivation

Motivated by the wide range of discussion: “ best programming language to learn” and our former hesitant experience when deciding which programming course to enroll, we decided to design a Fuzzy expert system to solve these kinds of problems.

Problem Definition

The users do not have a clear knowledge on different programming language, which make them hard to select the best/ most suitable programming language for them to learn.

System Design

The Fuzzy system is designed for recommending suitable programming languages for users to learn. And rules are provided for inference so that users need to answer some simple programming habits questions, which are used as facts in the system.

Implementation

Fuzzy type
  • Difficulty the expected difficulty of programming language.
  • Time the expected time spent to learn programming language every week.
  • Resource the expected learning resource of programming language.
  • Experience the former programming experience of the user.
  • Efficiency the expected efficiency of programming language.
  • Package the expected package(eg. numpy in python) requirements of programming language.
  • Debug the debugging ability and willing of user.
Objects
  • (conclusion)

    preset values of the result

  • expectation_of_difficulty

  • time_spent_on_learning

  • available_resource

  • experience

  • package_requirement

  • efficiency_requirement

  • upset_when_debug

  • market_requirement

    whether the reason user learns programming is for finding a job.

  • company_orientation

    the expected IT companies the user wants to work for.

  • interest_field

    the expected programming field the user wants to work in.

  • compiled_language_or_not

    whether the user is suitable for compiled language or not.

Rules

Rules in the system is divided into two perspectives: Facts and languages.

For the facts perspective, the rules help to inference the conclusion and other objects such as expectation of difficulty by facts. Here are some examples:

1
2
3
4
5
6
7
8
9
10
11
RULE CODE:difficulty_rule_easy_111
IF time_spent_on_learning is little AND available_resource is little AND experience is little
THEN expectation_of_difficulty is easy
WITH CERTAINTY: 0.9
//help to inference expectation_of_difficulty

RULE CODE:difficulty_rule_1
IF expectation_of_difficulty is easy
THEN programming_language is Python
WITH CERTAINTY: 0.75
//help to inference programming_language

For the languages perspective, the rules help to inference the conclusion by languages, this kind of rule makes use of the key features of different programming language. Here is one example:

1
2
3
4
RULE CODE:js_rule_2
IF expectation_of_difficulty is easy AND efficiency_requirement is high AND interest_field is Web_programming
THEN programming_language is Javascript
WITH CERTAINTY: 0.95
Inference structure

![Untitled Diagram](../images/Untitled Diagram.png)

Results

Given these facts:

  • time_spent_on_learning much (1.0)
  • **available_resource **much (1.0)
  • experience much (1.0)
  • **package_requirement **high (1.0)
  • efficiency_requirement high (1.0)
  • upset_when_debug No (1.0)
  • market_requirement Yes (1.0)
  • company_orientation Apple (1.0)
  • interest_field No_preference (1.0)

The results are:

1519621128(../images/1519621128(1).png)

1519621161(../images/1519621161(1).png)

1519621201(../images/1519621201(1).png)

Discussion and conclusion
Pros and cons

Our system has done a basic programming language recommendation task, which covers almost all possibilities. And the result is really practical for uses.

However, the system can not solve a few specific language requirements well, such as Database programmer and so on.

Further improvements

According to what we mentioned above, the system can be still improved by importing more language-based rules so that the system can handle some specific language requirements better.

Work distribution
Work Liu Boyi Li Wei
Brainstorm the topic yes yes
Conduct survey and collect expert knowledge yes yes
Design the system (Main workload) yes yes
Coding the Fuzzy Set and Object part yes
Coding the Rule part(Main workload) yes yes
Test and debug yes

一个人的命运 固然要靠个人的奋斗 但是也要考虑历史的进程 —呱呱呱

中大四载,明白的最深刻的道理就是上面这条了。

放大来说,先人杜甫,比我们不知道高到哪里去了,在那个年代也会沦落到被几个小毛孩欺负。

从小了来说,我们每个20出头的愣头青,有的人拼死拼活还是在B,有的人混一混就能拿个A leve。姑且先不说是不是学校差别,比如一个来自人大(苏州),一个来自中大。我们就拿中大自己人来说的话,有个深刻的道理:

选课 也是一门学问。

有的时候不是说好好上课努力学习final感觉满分就一定能拿A(说你呢 4430),因此本文是我们用拍死在沙滩上的泪水

献给 年轻的学弟学妹们,愿天堂没有GPA。

注意:本文纯属个人见解,如有冒犯,请联系。

格式:

课程代码,课程名称,课程教授

description

workload

grade

给分好,工作量小(或 难度低)

CCAN1703/CCAN2703 初阶粤语/中阶粤语 Dr LEE Siu Lun

我那个年代(14-15)内地生给分福利。老师上课有趣。

只有一些粤语阅读 和 演讲。

据说 最低A-

ESTR3108 人工智能 Prof Leung

梁老教授快退休了,但是为人和善,也是我最尊敬的几位教授之一,每次大家的weekly presentation 他都听的特别认真,真的是肃然起敬。

ELITE的AI比一般的会多一个DNN的project,不过之前有过经验的话倒是挺简单的。

如果有过DNN经验,final前好好复习(pastpaper!) 一般A level是没问题的。

大一 数学物理

基本都是高中学过的,有一些没学过的也是奥赛(自主招生)会学的。总之基本上大一的数理课是相当简单。

一般都是 weekly assignment + mid term + final

只要别完全划水,A level也是OK的

UGEB2401 天文

讲的东西特别有趣!感觉自己是当代诸葛,上知天文了哈哈哈,小时候很多为什么都是那个时候了解的。

忘了。

有兴趣的话 拿A还算没毛病。

UGEC2020 澳门历史

挺不错的一门课,看得出来老师在试图讲的有趣哈哈哈。

一篇paper + 一次开卷考试

我拿的是A-。。因为感觉基本就写了篇paper就拿了A level,有点受之有愧

IERG3280 / ESTR3302

BY Li Qi:

IE出名的给分好颓课,唯一要注意的是教授会尝试记住每个人,尤其ELITE,所以千万别翘课。他会给每个人照相,然后打印出一张座位相片名字表。。。。。。

homework + mid-term + project

反正我认识的都是A…没选这门课真是气死我了!

给分好,工作量大(或 难度高)

CSCI3100 软件工程

考试和project没有任何关系。。mid-term有点理科的感觉,final完全就是文科考试了,不过和TA拉好关系是有帮助的!

assignment + mid-term + final + project(4人抱团把天赋带到搭网站。。。那段时间连续通顶一个星期)

其实一般CS的major, final 和mid term才是关键。project大家分数差不多。所以final好好复习 A level也是没问题的。

CSCI3130/CSCI3160 Automata theory/ fund of Alg

两门课其实差别挺大的,放一起是因为给分,workload都是类似的。而且两位教授之前在中大是师生关系哈哈哈哈。

正常的weekly assignment + mid term + final,不过挺多人吐槽比较难。(不过相信我,你听到的大部分内地生吐槽难,炸了都是在装!)我反正觉得还OK。

好好学习A- 是有的,A就是可遇不可求了。

ENGG2420C/ESTR2000 复数分析 Prof Pascal

当时为了凑ELITE分数选这门课,同时想着复数应该还有点用。现在特别后悔。。是我上过workload最大!的一门课!

weekly assignment + mid term + final + presentation + paper

emm。。。虽然ELITE后面的都没听懂了,不过final好像考的并不难 基本都会做。所以A-还是可以的。

CSCI4430 Computer network

怎么说呢。。其实我是想把他放在 给分烂 难度低里面的哈哈哈。课挺有意思的,也解答了我很多最开始玩电脑时候的问题,全程顺风顺水,但最后拿个A-还是不满意的。毕竟我难得有一门专业课这么扎实的学啊。

workload一般 主要还是final得考好把。。

我前面都是满分,final估分满分,最后拿个A-。。哎。

给分烂,工作量大(或 难度高)

CSCI3150 Intro to OS Prof.Eric

Welcome to 3150! 也是出名的workload巨大。不过OS这种课本来就基础而且难,所以也怨不了Eric,而且教授讲课也是特别仔细,一个问题会整的明明白白的。不过。。final 考一个一百行代码找一个bug是什么鬼啊!!!

code assignment + mid term(巨难)+final(巨难)

带没啥好说的。CS内地生我只清楚两个拿A level的。

CSCI3180 programming language Prof.Jimmy

Whose bell rings? 这个梗懂得就懂。Jimmy是我印象里对教书最认真负责的一名教授,也难怪拿了两次 best teacher。不过必须要diss一下jimmy的是。。他那个code作业真的是太难了。。。

code assignment + mid term + final

给分已经不重要了,我只知道前面两次code assignment我是生不如死。

以上就是所有印象深刻的课程。如有遗漏/新的见解 欢迎补充~

Full Automatic Micro Calcification Detection in Mammogram Images Using Artificial Neural Network and Gabor Wavelets

Preprocess

remove seeds from images and increase contrast between normal and abnormal brain tissues

  • Histogram equalization
  • Median filter
  • Un sharp mask
  • thresholding
  • Mean filter
Feature extraction

Networks in their Surrounding Contexts

Homophily Test

计算不同category相连edge比例 与 理论比例的大小关系。如果”significantly less than”,那么就叫做同质化。

Conv sppedup

Accurate compu
Approximatation computation
Low-rank
sparse
Activation function (ReLU)
Yuzhe

$min\sum||Y_i-r((A+B)\cdot X_i)||_F^2, st.rank(A)<T_1, ||B||_0<T_2$

A is a low-rank tensor, while B is a sparse tensor

Speaker:

1
`Prof. Xiang Chen`
Low power

power barrier ( battery capacity size)

OLED

  • 不同颜色能耗比不同,相比降低亮度,可以增加红绿的比例(oled)
  • 对不同label的视频(游戏,电影,新闻——。。)可以有不同的处理方式。

GPS:

  • 不持续发送信号

GPU

  • fps: 手机越小,距离越大。需求越小。GGG
    • 距离:用camera(间歇性)+sensor(持续性)

DNN

  • FC ( memory sensitive)
    • sparsity -> cluster?
  • CNN(computational sensitive)
    • layer partition of different nodes(distributed computation)
Mobile security
  • Gesture (velocity pressure area) is different

Continuous Random Variables/Distributions

Exponential

Gamma

Normal

Chi-square

typical type of Gamma, $\alpha = r/2, \theta = 2 = 1/\lambda$

where r is called the degree of freedom

eg: Brown motion: $x = Normal,y = Normal, then, r = \sqrt{x^2 + y^2} = \sqrt{chi-square}$

Pareto distribution

$\alpha$ = power (decay) law

Future
  • Processor
    • many cores
    • heterogeneous(APU)/specialized(AI chips) hardware
  • Disk dead. Locality is still the king.
  • Cluster as a personal supercomputer
Questions
  • Can software system fully optimize for the hw
  • How to decide which hw is suitable for a new sw
  • How to transform a hw into a practical sw use
Challenge 1 -> Q1
  • Parallel alh design
  • HW features
C2 from prototype to production
  • practical / technical constrains
    • hw: power,space,costs
    • sw: maintenance
    • workload: deep learning
C3 millions of lines of legacy code

how to solve: modular code?

一位local女教授,既Jimmy之后第二位不准手机铃声的教授。。。

第一节课是拿一个英文的科普视频 放一段讲一段。。。太真实了吧?。。。

C1 Strong and Weak ties 17

Basic definition

Path
  • A path in a graph is a sequence of nodes with the property that each consecutive pair in the sequence is connected by an edge.
  • A path can have repeat nodes;
  • Path without repeat nodes is a simple path
Cycle

A cycle is a path with at least three edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.

Bridge

the edge joining two nodes if deleting that edge would cause the two nodes in two different components.

Local Bridge

Bridge is rare

the edge joining nodes A and B, if its endpoints A and B have no friends in common.

Span of a local bridge:

the distance between the end points if that edge were deleted.(original:1)

Embeddedness(edge)

Embeddedness of an edge in a network is the number of common neighbors the two endpoints have.

Embeddedness(Local bridge) =0

Structure hole (几个不同contacts的中间人)
  • access information originating in multiple, non-interacting parts of the network.
  • have opportunities for innovations arose from the unexpected synthesis of multiple ideas, each of them on their own perhaps well-known, but well-known in distinct and unrelated bodies of expertise.
  • have an opportunity for a kind of social “gate-keeping”; a source of power in the organization.
Connected

A graph is connected if for every pair of nodes, there is a path between them

Traffic in a Network

For each pair of nodes A and B: 1 unit of traffic “flow” from A to B if A and B belong to same component. the flow divides itself evenly along all the possible shortest paths from A to B

Betweenness
  • Edge: total amount of flow that the edge carries, counting flow between all pairs of nodes using this edge.
  • Node: total amount of flow that the node carries, when a unit of flow between each pair of nodes is divided up evenly over shortest paths
Triadic closure

朋友的朋友更容易成为朋友

cluster coefficient(a)

a的朋友们也互相是朋友的概率

Strong Triadic Closure Property

如果a 和bc关系很好,那么bc会成为朋友。

如果不是(no matter strong or weak),violate the property

Strength of weak ties

neighborhood overlap (edge)

no of nodes who are neighbors of both A and B**/** no of nodes who are neighbors of at least one of A of B.

  • Neighborhood overlap increases with the strength
  • Local level : tie strength increases with neighborhood
  • Global level : weak ties serve to link together different tightly-knit communities ?

Weak ties provide the more crucial(关键) connective structure
• for holding together disparate communities and
• for keeping the global structure of the giant components intact.

Graph patitioning

division

移除最可能连接两大块的(betweenness最大值)

agglomerative(凝聚)

把nodes that are likely to belong to the same region merge起来

Girvan-Newman Method
  • Find the edge of highest betweenness and remove these edges from the graph.
  • Recalculate all betweennesses, again remove the edge or edges of highest betweenness.

C2 Networks in their Surrounding Contexts

Homophily

The tendency of individuals to associate and bond with similar others

Friendship that forms because two people
  • have a common friend (intrinsic network structure, triadic closure)
  • attend the same school or work for the same company (contextual factor)情境因素
Measuring Homophily

Homophily Test: If the fraction of cross-gender edges is significantly less than 2pq(p = fraction of males,q = fraction of females), then there is evidence for homophily.

The edge is heterogeneous(异质) if two end nodes do not share the same characteristic

Mechanisms Underlying Homophily
Selection

select friends with similar characteristic. tends to drive the network towards smaller clusters of like-minded individuals (balkanization)

  • individual characteristics drive the formation of links (邻居)
  • involves immutable(不变) characteristics (determined at birth)
  • 物以类聚。
Socialization and social influence

modify their behaviors to bring them more closely into alignment with the behaviors of their friends.

– reverse of selection
– Involves mutable characteristics
– 近朱者赤, 近墨者黑/lies down with dogs, rises with fleas

Social influence

can produce network-wide uniformity, as new behavior spreads across the links.

Affiliation(联系) Networks 17

Include the contextual factors (the set of activities) into the network. expressed as a bipartite(even)
graph.

  • node: person or focus(activity)
  • edge: (person,focus)

• A graph is bipartite if its nodes can be divided into two sets in such a way that every edge connects a node in one set to a node in the other set

• Two people sharing the same focus → an opportunity to become friends.

Extension:

  • Node: person/focus
  • edge: (person,focus) or (person person)
3 Closure Processes
  • Triadic closure: 朋友的朋友更容易成为朋友
  • Membership closure(social influence): 朋友参加的活动更容易参加
  • Focal closure (selction): 一个活动中的两人更容易成为朋友
  1. – For each k, identify all pairs of nodes who have exactly k friends in common, but who are not directly connected by an edge.
  2. (Difference time with step1 ) T (k) = the fraction of these pairs that have formed an edge.
  • Triadic Closure: HIGH comon friends -> higher fraction
  • Focal closure: high -> not so high

The Schelling Model

How global patterns of spatial segregation can arise from the effect of homophily operating at a local level (Thomas Schelling)

  • Each agent(population) wants to have at least t agents of its own type as neighbors
  • Unsatisfied agents: – fewer than t neighbors of the same type as itself and move to a new cell

homophily draws people together along immutable characteristics

The Schelling model creates a natural tendency for mutable characteristics

C3 Positive and Negative Relationships

clique/complete graph: graph which an edge connecting each pair of nodes

  • Balanced Structure(triangle)
    • Three mutual(相互的) friends
    • A third mutual enemy
  • Unbalanced Structure(triangle)
    • A,和BC 玩的好。BC仇敌。
    • All are enermies

A complete graph is structurally balanced if its everyone triangle is balanced.

Balance Theorem

If a labeled complete graph is balanced, then

  • either all pairs of nodes are friends,
  • or else the nodes can be divided into two groups, X and Y ,
Weak Structural Balance Property:

There is no set of three nodes such that the edges among them consist of exactly two positive edges and one negative edge . ( all are enermy are possible)

Structural Balance in Arbitrary (Non-Complete) Networks

A signed graph is balanced if and only if it contains no cycle with an odd number of negative edges.

Searching for balanced division
  • find the connected components using positive edges. Declare each component to be a supernode.
  • the supernodes form the reduced graph (negative edges only)
  • BFS: edges connect two nodes in the same layer: a cycle with odd number of nodes

Generalizing the Definition of Structural Balance

Approximately Balance Theorem

If at least 99.9% (1-$\epsilon$)of all triangles in a labeled complete graph are balanced, then

  • either there is a set consisting of at least 90%(1-$\epsilon^{1/3}$)of the nodes in which at least 90% of all pairs are friends
  • or else the nodes can be divided into two groups, X and Y ,
    • at least 90% of the pairs in X like each other
    • at least 90% of the pairs in Y like each other, and
    • at least 90% of the pairs with one end in X and the other end in Y are enemies.

C4 Information cascade 17

Reasons why individual might imitate the behavior of others
  • – Informational effect :

    • The behavior of others conveys information about what they know
    • Peer pressure
    • Imitate what others (e.g. idols) are doing
  • Network effect (direct benefit effect) :(C5)

    you incur an explicit benefit when you align behavior with the behavior of others.

Bayes Rule

$P(A|B) = \frac{A\cap B}{B}=\frac{P(A)P(B|A)}{P(B)}$, 一般 ,pa, p(b|a)题目都给了。问题是求pb:

若A好算:$p(b) = P(A)P(B|A)+P(A^c)P(B|A^C)$

A Simple General Cascade Model 17

  • States: G(Good) with probability p, and B with 1-p

  • Accepting select which state

  • Private information

  • Signal H(High) before making decision: accepting is a good idea, L(Low) : bad idea

    P(H|G) = q > 0.5, P(L|G) = 1-q < 0.5

  • Payoff: 0 reject. Vg: accepting a good option Vb: accept a bad option

Suppose one receives a H signal

  • Expected payoff = Vg * P(G|H) + Vb *P(B|H)
  • P(G|H) = $\frac{p(G)p(H|G)}{p(G)p(H|G)+p(B)p(H|B)}= \frac{pq}{pq+(1-p)(1-q)} >p$
Multiple signals

S = a high signals + b low signals

  • posterior probability Pr [G | S] is greater than the prior Pr [G] (p) when a > b;

C5 Network Effects

Externalities(和外部坏境联系的场景 如 交通事故?party)

situation that the welfare of an individual is affected by the actions of other individuals

  • Positive: welfare increases when others are joining
  • Negative: 相反。

Without network effect, only consider Reservation price

the max amount one is willing to pay for one unit of the good. If consumer x (0 ≤ x ≤ 1) has a higher reservation price r(x) than consumer y, then x < y.

**r: strictly decreasing **

p’: production cost of one unit of the good (the minimum price a producer is willing to accept to sell a good)

x’: equilibrium quantity of the good so that r(x’) = p’

The Economy with Network Effects

  • intrinsic interest. r(x). eg. r(1 ) = 0

  • the no. of other people using the good

    • ( z: fraction of people in use).
    • f(z) measures the benefit to each consumer from those who use the good.f(z) is increasing in z, f(0) = 0
  • reservation price of consumer x = r(x)f(z) . here z is the fraction expected by x

  • self-fulfilling expectations equilibrium p’ = r(z)f(z)

Stability, Instability, and Tipping Points

1524728773(1)

  • 0 < z < z’ 买家不想买。“downward pressure” on the consumption of the good : this would push demand downward.
  • z’ < z < z” 买家想买。 “upward pressure” on the consumption of the
    good: this would drive demand upward
  • z” < z < 1 “downward pressure”
  • z’ : critical point/tipping point
A Dynamic View of the Market

z^: fraction of population who buy the product (actual buyer)

r(z^)f(z) = p = >

z^ = $r^{-1}(p/f(z))$ = g(z)

For goods with network effects, the equilibria are typically not social optimal.(社会所有人的总利益)

Being the first to reach this tipping point is more important than being “best”.

Mixing Individual Effects with Population-Level Effects

f(Z) = 1 + az^2

r(X) = 1-x

C6 Power Laws and Rich-Get-Richer Models 17

Central Limit Theorem says that if we take any sequence of small independent random quantities, then in the limit their sum (or average) will be distributed according to the normal distribution.

A function that decreases as k to some fixed power, such as 1/k 2(fraction) is called a power law

Rich-Get-Richer Models

Model the growth of the popularity.

  • Created pages in order : 1, 2, 3, . . . , j.
  • When j is created, With probability p, page j, chooses a page i uniformly at random from among all earlier pages, and creates a link to this page i.
  • With probability 1-p, page j creates a link to the page that i links to. (i also uniformly random distributed)
    • new version:with probability 1 − p, page j chooses a page l with probability
      proportional to l ’s current number of in-links, and creates a link to l.

preferential attachment: : “preferentially” to pages that already have high popularity. (人们更愿意打开已经火了的页面,copy别人的)

No. of in-links of node j = $x_j = p/q[(t/j)^q-1]$ . t: simulated steps. q = 1-p

At time t, we have nodes x1, x2, … xt. The fraction of nodes with at least k links is

$[\frac{q}{p}k +1]^{-1/q}$

f(k) = the fraction of nodes with exactly k links = $1/p [1+\frac{1-p}{p}k]^{-(1+\frac{1}{1-p})}$. when p -0, power law exponent is 2.

The long tail

a small set of items that are enormously popular

Zipf’s Law : the frequency of the j th most common word in English (or most other widespread human languages) is proportional to 1/j.

C6 Cascading behavior in networks 17

Structure of the network and how individuals are influenced by their particular network neighbors (instead of everyone else in C5)

Why an innovation can fail to spread through a population ?
  • tightly-knit(严密) social community;
  • complexity
  • observability observe别人也在用
  • trialability 可试验性。(一步一步慢慢来)
  • compatibility with social system

the set of initial adopters causes a complete cascade at threshold q(大于百分之多少才换).

cluster of density p

cluster of density p = a set of nodes such that each node in the set has at least a p fraction of its network neighbors in the set.

cluster of density 1:

The Relationship between Clusters and Cascades
  • If the remaining network contains a cluster of density greater than 1 − q, then the set of initial adopters will not cause a complete cascade.(cluster 是cascade的阻碍 必要条件)
  • Moreover, whenever a set of initial adopters does not cause a complete cascade with threshold q, the remaining network must contain a cluster of density greater than 1 − q(充分条件)

Extensions of the Basic Cascade Model 17

Heterogeneous Thresholds: each node has a specific payoff and hence threshold

A blocking cluster in the network is a set of nodes for which each node v has more than a 1 − qv fraction of its friends also in the set.

cascade capacity of the network: 有限初始集能造成complete cascade的最大threshold

最大值:1/2

proof number of AB edges is decreasing when switching w from B to A

payoff from choosing A = a’(两边都是A/AB) 则double a
payoff from choosing B = b’
payoff from choosing AB= a’+b’-c

C7 Small World Phenomenon

Watts-Strogatz model

Two type of links:

Homophily(和其他r个小格内点相连的link)and

Weak ties(剩余的k 个远link).

  • d(v, w) :the number of grid steps between nodes v and w
  • Probability of an edge is proportional to d(v, w)^-q (Small q, more long distances edges)

time: at most a2(log n)^2

A Rough Calculation Motivating the Inverse-Square Network

The probability that a random edge from center point v links into any node in this group is approximately independent of the value of d. ( in d - 2d circles)

Rank-Based Friendship

rank(w) = the number of other nodes that are closer to v than w is. (比w 还离v近一点的)

social distance

dist(v, w) = the social distance between nodes v and w = the size of the smallest focus that contains both v and w

A link between each pair of nodes v and w with probability proportional to dist(v,w)^-p

Efficient decentralized search when p = 1. 17

C8 Epidemics

•The basic reproductive number ( R0) = the expected number of new cases of the disease caused by a single individual = p(probability)k(new meet num).

Percolation : static view of the model. edge v-w is open is w is infected by v

SIR . susceptible infectious removed

SIS susceptible infectious back_TO_susceptile

SIRS susceptible infectious 免疫期 back_TO_susceptile

transient contacts — contact networks in which each edge is annotated with the period of time during which it existed

Analysis of Branching Processes

qn = the probability that the epidemic survives for at least n waves

q∗ = qn when n is infinite = the probability that the epidemic persists indefinitely.

Wright-Fisher model

  1. 每一代 N个人
  2. 每个孩子都只有一个家长。一个家长可以有多个孩子也可能没有。

consider the time that k个个体是一个祖先

• Ranking of web pages

Information Retrieval

1960s: Search repositories of newspaper articles, scientific papers, patents, legal abstracts, and other document collections in response to keyword queries.

vote by link 缺点:offtopic, criticism, advertisement

each page有两种score

authorities 权威网站 更新: 所有指向他的page的hub score和

hub 首页 更新 类似

1
2
3
4
5
6
initial are score = 1
select k steps
for i in k:
update auth
update hub
normalize hub and auth

(A-\lamda I)x = 0

\lamda = eigenvalue

x = eigenvector

pagerank

每个node指出k个,其value平均分k,到新的node。

最后把收到的value sum。

Scaling Factor

sum之后乘以一个factor 。 然后把factor多出来的那一部分均匀分给每个人。

Random walks

随便选一个。然后继续选out-going的。和pagerank思路一样。停在哪个page的概率和pagerank概率一样。

  • 有相关text的超链接权重更高。
  • 用户点击数据 更能反映是否相关。

C10 Sponsored Search Market

SPONSORED SEARCH AD:关键词

branding ad :直接投放在网站上

contextual ad: 根据用户特征投放广告

拍卖

  • The Second Price Sealed Auction 由出价最高的买家获得物品,但他只需要支付所有投标者中的第二高价。

click-through rates:(r_i) The number of clicks per hour it will receive

revenue(收入) v_j per click of advertiser j,

ri*vj = benifit

Matching Market

if the search engine knew all the advertisers’ valuations for clicks

For buyer:

$v_{ij}$ = valuation for item i by buyer j

$p_i$ = price announced by the seller

v- p 就是买家收益。

Market clearing prices 所有买家选到最赚的而且不冲突

Constructing the set of market clearing price

price先重置为0

将最多人要买的slot price 加一

重新计算

重新加一

直到可以market clearing

Vickrey-Clarke-Groves (VCG) mechanism

the advertisers’ valuations are not known

算没有 A 后 BC 收益。比如600

有A后BC收益 比如 200

那么A 需要支付 600-200

GSP

bi bids per click

GSP charges a cumulative price of ri bi+1 for slot i.

Nash equilibrium

The pair of strategies (S; T) is a Nash equilibrium if S is a best response to T, and T is a best response to S.

Best response: $P_1(S,T)>= P_1(S’,T)$ T: stratage of 2

STrict: 没有等于。

Dominant strategy: a strategy that is a best response to every strategy of Player 2.

A strictly dominant strategy .