R Notes

Documentation

Data Manipulation

Data Structure

Basic

  • no 0-dim type: vector of length 1 e.g.
dimension homogeneous heterogeneous
1d atomic vector list
2d matrix data.frame
nd array
type suffix NA check coercion rule
logical is.logical(), is.atomic() 1
integer L NA_integer_ is.integer(), is.numeric(), is.atomic() 2
double NA_real_ is.double(), is.numeric(), is.atomic() 3
string $ NA_character_ is.character(), is.atomic() 4

Factor

  • create:
    f <- factor(c("a","b"), levels=c("a","b","c"))
    
    #> [1] a b 
    
    
    #> Levels:  a b c
    
    
  • view: levels(f)
  • feature:
    • built on top of integer vector
    • contains only predefined values, can’t combine factors: c(factor("a"),factor("b"))

Read/Write

csv

  • read: e.g.
    df <- read.csv(path, header=TRUE, sep=",", quote="\"", na.strings=c("", "NA"), col.names=c(), stringsAsFactors=FALSE)
    • difference: doc
      function delimited hierarchy
      read.table white space origin
      read.csv , wrapper
      read.csv2 ; wrapper
      read.delim \t wrapper
    • performance: doc
      • colClasses: int save 14 times memory than string
      • nrows: even as a mild over-estimate
      • comment.char = “”
  • write: doc
    write.table(df, file=path, sep=",", row.names=FALSE)

rda

  • read: df <- readRDS(path)e.g.
  • write: saveRDS(df, file=path)e.g.

xlsx

require(XLConnect)
wb <- loadWorkbook(path)
df <- readWorksheet(wb, sheet="Sheet1", header=FALSE)
require(XLConnect)
wb <- loadWorkbook(path, create=TRUE)
createSheet(wb, name="f")
writeWorksheet(wb, df, sheet="Sheet1", startRow=1, startCol=1)
saveWorkbook(wb, )

json

  • read: e.g.
    require(rjson)
    j <- fromJSON(file=path, method='C')
  • write:
    require(rjson)
    json <- toJSON(l)
    fileConn <- file(path)
    writeLines(json, fileConn)
    close(fileConn)

File System

work directory

variable

  • view: ls()
  • remove: rm(d)

Create

  • data.frame
    • manually:
      • fix: fix(df) e.g.
      • edit: df <- edit(df) e.g.
    • from list: e.g.
      l <- list(user=character(), id=integer())
      l$user <- c('John', 'Peter')
      l$id <- c(1, 2)
      df <- as.data.frame(l)
    • from vectors: e.g.
      user <- c('John', 'Peter') # character vecter
      id <- c(1, 2)
      df <- data.frame(user, id, stringAsFactors=FALSE) #stringAsFactor
    • from matrix: e.g.
      m <- matrix(c(1,2,3,4), ncol=2, nrow=2, dimnames=list(c('row1', 'row2'), c('var1', 'var2')))
      df <- as.data.frame(m)
  • list
    • from vector: l <- list(user=c("John","Peter"),id=c(1))
  • vector: c(1,2,3) c(1,c(2,3)) always flat
  • array
    • matrix: from vector: e.g.
    m <- matrix(1:20, nrow=5, ncol=4, byrow=TRUE, dimname=list(c("r1","r2")), c("c1","c2"))
    • array: multi-dim e.g.
      array(1:12, c(2,3,2))
      
      #>, , 1
      
      
      #>     [,1] [,2] [,3]
      
      
      #>[1,]    1    3    5
      
      
      #>[2,]    2    4    6   
      
      
      #>, , 2 
      
      
      #>     [,1] [,2] [,3]
      
      
      #>[1,]    7    9   11
      
      
      #>[2,]    8   10   12
      

View

  • general: e.g.
    • print: d
    • name: e.g.
      • create: v <- c(a=1,b=2,c=3) names(v) <- c("a","b","c") v <- setNames(1:3, c("a","b","c"))
      • remove: names(v) <- NULL unname(v)
    • type: diff
      • class(d): oop, element type
      • typeof(d): R memory
      • mode(d): Becker, Chambers & Wilks – S language
    • attribute: arbitrary additional metadata e.g.
      • attr(f, "attr_name")
      • attributes(d) e.g.
      • structure: str(d) best way to discover type
        • data.frame:
          'data.frame': 4 obs. of 2 variables:
          $ var1: num 1 2 3 4
          $ var2: num 4 3 2 1
        • list:
          List of 2
          $ var1: num [1:4] 1 2 3 4
          $ var2: num [1:2] 1 2
        • vector: num [1:4] 1 2 3 4
        • matrix: num [1:4, 1:2] 1 2 3 4 1 2 3 4
  • data.frame:
    • print: View(df)
    • first n rows: head(df, n=10)
    • last n rows: tail(df, n=10)
    • list column names:
      colnames(df)
      names(df)
    • dimension:
      dim(df)
      ncol(df), length(df)
      nrow(df)
  • list:
    • list keys: names(l)
  • vector:
  • matrix:
    • dimension: dim(m)

Index/Recode

  • data.framee.g.
    • select column:
      • [] return list:

        df[1:3] df[c("user","id")]

          user
        1 John
        2 Peter
      • [[]] $ [,] return components of list:

        df$user df[["user"]] df[,"user"] df[,1]

        John Peter
    • exclude column: df[c(-3,-5)] df[!c("user", "id")] df$user <- df$id <- NULL e.g.
    • filter: e.g.
      df <- df[which(df$user=="John" & df$id ==1), ] # which: return data.frame row index

      e.g.

      subset(df, user=="John" & id=1, select=c(user))
    • replace e.g.
      • build-in: df$user[df$id==1] <- "Bob"
      • match: df$user <- df[match(df$user, c("John","Peter"))]
      • plyr:
        library(plyr)
        df$user <- revalue(df$user, c("John"="Bob", "Peter"="David"))
        df$user <- mapvalues(df$user, from=c("John","Peter"), to=c("Bob","David"))
      • cut: df$level <- cut(df$id, breaks=c(-Inf,2,4,Inf), labels=c("low","medium","high"))
    • random sample: df[sample(1:nrow(df), 50, replace=FALSE),] e.g.
  • list: e.g.
    • column: l[[1]] l[["user"]] l$user
  • vector:
    • index: e.g.
      • positive number:
        • v[2] v[c(2,3)]
        • v[c(1,1)]is duplicated values
        • v[c(2.1,3.2)] doubles are truncated to integers
      • negative number:
        • v[-c(1,2)] remove elements
        • v[c(-1,2)] can’t mix positive & negative index
      • logical:
        • v[c(TRUE,TRUE,FALSE)] is v[c(1,2)]
        • v[c(TRUE,FALSE)] recycled to v[c(TRUE,FALSE,TRUE,FALSE...)]
        • v[c(NA)] is NA
      • nothing: v[] is v
      • zero: v[0] is numeric(0)
      • name:
        v <- setNames(v, letters[1:4])
        v[c("a","b")]
    • replace: e.g.
      • build-in: v[v=="beta"] <- "two"
      • plyr:
        library(plyr)
        revalue(v, c("beta"="two", "gamma"="three"))
        mapvalues(v, from=c("beta","gamma"), to=c("two","three"))
    • find index by value: which(v==value) ref
  • matrix: e.g.
    • column: m[,4]
    • row: m[3,]
    • subset: m[2,3] m[2:4,1:3]

Convert

factor2numeric

e.g.

as.numeric(as.character(f))
as.numeric(levels(f))[f]

factor2characer

e.g.

df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE) # convert all columns
i <- sapply(df, is.factor) # convert only factor columns
df[i] <- lapply(df[i], as.character)

data.frame2list (group by)

  • split

list2vector

  • unlist: v <- unlist(l, recursive=TRUE, use.names=FALSE) e.g.

string2vector

strsplite("a,b,c", ",") is c(“a”,”b”,”c”) e.g.

NA

  • check: is.na(d) e.g.
  • edit to missing: df$v1[df$v1==0] <- NA
  • remove missing:
    • na.omit()
    • mean(x, na.rm=TRUE)
    • df[!complete.cases(df),]

Unique/Duplicate

  • data.frame e.g.
    • duplicate by row
      • bool: duplicated(df)
      • filter: df[duplicated(df),]
      • filter w/out repeat: unique(df[duplicated(df),])
      • by column: e.g.
    • unique:
      • all columns: unique(df) df[!duplicated(df),]
      • by column: e.g.
        library(data.table)
        d <- as.data.table(df)
        unique(d, by=c("user", "id"))
  • vector e.g.
    • duplicate
      • bool: duplicated(v)
      • filter: v[duplicated(v)]
      • filter w/out repeat: unique(v[duplicated(v)])
    • unique: unique(v) v[!duplicated(v)]

Sort

  • data.frame
    • by column:

      df <- df[order(df$user, -df$id),] e.g.

      or e.g.

      library(plyr)
      arrange(df, user, id)
    • sort columns:
      • matrix-style: df <- df[, order(colnames(df))] e.g.
      • list-style: df <- df[order(colnames(df))] df <- df[c(3,2,1)] e.g.
    • trick
      • sort by all columns, from left to right: df[do.call(order, as.list(df)),] e.g.
      • reverse sort: e.g.
        • number: -df$id
        • factor: 2number -xtfrm(df$user)
        • character: 2factor2number
  • vector” sort(v, decreasing=TRUE) e.g.
  • matrix

Merge

  • data.frame
    • merge: e.g.
      df3 <- merge(df1, df2, by.x=c("var1", "var2"), by.y=c("var3", "var4"), all.x=TRUE, sort=FALSE) #sort=FALSE
    • cbind: df <- cbind(df, age=age)
    • rbind: df <- rbind(df, df2) e.g.
    • plyr::rbind.fill(): fill missing columns e.g.
  • vector

Count

  • data.frame:
    nrow(df)
    ncol(df)
    dim(df)
  • list: length(l)
  • vector:length(v)
  • matrix: dim(m)

Summarize

  • data.frame e.g.
    • ddply:
      library(plyr)
      dfs <- ddply(df, c("user", "id"), 
                       summarise, 
                       N=sum(!is.na(income)), 
                       mean=mean(income, na.rm=TRUE), 
                       sd=sd(income, na.rm=TRUE), 
                       se=sd/sqrt(N))
    • aggregate: dfs <- aggregate(df["income"], by=df[c("user","id")], FUN=length)

Trick

String

substring

substr("abcdef", 2, 4) is “bcd”

substr("abcdef", 2, 4) <- "ghi" is “aghief” e.g.

concatenate

paste("x", 1:3, sep="")is c(“x1”,”x2”,”x3”) e.g.

grep("B", c("a", "b", "c"), ignore.case=TRUE, fixed=TRUE) is 2 e.g.

replace

sub("B", "D", "abc"), ignore.case=TRUE, fixed=TRUE) is “aDc”

case

  • Uppercase: toupper("abc") e.g.
  • Lowercase: tolower("ABC")

format character

  • new line: \n
  • tab

split by character

Date

e.g.

Function

Recursive

Variable Scope

  • global environment variable: assign('var', value, envir=.GlobalEnv)
  • trace back parent scope variable: var <<- value e.g.

Closure

(http://stackoverflow.com/questions/1169534/writing-functions-in-r-keeping-scoping-in-mind)

Loop

Reshape

apply*

(http://www.statmethods.net/management/controlstructures.html)

Parallel Computing

Memory

e.g.

Graph

Advertisements

JavaScript, HTML

Fields vs. Method Arguments

  1. class fields > constantly passing data as arguments > read data from DOM elements [code]
  2. DON’T use fields: Two ‘threads’ using the object at the same time will confuse each other (e.g. ‘static variable’).  [source] SOLUTION: Create an abstract class with a method to create only one instance whenever needed.

Style Guide 

(google, idiomatic)

  1. Function:
    1. don’t declare function within blocks [code]
  2. Property:
    1. initialize properties only in the constructor for better performance [code]
    2. don’t declare implicit variable in block scope [code]
    3. access prefer object.property over object[“property”] [code]
    4. delete prefer object.property = null over delete object.property for better performance [code]
    5. replace array with object [source]
  3. Format:
    1. prefer ‘ over ” [source]
    2. use JSDoc for comments [source]

Programmer Notes

  • Have a blog, even if no one reads it.
  • Code review for each day.
  • Reading books is not high efficiency.

CSE534 Project1: Ethernet LAN Simulation

report

CSE523 Project2: Group Purchase Design

Environment Configuration: Linux+XSB Prolog

Instructions:

  1. Deductive databases in XSB Prolog CSE532 project 2 (file: tangerOutlet.pl submitted in Blackboard) = 50 points
    Due: 04/09/2012 (8:00AM EST)Write a predicate buy(Stores,Links,MinCustomers) that determines the minimum number of customers required to buy all the items on sale in an outlet with a given list of Stores and Links connecting them. The customers will buy all sales in the Stores one by one, and travel between stores along the available Links. A Store is a term store(Name,Capacity,Bankrupt,Waiting), where: Name is a unique name, Capacity is the number of customers required to buy all the sales in the store, Bankrupt is the number of customers bankrupt after buying all sales in the store, and Waiting are the number of customers that remain in the store to wait for more sales. Any remaining customers after a store sale that are not Waiting or Bankrupt, can move on to shop in the next store. A link connects two stores, and is represented by a term link(Namel,Name2) (you can assume that the Stores and Links form a connected graph). In your plan, you are free to start at any store, but you should travel at most once in either direction along a link.For example:|store(walm,5,5,5)|——|store(jcp,5,1,1)|——|store(tommy,10,5,5)|?-buy([store(walm,5,5,5),store(tommy,10,5,5),store(jcp,5,1,1)],[link(walm,jcp),link(tommy,jcp)],MinCustomers]).
    MinCustomers = 17Grading scheme (50 points): we will run in XSB Prolog 5 queries (with hidden test cases) with 10 points per query. If the query returns the right answer, then you will get 10 points, otherwise, you will get 0 points. The hidden test cases and the correct answers will be published after the project submission deadline.

Algorithm:

Find all paths -> Hamilton path -> Minimal Hamilton path

Hamiltonian path is a path in an undirected graph that visits each vertex exactly once.

Note:

I Datalog

1. basic syntax:

.    rule: R(t1, …, tn) :- Q1(s11, …, s1m), Q2(s21, …, s2m).

.    query: ?- R(t1, …, tn)

2. union: R(t1, …, tn) :- Q1(s11, …, s1m); Q2(s21, …, s2m).

3. recursion: R(t1, …, tn) :- Q(s1, …, sm), R(t1, …, tn) 

4. positive rule & negative rule: (semantics algorithms)

.    div: find all tuples in one relation that join with every tuple in another relation

Q: “find all students who took a course from every professor”

Answer(Sid) :- Student(Sid, Name, Addr),

.                          not DidNotTakeAnyCourseFromSomeProf(Sid).

not DidNotTakeAnyCourseFromSomeProf(Sid) :- Proffesor(Pid, Pname, Dept),

.                                                                                         Student(Sid, Name, Addr),

.                                                                                         not HasTaught(Pid, Sid).

HasTaught(Pid, Sid) :- Teaching(Pid, Crs, Sem),

.                                         Transcript(Sid, Crs, Sem, Grd).

?- Answer(Sid).

.    negative circle(well-defined/ill-defined): dependency graph(node: relation name, arc: P :- (not)QQ—>R/Q-=->R) ~ stratification

II XSB Prolog [tangerOutlet.P/pl]

1. Basic Syntax:

.    Variable: capital letter/_+alphanumberic

.    Relation name, Atom: lowercase letter+alphanumberic/_

.        nameless variable: store(Name, _, _, _),

.        variable name is not important, occurrence refer to the same variable

2. support Datalog: :- auto table

3. data controll:

:- dynamic store/4. //define dynamic variable

asserta(store(Name, Capacity, Bankrupt, Waiting)), //malloc

retract(store(Name, Capacity, Bankrupt, Waiting)), //delete

retractall(store(N, C, B, W)), //delete all tuples

4. negative rule: tnot(remain_sign(X)),√  not×

5. calculation: is√  =× Loose is Bankrupt + Waiting.

6. List: [A|Path1] binary tree structure

7. process control: 

.    cut: change_remain(Y, Remain_Y1),!

.    fail: asserta(nodelist(NodeList)), fail. 

.    if then else: (Capacity_Y < Remain_X) -> ;

8. output: write(X): output writeln(X): output+nl nl:nl

9. comment: /**/ //

10. debug: | ?-trace. tree structure analysis [manual1.pdf]

.    call: callinto

.    exit: exit successfully

.    fail:  exit unsuccessfully

.    redo: search for another tuple

CSE532 Project3: Social Network

Environment Configuration: Altova XMLSpy

You will design and implement an XML database system to support the following operations of a simplified social networking Web site and you will use HTML for the user interface. Users, posts, etc. will now each be represented by separate 1-column DB2 tables whose columns will have the type XML. Each particular user, post, etc. will be represented by a separate XML document (and each such document will be a tuple in an appropriate table). XQuery can create the entire output in the desired HTML format, so you do not need iterators. Instead, you can just extract the HTML snippet created by XQuery into a string and send it to JSP.

Since, in our design, data is split into several independent XML documents, we cannot specify any interesting schematic aspects, like foreign keys, for these data snippets. Instead, after you are done with the database, pull all the pieces of data
together in one consolidated XML document (you might need to add additional tags to group users, walls, etc., together). Then create the appropriate schema (keys, keyrefs, and all the types) and validate your document against the schema using XML Spy: http://www.xmlspy.com

Implement and test the following queries:

1. For each user list the posts he has on the wall and all replies to these posts (including replies to replies – represent 2 levels down). Make the output clear, but not fancy. Something like:
* John Doe has the posts:
– “A photo with a day in my life”
reply by Mary John: “Is that you?”
reply to Mary John’s post by John Doe: “Of course, I was fishing”
reply to Mary John’s post by Mary John: “Did you catch anything?”
reply to Mary John’s post by John Doe: “Only a cold”
reply by Anne Chen: “Excellent photo”
reply by Bill Pop: “Lets go fishing again next week”
– “Got married”
reply by Mary John: “Congratulations”

2. For each user list the posts he liked and the other users who also liked that post.
* John Doe likes the posts:
– “I like potatoes” on Anne Chen’s wall, also liked by: Mary Doe and Joe Public.
– “Photo with Grand Canyon” on Bob Gia’s wall, also liked by: Joe Public and Bill Pop.

3. Find the user with the highest number of friends.

4. List all the connections up to three levels down of a user: his friends, his friends’ friends and their friends.

5. List all friends of a user with their number of friends.

6. Construct the mailbox of a given user: list all emails with Message ID, Date, Subject, Sender and Receivers, and the number of replies to each message.

7. Produce a summary listing of revenue generated by a particular user (you need only to implement a query that lists the set of products bought by a user and their total).

Notes:

I. XML Schema:

1. Namespace

[Bookface.xsd] describe structure

<schema <!default namespace-no need prefix>

. xmlns=”http://www.w3.org/2001/XMLSchema” <!xmlns-default namespace, namespace for keywords>

</schema>

<xs:schema <!xs-root element need prefix>

. xmlns:xs=http://www.w3.org/2001/XMLSchema&#8221; <!xs-prefix of xmlns(xml×)>

</xs:schema>

<xs:schema

. xmlns:xs=”http://www.w3.org/2001/XMLSchema&#8221;

. xs:targetNamespace=”http://bookface.com“> <!xs-attribute need prefix, namespace defined by this schema, fake URL>

. <include xs:schemaLocation=”http://bookface.com/UserType.xsd”/&gt; <!include other schemas, must has same targetNamespace>

</xs:schema>

[BookfaceDB.xml] dataset

<DB xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance
…….xsi:noNamespaceSchemaLocation=”file:///D:/Assignment/CS/DB/Project/Project3/Social%20Network/Bookface.xsd”>

<DB xmlns=”http://bookface.com<!defualt namespace=xs:targetNamespace in .xsb>

.        xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance&#8221; <!namespace for keywords>

.        xsi:schemaLocation=”http://bookface.com http://bookface.com/DB.xsd”> <!.xsd file location>

2. Simple Type

decimal, integer, boolean, string, float, date

.    (1) list

<simpleType name=”myIntList”>

.    <list itemType=”integer”/>

</simpleType>

.    (2) union

<simpleType name=”phoneNumber”>

.    <union memberTypes=”phone7digits phone10digits”/>

</simpleType>

.    (3) restriction

<simpleType name=”phone7digits”>

.    <restriction base=”integer”>

.        <minInclusive value=”1000000″/>

.        <maxInclusive value=”999999″/>

.        <pattern value=”s[0-9]{9}“/>

.        <enumeration value=”1″/>

.        <enumeration value=”2″/>

.    </restriction>

</simpleType>

3. Complex Type

.    (1) sequence

<complexType name=”User”> <!included in element: no name>

.    <sequence> | <all>

.        <element name= , type= >

.        <element name= , type= >

.    </sequence>

.    <attribute name= , type= >

</complexType>

.    (2) all

.    (3) choice

<complexType name=”addressType”>

.    <sequence> <!no need>

.        <choice>

.            <element name= , type= > <!a>

.            <sequence> <!b>

.                <element name= , type= >

.                <element name= , type= >

.            </sequence>

.        </choice>

.    </sequence>

.    <attribute name= , type= >

</complexType>

4. Key

.    (1) key

<xs:key name=”UserPK”>
.    <xs:selector xpath=”DB/Users/User”/>
.    <xs:field xpath=”@UserID”/>
</xs:key>

.    (2) keyref

<xs:keyref name=”Friend_OwnerFK” refer=”UserPK”>
.    <xs:selector xpath=”DB/Friends/Friend”/>
.    <xs:field xpath=”@OwnerID”/>
</xs:keyref>

5. Design

.    (1) attribute vs. element

.          attribute: Primary Key <xs:attribute name=”UserID” type=”xs:integer”/>

.          element: other attributes <xs:element name=”FirstName” type=”xs:string”/>

.    (2) schema hierarchy structure: DB/Users/User

<xs:element name=”DB”>
.    <xs:complexType>
.        <xs:sequence>

.            <xs:element ref=”Users” maxOccurs=”unbounded”/>

.    (3) error: duplication of name

.    (4) prefix: xs:integer(schema defined)  addressType(user defined)

II. XQuery:

1. general structure: [Bookface1.xquery]

for $u2 in doc(“BookfaceDB.xml“)/DB/Users/User (:keywords-lowercase, doc(“current path”)/start from root element:)

where $cc/OwnerID = $u2/@UserID

return <ul>{data($u2/FirstName)}<ul>

2. well-formed:

.    <ul>{…}</ul> (:file start with tag:)

.    nested/stand side by side: within tag [Bookface4.xquery]

3. return form:

.    within {}: function, without {}: words

.    return output×<ul>output√{for…}</ul> (:output place within tag:) [Bookface1.xquery]

.    {data($u/FirstName)} (:function data()-only return data:)

.    <ul><il>tag, used in html

4. let: [Bookface3.xquery]

.    let $fs := for…where…return $u [Bookface]

5. aggregation:

.    sum: [Bookface7.xquery]

.    count: can’t use within for  [Bookface3.xquery]

let $fs := for…where…retun…

return count($fs)

.    max: [Bookface3.xquery]

let $maxf := max(for…where…return…) (:get maxvalue:)

if(count($fs) eq $maxf) then <ul>output</ul> else() (:search max argument:)

6. if then else: [Bookface3.xquery]

return (:must after return:)

if()

then <ul>output</ul> (:can’t use return:)

else()

7. >gt =eq

8. newline:

.    let $nl := “”

.    return <ul>{$nl}</ul>

9. comment: (:🙂

10. error:

.    result is empty-name typo

.    global variable vs. local variable

Database Structure

CSE532 Project1: Social Netwok

Environment Configuration: MyEclipse for Spring 9.0 + Tomcat 6.0 + Data Studio 3.1 + DB2

You will design and implement a relational database system to support the operations of a social networking Web site and you will use HTML for the user interface.

1. A person can sign in as a new user of your Web site. You should store: user ID, password, Last Name, First Name, Address (City, State, Zip Code, Telephone, E-mail Address), User Creation Date, Preferences (see advertisement below), sex, date of birth, etc.

2. Each user should be able to search and ask for the friendship of other users and group his friends into circles (for family members, for co-workers, etc). You should store: circle ID, owner, circle name and members for circles.

3. Each user has an associated Wall on which he or his friends can make posts and comment on existing posts. You should store: Post ID, wall owner, date, content, and comments (authors, date and times, contents).

4. Each user has a profile page that lists his contact info and his friends.

5. Each user has a default/personal page of updates where he can see the posts posted on his friends’ walls in chronological order and his friend requests.

6. Users should be able to send each other messages. These messages should be grouped into discussions by message subject. One user can send one message to multiple other users. They should all see the discussion. You should store: Message ID, Date, Subject, Content, Sender, Receivers.

7. Users should be able to create Special Interest Pages (SIPs), invite other users to join SIPs, search and request to join a SIP, and moderate (accept or deny) the request to join a SIP. Each SIP has one or more moderators (at least one). The creator of a SIP is the first moderator of that SIP. Any moderator of a SIP can assign/remove other participants as moderators. Each such SIP has an associated a page on which members can make posts and comment on existing posts.

8. Your social network site will also support targeted advertising as a mechanism for the site to generate revenue. Users will specify their preferences (e.g. cars, clothing, life insurance) as to the type of advertising they wouldn’t mind seeing while visiting the site. Likewise, the site will ensure that only advertisements of these types are displayed to the user. A user can click on an advertisement to purchase one or more of the items being advertised, and the site receives 10% of all such transactions. You should store: Advertisement ID, item, Company, unit price, number of units available for each ad. You should also store sales data: Transaction ID, Date and Time, Advertisement ID, number of units, user ID.

You are responsible for arranging the data items into tables, determining the relationships among tables and identifying the key attributes. In addition, you should include indices in your tables to speed up query processing. You should specify and enforce integrity constraints on the data, including referential integrity constraints.

Some User-Level Transactions:
1 Sign up on the Web site
2 Create/delete circles
3 Search for a user ask for friendship
4 Accept a friendship request / Delete a friend
5 Add/remove friends to/from circles
6 Make/remove/modify posts
7 Comment on posts
8 Search/Join SIPs
9 Moderate SIPs
10 Make/remove/modify own SIP posts
11 Comment on SIP posts
12 Send / Receive / Delete messages
13 Purchase one or more copies of an advertised item

Web Site Manager-Level Transactions: a separate Web site for site managers should allow managers to:

1 Create/delete an advertisement
2 List the sales report for a particular month
3 Produce a comprehensive listing of all items being advertised on the site
4 Produce a list of transactions by item name or by customer name
5 Produce a summary listing of revenue generated by a particular item, item type, or customer
6 Determine which customer generated most total revenue
7 Determine which product (ad/item) generated most total revenue
8 Produce a sorted list of sold items by number of units sold
9 Produce a list of all customers who have purchased a particular item
10 Produce a list of all items for a given company sorted by number of units sold
11 Produce customer mailing lists for a given product
12 Produce a list of item suggestions for a given customer (based on that customer’s past transactions) – items of the same types

The Web site should include a comprehensive Help facility both for users and managers.

Design:
1. Provide the Entity-Relationship (E-R) diagram of the complete database scheme.
2. Provide the description of the relational database scheme including a discussion of the reasoning behind your design decisions. Make clear how your design supports efficient query processing.
3. Provide a list of all functional dependencies in the relational database scheme.
4. Describe the integrity constraints including referential integrity.
All printouts should be in PDF. You can use some appropriate document-preparation software for your tables (e.g., Microsoft Visio is available for all SBU students for free).

Notes:

1. create new project: MyEclipse → File → New → Web Project

2. structure: ProjectName → src → package → Servlet

.                                              → WebRoot → (jsp folder) → folder → JSP

.                                                                    → WEB-INF → lib → db2jcc_license_cisuz.jar/db2jcc.jar/db2jcc4.jar

.                                                                                          → web.xml

3. JSP: (initial lower case: prevent URL input error)

form: <form action=”servlet/Login” method=”get”></form> <%–login,jsp–%>

input: UserName: <input type=text name=username value=Yueying />

.                  Password: <input type=”passwordname=password /”>

.                  <input type=”submitvalue=”Submit” />

.                  <input type=”buttononclick=”msg(0)” value=”SignUp” />

.                  function msg(value)

.                 {
.                       if(value==0)
.                       location.href=”jsp/Login/signup.jsp”;
.                  }

.                  <input type=”radioname=”sex” value=”male” /> Male

.                   <input type=”radio name=”sex” value=”female” /> Female

.                   <input type=”checkboxname=”semester” value=”spring” /> Spring <%–createmessage,jsp–%>

.                    <input type=”checkbox” name=”semester” value=”fall” /> Fall

.                     String semester[] = request.getParameterValues(“semester”); //CreateMessageS.java

.                     <input type=”hiddenname=”messageid_choose” value=<%=messageid%> /> <%–messagelist.jsp–%>

select: Department <select name=”department”> <%–signup.jsp–%>

.                     <option value=”CSE”>CSE</option>

.                      <option value=”MAT”>MAT</option></select>

4. Servlet: (initial capital case: like-named java class)

5. JSP → Servlet: String s = request.getParameter(“s”); //LoginS.java

6. Servlet → JSP: 1. response.sendRedirect(“jsp/Login/login.jsp”); //LoginS.java

.                                         2. request.setAttribute(“”friendid_list”, friendid_list”); //MessageS.java

.                                              request.getRequestDispatcher(“jas/Message/createmessage.jsp”);

.                                             ArrayList friendid_list = (ArrayList)request.getAttribute(“friendid_list”); <%–createmessage.jsp–%>

.                                        3. HttpSession session = request.getSession(); //LoginS.java

.                                             session.setAttribute(“hostid”, userid_db);

.                                             int senderid = 0; //CreateMessageS.java

.                                             HttpSession session = request.getSession();

.                                             senderid = (Integer)session.getAttribute(“hostid”);

7. web.xml: XML file