Kutuhal: March 2013

Friday, 29 March 2013

Session # 10 - Plotting in R

Assignment 01

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
T<- cbind(x,y,z)
Create 3 dimensional plot of the same

> sample<-rnorm(50,25,6)

> sample

[1] 30.785023 31.702170 23.528853 18.208267 32.110218 35.820121 32.404731

[8] 24.507976 14.959855 29.919671 27.677203 17.108632 27.514712 20.260337

[15] 26.557483 30.048945 23.540832 15.833124 29.411549 27.037098 29.744451

[22] 28.901576 31.999236 32.641413 24.628705 27.263692 32.895669 27.046758

[29] 20.699581 32.417177 20.637992 20.448817 29.045200 9.706208 19.479191

[36] 19.214362 30.487007 41.029803 26.190709 24.989519 28.134211 25.319421

[43] 22.595737 27.045515 20.529657 36.455755 31.249895 19.290580 24.701767

[50] 24.621257

> x<-sample(sample,10)

> y<-sample(sample,10)

> z<-sample(sample,10)

> x

[1] 30.45576 20.63799 23.52885 20.69958 41.02980 29.74445 31.24990 30.48701

[9] 32.64141 15.83312

> y

[1] 20.69958 22.59574 36.45576 30.48701 30.78502 32.64141 32.40473 24.50798

[9] 24.98952 26.55748

> z

[1] 27.03710 32.40473 27.04676 24.98952 30.04895 24.50798 36.45576 29.04520

[9] 19.29058 30.78502

> T<-cbind(x,y,z)

> T

x y z

[1,] 30.45576 20.69958 27.03710

[2,] 20.63799 22.59574 32.40473

[3,] 23.52885 36.45576 27.04676

[4,] 20.69958 30.48701 24.98952

[5,] 41.02980 30.78502 30.04895

[6,] 29.74445 32.64141 24.50798

[7,] 31.24990 32.40473 36.45576

[8,] 30.48701 24.50798 29.04520

[9,] 32.64141 24.98952 19.29058

[10,] 15.83312 26.55748 30.78502

> plot3d(T)

> plot3d(T,col=rainbow(1000))

> plot3d(T,col=rainbow(1000),type='s')

Assignment no 2:
Read the documentation of rnorm and pnorm,
Create 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories) Hint: ?factor
3. Color code and draw the graph
4. Smooth and best fit line for the curve

> x<-rnorm(1500,100,10)
> y<-rnorm(1500,85,5)
> z1<-sample(letters,5)
> z2<-sample(z1,1500,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)
> qplot(x,y)

> qplot(x,z)

> qplot(x,z,alpha=I(1/10))

> qplot(x,y,geom=c("point","smooth"))

> qplot(x,y,colour=z)

> qplot(log(x),log(y),colour=z)

Saturday, 23 March 2013

Session # 09

Visualize Free

Visualize Free is a free visual analysis tool based on the advanced commercial dashboard and visualization software.

Visualization is a good technique for going through multidimensional data to understand trends in data with simple point-and-click methods.

There are basic three steps to use this tool. They are as follows:

1. Upload

2. Create

3. Analyze

These steps are explained below:

1. Upload

You can upload your own private data.

Datasets can be Excel files (both XLS and XLSX) or text (CSV and tab-delimited TXT). You can also copy and paste your data, as long as it comes in as tab-delimited. Make sure the first row contains the column headers.

2. Create

You can drag and drop components, and data fields to create an interactive dashboard...
After you create a data set, click New in the last column on the datasets page. After you name your visualization, click Edit to get started.
These example data sets are the ones used in the Visualize Free Documentation.

3. Analyze

You can view a visualization, but the real power comes from interacting with it...
Select your visualization on the datasets page, or on the visualizations page. You can Bookmark interesting views.
You can also share your datasets, visualizations, and bookmarks with others to make analysis a social activity. On the detail page for a visualization, change the mode to Protected and you can email a link to your friends and colleagues. You can also embed your visualization in your blog or intranet.

Friday, 15 March 2013

IT BAl

Session #8 -12 Mar Assignment Submission

Problem:

Perform Panel Data Analysis of "Produc" data

Solution:

There are three types of models:
      Pooled affect model
      Fixed affect model
      Random affect model

We will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed

The data can be loaded using the following command
data(Produc , package ="plm")
head(Produc)

Pooled Affect Model

pool <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("pooling"),index =c("state","year"))
summary(pool)

Fixed Affect Model:

fixed<-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("within"),index =c("state","year"))

summary(fixed)

Random Affect Model:

random <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("random"),index =c("state","year"))

> summary(random)

Testing of Model

This can be done through Hypothesis testing between the models as follows:

H0: Null Hypothesis: the individual index and time based params are all zero

H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)

Result:

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis: Random Affect Model

Command :

> plmtest(pool)

Result:

Lagrange Multiplier Test - (Honda)

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model

Alternate Hypothesis: Fixed Affect Model

Command:

> phtest(fixed,random)

Result:

Hausman Test

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion:

So after making all the tests we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation.