Friday, 29 March 2013

Session # 10 -  Plotting in R 

Assignment 01

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
T<- cbind(x,y,z)
Create 3 dimensional plot of the same

> sample<-rnorm(50,25,6)
> sample
 [1] 30.785023 31.702170 23.528853 18.208267 32.110218 35.820121 32.404731
 [8] 24.507976 14.959855 29.919671 27.677203 17.108632 27.514712 20.260337
[15] 26.557483 30.048945 23.540832 15.833124 29.411549 27.037098 29.744451
[22] 28.901576 31.999236 32.641413 24.628705 27.263692 32.895669 27.046758
[29] 20.699581 32.417177 20.637992 20.448817 29.045200  9.706208 19.479191
[36] 19.214362 30.487007 41.029803 26.190709 24.989519 28.134211 25.319421
[43] 22.595737 27.045515 20.529657 36.455755 31.249895 19.290580 24.701767
[50] 24.621257
> x<-sample(sample,10)
> y<-sample(sample,10)
> z<-sample(sample,10)
> x
 [1] 30.45576 20.63799 23.52885 20.69958 41.02980 29.74445 31.24990 30.48701
 [9] 32.64141 15.83312
> y
 [1] 20.69958 22.59574 36.45576 30.48701 30.78502 32.64141 32.40473 24.50798
 [9] 24.98952 26.55748
> z
 [1] 27.03710 32.40473 27.04676 24.98952 30.04895 24.50798 36.45576 29.04520
 [9] 19.29058 30.78502
> T<-cbind(x,y,z)
> T
             x        y        z
 [1,] 30.45576 20.69958 27.03710
 [2,] 20.63799 22.59574 32.40473
 [3,] 23.52885 36.45576 27.04676
 [4,] 20.69958 30.48701 24.98952
 [5,] 41.02980 30.78502 30.04895
 [6,] 29.74445 32.64141 24.50798
 [7,] 31.24990 32.40473 36.45576
 [8,] 30.48701 24.50798 29.04520
 [9,] 32.64141 24.98952 19.29058
[10,] 15.83312 26.55748 30.78502
> plot3d(T)
> plot3d(T,col=rainbow(1000))
> plot3d(T,col=rainbow(1000),type='s')


Assignment no 2:
Read the documentation of rnorm and pnorm,
Create 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories) Hint: ?factor
3. Color code and draw the graph
4. Smooth and best fit line for the curve 

> x<-rnorm(1500,100,10)
> y<-rnorm(1500,85,5)
> z1<-sample(letters,5)
> z2<-sample(z1,1500,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)
> qplot(x,y)

> qplot(x,z)

> qplot(x,z,alpha=I(1/10))
> qplot(x,y,geom=c("point","smooth"))
  > qplot(x,y,colour=z)
> qplot(log(x),log(y),colour=z)


Saturday, 23 March 2013

 Session # 09

 Visualize Free


Visualize Free is a free visual analysis tool based on the advanced commercial dashboard and visualization software.
Visualization is a good technique for going through multidimensional data to understand trends in data with simple point-and-click methods.

There are basic three steps to use this tool. They are as follows:

1. Upload
2. Create
3. Analyze
These steps are explained below:

1. Upload
You can upload your own private data.
Datasets can be Excel files (both XLS and XLSX) or text (CSV and tab-delimited TXT). You can also copy and paste your data, as long as it comes in as tab-delimited. Make sure the first row contains the column headers.

 2. Create

You can drag and drop components, and data fields to create an interactive dashboard...
After you create a data set, click New in the last column on the datasets page. After you name your visualization, click Edit to get started.
These example data sets are the ones used in the Visualize Free Documentation.


3. Analyze

You can view a visualization, but the real power comes from interacting with it...
Select your visualization on the datasets page, or on the visualizations page. You can Bookmark interesting views.
You can also share your datasets, visualizations, and bookmarks with others to make analysis a social activity. On the detail page for a visualization, change the mode to Protected and you can email a link to your friends and colleagues. You can also embed your visualization in your blog or intranet.



Friday, 15 March 2013







  IT BAl

Session #8 -12 Mar Assignment Submission


Problem:

Perform Panel Data Analysis of "Produc" data

Solution:

There are three types of models:
      Pooled affect model
      Fixed affect model
      Random affect model

We will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed

The data can be loaded using the following command
data(Produc , package ="plm")
head(Produc)



 

 

Pooled Affect Model

pool <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("pooling"),index =c("state","year"))
summary(pool)














Fixed Affect Model:



fixed<-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("within"),index =c("state","year"))

summary(fixed)







Random Affect Model:



random <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("random"),index =c("state","year"))
> summary(random)







Testing of Model



This can be done through Hypothesis testing between the models as follows:



H0: Null Hypothesis: the individual index and time based params are all zero

H1: Alternate Hypothesis: atleast one of the index and time based params is non zero



Pooled vs Fixed



Null Hypothesis: Pooled Affect Model

Alternate Hypothesis : Fixed Affect Model



Command:



> pFtest(fixed,pool)





Result:

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.




Pooled vs Random



Null Hypothesis: Pooled Affect Model

Alternate Hypothesis: Random Affect Model



Command :

> plmtest(pool)



Result:



  Lagrange Multiplier Test - (Honda)

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects



Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.




Random vs Fixed



Null Hypothesis: No Correlation . Random Affect Model

Alternate Hypothesis: Fixed Affect Model



Command:

 > phtest(fixed,random)



Result:



 Hausman Test

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent



Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.




Conclusion: 



So after making all the tests we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.



Hence , we conclude that within the same id i.e. within same "state" there is no variation.

Wednesday, 13 February 2013

ITBAL: SESSION 6

Assignment 1:
Create log of returns for NIFTY data from 01 Jan 2012 to 31 Jan 2013 and calculate the historical volatility


Solution:

z<-read.csv(file.choose(), header =T)
head(z)
closingprice<-z[,5]
closingprice.ts<-ts(closingprice, frequency =252)
st<-log(closingprice.ts)
stlag<-log(lag(closingprice.ts,k=-1))
log.returns<-(st-stlag)/stlag
plot(log.returns)
T =(252) ^ 0.5
historicalvolatility<-sd(returns) * T
historicalvolatility






Assignment 2:

Create ACF plot for the above log of returns data and perform the adf test and comment on it

The ACF plot can be done using the below formula

acf(log.returns)








  It can be seen from the plot that the data lies within the 95% confidence interval and there is maximum possibility of data being stationary.
The ADF test is done using :

adf.test(returns)
As the P value is less than the significant value(0.05) and we can reject the null hypothesis.

Hence as per the  alternate hypothesis data being stationary, analysis can be done.

Thursday, 7 February 2013

IT Lab Day5 Assignment


Assignment1

1. Find returns of NSE data of greater than 6 months having selected the 10th data point as start and 95th data point as end.

2. Find plot of that return.




Commands:

> z<-read.csv(file.choose(),header=T)

> Close<-z$Close
> Close
> Close.ts<-ts(Close)
> Close.ts<-ts(Close,deltat= 1/252)
z1<-ts(data=Close.ts[10:95],frequency=1,deltat=1/252) 
> z1.ts<-ts(z1)
> z1.ts
> z1.diff<-diff(z1)
> z2<-lag(Close.ts,K=-1)
> Returns<-z1.diff/z2
> plot(Returns,main=" Returns from 10th to 95th day of NSE Mid-cap Index ")
z3<-cbind(z1.ts,z1.diff,Returns)
> plot(z3,main=" Data from 10th-95th day ; Difference ; Returns")

Assignment2
 
 1-700 data is available, Predict the data from 701-850, use the GLM estimation using LOGIT Analysis for the same.

Commands:

> z<-read.csv(file.choose(),header=T)
> z1<-z[1:700,1:9]
> head(z1)
> z1$ed<-factor(z1$ed)
> z1.est<-glm(default ~ age + ed + employ + address + income, data=z1, family ="binomial")
> summary(z1.est)

> forecast<-z[701:850,1:8]
> forecast$ed<-factor(forecast$ed)
> forecast$probability<-predict(z1.est,newdata=forecast,type="response")
> head(forecast)



Tuesday, 22 January 2013

ASSIGNMENT 1a:

Fit ‘lm’ and comment on the applicability of ‘lm’.
Plot1: Residual vs Independent curve.
Plot2: Standard Residual vs independent curve.
> file<-read.csv(file.choose(),header=T)
> file
 
Data

 mileage groove
1       0 394.33
2       4 329.50
3       8 291.00
4      12 255.17
5      16 229.33
6      20 204.83
7      24 179.00
8      28 163.83
9      32 150.33


> x<-file$groove
> x
[1] 394.33 329.50 291.00 255.17 229.33 204.83 179.00 163.83 150.33
> y<-file$mileage
> y
[1]  0  4  8 12 16 20 24 28 32
> reg1<-lm(y~x)
> res<-resid(reg1)
> res
         1          2          3          4          5          6          7          8          9
 3.6502499 -0.8322206 -1.8696280 -2.5576878 -1.9386386 -1.1442614 -0.5239038  1.4912269  3.7248633
> plot(x,res)

Since the plot is parabolic, the regression cant be done.


Assignment 1 (b) -Alpha-Pluto Data

Fit ‘lm’ and comment on the applicability of ‘lm’.
Plot1: Residual vs Independent curve.
Plot2: Standard Residual vs independent curve.

Also do:
Qq plot
Qqline
> file<-read.csv(file.choose(),header=T)
> file
   alpha pluto
1  0.150    20
2  0.004     0
3  0.069    10
4  0.030     5
5  0.011     0
6  0.004     0
7  0.041     5
8  0.109    20
9  0.068    10
10 0.009     0
11 0.009     0
12 0.048    10
13 0.006     0
14 0.083    20
15 0.037     5
16 0.039     5
17 0.132    20
18 0.004     0
19 0.006     0
20 0.059    10
21 0.051    10
22 0.002     0
23 0.049     5
> x<-file$alpha
> y<-file$pluto
> x
 [1] 0.150 0.004 0.069 0.030 0.011 0.004 0.041 0.109 0.068 0.009 0.009 0.048
[13] 0.006 0.083 0.037 0.039 0.132 0.004 0.006 0.059 0.051 0.002 0.049
> y
 [1] 20  0 10  5  0  0  5 20 10  0  0 10  0 20  5  5 20  0  0 10 10  0  5
> reg1<-lm(y~x)
> res<-resid(reg1)
> res
         1          2          3          4          5          6          7
-4.2173758 -0.0643108 -0.8173877  0.6344584 -1.2223345 -0.0643108 -1.1852930
         8          9         10         11         12         13         14
 2.5653342 -0.6519557 -0.8914706 -0.8914706  2.6566833 -0.3951747  6.8665650
        15         16         17         18         19         20         21
-0.5235652 -0.8544291 -1.2396007 -0.0643108 -0.3951747  0.8369318  2.1603874
        22         23
 0.2665531 -2.5087486
> plot(x,res)


> qqnorm(res)
 > qqline(res)
 

Assignment 2: Justify Null Hypothesis using ANOVA


> file<-read.csv(file.choose(),header=T)
> file

   Chair Comfort.Level Chair1
1      I             2      a
2      I             3      a
3      I             5      a
4      I             3      a
5      I             2      a
6      I             3      a
7     II             5      b
8     II             4      b
9     II             5      b
10    II             4      b
11    II             1      b
12    II             3      b
13   III             3      c
14   III             4      c
15   III             4      c
16   III             5      c
17   III             1      c
18   III             2      c
> file.anova<-aov(file$Comfort.Level~file$Chair1)
> summary(file.anova)

            Df Sum Sq Mean Sq F value Pr(>F)
file$Chair1  2  1.444  0.7222   0.385  0.687
 P Value  = 0.687

Since, the p value is high, the null hypothesis cant be rejected.

Wednesday, 16 January 2013

Assignment 1


Create two matrices z1 and z2 and bind the first column of z1 with the second column of z2.

Commands

> z1<-c(1:9)
> dim(z1)<-c(3,3)
> z2<-c(32,48,01,05,10,12,15,18,23)
> dim(z2)<-c(3,3)
>  z3<-cbind(z1[,1],z2[,2]) 

Assignment 2


Multiplication of two matrices z1 and z2.

Commands

> z3<-z1%*%z2
> z3

Assignment 3  

Regression 

 

Commands

>nse=read.csv(file.choose(),header=T)
>reg2<-lm(High~Open,data=nse)
> residuals(reg2)

Assignment 4


Commands


Generate a normal distribution data and generate the plot.

>x<-seq(0,500)
> y<-dnorm(x,mean=250,sd=50)
> plot(x,y,type="l")

Tuesday, 8 January 2013

IT and Business Application Lab 

Assignment 0

Line Plot 
Histogram

Assignment 1

NSE Data Histogram

  Assignment 2

Assignment 3

 Assignment 4