Earlier we have discussed how to use to conduct Regression testing in R -
Regression in R
In this article ,we will delve into some of the details to conduct the same using Python
Assumption - Dependent and independent data sets have been stored in respective variables sbiR & niftyR
Code to Run Linear Regression
from sklearn.linear_model import LinearRegression
regression = LinearRegression(normalize=True)
regression.fit(niftyR,sbiR)
As you might notice, Sklearn is the library in Python which has linear module and Linear Regression is the function that we have imported to run Regression.
You can also print the r2 value by function regression.score()
Code to Run Logistic Regression
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(niftyR,sbiR)
Everything else remain same however depending upon the dependent variable or output, we might need to use Logistic regression
Cross Validation & Train - Test Data sets
To test data sets, we first need to split the available data into 2 parts - train data & test data. Below library should be used to call split function
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size=0.33, random_state=42)
Array
X contains Independent data while
Y is the output, we will split whole data into 2 parts and will run regression on train data and check the results on test data to establish if model can be used for prediction
regression.fit(X_train,y_train)
print mean_squared_error(y_true=y_train, y_pred=regression.predict(X_train))
print mean_squared_error(y_true=y_test, y_pred=regression.predict(X_test))
This article has details on installing & Running Python Hello Code from Ubuntu Machine