## Random variable (y). Here the y can be calculated

Random ForestRandom Forest algorithm is a supervised classification algorithm which grows many classification trees. In random forest each tree in the ensemble is built from a sample drawn with from the training set. The higher the number of trees in the forest gives the high accuracy results. Random forest algorithm can use both for classification and the regression kind of problems. For classification problems, the ensemble of simple trees vote for the most popular class. In the regression problem, their responses are averaged to obtain an estimate of the dependent variable.Single decision trees often have high variance or high bias. Random Forests attempts to mitigate the problems of high variance and high bias by averaging to find a natural balance between the two extremes. Considering that Random Forests have few parameters to tune and can be used simply with default parameter settings, they are a simple tool to use without having a model or to produce a reasonable model fast and efficiently. AdaBoostThe core principle of AdaBoost is to fit a sequence of weak learners (i.e., models that are only slightly better than random guessing, such as small decision trees) on repeatedly modified versions of the data. The predictions from all of them are then combined through a weighted majority vote (or sum) to produce the final prediction. Simple Linear Regression model: Simple linear regression is a statistical method that enables users to summarise and study relationships between two continuous (quantitative) variables. Linear regression is a linear model wherein a model that assumes a linear relationship between the input variables (x) and the single output variable (y). Here the y can be calculated from a linear combination of the input variables (x). When there is a single input variable (x), the method is called a simple linear regression. When there are multiple input variables, the procedure is referred as multiple linear regression. Multilayer Percepteron(MLP) :An MLP is a network of simple neurons called perceptrons. A multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training the networkIMAGEMulti-layered neural networks are essentially used to deal with data-sets that have a large number of features, especially non-linear ones. Intuitively, the more hidden layers it has, the more ‘complex’ shapes it can fit. Logistic RegressionThe logistic regression model is one member of the supervised classification algorithm family. Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function. Logistic regression predicts the probability of an outcome that can only have two values (i.e. a dichotomy). The prediction is based on the use of one or several predictors (numerical and categorical). It is a regression model where the dependent variable (DV) is categorical. Performance Metrics :1. Accuracy – The ability of the model to correctly predict the class label of new or previously unseen dataAccuraccy = number of correct predictions/ Total of all the cases to be predicted 2. Mean Absolute Error – Measure of difference between two continuous variables. 3. Root Mean Squared error – It follows an assumption that error are unbiased and follow a normal distribution. 4. ROC (Receive operating characteristic)5. Kappa Statistics – The kappa statistic is frequently used to test interrater reliability. True positive (TP): Sick people correctly identified as sick False positive (FP): Healthy people incorrectly identified as sick True negative (TN): Healthy people correctly identified as healthy False negative (FN): Sick people incorrectly identified as healthy v Data AnalysisThe dataset used is the UCI Heart-disease datasethaing tota 303 instances. It consist of total 75 attributes from which 14 are used. Attributes used are of real, binary, nominal, and ordered type. The attributes and its descriptions are given as below- Interpretation And Evaluation ConclusionHeart Disease is a fatal disease by its nature. This disease makes a life threatening complexities such as heart attack and death. The importance of Data Mining in the Medical Domain is realized and steps are taken to apply relevant techniques in the Disease Prediction. The various research works with some effective techniques done by different people were studied. An ensemble is a technique for combining many weak learners in an attempt to produce a strong learner. Evaluating the prediction of an ensemble typically requires more computation than evaluating the prediction of a single model, so ensemble may be thought of as a way to compensate for poor learning algorithms by performing a lot of extra computation. There are basically two motivations behind building a ensemble of classifier. i. Reduced variance: Results are less dependent on the peculiarities of a single training set. ii. Reduced bias: A combination of multiple classifiers may learn a more expressive concept class than a single classifier.