## The nonparametric estimation in statsmodels relies on two main classes-

**UKDE**and

**CKDE**. Each class has attributes that store the probability density (

**cdf**) and the bandwidth (

**bw**). Currently the classes can handle mixed variable types (continuous and discrete data) and multiple bandwidth selection methods.

**UKDE**implements the unconditional kernel density estimation. Suppose you would like to estimate the joint probability density of two variables, say X and Y. And suppose that X is continuous and Y is some ordered discrete variable. To do this with statsmodels you simply have to create an instance of the class

**UKDE**:

**udens = UKDE (tdat = [X, Y], var_type = 'co', bw = 'cv_ls')**

**tdat**is the training data (in this case a list of two arrays),

**var_type**specifies the type of variables in

**tdat**(continuous and ordered) and

**bw**specifies the bandwidth method to be used (in this case least squares cross validation). Now that the density has been estimated suppose you would like to calculate the probability of a particular realization of X = x and a particular Y = y. To do this:

**udens.pdf (edat = [x,y] )**

where

**edat**is the evaluation data. x,y can also be arrays if the user wants to calculate the density at multiple points at the same time.

An important part of the nonparametric estimation is the calculation of the bandwidth. This is controlled by the input parameter

**bw.**Currently the user can choose three methods: normal reference rule of thumb (

**bw='normal_reference'**), maximum likelihood cross-validation (

**bw = 'cv_ml'**) and least squares cross-validation (

**bw = 'cv_ls'**). Or alternatively the user can specify an array of values to be used for the bandwidth. The bandwidth estimation is stored in the

**bw**attribute of the

**UKDE**class. To access it:

**udens.bw**

The conditional kernel density estimation is implemented through the class CKDE. For example

**cdens = CKDE (tydat = [X,Y], txdat = [V, W], dep_type = 'co', indep_type = 'cc', bw = 'cv_ml')**

This will estimate the conditional probability density P (X,Y | V, W) -- the joint probability of X and Y

*given*W and V.

**tydat**and

**txdat**are the dependent and independent data each of which has a variable type controlled by

**dep_type**and

**indep_type**. In this case the X is continous and Y is ordered while both independent variables V and W are continuous. The bandwidth selection method is maximum likelihood cross-validation which runs faster than least squares cross-validation.

To access the value of the conditional pdf for particular data x,y,v,w simply try:

**cdens.pdf (eydat = [x,y], exdat = [v,w])**

Great piece of work George

ReplyDeleteReally helpful

Hey, nice site you have here! Keep up the excellent work!

ReplyDeleteFunction Point Estimation Training