Autonomio v.0.3.2 User Manual

https://travis-ci.org/autonomio/core-module.svg?branch=master https://coveralls.io/repos/github/autonomio/core-module/badge.svg?branch=master https://gemnasium.com/badges/github.com/mikkokotila/core-module.svg

This document covers in detail functionality of Autonomio. If you’re looking for a high level overview of the capabilities, you might find [Autonomio_Overview] more useful.

Autonomio is very easy to use and it’s highly recommended to memorize the namespace which is just 3 commands and less than 30 arguments combined. Yet you have an infinite number of network configurations available. To have 100% control over Autonomio’s powerful features, you just have to know three commands.

To train (and save) model:

train()

To test (and load) model:

test()

To load a dataset:

data()

INSTALLATION

At the moment, the simplest way is to install with pip from the repo directly.

For installing the development version (latest):

pip install git+https://github.com/autonomio/core-module.git

TYPES OF NEURAL NETWORKS

Currently Autonomio is focused on providing a very high level abstraction layer to training of ‘dense’ neural networks, and then using the trained models to make predictions in any environment that you see fit. Dense layers are suited for a wide range of data science problems.

DATA INPUTS

The expected input dataformat is a Pandas dataframe. Generally speaking, deep learning is at its strongest in solving classification problems, where the outcome variable is either binary categorical (0 or 1) or multi categorical. It’s recommended to convert continuous and ranged variables to categoricals first.

BINARY (default)

  • X can be text, int, or floating point
  • Y can be an int, or floating point

The default settings are optimized for making a 1 or 0 prediction and for example in the case of predicting sentiment from tweets, Autonomio gives 85% accuracy without any parameter setting for classifying tweets that rank in the most negative 20% according to NLTK Vader sentiment analysis.

CATEGORICAL

  • X can be text, integer
  • Y can be an integer or text
  • output layer neurons must match number of categories
  • change activation_out to something that works with categoricals

It’s not a good idea to have too many categories, maybe 10 is pushing it in most cases.

TRAIN

The absolute minimum use case using an Autonomio dataset is:

from autonomio.commands import *
%matplotlib inline
train('text','neg',data('random_tweets'))

Using this example and NLTK’s sentiment analyzer as an input for the ground truth, Autonomio yields 85% prediction result out of the box with with nothing but:

train('text','neg',data('random_tweets'))

There are multiple ways you can input ‘x’ with single input:

train('text' ,'neg', data) # a single column where data is string

train(5, 'neg', data) # a single column by index

train(['quality_score'], 'neg', data) # a single column by label

And few more ways where you input a list for ‘x’:

train([1,5], 'neg', data) # a range of column index

train(['quality_score', 'reach_score'], 'neg', data) # set of column labels

train([1,2,4,6,18], 'neg', data) # a list of column index

A slightly more involving example may include changing the number of epochs:

train('text','neg',data('random_tweets'),epoch=20)

For flattening the options are ‘mean’, ‘median’, ‘none’ and IQR. IQR is invoked by inputting a float:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3)

Dropout is one of the most important aspects of neural network:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,dropout=.5)

You might want to change the number of layers in the network:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,dropouts=.5,layers=4)

Or change the loss of the model:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,dropouts=.5,layers=4,loss='kullback_leibler_divergence')

For a complete list of supported losses see [Keras_Losses]

If you want to save the model, be mindful of using .json ending:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,save_model='model.json')

Control the neuron size by setting the number of neurons on the input layer:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,neuron_first=50)

Sometimes changing the batch size can improve the model significantly:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,batch_size=15)

By default verbosity from Keras is at mimimum, and you may want the live mode for training:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,verbose=1)

You can add the shape in the model(the way how layers are distributed):

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,verbose=1, shape='brick')

To validate the result to check the test accuracy you may use the validation:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,validation=True)

The True for validation puts the half of the data to be trained, the other - tested.

You can also define which part of the data will be validated:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,validation=.4)

To be sure about the results which you have got you can use double check:

train('text','neg',data('random_tweets'),epoch=20,flatten=.3,double_check=True)

TRAIN ARGUMENTS

Even though it’s possible to use Autonomio mostly with few arguments, there are a total 13 arguments that can be used to improving model accuracy:

def train(X,Y,data,
            dims=300,
            epoch=5,
            flatten='mean',
            dropout=.2,
            layers=3,
            model='train',
            loss='binary_crossentropy',
            save_model=False,
            neuron_first='auto',
            neuron_last=1,
            batch_size=10,
            verbose=0,
            shape='funnel',
            double_check=False,
            validation=False):
ARGUMENT REQUIRED INPUT DEFAULT
X string, int, float NA
Y int,float,categorical NA
data data object NA
epoch int 5
flatten string, float ‘mean’
dropout float .2
layers int (2 through 5 3
model int ‘train’ (OBSOLETE)
loss string [Keras_Losses] ‘binary_crossentropy’
save_model string, False
neuron_first int,float,categorical 300
neuron_last data object 1
batch_size int 10
verbose 0,1,2 0
shape string ‘funnel’
double_check True or False False
validation True,False,float(0 to 1) False

SHAPES

Shapes function takes as input number of layers, maximum value of neurons and the name of a shape. As an output it gives a list of neurons in order according to its shape.

Funnel

Funnel is the shape, which is set by default. It roughly looks like an upside-dowm pyramind, so that the first layer is defined as neuron_max, and the next layers are sligtly decreased compared to previous ones.:

+          +
\          /
 \        /
  \      /
   \    /
    |  |

As funnel shape is set by default, we do not need to input anything to use it.

Example input.:

tr = train(1,'neg',temp,layers=5,neuron_max=10)

Output list of neurons(excluding ounput layer).

neuron_count = [10, 5, 3, 2, 1]

Long Funnel

Long Funnel shape can be applied by defining shape as ‘long_funnel’. First half of the layers have the value of neuron_max, and then they have the shape similar to Funnel shape - decreasing to the last layer.:

+          +
|          |
|          |
|          |
 \        /
  \      /
   \    /
    |  |

Example input.:

tr = train(1,'neg',temp,layers=6,neuron_max=10,shape='long_funnel')

Output list of neurons(excluding ounput layer).:

neuron_count = [10, 10, 10, 5, 3, 2]

Rhombus

Rhobmus can be called by definind shape as ‘rhombus’. The first layer equals to 1 and the next layers slightly increase till the middle one which equals to the value of neuron_max. Next layers are the previous ones goin in the reversed order.

   +   +
   /   \
  /     \
 /       \
/         \
\         /
 \       /
  \     /
   \   /
   |   |

Example input.

tr = train(1,'neg',temp,layers=5,neuron_max=10,shape='rhombus')

Output list of neurons(excluding ounput layer).

neuron_count = [1, 6, 10, 6, 1]

Diamond

Defining shape as ‘diamond’ we will obtain the shape of the ‘opened rhombus’, where everything is similar to the Rhombus shape, but layers start from the larger number instead of 1.

  +     +
 /       \
/         \
\         /
 \       /
  \     /
   \   /
   |   |

Example input.

tr = train(1,'neg',temp,layers=6,neuron_max=10,shape='diamond')

Output list of neurons(excluding ounput layer).

neuron_count = [6, 6, 10, 5, 3, 2]

Hexagon

Hexagon, which we get by calling ‘hexagon’ for shape, starts with 1 as the first layer and increases till the neuron_max value. Then some next layers will have maximum value untill it starts to decrease till the last layer.

    +  +
   /    \
  /      \
 /        \
|          |
|          |
|          |
 \        /
  \      /
   \    /
    |  |

Example input.

tr = train(1,'neg',temp,layers=7,neuron_max=10,shape='hexagon')

Output list of neurons(excluding ounput layer).

neuron_count = [1, 3, 5, 10, 10, 5, 3]

Brick

All the layers have neuron_max value. Called by shape=’brick’.

+                     +
|             |
|             |
|             |
|             |
 ----     ----
     |   |

Example input.

tr = train(1,'neg',temp,layers=5,neuron_max=10,shape='brick')

Output list of neurons(excluding ounput layer).

neuron_count = [10, 10, 10, 10, 10]

Triangle

This shape, which is called by defining shape as ‘triangle’ starts with 1 and increases till the last input layer, which is neuron_max.

     +  +
    /    \
   /      \
  /        \
 /          \
/            \
----      ----
    |    |

Example input.

tr = train(1,'neg',temp,layers=5,neuron_max=10,shape='triangle')

Output list of neurons(excluding ounput layer).

neuron_count = [1, 2, 3, 5, 10]

Stairs

You can apply it defining shape as ‘stairs’. If number of layers more than four, then each two layers will have the same value, then it decreases.If the number of layers is smaller than four, then the value decreases every single layer.

+                      +
|                      |
 ---                ---
    |             |
     ---       ---
        |     |

Example input.

tr = train(1,'neg',temp,layers=6,neuron_max=10,shape='stairs')

Output list of neurons(excluding ounput layer).

neuron_count = [10, 10, 8, 8, 6, 6]

TEST

Once you’ve trained a model with train(), you can use it easily on any dataset:

test('text',data,'handle','model.json')

Or if you want to see an interactive scatter plot visualization with new y variable:

test('text',data,'handle','model.json',y_scatter='influence_score')

Whatever y_scatter is set as, will be set as the y-axis for the scatter plot.

To yield the scatter plot, you have to call it specifically:

test_result = test('text',data,'handle','model.json',y_scatter='influence_score')
test_result[1]

TEST ARGUMENTS

The only difference between the two modes of test() is if a scatter plot is called:

def test(X,data,labels,saved_model,y_scatter=False)
ARGUMENT REQUIRED INPUT DEFAULT
X variable/s in dataframe NA
data pandas dataframe NA
labels variable/s in dataframe NA
saved_model filename 5
y_scatter variable in dataframe ‘mean’

DATA

Dataset consisting of 10 minute samples of 80 million tweets:

data('election_in_twitter')

4,000 ad funded websites with word vectors and 5 categories:

data('sites_category_and_vec')

Data from both buy and sell side and over 10 other sources:

data('programmatic_ad_fraud')

9 years of monthly poll and unemployment numbers:

data('parties_and_employment')

120,000 tweets with sentiment classification from NLTK:

data('tweet_sentiment')

20,000 random tweets:

data('random_tweets')

DATA ARGUMENTS

The data command is provided for both convinience, and to give the user access to unique deep learning datasets. In addition to allowing access to Autonomio datasets, the function also supports importing from csv, json, and excel. The data importing function is for most cases we face, but is not intended as a replacement to pandas read functions:

def data(name,mode='default')
ARGUMENT REQUIRED INPUT DEFAULT
name dataset or filename NA
mode string (‘file’) NA

VALIDATION

TROUBLESHOOTING

One of the most common errors you get working with Keras is related with your output layer:

ValueError: Error when checking model target: expected dense_22 to have shape (None, 2) but got array with shape (1000,

This means that your neuron_last does not match the number of categories in ‘y’. Usually you would only see this with in cases where you have an output other than 1 or 0, or when you do have that but for some reason changed neuron_last to something else than 1 from train().

You could have a very similar error message from Keras if your dims is not same as the number of features:

ValueError: Error when checking model input: expected dense_1_input to have shape (None, 300) but got array with shape (1000, 1)

NOTE: Your dims number must be exactly the same as the number of features in your mode (‘x’) except with series of text as an input where the default setting 300 is correct.

If your dims (input layer) is smaller than output layer (neuron_last):

ValueError: Input arrays should have the same number of samples as target arrays. Found 100 input samples and 1 target samples.