Skip to main content

Tensorflow - Data API

· 3 min read
Sanjeev Sarda
High Performance Developer

Some notes on the high level data API in Tensorflow.

Tensorflow

Process

  1. Collect clean and process your data
  2. Prototype and develop your model architecture
  3. Train and evaluate results
  4. Prepare the model for production serving

alt text

Tensorflow provides functionality at each stage.

Dev Process with Eager Execution

Develop models in eager whilst iterating:

import tensorflow as tf

tf.enable_eager_execution()

Executes the graph immediately.

Loading Data

TF Dataset is similar to a dataframe or numpy array. Optimised for data consumption for training.

Rows are tuples of different tensor types.

Can apply a function to data row by row to transform the data into a set of features and class label.

def _parse_data_function(*vals):
...
...

Separate and group columns of features at this stage.

e.g. say we have an enum that's been one hot encoded in our dataset across columns 14-54. We can group that into a single tensor:

soil_type = tf.convert_to_tensor(vals[14:54])

This means soil_type can be learnt as a single feature, not a set of features that are in columns 14-54

alt text

Apply Function to Dataset

dataset = dataset.map(_parse_data_function).batch(BATCH_SIZE)

Apply the function on sets of the BATCH_SIZE.

Feature Columns

Config class that defines a mapping function for turning categorical data into continuous numeric data.

Convert categories into embedding columns:

alt text

The transformations are part of the model graph so can be exported as part of the saved model. This is an advantage vs preprocessing it yourself.

Feature Layer

The feature columns make up an initial feature layer - this layer's responsibility is to transform the data into the representations the neural net is expecting.

model = tf.keras.Sequential([
feature_layer.
...
...
...
])

The feature layer is also used to train any embedding types.

Compiling the Model

Different optimizers and loss functions are available through Tensorflow.

Validate Model

Load the validation dataset using the same process as the train dataset.

Call evaluate on the model you created.

Saving for Deployment

Tensorflow saved model format for serving - can export in this format and reuse for serving the model and using with the full suite of Tensorflow tools.

Swapping the Model using Canned Estimators

Tensorflow comes with a bunch of canned estimators that don't necessarily fit nicely into a layered sequential style as expressed in keras. Exploit existing research on model structures.

Wide and Deep Model

tf.estimator.
DNNLinearCombinedClassifier(....)

alt text

Linear learning with deep learning. Categorical to the linear half. Numerical to the DNN half.

Input Receiver Function

Used to build the tensor shapes we expect at serving time - those that we want to run inference on.

This means you can have the inference data in a different format to training.

You can pass this input serving function to the model export so that it's included for use in the entire Tensorflow suite.

alt text

TensorFlow High Level APIs, YouTube