ML Agents in Unity

Training a reinforcement learning agent to jump over objects.

Moin
6 min readMar 27, 2021
ML Agents Unity

Made by: Moin Sul and Sahil Rana

In this we will be looking at a sample project where we simulate a environment in unity and make a car jump over incoming traffic. This simulation is using the following concepts of machine learning.

Unity

How ML Agents work?

Unity ML-Agents is a new plugin for the game engine Unity that allows us to create or use pre-made environments to train our agents.

It’s developed by Unity Technologies, the developers of Unity, one of the best game engine ever.

The three Components

With Unity ML-Agents, you have three important components.

Source: Unity ML-Agents Documentation

The first is the Learning Component (on Unity), that contains the Unity scene and the environment elements.

The second is the Python API that contains the RL algorithms(reinforcement learning).We use this API to launch training, to test, etc. It communicates with the Learning environment through the external communicator.

Inside the Learning Component

Inside the Learning Component, we have different elements:

Source: Unity ML-Agents Documentation

The first is the Agent, the actor of the scene, we’re going to train by optimizing his policy (that will tell us what action to take at each state) called Brain.

Finally, there is the Academy, this element orchestrates agents and their decision-making process. Think of this Academy as a maestro that handles the requests from the python API.

Now, let’s imagine an agent learning to play a platform game. The RL process looks like this:

  • Our agent receives state S0 from the environment — we receive the first frame of our game (environment).
  • Based on the state S0, the agent takes an action A0 — our agent will jump.
  • The environment transitions to a new state S1.
  • Give a reward R1 to the agent — we’re not dead (Positive Reward +1).

This RL loop outputs a sequence of state, action, and reward. The goal of the agent is to maximize the expected cumulative reward.

In fact, the Academy will be the one that will send the order to our Agents and ensure that agents are in sync:

  • Collect Observations
  • Select your action using your policy
  • Take the Action
  • Reset if you reached the max step or if you’re done.

Script for Agent

Vector Actions

Vector action has two types continuous and discrete. Here we are only dealing with jumping so only on discrete action.

Behavior Types

Heuristic-It is classical way of AI in games work. Programmers think of ways the Ai should behave and hard-code the in it can work very well but has problems to adapt to ever changing ways and complex environments

Learning-This is what we are after this is when the AI is currently trained using machine learning. During training a neural network model gets generated, in order to use this generated model after the training is finished.

Inference-The last behavior is used which is called inference where the learnt model is supplied but not changed, meaning the AI won’t learn if we choose to fold. It will basically try to use the learning behavior if we don’t have the external Python training process attached like

The team ID is only relevant if you want to use the same behavior on multiple agents playing against each other.

Agent Methods

Transforming our player into an Agent

A spawner object located in environment. Here we set that traffic cars are spawned and at what interval they are spawned. In spawner the player is located and its script has actions and methods which will be responsible for jumping.

In script the player should inherit from Agent

Script

Now the Behavior Parameter is set which is acting as its brain.

Next the ray cast is set which acts like beam shot which looks for incoming traffic cars

Discrete action parameter is set to size of 2 as the player has one action(jump).

Behavior Parameter
ray perception sensor parameters

Adding Logic to our Agent

The important methods in script
The initialize method which set our player at the given position

OnActionReceived()has array parameter where 0 is doing nothing and 1 is used for jumping.

OnActionReceived()

The Update() is moved into Heuristic() function.

The actionsOut[] has values 0 and 1 for jumping.

The heuristic function is basic behavior

The OnTriggerEnter() Method adds a reward if the car jumps over, and the score is increased.

negative reward

If the car fails to jump over and collides then a negative reward is given. The episode of training is ended here and the next episode begins.

Training the Agent

Install the Python with ML-Agents. We will just open the terminal and cd into the repository and then into the Trainer_config folder. Now we start the training by putting in:

mlagents-learn trainer_config.yaml — run-id=”JumperAI_1"

Now change the run-id . The second parameter refers to the trainer_config file, which located inside the repository. This exact command only works if you are inside the Trainer_config folder.

Press play, and let the A.I. train.

Training of the model

tensorboard — logdir=summaries

to see tensorboard cumulative reward graph and episode length graph.

summary of episode length and cumulative reward
sample image

Neural network file created is of this specification.

NN File

Applications

Many studies and research paper have been published for this type of simulations like AI learns to walk , AI learns to park car, AI learns to Jump.

Unity

These are the few applications and this can be used to solve real world problems.

Thank You

Open for Review

--

--

Moin
Moin

No responses yet