# Create a data frame analytics job

**PUT /_ml/data_frame/analytics/{id}**

This API creates a data frame analytics job that performs an analysis on the
source indices and stores the outcome in a destination index.
By default, the query used in the source configuration is `{"match_all": {}}`.

If the destination index does not exist, it is created automatically when you start the job.

If you supply only a subset of the regression or classification parameters, hyperparameter optimization occurs. It determines a value for each of the undefined parameters.

## Required authorization

* Index privileges: `create_index`,`index`,`manage`,`read`,`view_index_metadata`
* Cluster privileges: `manage_ml`


## Servers
- http://api.example.com: http://api.example.com ()


## Authentication methods
- Api key auth


## Parameters

### Path parameters
- **id** (string)
  Identifier for the data frame analytics job. This identifier can contain
  lowercase alphanumeric characters (a-z and 0-9), hyphens, and
  underscores. It must start and end with alphanumeric characters.


### Body: application/json (object)

- **allow_lazy_start** (boolean)
  Specifies whether this job can start when there is insufficient machine
  learning node capacity for it to be immediately assigned to a node. If
  set to `false` and a machine learning node with capacity to run the job
  cannot be immediately found, the API returns an error. If set to `true`,
  the API does not return an error; the job waits in the `starting` state
  until sufficient machine learning node capacity is available. This
  behavior is also affected by the cluster-wide
  `xpack.ml.max_lazy_ml_nodes` setting.
- **analysis** (object)
  The analysis configuration, which contains the information necessary to
  perform one of the following types of analysis: classification, outlier
  detection, or regression.
- **analyzed_fields** (object)
  Specifies `includes` and/or `excludes` patterns to select which fields
  will be included in the analysis. The patterns specified in `excludes`
  are applied last, therefore `excludes` takes precedence. In other words,
  if the same field is specified in both `includes` and `excludes`, then
  the field will not be included in the analysis. If `analyzed_fields` is
  not set, only the relevant fields will be included. For example, all the
  numeric fields for outlier detection.
  The supported fields vary for each type of analysis. Outlier detection
  requires numeric or `boolean` data to analyze. The algorithms don’t
  support missing values therefore fields that have data types other than
  numeric or boolean are ignored. Documents where included fields contain
  missing values, null values, or an array are also ignored. Therefore the
  `dest` index may contain documents that don’t have an outlier score.
  Regression supports fields that are numeric, `boolean`, `text`,
  `keyword`, and `ip` data types. It is also tolerant of missing values.
  Fields that are supported are included in the analysis, other fields are
  ignored. Documents where included fields contain an array with two or
  more values are also ignored. Documents in the `dest` index that don’t
  contain a results field are not included in the regression analysis.
  Classification supports fields that are numeric, `boolean`, `text`,
  `keyword`, and `ip` data types. It is also tolerant of missing values.
  Fields that are supported are included in the analysis, other fields are
  ignored. Documents where included fields contain an array with two or
  more values are also ignored. Documents in the `dest` index that don’t
  contain a results field are not included in the classification analysis.
  Classification analysis can be improved by mapping ordinal variable
  values to a single number. For example, in case of age ranges, you can
  model the values as `0-14 = 0`, `15-24 = 1`, `25-34 = 2`, and so on.
- **description** (string)
  A description of the job.
- **dest** (object)
  The destination configuration.
- **max_num_threads** (number)
  The maximum number of threads to be used by the analysis. Using more
  threads may decrease the time necessary to complete the analysis at the
  cost of using more CPU. Note that the process may use additional threads
  for operational functionality other than the analysis itself.
- **_meta** (object)

- **model_memory_limit** (string)
  The approximate maximum amount of memory resources that are permitted for
  analytical processing. If your `elasticsearch.yml` file contains an
  `xpack.ml.max_model_memory_limit` setting, an error occurs when you try
  to create data frame analytics jobs that have `model_memory_limit` values
  greater than that setting.
- **source** (object)
  The configuration of how to source the analysis data.
- **headers** (object)

- **version** (string)


## Responses
### 200


#### Body: application/json (object)
- **authorization** (object)

- **allow_lazy_start** (boolean)

- **analysis** (object)

- **analyzed_fields** (object)

- **create_time** (number)

- **description** (string)

- **dest** (object)

- **id** (string)

- **max_num_threads** (number)

- **_meta** (object)

- **model_memory_limit** (string)

- **source** (object)

- **version** (string)


[Powered by Bump.sh](https://bump.sh)