I am very new to Docker + Airflow. Below is what I am trying to accomplish.
I have 4 services as shown in the below compose file. 3 are relevant to Airflow and one as a test Ubuntu instance. The Airflow related containers: airflow-database, airflow-webserver, airflow-scheduler are able to communicate with each other and I am able to run the example DAGs.
Now I added a 4th service (ubuntu) to which I am trying to send a simple command "/bin/sleep 10" from a DAG using DockerOperator (Below is the DAG file). But for some reason I am getting the Permission Denied message (Attached the DAG error file as well).
It works if I run Airflow from localhost instead of from inside a docker container Unable to figure out what I am missing. Below are some of the ways I tried:
- In docker_url:
replaced
unix://var/run/docker.sockwithtcp://172.20.0.1thinking it would be able to resolve via the docker host ipused gateway.host.internal
Even removed docker_url option from the operator, but realized it anyways gets defaulted to unix://var/run/docker.sock
Tried bunch of combinations, tcp://172.20.0.1:2376, tcp://172.20.0.1:2375
Mapped port of host to Ubuntu, like 8085:8085, etc.
- May be the airflow user of Airflow web server is getting kicked out by Ubuntu
- So created a group in the Ubuntu container and added airflow user to it -- Nope did not work as well
- api_version: The option 'auto' as well does not work and keeps giving version not found error. So had to hard-code with 1.41 as I found this in the
docker versioncommand. Not sure if that is what it supposed to be.
Thanks in advance for any help in what else I can try to make this work :)
docker-compose.yml
version: '3.2'
services:
# Ubuntu Container
ubuntu:
image: ubuntu
networks:
- mynetwork
# Airflow Database
airflow-database:
image: postgres:12
env_file:
- .env
ports:
- 5432:5432
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./airflow/database/data:/var/lib/postgresql/data/pgdata
- ./airflow/database/logs:/var/lib/postgresql/data/log
command: >
postgres
-c listen_addresses=*
-c logging_collector=on
-c log_destination=stderr
-c max_connections=200
networks:
- mynetwork
# Airflow DB Init
initdb:
image: apache/airflow:2.0.0-python3.8
env_file:
- .env
depends_on:
- airflow-database
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./airflow/metadata-airflow/dags:/opt/airflow/dags
- ./airflow/logs:/opt/airflow/logs
entrypoint: /bin/bash
command: -c "airflow db init && airflow users create --firstname admin --lastname admin --email [email protected] --password admin --username admin --role Admin"
networks:
- mynetwork
# Airflow Webserver
airflow-webserver:
image: apache/airflow:2.0.0-python3.8
env_file:
- .env
depends_on:
- airflow-database
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./airflow/metadata-airflow/dags:/opt/airflow/dags
- ./airflow/logs:/opt/airflow/logs
ports:
- 8080:8080
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
networks:
- mynetwork
# Airflow Scheduler
airflow-scheduler:
image: apache/airflow:2.0.0-python3.8
env_file:
- .env
depends_on:
- airflow-database
- airflow-webserver
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./airflow/metadata-airflow/dags:/opt/airflow/dags
- ./airflow/logs:/opt/airflow/logs
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
command: scheduler
networks:
- mynetwork
networks:
mynetwork:
DAG File
from datetime import timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.providers.docker.operators.docker import DockerOperator
from airflow.utils.dates import days_ago
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
}
dag = DAG(
'docker_sample',
default_args=default_args,
schedule_interval=None,
start_date=days_ago(2),
)
t1 = DockerOperator(
task_id='docker_op_tester',
api_version='auto',
image='ubuntu',
docker_url='unix://var/run/docker.sock',
auto_remove=True,
command=[
"/bin/bash",
"-c",
"/bin/sleep 30; "],
network_mode='bridge',
dag=dag,
)
t1
DAG Error Log
*** Reading local file: /opt/airflow/logs/docker_sample/docker_op_tester/2021-01-09T05:16:17.174981+00:00/1.log
[2021-01-09 05:16:26,726] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: docker_sample.docker_op_tester 2021-01-09T05:16:17.174981+00:00 [queued]>
[2021-01-09 05:16:26,774] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: docker_sample.docker_op_tester 2021-01-09T05:16:17.174981+00:00 [queued]>
[2021-01-09 05:16:26,775] {taskinstance.py:1017} INFO -
--------------------------------------------------------------------------------
[2021-01-09 05:16:26,776] {taskinstance.py:1018} INFO - Starting attempt 1 of 1
[2021-01-09 05:16:26,776] {taskinstance.py:1019} INFO -
--------------------------------------------------------------------------------
[2021-01-09 05:16:26,790] {taskinstance.py:1038} INFO - Executing <Task(DockerOperator): docker_op_tester> on 2021-01-09T05:16:17.174981+00:00
[2021-01-09 05:16:26,794] {standard_task_runner.py:51} INFO - Started process 1057 to run task
[2021-01-09 05:16:26,817] {standard_task_runner.py:75} INFO - Running: ['airflow', 'tasks', 'run', 'docker_sample', 'docker_op_tester', '2021-01-09T05:16:17.174981+00:00', '--job-id', '360', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/example_docker.py', '--cfg-path', '/tmp/tmp4phq52dv']
[2021-01-09 05:16:26,821] {standard_task_runner.py:76} INFO - Job 360: Subtask docker_op_tester
[2021-01-09 05:16:26,932] {logging_mixin.py:103} INFO - Running <TaskInstance: docker_sample.docker_op_tester 2021-01-09T05:16:17.174981+00:00 [running]> on host 367f0fc7d092
[2021-01-09 05:16:27,036] {taskinstance.py:1230} INFO - Exporting the following env vars:
[email protected]
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=docker_sample
AIRFLOW_CTX_TASK_ID=docker_op_tester
AIRFLOW_CTX_EXECUTION_DATE=2021-01-09T05:16:17.174981+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-01-09T05:16:17.174981+00:00
[2021-01-09 05:16:27,054] {taskinstance.py:1396} ERROR - ('Connection aborted.', PermissionError(13, 'Permission denied'))
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/usr/local/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/home/airflow/.local/lib/python3.8/site-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
PermissionError: [Errno 13] Permission denied
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/util/retry.py", line 410, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/usr/local/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/home/airflow/.local/lib/python3.8/site-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', PermissionError(13, 'Permission denied'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/docker/operators/docker.py", line 286, in execute
if self.force_pull or not self.cli.images(name=self.image):
File "/home/airflow/.local/lib/python3.8/site-packages/docker/api/image.py", line 89, in images
res = self._result(self._get(self._url("/images/json"), params=params),
File "/home/airflow/.local/lib/python3.8/site-packages/docker/utils/decorators.py", line 46, in inner
return f(self, *args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/docker/api/client.py", line 230, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 543, in get
return self.request('GET', url, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', PermissionError(13, 'Permission denied'))
[2021-01-09 05:16:27,073] {taskinstance.py:1433} INFO - Marking task as FAILED. dag_id=docker_sample, task_id=docker_op_tester, execution_date=20210109T051617, start_date=20210109T051626, end_date=20210109T051627
[2021-01-09 05:16:27,136] {local_task_job.py:118} INFO - Task exited with return code 1
Specs: Docker: Version: 20.10.2 API version: 1.41
Airflow image: apache/airflow:2.0.0-python3.8
Host system: MacOS BigSur