当前位置:   article > 正文

postgresql和es_Apache的气流和PostgreSQL与码头工人和码头工人组成

postgresql与es对比

postgresql和es

Hello, in this post I will show you how to setup official Apache/Airflow with PostgreSQL and LocalExecutor using docker and docker-compose. In this post I won’t be going through Airflow, what it is and how it is used. Please check official documentation for more information about that.

您好,在本文中,我将向您展示如何使用docker和docker-compose通过PostgreSQL和LocalExecutor设置官方的Apache / Airflow。 在这篇文章中,我不会介绍Airflow,它是什么以及如何使用。 请检查官方文档以获取有关此信息的更多信息。

Before setting up and running Apache Airflow, please install Docker and Docker Compose.

在设置和运行Apache Airflow之前,请安装DockerDocker Compose

对于那些赶时间的人。 (For those in hurry..)

In this chapter I will show you files and directories which are needed to run airflow and in next chapter I will go file by file, line by line explaining what is going on.

在本章中,我将向您展示运行气流所需的文件和目录,在下一章中,我将逐文件逐行解释发生的情况。

Firstly, in root directory create three more directories: dags, logs and scripts. Further, create following files: .env, docker-compose.yml, entrypoint.sh and dummy_dag.py. Please make sure those files and directories follow structure bellow.

首先,在根目录中,再创建三个目录: dagslogsscripts 。 此外,创建以下文件: .env,docker-compose.yml,entrypoint.shdummy_dag.py。 请确保以下文件和目录遵循以下结构。

#project structureroot/├── dags/│   └── dummy_dag.py├── scripts/│   └── entrypoint.sh├── logs/├── .env└── docker-compose.yml

Created files should contain the following:

创建的文件应包含以下内容:

#docker-compose.ymlversion: '3.8'services:    postgres:        image: postgres        environment:            - POSTGRES_USER=airflow            - POSTGRES_PASSWORD=airflow            - POSTGRES_DB=airflow    scheduler:        image: apache/airflow        command: scheduler        restart_policy:            condition: on-failure        depends_on:            - postgres        env_file:            - .env        volumes:            - ./dags:/opt/airflow/dags            - ./logs:/opt/airflow/logs    webserver:        image: apache/airflow        entrypoint: ./scripts/entrypoint.sh        restart_policy:            condition: on-failure        depends_on:            - postgres            - scheduler        env_file:            - .env        volumes:            - ./dags:/opt/airflow/dags            - ./logs:/opt/airflow/logs            - ./scripts:/opt/airflow/scripts        ports:            - "8080:8080"
#entrypoint.sh#!/usr/bin/env bashairflow initdbairflow webserver
#.envAIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflowAIRFLOW__CORE__EXECUTOR=LocalExecutor
#dummy_dag.pyfrom airflow import DAGfrom airflow.operators.dummy_operator import DummyOperatorfrom datetime import datetimewith DAG('example_dag', start_date=datetime(2016, 1, 1)) as dag:    op = DummyOperator(task_id='op')

Positioning in root directory and executing “docker-compose up” in terminal should make airflow accessible on localhost:8080. Image bellow shows final result.

在根目录中定位并在终端中执行“ docker-compose up” ,应可在localhost:8080上访问气流。 波纹管图像显示最终结果。

If you encounter permission errors, please run “chmod -R 777” on all subdirectories, e.g. “chmod -R 777 logs/”

如果遇到权限错误,请在所有子目录上运行“ chmod -R 777”,例如“ chmod -R 777 logs /”

Image for post

对于那些好奇的人.. (For the curious ones..)

In Leyman’s terms, docker is used when managing individual containers and docker-compose can be used to manage a multi-container applications. It also moves many of the options you would enter on the docker run into the docker-compose.yml file for easier reuse. It works as a front end "script" on top of the same docker api used by docker. You can do everything docker-compose does with docker commands and a lot of shell scripting.

在Leyman的条款, 搬运工用于管理个人集装箱和码头工人,撰写时,可用于管理多容器的应用程序。 它还会将您要在Docker 运行时输入的许多选项移到docker-compose.yml文件中,以方便重用。 它在docker使用的同一docker api上充当前端“脚本”。 您可以使用docker命令和许多Shell脚本来完成docker-compose的所有工作。

Before running our multi-container docker applications, docker-compose.yml must be configured. With that file we define services which will be run on docker-compose up.

在运行我们的多容器Docker应用程序之前,必须先配置docker-compose.yml 。 使用该文件,我们定义将在docker-compose up上运行的服务。

First attribute of docker-compose.yml is version, which is compose file format version. For most recent version of file format and all configuration options click here.

docker-compose.yml的第一个属性是version,它是撰写文件格式的version。 有关文件格式的最新版本和所有配置选项,请单击此处

Second attribute is services and all attributes one level bellow services denote containers used in our multi-container application. These are: postgres, scheduler and webserver. Each container has image attribute which points to base image used for that service.

第二个属性是服务 ,所有属性的以下一级服务表示在我们的多容器应用程序中使用的容器。 它们是: postgres,调度程序Web服务器。 每个容器都有图像属性,该属性指向用于该服务的基本图像。

For each service we define environment variables used inside service containers. For postgres it is defined by environment attribute, but for scheduler and webserver it is defined by .env file. Because .env is external file we must pointed to it with env_file attribute.

对于每个服务,我们定义在服务容器内使用的环境变量。 对于postgres,它是由环境属性定义的,而对于调度程序和网络服务器,它是由.env文件定义的。 因为.env是外部文件,所以我们必须使用env_file属性指向它。

By opening .env file we can see two variables defined. One defines executor which will be used and the other denotes connection string. Each connection string must be defined in following manner:

通过打开.env文件,我们可以看到定义了两个变量。 一个定义将使用的执行程序,另一个表示连接字符串。 必须以以下方式定义每个连接字符串:

dialect+driver://username:password@host:port/database

Dialect names include the identifying name of the SQLAlchemy dialect, a name such as sqlite, mysql, postgresql, oracle, or mssql. The drivername is the name of the DBAPI to be used to connect to the database using all lowercase letters. In our case, connection string is defined by:

方言名称包括SQLAlchemy方言的标识名称,例如sqlitemysqlpostgresqloraclemssql 。 驱动程序名称是使用所有小写字母连接到数据库的DBAPI的名称。 在我们的情况下,连接字符串由以下方式定义:

postgresql+psycopg2://airflow:airflow@postgres/airflow

Omitting port after host part denotes that we will be using default postgres port defined in its own Dockerfile.

主机部分后省略端口表示我们将使用在其自己的Dockerfile中定义的默认postgres端口。

Every service can define command which will be run inside Docker container. If one service needs to execute multiple commands it can be done by defining optional .sh file and pointing to it with entrypoint attribute. In our case we have entrypoint.sh inside scripts folder which once executed, runs airflow initdb and airflow webserver. Both are mandatory for airflow to run properly.

每个服务都可以定义将在Docker容器中运行的命令 。 如果一项服务需要执行多个命令,则可以通过定义可选的.sh文件并使用entrypoint属性指向该文件来完成。 在我们的例子中,我们在scripts文件夹中有entrypoint.sh ,一旦执行,它就会运行airflow initdbairflow webserver 。 两者都是使气流正常运行所必需的。

Defining depends_on attribute, we can express dependency between services. In our example, webserver starts only if both scheduler and postgres have started and scheduler start after postgres has started.

定义depends_on属性,我们可以表示服务之间的依赖关系。 在我们的示例中,仅当调度程序和postgres都已启动并且调度程序在postgres启动后才启动时,web服务器才会启动。

In case our container crashes, we can restart it by restart_policy. Restart policy configures if and how to restart containers when they exit. Additional options are: condition, delay, max_attempts and window.

万一容器崩溃,我们可以通过restart_policy重新启动它。 重新启动策略配置退出容器时是否以及如何重新启动容器。 其他选项包括:条件,延迟,max_attempts和窗口。

Once service is running, it is being served on containers defined port. To access that service we need to expose containers port to hosts port. That is being done by ports attribute. In our case we are exposing port 8080 of the container to TCP port 8080 on 127.0.0.1 of the host machine. Left side of : defines host machines port and right hand side defines containers port.

服务运行后,将在容器定义的端口上提供服务。 要访问该服务,我们需要将容器端口暴露给主机端口。 这是通过ports属性完成的。 在本例中,我们将容器的端口8080暴露给主机127.0.0.1上的TCP端口8080 。 的左侧:定义主机端口,右侧定义容器端口。

Lastly, volumes attribute defines shared volumnes (directories) between host file system and docker container. Because airflows default working directory is /opt/airflow/ we need to point our designated volumes from root folder to airflow containers working directory. Such is done by following command:

最后, 属性主机文件系统和搬运工容器之间共享volumnes(目录)定义。 因为airflows的默认工作目录是/ opt / airflow /,所以我们需要将指定的卷从根文件夹指向airflow container的工作目录。 这是通过以下命令完成的:

#general case for airflow- ./<our-root-subdir>:/opt/airflow/<our-root-subdir>#our case- ./dags:/opt/airflow/dags- ./logs:/opt/airflow/logs- ./scripts:/opt/airflow/scripts           ...

This way, when scheduler or webserver writes logs to its logs directory we can access it from our file system within logs directory. When we add new dag to dags folder it will automatically be added in containers dag bag and so on.

这样,当调度程序或网络服务器将日志写入其日志目录时,我们可以从日志目录中的文件系统访问它。 当我们将新的dag添加到dags文件夹时,它将自动添加到容器dag bag中,依此类推。

最后的话 (Last words)

This is my first post on medium, I will be posting more soon. If you notice some mistake please let me know. Thank you!

这是我在媒体上的第一篇文章,我将很快发布。 如果您发现某些错误,请告诉我。 谢谢!

翻译自: https://medium.com/@ivan.rezic1/apache-airflow-and-postgresql-with-docker-and-docker-compose-5651766dfa96

postgresql和es

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/136884
推荐阅读
相关标签
  

闽ICP备14008679号