赞
踩
要求至少有4G的内存分配给Datahub
Docker的安装请参考我的博客centos7基于yum repository方式安装docker和卸载docker
这里安装的是Docker当前最新版,docker-ce-20.10.17、docker-ce-cli-20.10.17、containerd.io-1.6.6
jq简介:jq是一个命令行JSON解析器,使用它来进行切片、过滤、映射和转换结构化数据。类似linux的sed工具
我这里安装的是当前最新版1.6
[root@datahub ~]# wget https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64 [root@datahub ~]# [root@datahub ~]# mv jq-linux64 jq [root@datahub ~]# [root@datahub ~]# chmod +x jq [root@datahub ~]# [root@datahub ~]# cp jq /usr/bin [root@datahub ~]# [root@datahub ~]# jq jq - commandline JSON processor [version 1.6] Usage: jq [options] <jq filter> [file...] jq [options] --args <jq filter> [strings...] jq [options] --jsonargs <jq filter> [JSON_TEXTS...] jq is a tool for processing JSON inputs, applying the given filter to its JSON text inputs and producing the filter's results as JSON on standard output. The simplest filter is ., which copies jq's input to its output unmodified (except for formatting, but note that IEEE754 is used for number representation internally, with all that that implies). For more advanced filters see the jq(1) manpage ("man jq") and/or https://stedolan.github.io/jq Example: $ echo '{"foo": 0}' | jq . { "foo": 0 } For a listing of options, use jq --help. [root@datahub ~]#
要求是安装python3.6+。python的安装请参考centos7同时安装Python2和Python3
我这里安装的是当前最新版python3.10.5
需要python的pip工具。安装的docker-compose v1是当前的最新版v1.29.2。为了避免和python的依赖包环境冲突,这里使用virtualenv方式进行安装
[root@datahub ~]# pip3 install --upgrade pip
[root@datahub ~]#
[root@datahub ~]# pip3 install virtualenv
[root@datahub ~]#
[root@datahub ~]# /root/python-3.10.5/bin/virtualenv docker-compose-v1-py --python=/root/python-3.10.5/bin/python3
[root@datahub ~]#
会在当前目录生成虚拟的python3.10.5环境目录docker-compose-v1-py
进入docker-compose-v1-py虚拟环境
[root@datahub ~]# . docker-compose-v1-py/bin/activate
(docker-compose-v1-py) [root@datahub ~]#
(docker-compose-v1-py) [root@datahub ~]# deactivate
[root@datahub ~]#
(docker-compose-v1-py) [root@datahub ~]# pip3 install --upgrade pip
(docker-compose-v1-py) [root@datahub ~]#
(docker-compose-v1-py) [root@datahub ~]# pip3 install docker-compose
先安装datahub的安装包
(docker-compose-v1-py) [root@datahub ~]# pip3 install --upgrade pip wheel setuptools
(docker-compose-v1-py) [root@datahub ~]# pip3 uninstall datahub acryl-datahub || true
(docker-compose-v1-py) [root@datahub ~]# pip3 install --upgrade acryl-datahub
......省略部分......
Installing collected packages: types-termcolor, types-Deprecated, termcolor, tabulate, ratelimiter, pytz, mypy-extensions, wrapt, tzdata, typing-extensions, toml, stackprinter, python-utils, python-dateutil, pyparsing, psutil, markupsafe, humanfriendly, expandvars, entrypoints, click, avro, typing-inspect, pytz-deprecation-shim, pydantic, progressbar2, packaging, mixpanel, Deprecated, click-default-group, tzlocal, avro-gen3, acryl-datahub
Successfully installed Deprecated-1.2.13 acryl-datahub-0.8.38 avro-1.10.2 avro-gen3-0.7.4 click-8.1.3 click-default-group-1.2.2 entrypoints-0.4 expandvars-0.9.0 humanfriendly-10.0 markupsafe-2.0.1 mixpanel-4.9.0 mypy-extensions-0.4.3 packaging-21.3 progressbar2-4.0.0 psutil-5.9.1 pydantic-1.9.1 pyparsing-3.0.9 python-dateutil-2.8.2 python-utils-3.3.3 pytz-2022.1 pytz-deprecation-shim-0.1.0.post0 ratelimiter-1.2.0.post0 stackprinter-0.2.6 tabulate-0.8.9 termcolor-1.1.0 toml-0.10.2 types-Deprecated-1.2.8 types-termcolor-1.1.4 typing-extensions-4.2.0 typing-inspect-0.7.1 tzdata-2022.1 tzlocal-4.2 wrapt-1.14.1
(docker-compose-v1-py) [root@datahub ~]#
(docker-compose-v1-py) [root@datahub ~]# python3 -m datahub version
DataHub CLI version: 0.8.38
Python version: 3.10.5 (main, Jun 18 2022, 17:36:43) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
(docker-compose-v1-py) [root@datahub ~]#
安装的是当前最新的datahub 0.8.38
再启动datahub。如果重启服务器,容器都会停止运行,重新运行python3 -m datahub docker quickstart
命令启动容器就可以了,之前同步的元数据不会丢失
(docker-compose-v1-py) [root@datahub ~]# wget https://raw.githubusercontent.com/datahub-project/datahub/v0.8.38/docker/quickstart/docker-compose.quickstart.yml (docker-compose-v1-py) [root@datahub ~]# (docker-compose-v1-py) [root@datahub ~]# python3 -m datahub docker quickstart --quickstart-compose-file /root/docker-compose.quickstart.yml Pulling elasticsearch ... done Pulling elasticsearch-setup ... done Pulling mysql ... done Pulling datahub-gms ... done Pulling datahub-frontend-react ... done Pulling datahub-actions ... done Pulling mysql-setup ... done Pulling neo4j ... done Pulling zookeeper ... done Pulling broker ... done Pulling schema-registry ... done Pulling kafka-setup ... done Creating network "datahub_network" with the default driver Creating volume "datahub_broker" with default driver Creating volume "datahub_esdata" with default driver Creating volume "datahub_mysqldata" with default driver Creating volume "datahub_neo4jdata" with default driver Creating volume "datahub_zkdata" with default driver Creating neo4j ... done Creating mysql ... done Creating elasticsearch ... done Creating zookeeper ... done Creating mysql-setup ... done Creating datahub-gms ... done Creating elasticsearch-setup ... done Creating broker ... done Creating datahub-frontend-react ... done Creating datahub_datahub-actions_1 ... done Creating schema-registry ... done Creating kafka-setup ... done ......省略部分...... ✔ DataHub is now running Ingest some demo data using `datahub docker ingest-sample-data`, or head to http://localhost:9002 (username: datahub, password: datahub) to play around with the frontend. Need support? Get in touch on Slack: https://slack.datahubproject.io/ (docker-compose-v1-py) [root@datahub ~]#
然后访问http://datahub:9092,用户名和密码为datahub/datahub。如下所示
然后导入一些官方提供的测试元数据
(docker-compose-v1-py) [root@datahub ~]# python3 -m datahub docker ingest-sample-data Downloading sample data... Downloaded to /tmp/tmpn3k2a_di.json Starting ingestion... [2022-06-19 10:24:34,830] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit file:///tmp/tmpn3k2a_di.json:0 [2022-06-19 10:24:35,075] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit file:///tmp/tmpn3k2a_di.json:1 [2022-06-19 10:24:35,395] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit file:///tmp/tmpn3k2a_di.json:2 ......省略部分...... [2022-06-19 10:24:58,419] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit file:///tmp/tmpn3k2a_di.json:94 [2022-06-19 10:24:58,578] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit file:///tmp/tmpn3k2a_di.json:95 [2022-06-19 10:24:58,727] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit file:///tmp/tmpn3k2a_di.json:96 Source (file) report: {'workunits_produced': 97, 'workunit_ids': ['file:///tmp/tmpn3k2a_di.json:0', 'file:///tmp/tmpn3k2a_di.json:1', 'file:///tmp/tmpn3k2a_di.json:2', ......省略部分...... 'file:///tmp/tmpn3k2a_di.json:94', 'file:///tmp/tmpn3k2a_di.json:95', 'file:///tmp/tmpn3k2a_di.json:96'], 'warnings': {}, 'failures': {}, 'cli_version': '0.8.38', 'cli_entry_location': '/root/docker-compose-v1-py/lib/python3.10/site-packages/datahub/__init__.py', 'py_version': '3.10.5 (main, Jun 18 2022, 17:36:43) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]', 'py_exec_path': '/root/docker-compose-v1-py/bin/python3', 'os_details': 'Linux-3.10.0-1160.66.1.el7.x86_64-x86_64-with-glibc2.17'} Sink (datahub-rest) report: {'records_written': 97, 'warnings': [], 'failures': [], 'downstream_start_time': datetime.datetime(2022, 6, 19, 10, 24, 34, 550628), 'downstream_end_time': datetime.datetime(2022, 6, 19, 10, 24, 58, 727050), 'downstream_total_latency_in_seconds': 24.176422, 'gms_version': 'v0.8.38'} Pipeline finished successfully producing 97 workunits (docker-compose-v1-py) [root@datahub ~]#
如果还没有导入我们自己的元数据,可以使用如下命令清除Datahub的所有containers、volumes、networks(包括我们刚刚导入的官方提供的测试元数据)
(docker-compose-v1-py) [root@datahub ~]# python3 -m datahub docker nuke
Removing containers in the datahub project
Removing volumes in the datahub project
Removing networks in the datahub project
(docker-compose-v1-py) [root@datahub ~]#
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。