赞
踩
- CREATE TABLE db_name.mysql_tb_mapping
- USING org.apache.spark.sql.jdbc
- OPTIONS (
- url "jdbc:mysql://xxx.xxx.xxx.xxx:3306/db_name?zeroDateTimeBehavior=convertToNull",
- dbtable "mysql_db_name.mysql_tb_name",
- driver "com.mysql.jdbc.Driver",
- user "xxxxxxx",
- password "xxxxxx"
- )
spark在2.3以后可以使用 using语法创建多数据源的映射表,这样在spark-sql 客户端可以直接查询异构数据源,实现跨数据源的联邦查询。
需要注意的是如果spark-sql 配置hive数据源,这个表会写入hiveMetadata中。但是表的映射信息是写在TBLPROPERTIES属性里,所以建议使用spark-sql客户端查询具体的列信息。
spark-sql 启动时需要配置 jdbc的jar包,直接通过 --jars 配置即可。本方法也适用于其他jdbc数据源
建表语法规则如下
- CREATE TABLE [ IF NOT EXISTS ] table_identifier
- [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ]
- USING data_source
- [ OPTIONS ( key1 [ = ] val1, key2 [ = ] val2, ... ) ]
- [ PARTITIONED BY ( col_name1, col_name2, ... ) ]
- [ CLUSTERED BY ( col_name3, col_name4, ... )
- [ SORTED BY ( col_name [ ASC | DESC ], ... ) ]
- INTO num_buckets BUCKETS ]
- [ LOCATION path ]
- [ COMMENT table_comment ]
- [ TBLPROPERTIES ( key1 [ = ] val1, key2 [ = ] val2, ... ) ]
- [ AS select_statement ]
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。