华为云用户手册

MapReduce服务 MRS-开启Native Task特性后，Reduce任务在部分操作系统运行失败:回答

回答运行包含Reduce的Mapreduce任务时，通过-Dmapreduce.job.map.output.collector.class=org.apache.hadoop.mapred.nativetask.NativeMapOutputCollectorDelegator命令开启Native Task特性，任务在部分操作系统运行失败，日志中提示错误“version 'GLIBCXX_3.4.20' not found”。该问题原因是操作系统的GLIBCXX版本较低，导致该特性依赖的libnativetask.so.1.0.0库无法加载，进而导致任务失败。规避手段：设置配置项mapreduce.job.map.output.collector.class的值为org.apache.hadoop.mapred.MapTask$MapOutputBuffer。

MapReduce服务 MRS
MapReduce服务 MRS-Hudi性能调优:推荐资源配置

推荐资源配置 mor表：由于其本质上是写增量文件，调优可以直接根据hudi的数据大小（dataSize）进行调整。 dataSize如果只有几个G，推荐跑单节点运行spark，或者yarn模式但是只分配一个container。入湖程序的并行度p设置：建议 p = （dataSize）/128M, 程序分配core的数量保持和p一致即可。内存设置建议内存大小和core的比例大于1.5:1 即一个core配1.5G内存，堆外内存设置建议内存大小和core的比例大于0.5:1。 cow表： cow表的原理是重写原始数据，因此这种表的调优，要兼顾dataSize和最后重写的文件数量。总体来说core数量越大越好（和最后重写多少个文件数直接相关），并行度p和内存大小和mor设置类似。

MapReduce服务 MRS
MapReduce服务 MRS-Flink WebUI权限管理

Flink WebUI权限管理访问并使用Flink WebUI进行业务操作需为用户赋予FlinkServer相关权限，Manager的admin用户没有FlinkServer的业务操作权限。 FlinkServer中应用（租户）是最大管理范围，包含集群连接管理、数据连接管理、应用管理、流表和作业管理等。 FlinkServer中有如表1所示三种资源权限：表1 FlinkServer资源权限权限名称权限描述备注 FlinkServer管理员权限具有所有应用的编辑、查看权限。是FlinkServer的最高权限。如果已经具有FlinkServer管理员权限，则会自动具备所有应用的权限。应用编辑权限具有当前应用编辑权限的用户，可以执行创建、编辑和删除集群连接、数据连接，创建流表、创建作业及运行作业等操作。同时具有当前应用查看权限。应用查看权限具有当前应用查看权限的用户，可以查看应用。 - 父主题：配置开发Flink可视化作业

MapReduce服务 MRS
MapReduce服务 MRS-INSERT:语法

语法 INSERT { INTO | OVERWRITE } [TABLE] table_name [(column_list)] [ PARTITION (partition_clause)] {select_statement | VALUES (value [, value ...]) [, (value [, value ...]) ...] } FROM from_statement INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement FROM from_statement INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) select_statement

MapReduce服务 MRS
MapReduce服务 MRS-INSERT:描述

描述向表中插入新的数据行。如果指定了列名列表，那么这些列名列表必须与query语句产生列列表名完全匹配。表中不在列名列表中的每一列，其值会设置为null。如果没有指定列名列表，则query语句产生的列必须与将要插入的列完全匹配。使用insert into时，会往表中追加数据，而使用insert overwrite时，如果表属性“auto.purge”被设置为“true”，直接删除原表数据，再写入新的数据。如果对象表是分区表时，insert overwrite会删除对应分区的数据而非所有数据。 insert into后面的table关键字为可选，以兼容hive语法。

MapReduce服务 MRS
MapReduce服务 MRS-SHOW FUNCTIONS:示例

示例 SHOW functions; --使用LIKE子句show functions like 'boo_%'; Function | Return Type | Argument Types | Function Type | Deterministic | Description ----------|-------------|----------------|---------------|---------------|------------- bool_and | boolean | boolean | aggregate | true | bool_or | boolean | boolean | aggregate | true | (2 rows)--如果匹配字符串中有字符与通配符冲突，可以指定转义字符来标识，示例为查询default这个schema下，table_name前缀为＂t_＂的所有table，转义字符为＂\＂:SHOW FUNCTIONS LIKE 'array\_%' escape '\'; Function | Return Type | Argument Types | Function Type | Deterministic | Description | Variable Arity | Built In -----------------|-------------|---------------------------------|---------------|---------------|--------------------------------------------------------------------------------------------------------|----------------|---------- array_agg | array(T) | T | aggregate | true | return an array of values | false | true array_contains | boolean | array(T), T | scalar | true | Determines whether given value exists in the array | false | true array_distinct | array(E) | array(E) | scalar | true | Remove duplicate values from the given array | false | true array_except | array(E) | array(E), array(E) | scalar | true | Returns an array of elements that are in the first array but not the second, without duplicates. | false | true array_intersect | array(E) | array(E), array(E) | scalar | true | Intersects elements of the two given arrays | false | true array_join | varchar | array(T), varchar | scalar | true | Concatenates the elements of the given array using a delimiter and an optional string to replace nulls | false | true array_join | varchar | array(T), varchar, varchar | scalar | true | Concatenates the elements of the given array using a delimiter and an optional string to replace nulls | false | true array_max | T | array(T) | scalar | true | Get maximum value of array | false | true array_min | T | array(T) | scalar | true | Get minimum value of array | false | true array_position | bigint | array(T), T | scalar | true | Returns the position of the first occurrence of the given value in array (or 0 if not found) | false | true array_remove | array(E) | array(E), E | scalar | true | Remove specified values from the given array | false | true array_sort | array(E) | array(E) | scalar | true | Sorts the given array in ascending order according to the natural ordering of its elements. | false | true array_sort | array(T) | array(T), function(T,T,integer) | scalar | true | Sorts the given array with a lambda comparator. | false | true array_union | array(E) | array(E), array(E) | scalar | true | Union elements of the two given arrays | false | true --查看所有JAVA和JDBC函数SHOW external functions; Function | Owner ----------------------------|----------- example.namespace02.repeat | admintest hetu.default.add_two | admintest (2 rows)--查看给定函数的定义信息SHOW external function example.namespace02.repeat; External Function -------------------------------------- External FUNCTION example.namespace02.repeat ( s varchar, n integer ) RETURNS varchar COMMENT '' LANGUAGE JDBC NOT DETERMINISTIC CALLED ON NULL INPUT FUNCPROPERTIES ( owner = 'admintest' )

MapReduce服务 MRS
MapReduce服务 MRS-INSERT:示例

示例创建fruit和fruit_copy表： create table fruit (name varchar,price double);create table fruit_copy (name varchar,price double); 向fruit表中插入一行数据： insert into fruit values('LIchee',32);-- 兼容写法示例,带上table关键字insert into table fruit values('Cherry',88); 向fruit表中插入多行数据： insert into fruit values('banana',10),('peach',6),('lemon',12),('apple',7); 将fruit表中的数据行加载到fruit_copy表中，执行后表中有5条记录： insert into fruit_copy select * from fruit; 先清空fruit_copy表，再将fruit中的数据加载到表中，执行之后表中有2条记录： insert overwrite fruit_copy select * from fruit limit 2; 对于varchar类型，仅当目标表定义的列字段长度大于源表的实际字段长度时，才可以使用INSERT... SELECT...的形式从源表中查数据并且插入到目标表： create table varchar50(c1 varchar(50));insert into varchar50 values('hetuEngine');create table varchar100(c1 varchar(100));insert into varchar100 select * from varchar50; 分区表使用insert overwrite语句时，只会清理插入值所在分区的数据，而不是整个表： --创建表create table test_part (id int, alias varchar) partitioned by (dept_id int, status varchar);insert into test_part partition(dept_id=10, status='good') values (1, 'xyz'), (2, 'abc');select * from test_part order by id; id | alias | dept_id | status ----|-------|---------|-------- 1 | xyz | 10 | good 2 | abc | 10 | good (2 rows)--清理分区partition(dept_id=25, status='overwrite')，并插入一条数据insert overwrite test_part (id, alias, dept_id, status) values (3, 'uvw', 25, 'overwrite'); select * from test_part ; id | alias | dept_id | status ----|-------|---------|----------- 1 | xyz | 10 | good 2 | abc | 10 | good 3 | uvw | 25 | overwrite--清理分区partition(dept_id=10, status='good')，并插入一条数据insert overwrite test_part (id, alias, dept_id, status) values (4, 'new', 10, 'good');select * from test_part ordr; id | alias | dept_id | status ----|-------|---------|----------- 3 | uvw | 25 | overwrite 4 | new | 10 | good (2 rows)--分区表插入数据create table test_p_1(name string, age int) partitioned by (provice string, city string); create table test_p_2(name string, age int) partitioned by (provice string, city string); -- 填充数据到test_p_1 insert into test_p_1 partition (provice = 'hebei', city= 'baoding') values ('xiaobei',15),( 'xiaoming',22);-- 根据test_p_1 插入数据到test_p_2-- 方式一from test_p_1 insert into table test_p_2 partition (provice = 'hebei', city= 'baoding') select name,age; -- 方式二insert into test_p_2 partition(provice = 'hebei', city= 'baoding') select name,age from test_p_1;

MapReduce服务 MRS
MapReduce服务 MRS-SHOW CREATE TABLE:示例

示例显示能够创建orders表的SQL 语句： show create table orders; Create Table ---------------------------------------- CREATE TABLE hive.default.orders ( orderkey bigint, orderstatus varchar, totalprice double, orderdate date ) COMMENT 'tt1_tab' WITH ( external = true, format = 'ORC', location = 'hdfs://hacluster/user', orc_compress = 'GZIP', orc_compress_size = 262144, orc_row_index_stride = 10000, orc_stripe_size = 67108864 ) (1 row)

MapReduce服务 MRS
MapReduce服务 MRS-SHOW TBLPROPERTIES TABLE|VIEW:示例

示例 --查看show_table1的所有表属性SHOW TBLPROPERTIES show_table1;SHOW TBLPROPERTIES --------------------------------------------------------------------------- STATS_GENERATED_VIA_STATS_TASK workaround for potential lack of HIVE-12730 numFiles 0 numRows 0 orc.compress.size 262144 orc.compression.codec GZIP orc.row.index.stride 10000 orc.stripe.size 67108864 presto_query_id 20210322_094034_00306_s4s9e@default@HetuEngine presto_version rawDataSize 0 totalSize 0 transient_lastDdlTime 1616406034 (1 row) --查看show_table1的压缩算法SHOW TBLPROPERTIES show_table1('orc.compression.codec');SHOW TBLPROPERTIES --------------------- GZIP (1 row)

MapReduce服务 MRS
MapReduce服务 MRS-SHOW TABLE/PARTITION EXTENDED:参数说明

参数说明 IN | FROM schema_name 指定schema名称，未指定时默认使用当前的schema。 LIKE 'identifier_with_wildcards' identifier_with_wildcards只支持包含“*”和“|”的规则匹配表达式。其中“*”可以匹配单个或多个字符，“|”适用于匹配多种规则匹配表达式中的任意一种的情况，它用于分隔这些规则匹配表达式。规则匹配表达式首尾的空格，不会参与匹配计算。 partition_spec 一个可选参数，使用键值对来指定分区列表，键值对之间通过逗号分隔。需要注意，指定分区时，表名不支持模糊匹配。

MapReduce服务 MRS
MapReduce服务 MRS-SHOW TABLES:示例

示例 --创建测试表Create table show_table1(a int);Create table show_table2(a int);Create table showtable5(a int);Create table intable(a int);Create table fromtable(a int); --匹配单字符'_'show tables in default like 'show_table_'; Table ------------- show_table1 show_table2 (2 rows) --匹配多字符'*'，'%'show tables in default like 'show%';Table ------------- show_table1 show_table2 showtable5 (3 rows) show tables in default like 'show*';Table ------------- show_table1 show_table2 showtable5 (3 rows) --转义字符使用,第二个示例将'_'作为过滤条件，结果集不包含showtable5 show tables in default like 'show_%'; Table ------------- show_table1 show_table2 showtable5 (3 rows) show tables in default like 'show$_%' ESCAPE '$'; Table ------------- show_table1 show_table2 (2 rows) --同时满足多个条件，查询default中'show_'开头或者'in'开头的表show tables in default like 'show$_%|in%' ESCAPE '$'; Table ------------- intable show_table1 show_table2 (3 rows)

MapReduce服务 MRS
MapReduce服务 MRS-SHOW TABLE/PARTITION EXTENDED:示例

示例 -- 演示数据准备create schema show_schema;use show_schema;create table show_table1(a int,b string);create table show_table2(a int,b string);create table from_table1(a int,b string);create table in_table1(a int,b string);--查询表名以"show"开始的表的详细信息show table extended like 'show*'; tab_name -------------------------------------------------------------------------- tableName:show_table1 owner:admintest location:hdfs://hacluster/user/hive/warehouse/show_schema.db/show_table1 InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat columns:struct columns {int a,string b} partitioned:false partitionColumns: totalNumberFiles:0 totalFileSize:0 tableName:show_table2 owner:admintest location:hdfs://hacluster/user/hive/warehouse/show_schema.db/show_table2 InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat columns:struct columns {int a,string b} partitioned:false partitionColumns: totalNumberFiles:0 totalFileSize:0 (1 row)-- 查询表名以"from"或者"show"开头的表的详细信息 show table extended like 'from*|show*'; tab_name -------------------------------------------------------------------------- tableName:show_table1 owner:admintest location:hdfs://hacluster/user/hive/warehouse/show_schema.db/show_table1 InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat columns:struct columns {int a,string b} partitioned:false partitionColumns: totalNumberFiles:0 totalFileSize:0 tableName:from_table1 owner:admintest location:hdfs://hacluster/user/hive/warehouse/show_schema.db/from_table1 InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat columns:struct columns {int a,string b} partitioned:false partitionColumns: totalNumberFiles:0 totalFileSize:0 (1 row)-- 查询web schema下的page_views表扩展信息 show table extended from web like 'page*'; tab_name ----------------------------------------------------------------------------- tableName:page_views owner:admintest location:hdfs://hacluster/user/web.db/page_views InputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat columns:struct columns {timestamp view_time,bigint user_id,string page_url} partitioned:true partitionColumns: struct partition_columns {date ds,string country} totalNumberFiles:0 totalFileSize:0 (1 row)

MapReduce服务 MRS
MapReduce服务 MRS-SHOW VIEWS:示例

示例创建示例所需视图： Create schema test1;Use test1;Create table t1(id int, name string);Create view v1 as select * from t1;Create view v2 as select * from t1;Create view t1view as select * from t1;Create view t2view as select * from t1; Show views;Table-------- t1view t2view v1 v2(4 rows) Show views like 'v1'; Table------- v1(1 row) Show views 'v_';Table ------- v1 v2 (2 rows)show views like 't*'; Table-------- t1view t2view Show views in test1; Table-------- t1view t2view v1 v2(4 rows)

MapReduce服务 MRS
MapReduce服务 MRS-ALTER TABLE:示例

示例将表名从users 修改为 people： ALTER TABLE users RENAME TO people; 在表users中增加名为zip的列： ALTER TABLE users ADD COLUMN zip varchar; 从表users中删除名为zip的列： ALTER TABLE users DROP COLUMN zip; 将表users中列名id更改为user_id： ALTER TABLE users RENAME COLUMN id TO user_id; 给users表添加一个creator属性，值为user1：可通过DESCRIBE EXTENDED/FORMATTED TABLE查看添加结果： ALTER TABLE users set tblproperties('creator' = 'user1'); --查看新增的属性DESCRIBE FORMATTED users; Describe Formatted Table ------------------------------------------------------------------------------ # col_name data_type comment id integer name varchar # Detailed Table Information Database: default Owner: admintest LastAccessTime: 0 Location: hdfs://hacluster/user/hive/warehouse/users Table Type: MANAGED_TABLE # Table Parameters: STATS_GENERATED_VIA_STATS_TASK workaround for potential lack of HIVE-12730 creator user1 numFiles 0 numRows 0 orc.compress.size 262144 orc.compression.codec GZIP orc.row.index.stride 10000 orc.stripe.size 67108864 presto_query_id 20210308_023136_00031_jiwsq@default@HetuEngine presto_version rawDataSize 0 totalSize 0 transient_lastDdlTime 1615170696 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] serialization.format: 1 (1 row) Query 20210308_032152_00046_jiwsq@default@HetuEngine, FINISHED, 1 nodeSplits: 1 total, 1 done (100.00%)0:00 [0 rows, 0B] [0 rows/s, 0B/s] 修改分区操作： --创建两个分区表CREATE TABLE IF NOT EXISTS hetu_int_table5 (eid int, name String, salary String, destination String, dept String, yoj int) COMMENT 'Employee Names' partitioned by (dt timestamp,country String, year int, bonus decimal(10,3)) STORED AS TEXTFILE; CREATE TABLE IF NOT EXISTS hetu_int_table6 (eid int, name String, salary String, destination String, dept String, yoj int) COMMENT 'Employee Names' partitioned by (dt timestamp,country String, year int, bonus decimal(10,3)) STORED AS TEXTFILE; --添加分区ALTER TABLE hetu_int_table5 ADD IF NOT EXISTS PARTITION (dt='2008-08-08 10:20:30.0', country='IN', year=2001, bonus=500.23) PARTITION (dt='2008-08-09 10:20:30.0', country='IN', year=2001, bonus=100.50) ; --查看分区show partitions hetu_int_table5; dt | country | year | bonus -------------------------|---------|------|--------- 2008-08-09 10:20:30.000 | IN | 2001 | 100.500 2008-08-08 10:20:30.000 | IN | 2001 | 500.230 (2 rows) --删除分区ALTER TABLE hetu_int_table5 DROP IF EXISTS PARTITION (dt=timestamp '2008-08-08 10:20:30.0', country='IN', year=2001, bonus=500.23); --查看分区show partitions hetu_int_table5; dt | country | year | bonus -------------------------|---------|------|--------- 2008-08-09 10:20:30.000 | IN | 2001 | 100.500 (1 row) --迁移分区示例CREATE SCHEMA part_test;CREATE TABLE hetu_exchange_partition1 (a string, b string) PARTITIONED BY (ds string);CREATE TABLE part_test.hetu_exchange_partition2 (a string, b string) PARTITIONED BY (ds string);ALTER TABLE hetu_exchange_partition1 ADD PARTITION (ds='1'); --查看分区 show partitions hetu_exchange_partition1; ds ---- 1 (1 row) show partitions part_test.hetu_exchange_partition2; ds ----(0 rows) --迁移分区，从 T1 到 T2ALTER TABLE part_test.hetu_exchange_partition2 EXCHANGE PARTITION (ds='1') WITH TABLE hetu_exchange_partition1; --再次查看分区，可以看到分区迁移成功show partitions hetu_exchange_partition1; ds ---- (0 row) show partitions part_test.hetu_exchange_partition2; ds ----1(1 rows) --重命名分区CREATE TABLE IF NOT EXISTS hetu_rename_table ( eid int, name String, salary String, destination String, dept String, yoj int) COMMENT 'Employee details' partitioned by (year int) STORED AS TEXTFILE; ALTER TABLE hetu_rename_table ADD IF NOT EXISTS PARTITION (year=2001); SHOW PARTITIONS hetu_rename_table;year ------ 2001 (1 row) ALTER TABLE hetu_rename_table PARTITION (year=2001) rename to partition (year=2020); SHOW PARTITIONS hetu_rename_table;year ------ 2020 (1 row) --修改分区表create table altercolumn4(a integer, b string) partitioned by (c integer); --修改表的文件格式 alter table altercolumn4 SET FILEFORMAT textfile; insert into altercolumn4 values (100, 'Daya', 500); alter table altercolumn4 partition (c=500) change column b empname string comment 'changed column name to empname' first; --修改分区表的存储位置（需要先在hdfs上创建目录，执行语句后，无法查到之前插入的那条数据）alter table altercolumn4 partition (c=500) set Location '/user/hive/warehouse/c500'; --修改列 b 改名为name，同时类型从integer转为string（对列属性修改需要先修改Hive数据源properties属性，新增“hive.orc.use-column-names=true”配置，否则会报错）create table altercolumn1(a integer, b integer); alter table altercolumn1 change column b name string; --修改altercolumn1的存储属性ALTER TABLE altercolumn1 CLUSTERED BY(a, name) SORTED BY(name) INTO 25 BUCKETS; --查看altercolumn1的属性describe formatted altercolumn1; Describe Formatted Table ---------------------------------------------------------------------------------------- # col_name data_type comment a integer name varchar # Detailed Table Information Database: default Owner: admintest LastAccessTime: 0 Location: hdfs://hacluster/user/hive/warehouse/altercolumn1 Table Type: MANAGED_TABLE # Table Parameters: STATS_GENERATED_VIA_STATS_TASK workaround for potential lack of HIVE-12730 numFiles 0 numRows 0 orc.compress.size 262144 orc.compression.codec GZIP orc.row.index.stride 10000 orc.stripe.size 67108864 presto_query_id 20210325_025238_00034_f63xj@default@HetuEngine presto_version rawDataSize 0 totalSize 0 transient_lastDdlTime 1616640758 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: 25 Bucket Columns: [a, name] Sort Columns: [SortingColumn{columnName=name, order=ASCENDING}] Storage Desc Params: serialization.format 1 (1 row) Query 20210325_090522_00091_f63xj@default@HetuEngine, FINISHED, 1 nodeSplits: 1 total, 1 done (100.00%)0:00 [0 rows, 0B] [0 rows/s, 0B/s]

MapReduce服务 MRS
MapReduce服务 MRS-SHOW语法使用概要

SHOW语法使用概要 SHOW语法主要用来查看数据库对象的相关信息，其中LIKE子句用来对数据库对象过滤，匹配规则如下，具体示例可参看SHOW TABLES：规则1：_可用用来匹配单个任意字符。规则2：%可以用来匹配0个或者任意个任意字符。规则3：* 可以用来匹配0个或者任意个任意字符。规则4：|可以用来配置多种规则，规则之间用“|”分隔。规则5：当想将“_”作为匹配条件时，可以使用ESCAPE 指定一个转义字符，对“_”进行转义，以免按照规则1对“_”进行解析。父主题： DDL 语法

MapReduce服务 MRS
MapReduce服务 MRS-ALTER TABLE:限制

限制 EXCHANGE PARTITION：被迁移的单个或多个分区，迁移前必须都是已存在的分区，并归属于来源表，且在目标表中不包含这些分区；该操作涉及的表需要有相同的列定义，并且有相同的分区键；如果表中包含索引，该操作会失败；来源表和目标表中任意一个为事务表时，不允许Exchange partition操作；对于目标表，在一次操作中，多个分区要么同时迁移成功，要么全部失败。对于来源表，操作成功后，所有迁移的分区都会被释放； Alter table change column不支持orc格式的表。 ALTER TABLE table_name ADD | DROP col_name命令仅对于ORC/PARQUET存储格式的非分区表可用。

MapReduce服务 MRS
MapReduce服务 MRS-SHOW SCHEMAS（DATABASES）:示例

示例列出当前catalog所有的schemas： SHOW SCHEMAS; 列出指定catalog下的schema_name前缀为＂t＂的所有schemas： SHOW SCHEMAS FROM hive LIKE 't%';--等价写法:SHOW SCHEMAS IN hive LIKE 't%'; 如果匹配字符串中有字符与通配符冲突，可以指定转义字符来标识，示例为查询hive这个catalog下，schema_name前缀为“pm_”的所有schema，转义字符为“/”： SHOW SCHEMAS IN hive LIKE 'pm/_%' ESCAPE '/';

MapReduce服务 MRS
MapReduce服务 MRS-CREATE FUNCTION:示例

示例创建一个新的JAVA函数“example.default.add_two”（需要先构建和部署UDF） CREATE FUNCTION example.default.add_two ( num integer)RETURNS integerLANGUAGE JAVADETERMINISTICSYMBOL "com.example.functions.AddTwo"URI "hdfs://hacluster/udfs/function-1.0.jar";--执行函数select hetu.default.add_two(2); 创建一个新的JDBC函数“example.namespace02.repeat” CREATE FUNCTION example.namespace02.repeat ( str VARCHAR, times BIGINT)RETURNS VARCHARLANGUAGE JDBCSPECIFIC repeatDETERMINISTIC;--执行函数select example.namespace02.repeat(t1.name,3) from mppdb.test.table1 t1;

MapReduce服务 MRS
MapReduce服务 MRS-CREATE FUNCTION:语法

语法 CREATE FUNCTION qualified_function_name ( parameter_name parameter_type [, ...] ) RETURNS return_type [ COMMENT function_description ] [ LANGUAGE [ JAVA | JDBC ] ] [ SPECIFIC specificName ] [ DETERMINISTIC | NOT DETERMINISTIC ] [ RETURNS NULL ON NULL INPUT | CALLED ON NULL INPUT ] [ SYMBOL class_name ] [ URI hdfs_path_to_jar ]

MapReduce服务 MRS
MapReduce服务 MRS-VALUES:示例

示例返回一个1列3行的表： VALUES 1, 2, 3 返回一个2列3行的表： VALUES(1, 'a'),(2, 'b'),(3, 'c') 返回具有列名id、name的表： SELECT * FROM ( (1, 'a'), (2, 'b'),(3, 'c')) AS t (id, name); 创建一个具有列名id、name的新表： CREATE TABLE example AS SELECT * FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c')) AS t (id, name);

MapReduce服务 MRS
MapReduce服务 MRS-CREATE FUNCTION:描述

描述通过给定的定义创建一个新的函数。每一个函数都由其限定函数名称和参数类型列表唯一标识。“qualified_function_name”的格式需要为“catalog.schema.function_name”，函数命名空间（格式为“catalog.schema”）可以自行规划管理，与HetuEngine中的catalog、schema概念无关联；“parameter_type”需要为HetuEngine支持的数据类型。 “return_type”需要为HetuEngine支持的数据类型，要与函数的返回实际类型匹配，不做类型强制转换。可以指定一组特征来修饰函数并指定其行为，每个特征最多只能指定一次，详情请参考表1。表1 特征说明特征默认值描述 Language clause JDBC 定义函数的语言。目前支持JAVA、JDBC两种语言。 JAVA函数：需要提供函数实现的JAR文件，并将JAR文件放入HetuEngine可以读取的HDFS中。 JDBC函数：在HetuEngine中映射JDBC数据源的函数。目前仅支持映射JDBC数据源的Scalar UDF（即为结果集的每一行返回一个值）。 Specific name - JDBC函数使用，指定函数在JDBC数据源中的限定函数名称。 Deterministic characteristic NOT DETERMINISTIC 函数是否确定性。 DETERMINISTIC：如果函数在使用相同的输入集调用时总是返回相同的结果集，则该函数被视为确定性。 NOT DETERMINISTIC：如果函数在使用相同的输入集调用时不返回相同的结果集，则该函数将被视为非确定性。 Null-call clause CALLED ON NULL INPUT 函数的行为。 RETURNS NULL ON NULL INPUT：当“NULL”作为函数参数时，返回“NULL”。 CALLED ON NULL INPUT：当“NULL”作为函数参数时调用。 Symbol class_name - JAVA函数使用，指定函数实现的限定类名。 Uri hdfs_path_to_jar - JAVA函数使用，指定函数实现的JAR文件路径。

MapReduce服务 MRS
MapReduce服务 MRS-TRUNCATE TABLE:示例

示例 -- 删除原生/管控表Create table simple(id int, name string); Insert into simple values(1,'abc')(2,'def'); select * from simple; id | name----|------ 1 | abc 2 | def(2 rows) Truncate table simple; select * from simple; id | name----|------(0 rows) --删除表分区Create table tb_truncate_part (id int, name string) partitioned by (age int, state string); Insert into tb_truncate_part values (1,'abc',10,'ap'),(2,'abc',10,'up'),(3,'abc',20,'ap'),(4,'abc',20,'up'); select * from tb_truncate_part; id | name | age | state----|------|-----|------- 2 | abc | 10 | up 3 | abc | 20 | ap 1 | abc | 10 | ap 4 | abc | 20 | up(4 rows Truncate table tb_truncate_part partition (state = 'ap', age = 10); select * from tb_truncate_part;id | name | age | state----|------|-----|------- 4 | abc | 20 | up 2 | abc | 10 | up 3 | abc | 20 | ap(3 rows)

MapReduce服务 MRS
MapReduce服务 MRS-CREATE TABLE:表配置

表配置 HetuEngine连接Hive数据源的表支持如下表配置：表2 Hive数据源的表配置 Catalog名 orc表参数默认值类型描述 hive orc_compress GZIP varchar 高级别的压缩算法 (NONE, ZLIB, SNAPPY, GZIP, ZSTD, LZ4) hive orc_compress_size 262144 long 每个压缩块中的字节数 hive orc_stripe_size 67108864 long 每个stripe中的字节数 hive orc_row_index_stride 10000 long 索引条目之间的行数 hive orc_bloom_filter_columns '' varchar 逗号分隔的列名称列表，应为其创建布隆过滤器 hive orc_bloom_filter_fpp 0.05 double 布隆过滤器的positive概率（必须大于0.0且小于1.0）例如： CREATE TABLE format1(orderkey bigint, orderstatus varchar, totalprice double, orderdate date) TBLPROPERTIES (orc_compress = 'SNAPPY', orc_compress_size = 6710422, orc_bloom_filter_columns = 'orderstatus,totalprice');CREATE TABLE format2(orderkey bigint, orderstatus varchar, totalprice double, orderdate date) TBLPROPERTIES (orc_compress = 'LZ4', orc_stripe_size = 33554432, orc_bloom_filter_fpp = 0.1, orc_row_index_stride=5000, orc_compress_size = 6710422, orc_bloom_filter_columns = 'orderstatus,totalprice'); 查看表的配置参数： Show create table format2; Create Table----------------------------------------------------------------------- CREATE TABLE hive.default.format2 ( orderkey bigint, orderstatus varchar, totalprice double, orderdate date ) WITH ( external = false, format = 'ORC', location = '/user/hive/warehouse/format2', orc_bloom_filter_columns = ARRAY['orderstatus','totalprice'], orc_bloom_filter_fpp = 1E-1, orc_compress = 'LZ4', orc_compress_size = 6710422, orc_row_index_stride = 5000, orc_stripe_size = 33554432 )(1 row)

MapReduce服务 MRS
MapReduce服务 MRS-CREATE TABLE:示例

示例创建一个新表orders，使用子句with指定创建表的存储格式、存储位置、以及是否为外表。通过“auto.purge”参数可以指定涉及到数据移除操作（如DROP、DELETE、INSERT OVERWRITE、TRUNCATE TABLE）时是否清除相关数据： "auto.purge"='true'时，清除元数据和数据文件。 "auto.purge"='false'时，仅清除元数据，数据文件会移入HDFS回收站。默认值为“false”，且不建议用户修改此属性，避免数据删除后无法恢复。 CREATE TABLE orders (orderkey bigint,orderstatus varchar,totalprice double,orderdate date)WITH (format = 'ORC', location='/user',orc_compress='ZLIB',external=true, "auto.purge"='false');-- 通过DESC FORMATTED 语句，可以查看建表的详细信息desc formatted orders ; Describe Formatted Table ------------------------------------------------------------------------------ # col_name data_type comment orderkey bigint orderstatus varchar totalprice double orderdate date # Detailed Table Information Database: default Owner: admintest LastAccessTime: 0 Location: hdfs://hacluster/user Table Type: EXTERNAL_TABLE # Table Parameters: EXTERNAL TRUE auto.purge false orc.compress.size 262144 orc.compression.codec ZLIB orc.row.index.stride 10000 orc.stripe.size 67108864 presto_query_id 20220812_084110_00050_srknk@default@HetuEngine presto_version 1.2.0-h0.cbu.mrs.320.r1-SNAPSHOT transient_lastDdlTime 1660293670 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 (1 row) 创建一个新表，指定Row format： --建表时，指定表的字段分隔符为‘,’号（如果创建外表，要求数据文件中的每条记录的字段是以逗号进行分隔）CREATE TABLE student(id string,birthday string,grade int,memo string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';--建表时，指定字段分隔符为'\t'，换行符为'\n'CREATE TABLE test(id int, name string ,tel string)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'LINES TERMINATED BY '\n'STORED AS TEXTFILE; 如果表orders不存在，则创建表orders，并且增加表注释和列注释： CREATE TABLE IF NOT EXISTS orders (orderkey bigint,orderstatus varchar,totalprice double COMMENT 'Price in cents.',orderdate date)COMMENT 'A table to keep track of orders.';insert into new_orders values(202011181113,'online',9527,date '2020-11-11'),(202011181114,'online',666,date '2020-11-11'),(202011181115,'online',443,date '2020-11-11'),(202011181115,'offline',2896,date '2020-11-11'); 使用表orders的列定义创建表bigger_orders： CREATE TABLE bigger_orders (another_orderkey bigint,LIKE orders,another_orderdate date);SHOW CREATE TABLE bigger_orders ; Create Table --------------------------------------------------------------------- CREATE TABLE hive.default.bigger_orders ( another_orderkey bigint, orderkey bigint, orderstatus varchar, totalprice double, ordersdate date, another_orderdate date ) WITH ( external = false, format = 'ORC', location = 'hdfs://hacluster/user/hive/warehouse/bigger_orders', orc_compress = 'GZIP', orc_compress_size = 262144, orc_row_index_stride = 10000, orc_stripe_size = 67108864 ) (1 row) 标号① 建表示例： CREATE EXTERNAL TABLE hetu_test (orderkey bigint, orderstatus varchar, totalprice double, orderdate date) PARTITIONED BY(ds int) SORT BY (orderkey, orderstatus) COMMENT 'test' STORED AS ORC LOCATION '/user' TBLPROPERTIES (orc_compress = 'SNAPPY', orc_compress_size = 6710422, orc_bloom_filter_columns = 'orderstatus,totalprice'); 标号② 建表示例： CREATE EXTERNAL TABLE hetu_test1 (orderkey bigint, orderstatus varchar, totalprice double, orderdate date) COMMENT 'test' PARTITIONED BY(ds int) CLUSTERED BY (orderkey, orderstatus) SORTED BY (orderkey, orderstatus) INTO 16 BUCKETS STORED AS ORC LOCATION '/user' TBLPROPERTIES (orc_compress = 'SNAPPY', orc_compress_size = 6710422, orc_bloom_filter_columns = 'orderstatus,totalprice'); 标号③ 建表示例： CREATE TABLE hetu_test2 (orderkey bigint, orderstatus varchar, totalprice double, orderdate date, ds int) COMMENT 'This table is in Hetu syntax' WITH (partitioned_by = ARRAY['ds'], bucketed_by = ARRAY['orderkey', 'orderstatus'], sorted_by = ARRAY['orderkey', 'orderstatus'], bucket_count = 16, orc_compress = 'SNAPPY', orc_compress_size = 6710422, orc_bloom_filter_columns = ARRAY['orderstatus', 'totalprice'], external = true, format = 'orc', location = '/user');

MapReduce服务 MRS
MapReduce服务 MRS-CREATE TABLE:限制

限制相同序号（①、②、③）标记的表达式/关键字可以同时使用，不同序号标记的表达式/关键字不能同时使用。 session属性可以设置bucket_count，默认值为-1，表示未设置。创建表时，如果bucket_count为-1且建表语句中未设置buckets，则使用默认值16。默认外部表存储位置/user/hive/warehouse/{schema_name}/{table_name}，其中{schema_name}为建表时使用的schema，{table_name}为表名。指定属性“transactional=true”可以让表支持“原子性、一致性、隔离性、持久性”写入的事务能力，但是将表定义为事务表后，无法通过设置“transactional=false”将其退化为非事务表。 transactional='true'或 '0'在执行过程中不会进行类型转换，所以这种写法会抛出异常： Cannot convert ['true'] to boolean Cannot convert ['0'] to boolean 默认不允许向托管表（表属性external = true）插入数据，如需使用该功能，可参考注意事项，添加hive自定义属性：hive.non-managed-table-writes-enabled=true。 Mppdb有一个限制，数据库的标识符的最大长度为63，如果我们把标识符命名超过了最大长度，那么会被自动截取掉超出的部分，只留下最大长度的标识符。跨域场景不支持建表。

MapReduce服务 MRS
MapReduce服务 MRS-CREATE TABLE:语法

语法 CREATE [EXTERNAL]①,② TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name ( { [col_name data_type [COMMENT col_comment] [ WITH③ ( property_name = expression [, ...] )③ ] | LIKE③ existing_table_name [ {INCLUDING | EXCLUDING}③ PROPERTIES]③ } [, ...]) [[PARTITIONED BY①(col_name data_type, ....)] [SORT BY① ([column [, column ...]])] ]① [COMMENT 'table_comment'] [ WITH ( property_name = expression [, ...] ) ]③ [[PARTITIONED BY② (col_name data_type, .....)] [CLUSTERED BY② (col_name, col_name, ...) [SORTED BY②③ (col_name, col_name, ...)] INTO num_buckets BUCKETS]② ]② [ [ROW FORMAT row_format] [STORED AS file_format] [LOCATION 'hdfs_path'] [TBLPROPERTIES (orc_table_property = value [, ...] ) ] ]①,②

MapReduce服务 MRS
MapReduce服务 MRS-CREATE TABLE:描述

描述使用CREATE TABLE创建一个具有指定列的、新的空表。使用CREATE TABLE AS创建带数据的表。使用可选参数IF NOT EXISTS，如果表已经存在则不会报错。 WITH子句可用于在新创建的表或单列上设置属性，如表的存储位置（location）、是不是外表（external）等。 LIKE子句用于在新表中包含来自现有表的所有列定义。可以指定多个LIKE子句，从而允许从多个表中复制列。如果指定了INCLUDING PROPERTIES，则将所有表属性复制到新表中。如果WITH子句指定的属性名称与复制的属性名称相同，则将使用WITH子句中的值。默认是EXCLUDING PROPERTIES属性，而且最多只能为一个表指定INCLUDING PROPERTIES属性。 PARTITIONED BY能够用于指定分区的列；CLUSTERED BY能够被用于指定分桶的列；SORT BY和 SORTED BY能够用于给指定的分桶列进行排序；BUCKETS能够被用于指定分桶数；EXTERNAL可用于指定创建外部表；STORED AS能被用于指定文件存储的格式；LOCATION能被用于指定在HDFS上存储的路径。想要查看当前数据源支持哪些column属性，可以运行以下命令，会显示当前对接的catalog分别支持哪些列属性。 SELECT * FROM system.metadata.column_properties; 想要查看当前数据源支持哪些table属性，可以运行以下命令： SELECT * FROM system.metadata.table_properties; 下表为catalog为hive时的查询结果。表1 查询结果 Catalog名属性名称默认值类型描述 hive avro_schema_url - varchar URI pointing to Avro schema for the table hive bucket_count 0 integer Number of buckets hive bucketed_by [] array(varchar) Bucketing columns hive bucketing_version - integer Bucketing version hive csv_escape - varchar CSV escape character hive csv_quote - varchar CSV quote character hive csv_separator - varchar CSV separator character hive external_location - varchar File system location URI for external table hive format ORC varchar Hive storage format for the table. Possible values: [ORC, PARQUET, AVRO, RCBINARY, RCTEXT, SEQUENCEFILE, JSON, TEXTFILE, CSV] hive orc_compress GZIP varchar Compression codec used. Possible values: [NONE, SNAPPY, LZ4, ZSTD, GZIP, ZLIB] hive orc_compress_size 262144 bigint orc compression size hive orc_row_index_stride 10000 integer no. of row index strides hive orc_stripe_size 67108864 bigint orc stripe size hive orc_bloom_filter_columns [] array(varchar) ORC Bloom filter index columns hive orc_bloom_filter_fpp 0.05 double ORC Bloom filter false positive probability hive partitioned_by [] array(varchar) Partition columns hive sorted_by [] array(varchar) Bucket sorting columns hive textfile_skip_footer_line_count - integer Number of footer lines hive textfile_skip_header_line_count - integer Number of header lines hive transactional false boolean Is transactional property enabled systemremote configs configurations varchar configurations for current query systemremote init_query_id initqueryid varchar init query id systemremote initiator_format initiator format varchar column metadata from initiator

MapReduce服务 MRS
MapReduce服务 MRS-ALTER VIEW:描述

描述 “ALTER VIEW view_name AS select_statement;”用于改变已存在的视图的定义，语法效果与CREATE OR REPLACE VIEW类似。 “ALTER VIEW view_name SET TBLPROPERTIES table_properties;”中table_properties格式为 (property_name = property_value, property_name = property_value, ...)。视图可以包含Limit和ORDER BY子句，如果关联视图的查询语句也包含了这类子句，则最后执行结果将根据视图的子句运算后得到。例如视图V指定了返回5条数据，而关联查询为select * from V limit 10，则最终只有5条数据返回。

MapReduce服务 MRS
MapReduce服务 MRS-ALTER VIEW:示例

示例 CREATE OR REPLACE VIEW tv_view as SELECT id,name from (values (1, 'HetuEngine')) as x(id,name); SELECT * FROM tv_view; id | name ----|------ 1 | HetuEngine (1 row) ALTER VIEW tv_view as SELECT id, brand FROM (VALUES (1, 'brand_1', 100), (2, 'brand_2', 300) ) AS x (id, brand, price); SELECT * FROM tv_view; id | brand ----|--------- 1 | brand_1 2 | brand_2 (2 rows) ALTER VIEW tv_view SET TBLPROPERTIES ('comment' = 'This is a new comment'); show tblproperties tv_view; SHOW TBLPROPERTIES -------------------------------------------------------------------- comment This is a new comment presto_query_id 20210325_034712_00040_f63xj@default@HetuEngine presto_version presto_view true transient_lastDdlTime 1616644032 (1 row)

MapReduce服务 MRS
MapReduce服务 MRS-ALTER MATERIALIZED VIEW STATUS:描述

描述修改物化视图的状态，仅支持修改处于“ENABLED”和“SUSPEND”状态的物化视图，且只能修改为其中一种状态。物化视图所有状态包含如下： SUSPEND：暂停使用状态，暂停使用的物化视图不会参与改写 ENABLED：可使用状态 REFRESHING：正在刷新物化视图数据，不可用于改写 DISABLED：关闭使用 UNKNOWN：缓存与数据库不一致，建议执行refresh catalog mv;

MapReduce服务 MRS

共100000条

undefined

意见反馈

0/200

提交取消

提交成功！非常感谢您的反馈，我们会继续努力做到更好反馈提交失败！请稍后重试！

华为云用户手册

7*24

备案

专业服务

退订

建议反馈

售前咨询热线