Hive的数据定义语言

发布于:2025-03-28 ⋅ 阅读:(24) ⋅ 点赞:(0)

1. 数据库

default数据库
Hive默认有一个default数据库,但是该数据库的location如下:

0: jdbc:hive2://hive1:10000> show create database default;
+--------------------------------------+
|            createdb_stmt             |
+--------------------------------------+
| CREATE DATABASE `default`            |
| COMMENT                              |
|   'Default Hive database'            |
| LOCATION                             |
|   'hdfs://nnha/user/hive/warehouse'  |
+--------------------------------------+
5 rows selected (0.135 seconds)
0: jdbc:hive2://hive1:10000>

修改default数据库的location

0: jdbc:hive2://hive1:10000> alter schema default set location 'hdfs://nnha/user/hive/warehouse/default.db';
No rows affected (0.264 seconds)
0: jdbc:hive2://hive1:10000>

虽然修改了default数据库的location,但是在default数据库下面创建的表的location还是:hdfs://nnha/user/hive/warehouse/test_tb

创建数据库

0: jdbc:hive2://hive1:10000> create schema if not exists base_db 
. . . . . . . . . . . . . .> comment '基础数据库' 
. . . . . . . . . . . . . .> location 'hdfs://nnha/user/hive/warehouse/base_db.db/'
. . . . . . . . . . . . . .> with dbproperties('my_name'='base', 'my_date'='2022-02-14');
No rows affected (4.018 seconds)
0: jdbc:hive2://hive1:10000>

查看数据库

0: jdbc:hive2://hive1:10000> show schemas like 'base*';
+----------------+
| database_name  |
+----------------+
| base_db        |
+----------------+
1 row selected (0.173 seconds)
0: jdbc:hive2://hive1:10000>

查看数据库信息

0: jdbc:hive2://hive1:10000> desc schema extended base_db;
+----------+----------+---------------------------------------------+-------------+-------------+-------------------------------------+
| db_name  | comment  |                  location                   | owner_name  | owner_type  |             parameters              |
+----------+----------+---------------------------------------------+-------------+-------------+-------------------------------------+
| base_db  | 基础数据库    | hdfs://nnha/user/hive/warehouse/base_db.db  | root        | USER        | {my_name=base, my_date=2022-02-14}  |
+----------+----------+---------------------------------------------+-------------+-------------+-------------------------------------+
1 row selected (1.294 seconds)
0: jdbc:hive2://hive1:10000>
  • extended可选,用于显示数据库parameters属性

查看当前数据库和切换数据库

0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> select current_database();
+----------+
|   _c0    |
+----------+
| default  |
+----------+
1 row selected (0.998 seconds)
0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> use base_db;
No rows affected (0.146 seconds)
0: jdbc:hive2://hive1:10000> 

修改数据库属性

0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> alter schema base_db set dbproperties('my_date' = '2022-02-15', 'my_locale' = 'Beijing');
No rows affected (0.915 seconds)
0: jdbc:hive2://hive1:10000> 

删除数据库

0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> drop schema if exists base_db cascade;
No rows affected (0.998 seconds)
0: jdbc:hive2://hive1:10000> 

其中cascade可选,表示及时数据库中有表,也删除数据库和其中的表

2. 数据表的基本操作

2.1 create table

语法如下:

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
 
data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)
 
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)
 
array_type
  : ARRAY < data_type >
 
map_type
  : MAP < primitive_type, data_type >
 
struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)
 
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 
file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | JSONFILE    -- (Note: Available in Hive 4.0.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
 
column_constraint_specification:
  : [ PRIMARY KEY|UNIQUE|NOT NULL|DEFAULT [default_value]|CHECK  [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]
 
default_value:
  : [ LITERAL|CURRENT_USER()|CURRENT_DATE()|CURRENT_TIMESTAMP()|NULL ] 
 
constraint_specification:
  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE 
    [, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]

说明如下:

  • 如果创建的表为空表,则可以不创建列
  • constraint_specification:用于指定字段约束
  • CLUSTERED BY:用于创建分桶表,指定分桶字段、分桶数量、桶排序字段
  • [SKEWED BY (col_name, col_name, …) – (Note: Available in Hive 0.10.0 and later)]
    ON ((col_value, col_value, …), (col_value, col_value, …), …)
    [STORED AS DIRECTORIES]
  • SKEWED BY:用于创建倾斜表,解决Hive数据倾斜问题。其中SKEWED BY (col_name, col_name, …)指定造成数据倾斜的字段,ON ((col_value, col_value, …), (col_value, col_value, …), …)指定数据倾斜字段中造成数据倾斜的值,STORED AS DIRECTORIES:将数据倾斜字段中造成数据倾斜的值,拆分成文件夹,不指定则拆分成文件
  • ROW FORMAT:用于序列化对象
  • STORED AS:用于创建表时指定Hive表的文件储存格式
  • AS select_statement:用于在创建表的同时,将查询结果插入表中

Serde序列化反序列化
通过Serde对表中的每一行进行读取和写入。读取HDFS数据会进行序列化成为字节流,插入数据会进行反序列化。自定义的需要指定row_format的值为Serde,并指定Serde实现类;内置的需要指定row_format的值为DELIMITED。常用的内置Serde

内置Serde 介绍
FIELDS TERMINATED BY char [ESCAPED BY char] FIELDS TERMINATED指定字段分隔符,ESCAPED指定转义符,避免数据中存在与字段分隔符一样的字符
COLLECTION ITERMS TERMINATED BY char 指定集合中元素的分隔符,集合类型有MAP、ARRAY和STRUCT
MAP KYS TERMINATED BY char 指定MAP中key和value的分隔符
LINES TERMINATED BY char 指定行分隔符
NULL DEFINED AS char 自定义空值格式,默认为’\N’

表属性
Hive支持自定义表属性和预定义表属性。预定义表属性必须是Hive规定的key和value,预定义表属性如下:

属性 描述
comment table_comment 表描述
hbase.table.name table_name 集成HBase
immutable true或false 如果为true,则无法通过insert实现数据的更新和插入
orc.compress ZLIB或SNAPPY或NONE 指定ORC压缩方式
transactional true或false 指定表是否支持ACID(更新、插入、删除)
NO_AUTO_COMPACTION true或false 表事务属性,指定表是否支持自动紧缩
compactor.mapreduce.map.memory.mb mapper_memory 表事务属性,指定紧缩map(内存/MB)作业的属性
compactorthreshold.hive.compactor.delta.num.thresthold threshold_num 表事务属性,如果有超过threshold_num个增量目录,则触发轻度紧缩
compactorthreshold.hive.compactor.delta.pct.threthold threshold_pct 表事务属性,如果增量文件的大小与基础文件的大小比率大于threshold_pct(区间为0~1),则触发深度紧缩
auto.purge true或false 如果为true,则删除或覆盖的数据会不经过回收站,直接被删除
EXTERNAL true或false 内部表和外部表的转换

创建表示例如下

0: jdbc:hive2://hive1:10000> create external table if not exists default.person_table(
. . . . . . . . . . . . . .> id int comment '人员id',
. . . . . . . . . . . . . .> name string,
. . . . . . . . . . . . . .> salary float,
. . . . . . . . . . . . . .> hobby array<string>,
. . . . . . . . . . . . . .> deduction map<string, float>,
. . . . . . . . . . . . . .> address struct<street:string, city:string>
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> row format delimited 
. . . . . . . . . . . . . .> fields terminated by ',' 
. . . . . . . . . . . . . .> collection items terminated by '_' 
. . . . . . . . . . . . . .> map keys terminated by ':' 
. . . . . . . . . . . . . .> lines terminated by '\n' 
. . . . . . . . . . . . . .> stored as textfile 
. . . . . . . . . . . . . .> location 'hdfs://nnha/user/hive/warehouse/default.db/person_table/' 
. . . . . . . . . . . . . .> tblproperties('comment'='person table');
No rows affected (3.24 seconds)
0: jdbc:hive2://hive1:10000> 

如果创建内部表,则没有external子句,且location子句也不是必须的

2.2 查看数据表

0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> show tables like 'person*';
+---------------+
|   tab_name    |
+---------------+
| person_table  |
+---------------+
1 row selected (0.334 seconds)
0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> desc formatted person_table;
+-------------------------------+----------------------------------------------------+-----------------------+
|           col_name            |                     data_type                      |        comment        |
+-------------------------------+----------------------------------------------------+-----------------------+
| # col_name                    | data_type                                          | comment               |
| id                            | int                                                | 人员id                  |
| name                          | string                                             |                       |
| salary                        | float                                              |                       |
| hobby                         | array<string>                                      |                       |
| deduction                     | map<string,float>                                  |                       |
| address                       | struct<street:string,city:string>                  |                       |
|                               | NULL                                               | NULL                  |
| # Detailed Table Information  | NULL                                               | NULL                  |
| Database:                     | default                                            | NULL                  |
| OwnerType:                    | USER                                               | NULL                  |
| Owner:                        | root                                               | NULL                  |
| CreateTime:                   | Tue Feb 22 11:05:45 CST 2022                       | NULL                  |
| LastAccessTime:               | UNKNOWN                                            | NULL                  |
| Retention:                    | 0                                                  | NULL                  |
| Location:                     | hdfs://nnha/user/hive/warehouse/default.db/person_table | NULL                  |
| Table Type:                   | EXTERNAL_TABLE                                     | NULL                  |
| Table Parameters:             | NULL                                               | NULL                  |
|                               | EXTERNAL                                           | TRUE                  |
|                               | bucketing_version                                  | 2                     |
|                               | comment                                            | person table          |
|                               | transient_lastDdlTime                              | 1645499145            |
|                               | NULL                                               | NULL                  |
| # Storage Information         | NULL                                               | NULL                  |
| SerDe Library:                | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL                  |
| InputFormat:                  | org.apache.hadoop.mapred.TextInputFormat           | NULL                  |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL                  |
| Compressed:                   | No                                                 | NULL                  |
| Num Buckets:                  | -1                                                 | NULL                  |
| Bucket Columns:               | []                                                 | NULL                  |
| Sort Columns:                 | []                                                 | NULL                  |
| Storage Desc Params:          | NULL                                               | NULL                  |
|                               | collection.delim                                   | _                     |
|                               | field.delim                                        | ,                     |
|                               | line.delim                                         | \n                    |
|                               | mapkey.delim                                       | :                     |
|                               | serialization.format                               | ,                     |
+-------------------------------+----------------------------------------------------+-----------------------+
37 rows selected (0.998 seconds)
0: jdbc:hive2://hive1:10000> 
  • formatted表示显示详细信息

2.3 修改数据表

只是修改数据表的元数据,数据表中的数据不会随之变化

  1. 重命名数据表
0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> alter table person_table rename to person_table_new;
No rows affected (2.535 seconds)
0: jdbc:hive2://hive1:10000> 
  1. 修改数据表属性
0: jdbc:hive2://hive1:10000> alter table person_table_new set tblproperties('comment' = 'new person table');
No rows affected (0.454 seconds)
0: jdbc:hive2://hive1:10000>
  1. 修改数据表列
    语法如下:
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type
  [COMMENT col_comment] [FIRST|AFTER column_name];

如果表中每个列的数据类型不一致,则无法使用FIRST|AFTER column_name子句

0: jdbc:hive2://hive1:10000> alter table person_table_new change name person_name varchar(64) comment '人员名字';
No rows affected (0.522 seconds)
0: jdbc:hive2://hive1:10000>
  1. 添加数据表列
    语法如下:
ALTER TABLE table_name 
  ADD COLUMNS (col_name data_type [COMMENT col_comment], ...)

示例如下

0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> alter table person_table_new add columns (age int comment '年龄');
No rows affected (0.414 seconds)
0: jdbc:hive2://hive1:10000> 
  1. 替换数据表列
    替换数据表的所有列,语法如下:
ALTER TABLE table_name 
  REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)

示例如下:

0: jdbc:hive2://hive1:10000> alter table person_table_new replace columns(username string comment '用户名', password string comment '密码');
No rows affected (0.365 seconds)
0: jdbc:hive2://hive1:10000>

2.4 删除数据表

0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> drop table if exists person_table_new purge;
No rows affected (0.789 seconds)
0: jdbc:hive2://hive1:10000> 
  • purge表示当删除表时,内部表的数据不会放入回收站,内部表的数据直接被删除,不能恢复。不使用purge则表示删除表时,内部表的数据会移动到回收站,但回收站功能需要HDFS开启,在core-site.xml添加如下内容,然后重启Hadoop集群。这样删除的数据会放到默认的回收站目录/user/root/.Trash/Current
    <property>
        <name>fs.trash.interval</name>
        <value>1440</value>
        <description>回收站保留数据的时间,单位为分钟</description>
    </property>

    <property>
        <name>fs.trash.checkpoint.interval</name>
        <value>60</value>
        <description>回收站清空到期数据的时间间隔,单位为分钟</description>
    </property>

3. 分区表

创建分区表

0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> create table if not exists default.partitioned_table(
. . . . . . . . . . . . . .> username string,
. . . . . . . . . . . . . .> age int
. . . . . . . . . . . . . .> ) partitioned by (
. . . . . . . . . . . . . .> province string comment '省份',
. . . . . . . . . . . . . .> city string
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by ','
. . . . . . . . . . . . . .> lines terminated by '\n'
. . . . . . . . . . . . . .> stored as textfile
. . . . . . . . . . . . . .> tblproperties('comment'='分区表');
0: jdbc:hive2://hive1:10000> 
  • 分区表创建完成,则不能修改和添加分区字段

查询分区表的分区
语法如下:

SHOW PARTITIONS [db_name.]table_name [PARTITION(partition_column = partition_col_value, 
partition_column = partition_col_value, ...)];

示例如下:

0: jdbc:hive2://hive1:10000> show partitions default.partitioned_table;
+------------+
| partition  |
+------------+
+------------+
No rows selected (0.554 seconds)
0: jdbc:hive2://hive1:10000>

添加实际分区
语法如下:

ALTER TABLE table_name ADD [IF NOT EXISTS] 
PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...) [LOCATION 'location']
[, PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...) [LOCATION 'location'], ...];

示例如下

0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> alter table default.partitioned_table add 
. . . . . . . . . . . . . .> partition (province='HuNan', city='ChangSha') location 'hdfs://nnha/user/hive/warehouse/default.db/partitioned_table' 
. . . . . . . . . . . . . .> partition (province='GuangDong', city='GuangZhou') location 'hdfs://nnha/user/hive/warehouse/default.db/partitioned_table';
No rows affected (2.439 seconds)
0: jdbc:hive2://hive1:10000>

HDFS上并没有新增province=GuangDong/city=GuangZhou和province=HuNan/city=ChangSha两个目录

重命名分区
语法如下:

ALTER TABLE table_name  
PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...) RENAME TO 
PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...);

示例如下

0: jdbc:hive2://hive1:10000> alter table default.partitioned_table 
. . . . . . . . . . . . . .> partition (province='GuangDong', city='GuangZhou') 
. . . . . . . . . . . . . .> rename to 
. . . . . . . . . . . . . .> partition (province='GuangDong', city='ShenZhen');
No rows affected (5.675 seconds)
0: jdbc:hive2://hive1:10000>

移动分区
将分区表A的分区1,移动到另一个具有相同表结构的分区表B中(分区表B不能具有分区1)。语法如下:

ALTER TABLE table_name_B EXCHANGE PARTITION 
(partition_column = partition_col_value, partition_column = partition_col_value, ...) 
WITH TABLE table_name_A;

创建目标分区表

0: jdbc:hive2://hive1:10000> create table if not exists default.partitioned_table_B(
. . . . . . . . . . . . . .> username string,
. . . . . . . . . . . . . .> age int
. . . . . . . . . . . . . .> ) partitioned by (
. . . . . . . . . . . . . .> province string comment '省份',
. . . . . . . . . . . . . .> city string
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by ','
. . . . . . . . . . . . . .> lines terminated by '\n'
. . . . . . . . . . . . . .> stored as textfile
. . . . . . . . . . . . . .> tblproperties('comment'='目标分区表');
No rows affected (0.647 seconds)
0: jdbc:hive2://hive1:10000>

查看源表的分区情况

0: jdbc:hive2://hive1:10000> show partitions default.partitioned_table;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| province=GuangDong/city=ShenZhen  |
| province=HuNan/city=ChangSha      |
+-----------------------------------+
2 rows selected (0.628 seconds)
0: jdbc:hive2://hive1:10000>

执行移动分区,并查看源表和目标表的分区情况

0: jdbc:hive2://hive1:10000> alter table default.partitioned_table_B exchange partition 
. . . . . . . . . . . . . .> (province='GuangDong', city='ShenZhen') 
. . . . . . . . . . . . . .> with table default.partitioned_table;
No rows affected (1.479 seconds)
0: jdbc:hive2://hive1:10000> show partitions default.partitioned_table;
+-------------------------------+
|           partition           |
+-------------------------------+
| province=HuNan/city=ChangSha  |
+-------------------------------+
1 row selected (0.361 seconds)
0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> show partitions default.partitioned_table_B;
+-----------------------------------+
|             partition             |
+-----------------------------------+
| province=GuangDong/city=ShenZhen  |
+-----------------------------------+
1 row selected (0.719 seconds)
0: jdbc:hive2://hive1:10000>

删除分区
删除分区表的实际分区,语法如下:

ALTER TABLE table_name DROP [IF EXISTS] PARTITION 
(partition_column = partition_col_value, partition_column = partition_col_value, ...) 
[PURGE];

不使用PURGE,则分区的数据会放入回收站(不包含元数据)。使用PURGE,则分区的数据不放入回收站,直接删除

示例如下:

0: jdbc:hive2://hive1:10000> alter table default.partitioned_table_B drop if exists partition 
. . . . . . . . . . . . . .> (province='GuangDong', city='ShenZhen');
No rows affected (4.819 seconds)
0: jdbc:hive2://hive1:10000> 

修改表或分区的数据储存位置
语法如下:

ALTER TABLE table_name 
[PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...)] 
SET LOCATION "new location";

修改表或分区的file format
语法如下:

ALTER TABLE table_name 
[PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...)] 
SET FILEFORMAT file_format;

4. 分桶表

有时由于数据倾斜问题,导致一个分区的数据很多。需要进行分桶,将某一列或某几列的数据,进行Hash取模的方式,将数据随机均匀的发送到各个桶文件

创建分桶表

0: jdbc:hive2://hive1:10000> create table if not exists default.clustered_table(
. . . . . . . . . . . . . .> username string,
. . . . . . . . . . . . . .> age int,
. . . . . . . . . . . . . .> province string,
. . . . . . . . . . . . . .> city string
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> clustered by (province, city) sorted by (username asc, age desc) into 5 buckets 
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by ','
. . . . . . . . . . . . . .> lines terminated by '\n'
. . . . . . . . . . . . . .> stored as textfile
. . . . . . . . . . . . . .> tblproperties('comment'='分桶表');
No rows affected (1.72 seconds)
0: jdbc:hive2://hive1:10000> 
  • 当插入数据时,会在分桶表的储存目录下生成5个数据小文件

6. 临时表

临时表只对当前会话可见,数据储存在临时目录,会话结束则删除。创建临时表示例如下:

0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> create temporary table if not exists default.temporary_table(
. . . . . . . . . . . . . .> username string,
. . . . . . . . . . . . . .> age int,
. . . . . . . . . . . . . .> province string,
. . . . . . . . . . . . . .> city string
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by ','
. . . . . . . . . . . . . .> lines terminated by '\n'
. . . . . . . . . . . . . .> stored as textfile
. . . . . . . . . . . . . .> tblproperties('comment'='临时表');
No rows affected (3.405 seconds)
0: jdbc:hive2://hive1:10000> 

通过desc formatted default.temporary_table查看Location的值为:hdfs://nnha/tmp/hive/root/5c1b0948-7c2b-4c18-a945-fbac7fc3e463/_tmp_space.db/d21677bb-a747-400f-a396-698bf26b406c。该路径为hive-site.xml配置的hive.exec.scratchdir参数

注意:

  • 临时表不支持分区表
  • 临时表不支持索引

7. 视图

视图的数据其实就是一条select查询的结果集。视图中的数据并不会进行实际的储存。

不能向视图插入数据。视图可以基于其它视图进行创建

创建视图

创建视图的语法如下:

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name 
[(column_name [COMMENT column_comment], ...) ] 
[COMMENT view_comment] 
[TBLPROPERTIES (property_name = property_value, ...)] 
AS SELECT ...;
  • 视图中列的数据类型,和查询的结果列数据类型一致

示例如下:

0: jdbc:hive2://hive1:10000> create view if not exists default.view_table (
. . . . . . . . . . . . . .> username, 
. . . . . . . . . . . . . .> age, 
. . . . . . . . . . . . . .> province, 
. . . . . . . . . . . . . .> upper_city comment '大写的城市'
. . . . . . . . . . . . . .> ) 
. . . . . . . . . . . . . .> comment '视图' 
. . . . . . . . . . . . . .> tblproperties('my_create_date'='2022-03-07', 'my_author'='Tom') 
. . . . . . . . . . . . . .> as select username, age, province, upper(city) from default.clustered_table;
No rows affected (2.589 seconds)
0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> desc default.view_table;
+-------------+------------+----------+
|  col_name   | data_type  | comment  |
+-------------+------------+----------+
| username    | string     |          |
| age         | int        |          |
| province    | string     |          |
| upper_city  | string     | 大写的城市    |
+-------------+------------+----------+
4 rows selected (0.236 seconds)
0: jdbc:hive2://hive1:10000>

查看视图

0: jdbc:hive2://hive1:10000> show views from default like 'view*';
+-------------+
|  tab_name   |
+-------------+
| view_table  |
+-------------+
1 row selected (0.128 seconds)
0: jdbc:hive2://hive1:10000>

修改视图

  1. 修改视图属性
    语法如下:
ALTER VIEW [db_name.]view_name SET TBLPROPERTIES 
(property_name = property_value, property_name = property_value, ...);

示例如下:

0: jdbc:hive2://hive1:10000> alter view default.view_table set tblproperties 
. . . . . . . . . . . . . .> ('my_author'='Mary');
No rows affected (0.576 seconds)
0: jdbc:hive2://hive1:10000>
  1. 修改视图结构
    修改视图的结构,就是修改视图的select查询语句。会覆盖原来的视图结构

语法如下:

ALTER VIEW [db_name.]view_name AS select_statement;

示例如下:

0: jdbc:hive2://hive1:10000> alter view default.view_table 
. . . . . . . . . . . . . .> as select username from default.clustered_table;
No rows affected (0.863 seconds)
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> desc default.view_table;
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| username  | string     |          |
+-----------+------------+----------+
1 row selected (0.192 seconds)
0: jdbc:hive2://hive1:10000>

删除视图

0: jdbc:hive2://hive1:10000> drop view if exists default.view_table;
No rows affected (0.648 seconds)
0: jdbc:hive2://hive1:10000> 
0: jdbc:hive2://hive1:10000> show views in default like 'view*';
+-----------+
| tab_name  |
+-----------+
+-----------+
No rows selected (0.185 seconds)
0: jdbc:hive2://hive1:10000>

视图B基于视图A,删除视图A,不会发出警告

8. 索引

创建索引的列称为索引列,通过索引列进行查询时,可以避免全表扫描或全分区扫描。

Hive的索引其实是一张索引表。表中储存了索引列的值、索引列的值对应文件在HDFS的Path、索引列的值在该文件的position。这样就可以只扫描部分数据

创建索引的语法如下:

CREATE INDEX index_name
  ON TABLE base_table_name (col_name, ...)
  AS index_type
  [WITH DEFERRED REBUILD]
  [IDXPROPERTIES (property_name=property_value, ...)]
  [IN TABLE index_table_name]
  [
     [ ROW FORMAT ...] STORED AS ...
     | STORED BY ...
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (...)]
  [COMMENT "index comment"];
  • AS index_type:用于指定索引类型
  • WITH DEFERRED REBUILD:用于重建索引
  • IDXPROPERTIES:用于指定索引属性
  • IN TABLE index_table_name:用于指定索引表的名称
  • ROW FORMAT:用于序列化行对象
  • STORED AS:用于指定储存格式。如RCFILE或SEQUENCFILE文件格式
  • STORED BY:用于指定储存方式,例如将索引表储存在HBase中
  • LOCATION hdfs_path:用于指定索引表在HDFS的储存位置
  • TBLPROPERTIES:用于指定索引表属性
  • COMMENT “index comment”:用于指定索引描述

示例如下: