目录
1. 数据库
default数据库
Hive默认有一个default数据库,但是该数据库的location如下:
0: jdbc:hive2://hive1:10000> show create database default;
+--------------------------------------+
| createdb_stmt |
+--------------------------------------+
| CREATE DATABASE `default` |
| COMMENT |
| 'Default Hive database' |
| LOCATION |
| 'hdfs://nnha/user/hive/warehouse' |
+--------------------------------------+
5 rows selected (0.135 seconds)
0: jdbc:hive2://hive1:10000>
修改default数据库的location
0: jdbc:hive2://hive1:10000> alter schema default set location 'hdfs://nnha/user/hive/warehouse/default.db';
No rows affected (0.264 seconds)
0: jdbc:hive2://hive1:10000>
虽然修改了default数据库的location,但是在default数据库下面创建的表的location还是:hdfs://nnha/user/hive/warehouse/test_tb
创建数据库
0: jdbc:hive2://hive1:10000> create schema if not exists base_db
. . . . . . . . . . . . . .> comment '基础数据库'
. . . . . . . . . . . . . .> location 'hdfs://nnha/user/hive/warehouse/base_db.db/'
. . . . . . . . . . . . . .> with dbproperties('my_name'='base', 'my_date'='2022-02-14');
No rows affected (4.018 seconds)
0: jdbc:hive2://hive1:10000>
查看数据库
0: jdbc:hive2://hive1:10000> show schemas like 'base*';
+----------------+
| database_name |
+----------------+
| base_db |
+----------------+
1 row selected (0.173 seconds)
0: jdbc:hive2://hive1:10000>
查看数据库信息
0: jdbc:hive2://hive1:10000> desc schema extended base_db;
+----------+----------+---------------------------------------------+-------------+-------------+-------------------------------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+----------+----------+---------------------------------------------+-------------+-------------+-------------------------------------+
| base_db | 基础数据库 | hdfs://nnha/user/hive/warehouse/base_db.db | root | USER | {my_name=base, my_date=2022-02-14} |
+----------+----------+---------------------------------------------+-------------+-------------+-------------------------------------+
1 row selected (1.294 seconds)
0: jdbc:hive2://hive1:10000>
- extended可选,用于显示数据库parameters属性
查看当前数据库和切换数据库
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> select current_database();
+----------+
| _c0 |
+----------+
| default |
+----------+
1 row selected (0.998 seconds)
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> use base_db;
No rows affected (0.146 seconds)
0: jdbc:hive2://hive1:10000>
修改数据库属性
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> alter schema base_db set dbproperties('my_date' = '2022-02-15', 'my_locale' = 'Beijing');
No rows affected (0.915 seconds)
0: jdbc:hive2://hive1:10000>
删除数据库
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> drop schema if exists base_db cascade;
No rows affected (0.998 seconds)
0: jdbc:hive2://hive1:10000>
其中cascade可选,表示及时数据库中有表,也删除数据库和其中的表
2. 数据表的基本操作
2.1 create table
语法如下:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| JSONFILE -- (Note: Available in Hive 4.0.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
column_constraint_specification:
: [ PRIMARY KEY|UNIQUE|NOT NULL|DEFAULT [default_value]|CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]
default_value:
: [ LITERAL|CURRENT_USER()|CURRENT_DATE()|CURRENT_TIMESTAMP()|NULL ]
constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
[, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
[, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
[, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]
说明如下:
- 如果创建的表为空表,则可以不创建列
- constraint_specification:用于指定字段约束
- CLUSTERED BY:用于创建分桶表,指定分桶字段、分桶数量、桶排序字段
- [SKEWED BY (col_name, col_name, …) – (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, …), (col_value, col_value, …), …)
[STORED AS DIRECTORIES] - SKEWED BY:用于创建倾斜表,解决Hive数据倾斜问题。其中SKEWED BY (col_name, col_name, …)指定造成数据倾斜的字段,ON ((col_value, col_value, …), (col_value, col_value, …), …)指定数据倾斜字段中造成数据倾斜的值,STORED AS DIRECTORIES:将数据倾斜字段中造成数据倾斜的值,拆分成文件夹,不指定则拆分成文件
- ROW FORMAT:用于序列化对象
- STORED AS:用于创建表时指定Hive表的文件储存格式
- AS select_statement:用于在创建表的同时,将查询结果插入表中
Serde序列化反序列化
通过Serde对表中的每一行进行读取和写入。读取HDFS数据会进行序列化成为字节流,插入数据会进行反序列化。自定义的需要指定row_format的值为Serde,并指定Serde实现类;内置的需要指定row_format的值为DELIMITED。常用的内置Serde
内置Serde | 介绍 |
---|---|
FIELDS TERMINATED BY char [ESCAPED BY char] | FIELDS TERMINATED指定字段分隔符,ESCAPED指定转义符,避免数据中存在与字段分隔符一样的字符 |
COLLECTION ITERMS TERMINATED BY char | 指定集合中元素的分隔符,集合类型有MAP、ARRAY和STRUCT |
MAP KYS TERMINATED BY char | 指定MAP中key和value的分隔符 |
LINES TERMINATED BY char | 指定行分隔符 |
NULL DEFINED AS char | 自定义空值格式,默认为’\N’ |
表属性
Hive支持自定义表属性和预定义表属性。预定义表属性必须是Hive规定的key和value,预定义表属性如下:
属性 | 值 | 描述 |
---|---|---|
comment | table_comment | 表描述 |
hbase.table.name | table_name | 集成HBase |
immutable | true或false | 如果为true,则无法通过insert实现数据的更新和插入 |
orc.compress | ZLIB或SNAPPY或NONE | 指定ORC压缩方式 |
transactional | true或false | 指定表是否支持ACID(更新、插入、删除) |
NO_AUTO_COMPACTION | true或false | 表事务属性,指定表是否支持自动紧缩 |
compactor.mapreduce.map.memory.mb | mapper_memory | 表事务属性,指定紧缩map(内存/MB)作业的属性 |
compactorthreshold.hive.compactor.delta.num.thresthold | threshold_num | 表事务属性,如果有超过threshold_num个增量目录,则触发轻度紧缩 |
compactorthreshold.hive.compactor.delta.pct.threthold | threshold_pct | 表事务属性,如果增量文件的大小与基础文件的大小比率大于threshold_pct(区间为0~1),则触发深度紧缩 |
auto.purge | true或false | 如果为true,则删除或覆盖的数据会不经过回收站,直接被删除 |
EXTERNAL | true或false | 内部表和外部表的转换 |
创建表示例如下
0: jdbc:hive2://hive1:10000> create external table if not exists default.person_table(
. . . . . . . . . . . . . .> id int comment '人员id',
. . . . . . . . . . . . . .> name string,
. . . . . . . . . . . . . .> salary float,
. . . . . . . . . . . . . .> hobby array<string>,
. . . . . . . . . . . . . .> deduction map<string, float>,
. . . . . . . . . . . . . .> address struct<street:string, city:string>
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by ','
. . . . . . . . . . . . . .> collection items terminated by '_'
. . . . . . . . . . . . . .> map keys terminated by ':'
. . . . . . . . . . . . . .> lines terminated by '\n'
. . . . . . . . . . . . . .> stored as textfile
. . . . . . . . . . . . . .> location 'hdfs://nnha/user/hive/warehouse/default.db/person_table/'
. . . . . . . . . . . . . .> tblproperties('comment'='person table');
No rows affected (3.24 seconds)
0: jdbc:hive2://hive1:10000>
如果创建内部表,则没有external子句,且location子句也不是必须的
2.2 查看数据表
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> show tables like 'person*';
+---------------+
| tab_name |
+---------------+
| person_table |
+---------------+
1 row selected (0.334 seconds)
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> desc formatted person_table;
+-------------------------------+----------------------------------------------------+-----------------------+
| col_name | data_type | comment |
+-------------------------------+----------------------------------------------------+-----------------------+
| # col_name | data_type | comment |
| id | int | 人员id |
| name | string | |
| salary | float | |
| hobby | array<string> | |
| deduction | map<string,float> | |
| address | struct<street:string,city:string> | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | default | NULL |
| OwnerType: | USER | NULL |
| Owner: | root | NULL |
| CreateTime: | Tue Feb 22 11:05:45 CST 2022 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://nnha/user/hive/warehouse/default.db/person_table | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | bucketing_version | 2 |
| | comment | person table |
| | transient_lastDdlTime | 1645499145 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | collection.delim | _ |
| | field.delim | , |
| | line.delim | \n |
| | mapkey.delim | : |
| | serialization.format | , |
+-------------------------------+----------------------------------------------------+-----------------------+
37 rows selected (0.998 seconds)
0: jdbc:hive2://hive1:10000>
- formatted表示显示详细信息
2.3 修改数据表
只是修改数据表的元数据,数据表中的数据不会随之变化
- 重命名数据表
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> alter table person_table rename to person_table_new;
No rows affected (2.535 seconds)
0: jdbc:hive2://hive1:10000>
- 修改数据表属性
0: jdbc:hive2://hive1:10000> alter table person_table_new set tblproperties('comment' = 'new person table');
No rows affected (0.454 seconds)
0: jdbc:hive2://hive1:10000>
- 修改数据表列
语法如下:
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type
[COMMENT col_comment] [FIRST|AFTER column_name];
如果表中每个列的数据类型不一致,则无法使用FIRST|AFTER column_name
子句
0: jdbc:hive2://hive1:10000> alter table person_table_new change name person_name varchar(64) comment '人员名字';
No rows affected (0.522 seconds)
0: jdbc:hive2://hive1:10000>
- 添加数据表列
语法如下:
ALTER TABLE table_name
ADD COLUMNS (col_name data_type [COMMENT col_comment], ...)
示例如下
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> alter table person_table_new add columns (age int comment '年龄');
No rows affected (0.414 seconds)
0: jdbc:hive2://hive1:10000>
- 替换数据表列
替换数据表的所有列,语法如下:
ALTER TABLE table_name
REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
示例如下:
0: jdbc:hive2://hive1:10000> alter table person_table_new replace columns(username string comment '用户名', password string comment '密码');
No rows affected (0.365 seconds)
0: jdbc:hive2://hive1:10000>
2.4 删除数据表
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> drop table if exists person_table_new purge;
No rows affected (0.789 seconds)
0: jdbc:hive2://hive1:10000>
- purge表示当删除表时,内部表的数据不会放入回收站,内部表的数据直接被删除,不能恢复。不使用purge则表示删除表时,内部表的数据会移动到回收站,但回收站功能需要HDFS开启,在core-site.xml添加如下内容,然后重启Hadoop集群。这样删除的数据会放到默认的回收站目录/user/root/.Trash/Current
<property>
<name>fs.trash.interval</name>
<value>1440</value>
<description>回收站保留数据的时间,单位为分钟</description>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>60</value>
<description>回收站清空到期数据的时间间隔,单位为分钟</description>
</property>
3. 分区表
创建分区表
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> create table if not exists default.partitioned_table(
. . . . . . . . . . . . . .> username string,
. . . . . . . . . . . . . .> age int
. . . . . . . . . . . . . .> ) partitioned by (
. . . . . . . . . . . . . .> province string comment '省份',
. . . . . . . . . . . . . .> city string
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by ','
. . . . . . . . . . . . . .> lines terminated by '\n'
. . . . . . . . . . . . . .> stored as textfile
. . . . . . . . . . . . . .> tblproperties('comment'='分区表');
0: jdbc:hive2://hive1:10000>
- 分区表创建完成,则不能修改和添加分区字段
查询分区表的分区
语法如下:
SHOW PARTITIONS [db_name.]table_name [PARTITION(partition_column = partition_col_value,
partition_column = partition_col_value, ...)];
示例如下:
0: jdbc:hive2://hive1:10000> show partitions default.partitioned_table;
+------------+
| partition |
+------------+
+------------+
No rows selected (0.554 seconds)
0: jdbc:hive2://hive1:10000>
添加实际分区
语法如下:
ALTER TABLE table_name ADD [IF NOT EXISTS]
PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...) [LOCATION 'location']
[, PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...) [LOCATION 'location'], ...];
示例如下
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> alter table default.partitioned_table add
. . . . . . . . . . . . . .> partition (province='HuNan', city='ChangSha') location 'hdfs://nnha/user/hive/warehouse/default.db/partitioned_table'
. . . . . . . . . . . . . .> partition (province='GuangDong', city='GuangZhou') location 'hdfs://nnha/user/hive/warehouse/default.db/partitioned_table';
No rows affected (2.439 seconds)
0: jdbc:hive2://hive1:10000>
HDFS上并没有新增province=GuangDong/city=GuangZhou和province=HuNan/city=ChangSha两个目录
重命名分区
语法如下:
ALTER TABLE table_name
PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...) RENAME TO
PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...);
示例如下
0: jdbc:hive2://hive1:10000> alter table default.partitioned_table
. . . . . . . . . . . . . .> partition (province='GuangDong', city='GuangZhou')
. . . . . . . . . . . . . .> rename to
. . . . . . . . . . . . . .> partition (province='GuangDong', city='ShenZhen');
No rows affected (5.675 seconds)
0: jdbc:hive2://hive1:10000>
移动分区
将分区表A的分区1,移动到另一个具有相同表结构的分区表B中(分区表B不能具有分区1)。语法如下:
ALTER TABLE table_name_B EXCHANGE PARTITION
(partition_column = partition_col_value, partition_column = partition_col_value, ...)
WITH TABLE table_name_A;
创建目标分区表
0: jdbc:hive2://hive1:10000> create table if not exists default.partitioned_table_B(
. . . . . . . . . . . . . .> username string,
. . . . . . . . . . . . . .> age int
. . . . . . . . . . . . . .> ) partitioned by (
. . . . . . . . . . . . . .> province string comment '省份',
. . . . . . . . . . . . . .> city string
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by ','
. . . . . . . . . . . . . .> lines terminated by '\n'
. . . . . . . . . . . . . .> stored as textfile
. . . . . . . . . . . . . .> tblproperties('comment'='目标分区表');
No rows affected (0.647 seconds)
0: jdbc:hive2://hive1:10000>
查看源表的分区情况
0: jdbc:hive2://hive1:10000> show partitions default.partitioned_table;
+-----------------------------------+
| partition |
+-----------------------------------+
| province=GuangDong/city=ShenZhen |
| province=HuNan/city=ChangSha |
+-----------------------------------+
2 rows selected (0.628 seconds)
0: jdbc:hive2://hive1:10000>
执行移动分区,并查看源表和目标表的分区情况
0: jdbc:hive2://hive1:10000> alter table default.partitioned_table_B exchange partition
. . . . . . . . . . . . . .> (province='GuangDong', city='ShenZhen')
. . . . . . . . . . . . . .> with table default.partitioned_table;
No rows affected (1.479 seconds)
0: jdbc:hive2://hive1:10000> show partitions default.partitioned_table;
+-------------------------------+
| partition |
+-------------------------------+
| province=HuNan/city=ChangSha |
+-------------------------------+
1 row selected (0.361 seconds)
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> show partitions default.partitioned_table_B;
+-----------------------------------+
| partition |
+-----------------------------------+
| province=GuangDong/city=ShenZhen |
+-----------------------------------+
1 row selected (0.719 seconds)
0: jdbc:hive2://hive1:10000>
删除分区
删除分区表的实际分区,语法如下:
ALTER TABLE table_name DROP [IF EXISTS] PARTITION
(partition_column = partition_col_value, partition_column = partition_col_value, ...)
[PURGE];
不使用PURGE,则分区的数据会放入回收站(不包含元数据)。使用PURGE,则分区的数据不放入回收站,直接删除
示例如下:
0: jdbc:hive2://hive1:10000> alter table default.partitioned_table_B drop if exists partition
. . . . . . . . . . . . . .> (province='GuangDong', city='ShenZhen');
No rows affected (4.819 seconds)
0: jdbc:hive2://hive1:10000>
修改表或分区的数据储存位置
语法如下:
ALTER TABLE table_name
[PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...)]
SET LOCATION "new location";
修改表或分区的file format
语法如下:
ALTER TABLE table_name
[PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...)]
SET FILEFORMAT file_format;
4. 分桶表
有时由于数据倾斜问题,导致一个分区的数据很多。需要进行分桶,将某一列或某几列的数据,进行Hash取模的方式,将数据随机均匀的发送到各个桶文件
创建分桶表
0: jdbc:hive2://hive1:10000> create table if not exists default.clustered_table(
. . . . . . . . . . . . . .> username string,
. . . . . . . . . . . . . .> age int,
. . . . . . . . . . . . . .> province string,
. . . . . . . . . . . . . .> city string
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> clustered by (province, city) sorted by (username asc, age desc) into 5 buckets
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by ','
. . . . . . . . . . . . . .> lines terminated by '\n'
. . . . . . . . . . . . . .> stored as textfile
. . . . . . . . . . . . . .> tblproperties('comment'='分桶表');
No rows affected (1.72 seconds)
0: jdbc:hive2://hive1:10000>
- 当插入数据时,会在分桶表的储存目录下生成5个数据小文件
6. 临时表
临时表只对当前会话可见,数据储存在临时目录,会话结束则删除。创建临时表示例如下:
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> create temporary table if not exists default.temporary_table(
. . . . . . . . . . . . . .> username string,
. . . . . . . . . . . . . .> age int,
. . . . . . . . . . . . . .> province string,
. . . . . . . . . . . . . .> city string
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> row format delimited
. . . . . . . . . . . . . .> fields terminated by ','
. . . . . . . . . . . . . .> lines terminated by '\n'
. . . . . . . . . . . . . .> stored as textfile
. . . . . . . . . . . . . .> tblproperties('comment'='临时表');
No rows affected (3.405 seconds)
0: jdbc:hive2://hive1:10000>
通过desc formatted default.temporary_table
查看Location的值为:hdfs://nnha/tmp/hive/root/5c1b0948-7c2b-4c18-a945-fbac7fc3e463/_tmp_space.db/d21677bb-a747-400f-a396-698bf26b406c。该路径为hive-site.xml配置的hive.exec.scratchdir参数
注意:
- 临时表不支持分区表
- 临时表不支持索引
7. 视图
视图的数据其实就是一条select查询的结果集。视图中的数据并不会进行实际的储存。
不能向视图插入数据。视图可以基于其它视图进行创建
创建视图
创建视图的语法如下:
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name
[(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ...;
- 视图中列的数据类型,和查询的结果列数据类型一致
示例如下:
0: jdbc:hive2://hive1:10000> create view if not exists default.view_table (
. . . . . . . . . . . . . .> username,
. . . . . . . . . . . . . .> age,
. . . . . . . . . . . . . .> province,
. . . . . . . . . . . . . .> upper_city comment '大写的城市'
. . . . . . . . . . . . . .> )
. . . . . . . . . . . . . .> comment '视图'
. . . . . . . . . . . . . .> tblproperties('my_create_date'='2022-03-07', 'my_author'='Tom')
. . . . . . . . . . . . . .> as select username, age, province, upper(city) from default.clustered_table;
No rows affected (2.589 seconds)
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> desc default.view_table;
+-------------+------------+----------+
| col_name | data_type | comment |
+-------------+------------+----------+
| username | string | |
| age | int | |
| province | string | |
| upper_city | string | 大写的城市 |
+-------------+------------+----------+
4 rows selected (0.236 seconds)
0: jdbc:hive2://hive1:10000>
查看视图
0: jdbc:hive2://hive1:10000> show views from default like 'view*';
+-------------+
| tab_name |
+-------------+
| view_table |
+-------------+
1 row selected (0.128 seconds)
0: jdbc:hive2://hive1:10000>
修改视图
- 修改视图属性
语法如下:
ALTER VIEW [db_name.]view_name SET TBLPROPERTIES
(property_name = property_value, property_name = property_value, ...);
示例如下:
0: jdbc:hive2://hive1:10000> alter view default.view_table set tblproperties
. . . . . . . . . . . . . .> ('my_author'='Mary');
No rows affected (0.576 seconds)
0: jdbc:hive2://hive1:10000>
- 修改视图结构
修改视图的结构,就是修改视图的select查询语句。会覆盖原来的视图结构
语法如下:
ALTER VIEW [db_name.]view_name AS select_statement;
示例如下:
0: jdbc:hive2://hive1:10000> alter view default.view_table
. . . . . . . . . . . . . .> as select username from default.clustered_table;
No rows affected (0.863 seconds)
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> desc default.view_table;
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| username | string | |
+-----------+------------+----------+
1 row selected (0.192 seconds)
0: jdbc:hive2://hive1:10000>
删除视图
0: jdbc:hive2://hive1:10000> drop view if exists default.view_table;
No rows affected (0.648 seconds)
0: jdbc:hive2://hive1:10000>
0: jdbc:hive2://hive1:10000> show views in default like 'view*';
+-----------+
| tab_name |
+-----------+
+-----------+
No rows selected (0.185 seconds)
0: jdbc:hive2://hive1:10000>
视图B基于视图A,删除视图A,不会发出警告
8. 索引
创建索引的列称为索引列,通过索引列进行查询时,可以避免全表扫描或全分区扫描。
Hive的索引其实是一张索引表。表中储存了索引列的值、索引列的值对应文件在HDFS的Path、索引列的值在该文件的position。这样就可以只扫描部分数据
创建索引的语法如下:
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS index_type
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"];
- AS index_type:用于指定索引类型
- WITH DEFERRED REBUILD:用于重建索引
- IDXPROPERTIES:用于指定索引属性
- IN TABLE index_table_name:用于指定索引表的名称
- ROW FORMAT:用于序列化行对象
- STORED AS:用于指定储存格式。如RCFILE或SEQUENCFILE文件格式
- STORED BY:用于指定储存方式,例如将索引表储存在HBase中
- LOCATION hdfs_path:用于指定索引表在HDFS的储存位置
- TBLPROPERTIES:用于指定索引表属性
- COMMENT “index comment”:用于指定索引描述
示例如下: