在多个行业的AI应用场景里,许多应用程序需要与不同多模态的内容交互和分析。其中一些应用程序需要处理复杂的文档,例如保险理赔单和医疗账单。移动应用还需要分析用户生成的一些图片、视频内容。企业需要在包括文档、图像、音频和视频文件的数字资产之上构建语义索引用于之后的数据搜索。然而从非结构化的多模态内容中获取洞察并不容易,大家需要为不同的数据格式处理、转换,并经过多个步骤才能获取所需的信息。这通常意味着我们方案中要用到多种模型,并且需要处理成本优化(如使用微调和提示工程)、安全防护(例如防止幻觉)、与第三方应用的集成(包括数据格式和API的统一)以及模型迭代更新。
为了简化这一过程,在亚马逊云科技re:Invent大会期间,Swami博士发布了Amazon Bedrock Data Automation。这是Amazon Bedrock的一项新功能,可简化从非结构化的多模态内容(如文档、图像、音频和视频)中提取有价值的数据总结、洞察和报告。使用Bedrock Data Automation,大家可以减少构建智能文档处理、媒体分析和其他多模态数据自动化解决方案的开发时间和工作量。
目前大家可以将Bedrock Data Automation作为独立功能使用,或者作为Amazon Bedrock知识库的解析器,用于索引多模态内容中的总结/报告,并在检索增强生成(RAG)场景中提升AI响应的相关性。
目前Bedrock Data Automation现已全面可用,并支持跨区域推理端点,使其能够在更多的亚马逊云科技区域使用,并利用不同区域的冗余计算资源。此外该服务还提升了准确性,并增加了对图像和视频中的Logo识别支持。下面请跟随小李哥,一起看看它在实际应用中的效果。
使用Amazon Bedrock Data Automation进行多模态总结
在本篇文章中,我们将展示如何将Bedrock Data Automation集成到大家的应用程序中,包括如何在控制台进行操作,如何通过代码进行API调用。当我们进入Amazon Bedrock控制台的Data Automation部分,首次访问时会要求确认是否启用跨区域支持。例如:
从API的角度来看,InvokeDataAutomationAsync这个API需要一个额外的参数(dataAutomationProfileArn)来指定要使用的Data Automation服务的配置文件。该参数的值取决于大家所在的区域和亚马逊云科技账户ID:
arn:aws:bedrock:<REGION>:<ACCOUNT_ID>:data-automation-profile/us.data-automation-v1
此外dataAutomationArn的参数名已更改为dataAutomationProjectArn,主要目的是为了更准确地反映其包含的是项目project的资源名称(ARN)。在调用Bedrock Data Automation时,现在需要指定一个项目project或蓝图blueprint。如果是使用蓝图,大家将可以配置自定义输出。若要继续使用标准默认输出,请将参数DataAutomationProjectArn配置为:
arn:aws:bedrock:<REGION>:aws:data-automation-project/public-default
从API的名字中可以看出,InvokeDataAutomationAsync操作是异步的。大家需要提前提供输入和输出配置,当响应内容生成后后,它会按照输出的配置写入特定的Amazon S3存储桶。大家还可以使用notificationConfiguration参数,通过Amazon EventBridge接收来自Bedrock Data Automation的通知。
使用Bedrock Data Automation时,大家可以通过以下两种方式配置模型的输出:
标准输出:根据预定义的信息提取指示,生成特定数据类型的总结,例如文档语义、视频章节摘要和音频转录。使用标准输出,大家只需要配置几个步骤就可以快速设置所需的洞察。
自定义输出:允许大家使用蓝图blueprint来定义特定的提取要求,以获取更符合特定需求的总结。
开始体验Data Automation功能
为了展示这些新功能,我创建了一个项目Project并配置了标准输出模式。对于文档,我选择输出纯文本而非Markdown格式。大家也可以使用Bedrock Data Automation API自动化这些配置步骤。
对于视频,我希望获取完整的音频转录文字以及整个视频的总结,同时为每个章节生成单独的标题和总结。
如果大家想配置蓝图blueprint,可以在Amazon Bedrock控制台导航栏的Data Automation部分选择“自定义输出内容”的设置。在这里,我选择了“US-Driver-License”为主题的示例蓝图Blueprint。大家也可以在这里浏览其他示例的蓝图,以将AI内容总结功能应用到不同的场景里。
示例的蓝图无法进行编辑,因此我使用“Action”功能复制了该蓝图,并将其添加到我的项目Project中。然后我就可以通过修改该蓝图,如添加自定义字段,进一步优化数据提取的内容和方式,使其能够使用生成式AI来更好地提取或计算所需格式的数据。
接下来我将一张美国驾照的图片上传到S3存储桶中。然后我使用了以下Python示例代码,该代码段通过亚马逊云科技的Python SDK(Boto3)调用Bedrock Data Automation,从图片中提取文本信息。
import json
import sys
import time
import boto3
DEBUG = False
AWS_REGION = '<REGION>'
BUCKET_NAME = '<BUCKET>'
INPUT_PATH = 'BDA/Input'
OUTPUT_PATH = 'BDA/Output'
PROJECT_ID = '<PROJECT_ID>'
BLUEPRINT_NAME = 'US-Driver-License-demo'
# Fields to display
BLUEPRINT_FIELDS = [
'NAME_DETAILS/FIRST_NAME',
'NAME_DETAILS/MIDDLE_NAME',
'NAME_DETAILS/LAST_NAME',
'DATE_OF_BIRTH',
'DATE_OF_ISSUE',
'EXPIRATION_DATE'
]
# AWS SDK for Python (Boto3) clients
bda = boto3.client('bedrock-data-automation-runtime', region_name=AWS_REGION)
s3 = boto3.client('s3', region_name=AWS_REGION)
sts = boto3.client('sts')
def log(data):
if DEBUG:
if type(data) is dict:
text = json.dumps(data, indent=4)
else:
text = str(data)
print(text)
def get_aws_account_id() -> str:
return sts.get_caller_identity().get('Account')
def get_json_object_from_s3_uri(s3_uri) -> dict:
s3_uri_split = s3_uri.split('/')
bucket = s3_uri_split[2]
key = '/'.join(s3_uri_split[3:])
object_content = s3.get_object(Bucket=bucket, Key=key)['Body'].read()
return json.loads(object_content)
def invoke_data_automation(input_s3_uri, output_s3_uri, data_automation_arn, aws_account_id) -> dict:
params = {
'inputConfiguration': {
's3Uri': input_s3_uri
},
'outputConfiguration': {
's3Uri': output_s3_uri
},
'dataAutomationConfiguration': {
'dataAutomationProjectArn': data_automation_arn
},
'dataAutomationProfileArn': f"arn:aws:bedrock:{AWS_REGION}:{aws_account_id}:data-automation-profile/us.data-automation-v1"
}
response = bda.invoke_data_automation_async(**params)
log(response)
return response
def wait_for_data_automation_to_complete(invocation_arn, loop_time_in_seconds=1) -> dict:
while True:
response = bda.get_data_automation_status(
invocationArn=invocation_arn
)
status = response['status']
if status not in ['Created', 'InProgress']:
print(f" {status}")
return response
print(".", end='', flush=True)
time.sleep(loop_time_in_seconds)
def print_document_results(standard_output_result):
print(f"Number of pages: {standard_output_result['metadata']['number_of_pages']}")
for page in standard_output_result['pages']:
print(f"- Page {page['page_index']}")
if 'text' in page['representation']:
print(f"{page['representation']['text']}")
if 'markdown' in page['representation']:
print(f"{page['representation']['markdown']}")
def print_video_results(standard_output_result):
print(f"Duration: {standard_output_result['metadata']['duration_millis']} ms")
print(f"Summary: {standard_output_result['video']['summary']}")
statistics = standard_output_result['statistics']
print("Statistics:")
print(f"- Speaket count: {statistics['speaker_count']}")
print(f"- Chapter count: {statistics['chapter_count']}")
print(f"- Shot count: {statistics['shot_count']}")
for chapter in standard_output_result['chapters']:
print(f"Chapter {chapter['chapter_index']} {chapter['start_timecode_smpte']}-{chapter['end_timecode_smpte']} ({chapter['duration_millis']} ms)")
if 'summary' in chapter:
print(f"- Chapter summary: {chapter['summary']}")
def print_custom_results(custom_output_result):
matched_blueprint_name = custom_output_result['matched_blueprint']['name']
log(custom_output_result)
print('\n- Custom output')
print(f"Matched blueprint: {matched_blueprint_name} Confidence: {custom_output_result['matched_blueprint']['confidence']}")
print(f"Document class: {custom_output_result['document_class']['type']}")
if matched_blueprint_name == BLUEPRINT_NAME:
print('\n- Fields')
for field_with_group in BLUEPRINT_FIELDS:
print_field(field_with_group, custom_output_result)
def print_results(job_metadata_s3_uri) -> None:
job_metadata = get_json_object_from_s3_uri(job_metadata_s3_uri)
log(job_metadata)
for segment in job_metadata['output_metadata']:
asset_id = segment['asset_id']
print(f'\nAsset ID: {asset_id}')
for segment_metadata in segment['segment_metadata']:
# Standard output
standard_output_path = segment_metadata['standard_output_path']
standard_output_result = get_json_object_from_s3_uri(standard_output_path)
log(standard_output_result)
print('\n- Standard output')
semantic_modality = standard_output_result['metadata']['semantic_modality']
print(f"Semantic modality: {semantic_modality}")
match semantic_modality:
case 'DOCUMENT':
print_document_results(standard_output_result)
case 'VIDEO':
print_video_results(standard_output_result)
# Custom output
if 'custom_output_status' in segment_metadata and segment_metadata['custom_output_status'] == 'MATCH':
custom_output_path = segment_metadata['custom_output_path']
custom_output_result = get_json_object_from_s3_uri(custom_output_path)
print_custom_results(custom_output_result)
def print_field(field_with_group, custom_output_result) -> None:
inference_result = custom_output_result['inference_result']
explainability_info = custom_output_result['explainability_info'][0]
if '/' in field_with_group:
# For fields part of a group
(group, field) = field_with_group.split('/')
inference_result = inference_result[group]
explainability_info = explainability_info[group]
else:
field = field_with_group
value = inference_result[field]
confidence = explainability_info[field]['confidence']
print(f'{field}: {value or '<EMPTY>'} Confidence: {confidence}')
def main() -> None:
if len(sys.argv) < 2:
print("Please provide a filename as command line argument")
sys.exit(1)
file_name = sys.argv[1]
aws_account_id = get_aws_account_id()
input_s3_uri = f"s3://{BUCKET_NAME}/{INPUT_PATH}/{file_name}" # File
output_s3_uri = f"s3://{BUCKET_NAME}/{OUTPUT_PATH}" # Folder
data_automation_arn = f"arn:aws:bedrock:{AWS_REGION}:{aws_account_id}:data-automation-project/{PROJECT_ID}"
print(f"Invoking Bedrock Data Automation for '{file_name}'", end='', flush=True)
data_automation_response = invoke_data_automation(input_s3_uri, output_s3_uri, data_automation_arn, aws_account_id)
data_automation_status = wait_for_data_automation_to_complete(data_automation_response['invocationArn'])
if data_automation_status['status'] == 'Success':
job_metadata_s3_uri = data_automation_status['outputConfiguration']['s3Uri']
print_results(job_metadata_s3_uri)
if __name__ == "__main__":
main()
在脚本的初始配置中,需要指定输入和输出使用的S3存储桶名称、存储桶中保存文件的位置、结果输出的具体路径、用于获取Bedrock Data Automation自定义输出的项目Project ID,以及要在输出中显示的蓝图blueprint模板名称。
运行脚本并导入多模态的文件后,输出中会显示Bedrock Data Automation提取的具体信息。US-Driver-License的蓝图模板匹配其预定义要识别的关键字成功,并在输出中打印了驾驶执照上的姓名和日期。
python bda-ga.py bda-drivers-license.jpeg
Invoking Bedrock Data Automation for 'bda-drivers-license.jpeg'................ Success
Asset ID: 0
- Standard output
Semantic modality: DOCUMENT
Number of pages: 1
- Page 0
NEW JERSEY
Motor Vehicle
Commission
AUTO DRIVER LICENSE
Could DL M6454 64774 51685 CLASS D
DOB 01-01-1968
ISS 03-19-2019 EXP 01-01-2023
MONTOYA RENEE MARIA 321 GOTHAM AVENUE TRENTON, NJ 08666 OF
END NONE
RESTR NONE
SEX F HGT 5'-08" EYES HZL ORGAN DONOR
CM ST201907800000019 CHG 11.00
[SIGNATURE]
- Custom output
Matched blueprint: US-Driver-License-copy Confidence: 1
Document class: US-drivers-licenses
- Fields
FIRST_NAME: RENEE Confidence: 0.859375
MIDDLE_NAME: MARIA Confidence: 0.83203125
LAST_NAME: MONTOYA Confidence: 0.875
DATE_OF_BIRTH: 1968-01-01 Confidence: 0.890625
DATE_OF_ISSUE: 2019-03-19 Confidence: 0.79296875
EXPIRATION_DATE: 2023-01-01 Confidence: 0.93359375
同样我对另外一个视频文件运行了相同的代码脚本。为了使输出更简洁,我没有打印完整的音频转录或视频中的文字内容。
python bda.py mike-video.mp4
Invoking Bedrock Data Automation for 'mike-video.mp4'.......................................................................................................................................................................................................................................................................... Success
Asset ID: 0
- Standard output
Semantic modality: VIDEO
Duration: 810476 ms
Summary: In this comprehensive demonstration, a technical expert explores the capabilities and limitations of Large Language Models (LLMs) while showcasing a practical application using AWS services. He begins by addressing a common misconception about LLMs, explaining that while they possess general world knowledge from their training data, they lack current, real-time information unless connected to external data sources.
To illustrate this concept, he demonstrates an "Outfit Planner" application that provides clothing recommendations based on location and weather conditions. Using Brisbane, Australia as an example, the application combines LLM capabilities with real-time weather data to suggest appropriate attire like lightweight linen shirts, shorts, and hats for the tropical climate.
The demonstration then shifts to the Amazon Bedrock platform, which enables users to build and scale generative AI applications using foundation models. The speaker showcases the "OutfitAssistantAgent," explaining how it accesses real-time weather data to make informed clothing recommendations. Through the platform's "Show Trace" feature, he reveals the agent's decision-making process and how it retrieves and processes location and weather information.
The technical implementation details are explored as the speaker configures the OutfitAssistant using Amazon Bedrock. The agent's workflow is designed to be fully serverless and managed within the Amazon Bedrock service.
Further diving into the technical aspects, the presentation covers the AWS Lambda console integration, showing how to create action group functions that connect to external services like the OpenWeatherMap API. The speaker emphasizes that LLMs become truly useful when connected to tools providing relevant data sources, whether databases, text files, or external APIs.
The presentation concludes with the speaker encouraging viewers to explore more AWS developer content and engage with the channel through likes and subscriptions, reinforcing the practical value of combining LLMs with external data sources for creating powerful, context-aware applications.
Statistics:
- Speaket count: 1
- Chapter count: 6
- Shot count: 48
Chapter 0 00:00:00:00-00:01:32:01 (92025 ms)
- Chapter summary: A man with a beard and glasses, wearing a gray hooded sweatshirt with various logos and text, is sitting at a desk in front of a colorful background. He discusses the frequent release of new large language models (LLMs) and how people often test these models by asking questions like "Who won the World Series?" The man explains that LLMs are trained on general data from the internet, so they may have information about past events but not current ones. He then poses the question of what he wants from an LLM, stating that he desires general world knowledge, such as understanding basic concepts like "up is up" and "down is down," but does not need specific factual knowledge. The man suggests that he can attach other systems to the LLM to access current factual data relevant to his needs. He emphasizes the importance of having general world knowledge and the ability to use tools and be linked into agentic workflows, which he refers to as "agentic workflows." The man encourages the audience to add this term to their spell checkers, as it will likely become commonly used.
Chapter 1 00:01:32:01-00:03:38:18 (126560 ms)
- Chapter summary: The video showcases a man with a beard and glasses demonstrating an "Outfit Planner" application on his laptop. The application allows users to input their location, such as Brisbane, Australia, and receive recommendations for appropriate outfits based on the weather conditions. The man explains that the application generates these recommendations using large language models, which can sometimes provide inaccurate or hallucinated information since they lack direct access to real-world data sources.
The man walks through the process of using the Outfit Planner, entering Brisbane as the location and receiving weather details like temperature, humidity, and cloud cover. He then shows how the application suggests outfit options, including a lightweight linen shirt, shorts, sandals, and a hat, along with an image of a woman wearing a similar outfit in a tropical setting.
Throughout the demonstration, the man points out the limitations of current language models in providing accurate and up-to-date information without external data connections. He also highlights the need to edit prompts and adjust settings within the application to refine the output and improve the accuracy of the generated recommendations.
Chapter 2 00:03:38:18-00:07:19:06 (220620 ms)
- Chapter summary: The video demonstrates the Amazon Bedrock platform, which allows users to build and scale generative AI applications using foundation models (FMs). [speaker_0] introduces the platform's overview, highlighting its key features like managing FMs from AWS, integrating with custom models, and providing access to leading AI startups. The video showcases the Amazon Bedrock console interface, where [speaker_0] navigates to the "Agents" section and selects the "OutfitAssistantAgent" agent. [speaker_0] tests the OutfitAssistantAgent by asking it for outfit recommendations in Brisbane, Australia. The agent provides a suggestion of wearing a light jacket or sweater due to cool, misty weather conditions. To verify the accuracy of the recommendation, [speaker_0] clicks on the "Show Trace" button, which reveals the agent's workflow and the steps it took to retrieve the current location details and weather information for Brisbane. The video explains that the agent uses an orchestration and knowledge base system to determine the appropriate response based on the user's query and the retrieved data. It highlights the agent's ability to access real-time information like location and weather data, which is crucial for generating accurate and relevant responses.
Chapter 3 00:07:19:06-00:11:26:13 (247214 ms)
- Chapter summary: The video demonstrates the process of configuring an AI assistant agent called "OutfitAssistant" using Amazon Bedrock. [speaker_0] introduces the agent's purpose, which is to provide outfit recommendations based on the current time and weather conditions. The configuration interface allows selecting a language model from Anthropic, in this case the Claud 3 Haiku model, and defining natural language instructions for the agent's behavior. [speaker_0] explains that action groups are groups of tools or actions that will interact with the outside world. The OutfitAssistant agent uses Lambda functions as its tools, making it fully serverless and managed within the Amazon Bedrock service. [speaker_0] defines two action groups: "get coordinates" to retrieve latitude and longitude coordinates from a place name, and "get current time" to determine the current time based on the location. The "get current weather" action requires calling the "get coordinates" action first to obtain the location coordinates, then using those coordinates to retrieve the current weather information. This demonstrates the agent's workflow and how it utilizes the defined actions to generate outfit recommendations. Throughout the video, [speaker_0] provides details on the agent's configuration, including its name, description, model selection, instructions, and action groups. The interface displays various options and settings related to these aspects, allowing [speaker_0] to customize the agent's behavior and functionality.
Chapter 4 00:11:26:13-00:13:00:17 (94160 ms)
- Chapter summary: The video showcases a presentation by [speaker_0] on the AWS Lambda console and its integration with machine learning models for building powerful agents. [speaker_0] demonstrates how to create an action group function using AWS Lambda, which can be used to generate text responses based on input parameters like location, time, and weather data. The Lambda function code is shown, utilizing external services like OpenWeatherMap API for fetching weather information. [speaker_0] explains that for a large language model to be useful, it needs to connect to tools providing relevant data sources, such as databases, text files, or external APIs. The presentation covers the process of defining actions, setting up Lambda functions, and leveraging various tools within the AWS environment to build intelligent agents capable of generating context-aware responses.
Chapter 5 00:13:00:17-00:13:28:10 (27761 ms)
- Chapter summary: A man with a beard and glasses, wearing a gray hoodie with various logos and text, is sitting at a desk in front of a colorful background. He is using a laptop computer that has stickers and logos on it, including the AWS logo. The man appears to be presenting or speaking about AWS (Amazon Web Services) and its services, such as Lambda functions and large language models. He mentions that if a Lambda function can do something, then it can be used to augment a large language model. The man concludes by expressing hope that the viewer found the video useful and insightful, and encourages them to check out other videos on the AWS developers channel. He also asks viewers to like the video, subscribe to the channel, and watch other videos.
更多关于多模态数据总结Data Automation的信息
Amazon Bedrock Data Automation现已在以下两个亚马逊云科技区域通过跨区域推理提供:美东(弗吉尼亚北部)和美西(俄勒冈)。在这些区域使用Bedrock Data Automation时,数据可以通过跨区域推理在以下四个区域中处理:美东(俄亥俄、弗吉尼亚北部)和美西(加利福尼亚北部、俄勒冈)。所有这些区域都位于美国,因此数据将在同一地理范围内处理。亚马逊计划在2025年晚些时候为欧洲和亚洲的更多区域提供支持。
在定价方面,与内测版本相比没有变化,并且使用跨区域推理不会额外收费。更多信息请访问Amazon Bedrock定价页面。此外Bedrock Data Automation现已支持多项安全、治理和管理功能,例如:
AWS KMS:支持客户管理的密钥对多模态数据加密,以实现更精细的加密控制。
AWS PrivateLink:可以直接在VPC中连接Bedrock Data Automation API,而无需通过互联网公网访问。
资源与作业标签:可用于跟踪成本,并在AWS IAM中强制执行基于标签的访问策略。
在本文中,我们使用了Python进行演示,但Bedrock Data Automation适用于所有亚马逊云科技不同语言SDK。例如大家可以使用Java、.NET或Rust构建后端文档处理应用;使用JavaScript开发Web应用来处理图像、视频或音频文件;或使用Swift构建处理终端移动应用。从多模态数据中获取总结利用云原生服务其实非常简单。