方言を話すおしゃべり猫型ロボット『ミーア』をリリースしました(こちらをクリック)

[AWS] How to manage access restrictions on text-to-speech files using S3 signed URLs.

aws-s3-presigned-url
This article can be read in about 15 minutes.

Introduction.

Developing “Mia,” a talking cat-shaped robot that speaks various dialects.

https://mia-cat.com/en

Currently, as the next feature after the beta release, we are developing a function that will allow users to input any text they want Mia to speak in the application, along with the playback time, and ESP32 will play the audio at that time.

The text created by the user is sent from the application to the server side via API request, and after text-to-speech synthesis on the server side, it is stored in the S3 folder under each user’s directory in AWS.

However, since this audio phrase is under the user directory, access restrictions are the default, and as it is, audio cannot be downloaded from ESP32.

Therefore, we would like to use a Pre signed URL this time.

What is a presign URL?

The presign URL is a URL that provides temporary access to an object in a cloud storage service (e.g., Amazon S3).

This URL is signed with specific permissions and expiration dates, allowing you to securely download or upload files without direct access to your cloud storage credentials.

AWS official site

https://docs.aws.amazon.com/ja_jp/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html

How the presign URL works

A presign URL works by embedding a signature and other parameters into the URL itself. This signature is generated using cloud storage credentials and ensures that the URL can only be used within specified constraints.

Specific examples of query parameters

The presign URL usually contains the following query parameters

  • X-Amz-Algorithm: Algorithm used for signature (e.g. AWS4-HMAC-SHA256)
  • X-Amz-Credential: AWS credentials used for signing
  • X-Amz-Date: Date and time the request was created
  • X-Amz-Expires: URL expiration date in seconds
  • X-Amz-SignedHeaders: Header information included in the signature
  • X-Amz-Signature: Signature of the request
ShellScript
https://your-bucket-name.s3.amazonaws.com/your-object-key?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=YOUR_CREDENTIALS&X-Amz-Date=20240724T123456Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=YOUR_SIGNATURE

Presign URL expiration date setting

The presign URL can specify an expiration date for the URL.

This is an important feature to limit how long access is allowed. In the case of this implementation, the device shadow is updated immediately after the presign URL is generated by the server, and ESP32 starts downloading as soon as the device shadow is updated, so the expiration time should be set short (e.g., 2 minutes).

Now that you understand the concept, I would like to get into the implementation.

Server side (Go)

Mia synthesizes the text entered by the user into speech and stores it in AWS S3. At that time, a presign URL is generated to access the voice file from ESP32.

Presign URL generation function

  1. Loading AWS Config: Load AWS configuration.
  2. Initialize S3 client: Configure S3 client.
  3. Generate presign URL: Generate a presign URL with the specified bucket and object key.

synthesize_speech.go

Go
// presigned URL生成
func GeneratePresignedURL(ctx context.Context, bucketName, key string, expiry time.Duration) (string, error) {
	awsCfg, err := config.LoadDefaultConfig(ctx)
	if err != nil {
		return "", fmt.Errorf("failed to load AWS config: %v", err)
	}
	s3Client := s3.NewFromConfig(awsCfg)

	presignClient := s3.NewPresignClient(s3Client)
	req, err := presignClient.PresignGetObject(ctx, &s3.GetObjectInput{
		Bucket: aws.String(bucketName),
		Key:    aws.String(key),
	}, s3.WithPresignExpires(expiry))
	if err != nil {
		return "", fmt.Errorf("failed to generate presigned URL: %v", err)
	}
	return req.URL, nil
}

User-defined phrase processing functions

  1. Set time zone: Load JST (Japan Standard Time).
  2. Get Current Time: Get the current time in JST and format.
  3. Get schedule from database: Get user’s phrase information.
  4. Generate pres ign URL: Generate presign URL with the obtained voice_path.
  5. Device shadow update: Update device shadow with presign URL.

worker.go

Go

// ユーザー定義フレーズの処理
func ProcessUserPhraseMessage(ctx context.Context, db *sqlx.DB, message Message, config *Config, shadowManager ShadowManager) {
	// タイムゾーンをロード
	jst, err := time.LoadLocation("Asia/Tokyo")
	if err != nil {
		log.Fatalf("Failed to load the 'Asia/Tokyo' time zone: %v", err)
	}

	// 現在の時間をJSTで取得
	currentTimeJST := time.Now().In(jst)

	// クエリに使用する時刻と曜日をデバッグログに出力
	formattedTime := currentTimeJST.Format("15:04")
	weekday := currentTimeJST.Weekday().String()[:3]
	log.Printf("Query Time: %s", formattedTime)
	log.Printf("Query Weekday: %s", weekday)

	// スケジュールからユーザーフレーズ情報を取得
	var schedule struct {
		VoicePath string `db:"voice_path"`
	}
	err = db.Get(&schedule, "SELECT up.voice_path FROM phrase_schedules ps JOIN user_phrases up ON ps.phrase_id = up.id WHERE ps.user_id = ? AND TIME_FORMAT(ps.time, '%H:%i') = ? AND FIND_IN_SET(?, ps.days) > 0;", message.UserID, formattedTime, weekday)
	if err != nil {
		log.Printf("Failed to fetch user phrase schedule for user %d: %v", message.UserID, err)
		return
	}

	// デバッグログ: 取得したvoice_pathを出力
	log.Printf("Fetched voice path for user %d: %s", message.UserID, schedule.VoicePath)

	// プリサインドURLの生成
	s3Url, err := GeneratePresignedURL(ctx, config.AWSS3ApiBucket, schedule.VoicePath, 2*time.Minute)
	if err != nil {
		log.Printf("Failed to generate presigned URL for user %d: %v", message.UserID, err)
		return
	}
	log.Printf("Generated presigned URL: %s", s3Url)

	// デバイスシャドウを更新
	user, err := GetUser(db, message.UID)
	if err != nil {
		log.Printf("Failed to get user info for device shadow update: %v", err)
		return
	}
	log.Printf("Updating device shadow for user %d with presigned URL %s", message.UserID, s3Url)
	err = UpdateDeviceShadow(ctx, shadowManager, user.DeviceID.V, s3Url, "user_phrase")
	if err != nil {
		log.Printf("Failed to update device shadow for user %d: %v", message.UserID, err)
		return
	}
	log.Printf("Scheduled task completed for user: %d", message.UserID)
}

operation check

Server-side log

  • Get the user’s audio file path.
  • Generate presign URL valid for 2 minutes.
  • Update presign URL to device shadow.
  • Record that the process is complete.
ShellScript
clocky_api_local  | 2024/07/23 21:44:00 Fetched voice path for user 1: users/1/user_phrase/user_phrase_20240721-140517.mp3
clocky_api_local  | 2024/07/23 21:44:00 Generated presigned URL: https://mia-dev-api.s3.ap-northeast-1.amazonaws.com/users/1/user_phrase/user_phrase_20240721-140517.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXX%2F20240723%2Fap-northeast-1%2Fs3%2Faws4_request&X-Amz-Date=20240723T214400Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&x-id=GetObject&X-Amz-Signature=XXX
clocky_api_local  | 2024/07/23 21:44:00 Updating device shadow for user 1 with presigned URL https://mia-dev-api.s3.ap-northeast-1.amazonaws.com/users/1/user_phrase/user_phrase_20240721-140517.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXX%2F20240723%2Fap-northeast-1%2Fs3%2Faws4_request&X-Amz-Date=20240723T214400Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&x-id=GetObject&X-Amz-Signature=XXX
clocky_api_local  | 2024/07/23 21:44:00 Scheduled task completed for user: 1

device-side log

  • Device receives MQTT message and detects device shadow update including presign URL.
  • This causes the ESP32 to begin downloading the audio file.
ShellScript
06:45:05.157 > MQTTPubSubClient::onMessage: $aws/things/XXX/shadow/update/delta {"version":391,"timestamp":XXX,"state":{"config":{"user_phrase_audio_url":"https://mia-dev-api.s3.ap-northeast-1.amazonaws.com/users/1/user_phrase/user_phrase_20240721-140517.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=XXX%2F20240723%2Fap-northeast-1%2Fs3%2Faws4_request&X-Amz-Date=20240723T214400Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&x-id=GetObject&X-Amz-Signature=XXX"}},"metadata":{"config":{"user_phrase_audio_url":{"timestamp":XXX}}}}

I was able to confirm that if I click directly on the user_phrase_audio_url sent by MQTT message, I can download the file if it is within the expiration date, and if it is past the expiration date, I get an access denied.

Copied title and URL