方言を話すおしゃべり猫型ロボット『ミーア』をリリースしました(こちらをクリック)

[Development] Reflections on Developing a Large Function: Priorities for Function Development and Testing

development-test-priority
This article can be read in about 15 minutes.

Introduction.

Developing “Mia,” a talking cat-shaped robot that speaks in various dialects.

Mia
The talking cat-shaped robot Mia shares sadness and joy with more than 100 different rich expressions. It also speaks in...

As the next big feature after the beta release,

Arbitrary text voice playback function

When the user enters a phrase that he/she wants Mia to speak, along with the playback time, the app will play back the phrase at the specified time.

and had been developing it for the past two weeks.

It is a feature like Alexa’s voice reminder.

We received the following request from a user who actually used the system, so we decided to develop it as it seemed to be highly scalable.

I would like to have a mode where I can freely customize the wording & even specify the time.

It could be used as a reminder to check for forgotten items before going to school, or it could be used as a study tool by entering proverbs or miscellaneous knowledge.

Although we initially assumed that it would take about a month, we were able to finish the development in about half the time (currently waiting for PR review), which was good in itself, but I would like to write this as a reminder of my reflections (mindset and how to proceed efficiently with the development) after developing a large function this time. (mindset and how to proceed with development efficiently) as a reminder.

Actual development: Voice playback of arbitrary text input by the application

First, I will describe the actual development in a test format.

  • Until now, users could only select one of the default phrases (5 personalities and 4 dialects) that we created and play the phrase stored in that phrase type, but the application allows you to enter the phrase you want Mia to speak, along with the playback time, and at that time Mia speaks the phrase.
  • You can choose to make your phrases public or private, and if you make them public, other users will be able to import your phrases, similar to Slack’s emoji importing.
  • Once a phrase is created, it is synthesized into speech and an emotion ID (joy, anger, sadness, etc.) is assigned based on the content of the phrase, and an eye image matching that emotion is displayed during speech playback.

Basically, it is a CRUD function that creates an arbitrary phrase, but the complexity has increased in that it allows the user to set a playback schedule (date and time) for that phrase, make it public and private, and import from a public phrase.

Display (Get)
・ Your phrases
If you have set a schedule, the schedule will appear in the subtext. If, after importing a public phrase, the person who created the public phrase later makes the public phrase private, the imported public phrase cannot be edited and an error message appears in the subtext. Deletion is possible.
・Public phrases
Public phrases are displayed.
・Infinite scroll
Scrolling down scrolls through 20 items at a time.

Create
・Create phrase text, set public/private, and reflect the result.
→A new record is created only for the user_phrases record.

・Create phrase text + schedule
Cannot create only one or the other of playback date and time Once a schedule is set, it is reflected.

・expression_id: Asynchronously assigned an expression_id based on the created phrase.
・voice_path: Based on the created phrase, the voice is asynchronously synthesized and uploaded to S3, and the voice_path is assigned.

Change (Update)
Change phrase text only Can change phrase text Can add a schedule
In this case, delete the phrase text and create a new one.

Delete
・Phrase text only: Records are deleted from the user_phrases table.
・Phrase text + schedule: Data is deleted from both the user_phrases and phrase_schedule tables.

Importing public phrases (Copy )
・You can import public phrases.
The imported public phrase will be displayed as your phrase. In this case, phrase_schedule can be added, but phrase text cannot be edited

Device (ESP32)
・Phrases with defined schedule: Phrases are played back audibly at playback time with eye expressions.
・Phrases with no defined schedule: audio playback at talk frequency intervals between start and end of talk time, staggered by 2 minutes to avoid conflicts with phrase type (personality/dialect).

Some of the actual development was described in the following article.

Reflections on developing larger functions

In the past, perhaps we have developed functions of the same level of complexity and magnitude, but the functions we developed this time were quite a handful.

begin with

As previously mentioned in this article, heavy tasks tend to be put off, so it is important to do them anyway with a heavy heart.

The reason why you feel that this task is heavy is because you are not sure about the task or have not broken it down into doable man-hours, so the task as a whole seems somewhat large, but when you actually start it, it may not be that heavy. When you actually start the task, it may not be that heavy.

On the other hand, there are not a few cases where there is more work to be done along the way than imagined, and the work becomes heavy.

Concentrate on

Other minor developments were completed more quickly, so it felt like work was done. However, since the final efficiency is better if all work is done as much as possible in a single task rather than multitasking, this time all other minor developments were put off to concentrate on the larger features.

It was quite a challenge, with quite a few bugs along the way, and progress being made and not being made, but I felt like I was climbing the mountain step by step.

Basically, since the mornings are better for performance with a clear head, heavy development was done in the mornings, and sometimes continued in the afternoons, while other detailed tasks were done in the afternoons.

Brush up functionality and testing while pre-development

Ideally, it would be possible to create the entire design and use of the system before development, but in the case of complex structures and processes, it is nearly impossible to foresee everything at the initial design stage, depending on the competence of the engineer, of course.

A very few super-engineers might be able to envision everything in their minds in advance and implement code in the shortest possible time, even if it is a complex feature, but at least I am not one of them, so I developed the code as I went along, adding additional features and tests as I noticed them during the actual development process. I developed the code as I went along, adding additional features and tests as I noticed them along the way.

Only the DB design was decided first, after consulting with the engineers developing the project together.

There are two more features and implementations that you’ll notice along the way.

  • As you develop, you may come up with additional features that you think would be better. and come up with additional functionality.
  • If this was not taken into account in the initial development

In this case, for the first one, we thought, “Wouldn’t it be interesting if we made it possible to make phrases public or private, and for public phrases, if we could include interesting phrases created by other users? We added this feature in the middle of the project.

As for the second one.

  • If the number of phrases created becomes too large, add an infinite scroll or paging function to retrieve the phrases 20 at a time.
  • For the created phrase, we also need to match the expression of the eyes, so we need to add a dedicated expression_id
  • Phrases can now be scheduled, so you can create a phrase only, create a phrase + schedule, create a phrase only and add a schedule after the fact, and so on.

These include.

In this case, we first developed the test code and API for the phrase CRUD functionality, but then realized that we needed to take into account the phrase schedule and the inclusion of public phrases, so we proceeded with the development of the functionality through trial and error. After all possible patterns were cleared, the code was refactored and the test code was finalized.

summary

I am sure that the same level or even heavier development will come up in the future, but at that time, I would like to look back on this article and try to “not put it off, but concentrate on it in the morning and squash any initial omissions of consideration each time while developing it”.

Copied title and URL