2022/05/31

Android automated testing to support one-week releases

Author:: kenken; amane; HiroseKoichiro; anzaiyuki

, 2022/05/31

Android automated testing to support one-week releases

* This article is a translation of the Japanese article written on December 10, 2021.

This article is for day 10 of Merpay Advent Calendar 2021.
Today’s article on Android automated testing to support a one-week release is brought to you by @amane, @kenken, @anzai, and @hiroP from the Merpay Android Team.

Background leading up to automation

In order to offer users greater value more quickly, we aimed to reduce the app release cycle for the Mercari app from once every two weeks, to once a week. (For more information on release cycle updates, refer to this article from @stamaki.)

In order to achieve this faster cycle, we needed to reduce regression testing from two days to one day.
We decided to automate manual testing to reduce the amount of work required.

Building a regression testing environment for Android

At Merpay, we run regression testing on devices once a night on Firebase Test Lab, on the master branch. This testing is also performed on branches that include "uitest" in the branch name.
We use a tool called TestRail to manage test cases and test tracing, and we use the TestRail API to register test results from CircleCI and Firebase Test Lab.

In terms of our testing framework, we use Espresso. We referred to the following document during implementation. Thank you very much to the author for sharing such crucial information!

Android UI Testing Starting with Espresso

Selecting test cases

When automating regression testing, you have to maintain the same quality of testing as the previous manual tests. This led us to automate the test cases designed by the QA Team, without modifying them.

However, our current system cannot automate cases that make use of the following device functions, so these are handled manually.

Booting up the camera
Using NFC
Tapping a push notification
Using position information from Google Maps
Checking screen brightness

Regression testing implementation goal and results

There are currently 238 regression tests that were designed by the QA Team, and it had taken around 24 hours (roughly 12 hours per person) in total to manually perform them.

In order to complete regression testing in one day, we continued to introduce automation with the goal of reducing the number of manual tests to 120 or fewer, and reducing the total time to run these tests to 16 hours (eight hours per person) or fewer.

We were able to automate 146 of the 238 tests. Our results are shown below.

Table. Number of manual regression tests and time required, before and after automation(1)

(1) The number of tests after automation was calculated as follows: (Total number of regression tests [238]) – (Number automated [146]) + (Number that failed during automated testing).

Issues and solutions

We encountered some issues while automating regression testing. These issues, and their solutions, are described below.

Issue 1: Long execution times

Automated testing is performed on multiple OS versions. Running the 146 automated tests on all applicable OS versions takes roughly nine hours in total, and doing this for every pull request would be very costly.
In response to this issue, we now run automated testing only on nighttime builds and UI test revision pull requests.

Also, although we run all automated tests for nighttime builds, we use a library called Flank to run tests on Firebase Test Lab in parallel. This allows us to run 10 parallel tests for each OS version. Now, we can complete tests that used to take a total of nine hours, in only around 30 minutes.

As for UI test revision pull requests, we created a mechanism to run automated tests only for revised tests, in order to save time. We can specify the package name of the test we want to run in the comment for the pull request to restrict the scope of the automated test.

Figure. Example comment

Figure. Procedure to restrict testing based on GitHub comments

This process is shown above.
A GitHub API is used to obtain comments from pull requests, and the package name for testing is specified when generating the YAML file for Flank test configuration.

Excerpt from generated YAML

gcloud:
  ## Omitted

  test-targets:
    - size large
    - package <package name> # The package name obtained from the comment is placed here

  ## Omitted

Issue 2: Increasing number of automated testing failures

As described earlier, we do not run regression testing for each pull request. This can break testing due to unanticipated impact from feature implementation. Leaving this unresolved would increase the number of tests that need to be manually recovered, so we continue to maintain existing automated tests while also doing development work.

An automated test could fail for any number of reasons (including API problems), and it would take an enormous amount of work to confirm reasons for failure for every single test.
In order to reduce the amount of time spent on this, the Merpay Android Team prepares a report summarizing test results for each test case on TestRail, which is checked weekly so that we can find tests with a high failure ratio and make changes as required.

Figure. TestRail report

Issue 3: Cases where reusing test customer data results in failure

Test results for Merpay may differ depending on the status of customer data used for testing.

For example, tests for adding points to an account require special attention.
We might not get the results we expect when we reuse customer data, as shown in the test below.

First round

Add 1,000 points.
1,000 points are added to the current amount, so "1,000 points" is displayed.

Second round

Add 1,000 points.
The expected test result is 1,000 points, but since the account already has 1,000 points, "2,000 points" is displayed.

In this case, the test would succeed as long as we obtain the current number of points prior to adding more points and run the test only on the difference. However, tests are run in parallel when executing from the CI, and the number of points displayed my differ from the expected value depending on when the test is run.

At Merpay, we use a tool called user-tkool, which provides an API for creating customer data in a test environment. We can use this to create new customer data each time a test is run prior to running the test, which will resolve this issue.

Future issues

We’ve just begun to automate regression testing, and there are still plenty of issues left to resolve. Of particular concern are maintenance costs. We’d like to reduce the amount of work involved in adding more tests or modifying existing tests.

We plan to enhance automated testing templates and snippets, and implement a mechanism to automatically identify tests requiring modification from the weekly report.

Summary

In this article, we covered the efforts of the Merpay Android Team to automate regression testing, and also discussed some issues we’ve encountered and the solutions we’ve implemented.

Although automating and operating regression tests can be a lot of work, the changes we made have allowed us to maintain a one-week release schedule. We hope to continue to deliver high-quality applications quickly to our customers.