C++でのベンチマークテストのためにGoogle Benchmarkライブラリを利用する

2022年3月9日2022年10月30日

便利なベンチマークツールは使うべし！

単体テストは重要性は近年では広く周知されています．一方で，性能に関してはなかなか議論されることが多くありません．

その一因としては，性能はプログラマの能力差が顕著に顕れるためでしょう．

バグに関してはテストを通じて発見することができるため比較的容易といえます．システムが検知するためエンジニアの能力差はシステムが埋めてくれます．

しかしながら，性能に関しては極度に遅い場合でもシステムが検知することが困難です．

その理由は，性能計測には統計的なデータ分析や客観的な性能指標の提案等が必要となり，専門性がより高くなります．

エンジニアの母数はかなり多いですが，安定して高性能なプログラムを記述できるエンジニアとなると母数がかなり減ります．

性能評価としてベンチマークテストを普段使いできるようになると，質の良いソフトウェアとなることはもちろんですが，個々のエンジニアとしての能力が数段上がることが期待できます．

C++のベンチマーク用ライブラリであるGoogle Benchmarkの導入方法と実装の一例を紹介します．

実装例は，charとstd::stringの文字列型と数値型の変換速度を比較です．

Benchmarkの必要性

ベンチマークは性能を測定するために重要な項目です．ベンチマークを利用し，関数やルーチン単位での性能検討を行うことでソフトウェア全体の高速化につながります．

バグを生み出さないための，単体テストは当たり前のように重要です．一方で，プログラムが高速に動作する（極端に低速でない）こともソフトフェアに求められる項目です．特にC++の場合は速度が重要視されているため，ベンチマークテストなどを使用して，一定の速度品質を保つこと意識すべきです．

高速化プログラミング入門

Amazon

楽天市場

Yahooショッピング

ポチップ

Google Benchmarkとは

Google Benchmarkは，ベンチマークテストを行うためのライブラリであり，簡単に速度計測を行うことが出来ますGoogle Benchmarkでは，Google Testのユニットテストのようにさまざまな便利なツールが用意されています．

ベンチマークなど速度計測用のコーディングは思った以上に面倒です．

通常は，std::chronoなどで特定の処理の前後で開始時刻と終了時刻を記録し，その差分を計算時間として算出すると思います．いちいちコード内に記載するのは面倒ですし，フェアな結果を得るために何度か計算をしてその平均や偏差などを取る必要もあります．言語の特性をあまり理解できていない人やデータ分析に慣れていない人が性能検証を行うと意味のない検証（最悪の場合，間違った検証）を行っている場合も多々あります．

Google Benchmarkを利用することで比較的容易かつ安定したベンチマークを行うことができます．

基本な使用方法

基本的な使用方法は下記の通りで，極めて簡単です．

static void BM_SomeFunction(benchmark::State& state)という関数を作成する．
計測したい処理をfor (auto _ : state) {}に記述する
BENCHMARK()マクロに作成した関数名を登録する


static void BM_SomeFunction(benchmark::State& state) {
  // Perform setup here
  for (auto _ : state) {
    // This code gets timed
    SomeFunction();
  }
}
// Register the function as a benchmark
BENCHMARK(BM_SomeFunction);
// Run the benchmark
BENCHMARK_MAIN();

Google Benchmarkの導入

まず，Google Benchmarkの導入のため，ソースコードをGitHubリポジトリからダウンロードし，ビルド・インストールする方法を示します．

ダウンロードとビルド


git clone https://github.com/google/benchmark.git
cd benchmark
cmake -DBENCHMARK_DOWNLOAD_DEPENDENCIES=on -DCMAKE_BUILD_TYPE=Release -S . -B "build"
cmake --build "build" --config Release
cmake --install build --prefix /path/to/install

テスト

念のため，テストがすべて成功するか確認します．


cmake -E chdir "build" ctest --build-config Release

以下のような出力が確認できれば成功です．テスト内容はバージョンによって変更されるため，下部の100% tests passed, 0 tests failed out of ...　を確認して，すべてテストが通っていれば問題ありません．


      Start  1: benchmark
 1/71 Test  #1: benchmark ..................................   Passed    3.28 sec
      Start  2: spec_arg
 2/71 Test  #2: spec_arg ...................................   Passed    0.00 sec
      Start  3: benchmark_setup_teardown
 3/71 Test  #3: benchmark_setup_teardown ...................   Passed    0.01 sec
      Start  4: filter_simple
 4/71 Test  #4: filter_simple ..............................   Passed    0.00 sec
      Start  5: filter_simple_list_only
 5/71 Test  #5: filter_simple_list_only ....................   Passed    0.00 sec
      Start  6: filter_simple_negative
 6/71 Test  #6: filter_simple_negative .....................   Passed    0.00 sec
      Start  7: filter_simple_negative_list_only
 7/71 Test  #7: filter_simple_negative_list_only ...........   Passed    0.00 sec
      Start  8: filter_suffix
 8/71 Test  #8: filter_suffix ..............................   Passed    0.00 sec
      Start  9: filter_suffix_list_only
 9/71 Test  #9: filter_suffix_list_only ....................   Passed    0.00 sec
      Start 10: filter_suffix_negative
10/71 Test #10: filter_suffix_negative .....................   Passed    0.00 sec
      Start 11: filter_suffix_negative_list_only
11/71 Test #11: filter_suffix_negative_list_only ...........   Passed    0.00 sec
      Start 12: filter_regex_all
12/71 Test #12: filter_regex_all ...........................   Passed    0.00 sec
      Start 13: filter_regex_all_list_only
13/71 Test #13: filter_regex_all_list_only .................   Passed    0.00 sec
      Start 14: filter_regex_all_negative
14/71 Test #14: filter_regex_all_negative ..................   Passed    0.00 sec
      Start 15: filter_regex_all_negative_list_only
15/71 Test #15: filter_regex_all_negative_list_only ........   Passed    0.00 sec
      Start 16: filter_regex_blank
16/71 Test #16: filter_regex_blank .........................   Passed    0.00 sec
      Start 17: filter_regex_blank_list_only
17/71 Test #17: filter_regex_blank_list_only ...............   Passed    0.00 sec
      Start 18: filter_regex_blank_negative
18/71 Test #18: filter_regex_blank_negative ................   Passed    0.00 sec
      Start 19: filter_regex_blank_negative_list_only
19/71 Test #19: filter_regex_blank_negative_list_only ......   Passed    0.00 sec
      Start 20: filter_regex_none
20/71 Test #20: filter_regex_none ..........................   Passed    0.00 sec
      Start 21: filter_regex_none_list_only
21/71 Test #21: filter_regex_none_list_only ................   Passed    0.00 sec
      Start 22: filter_regex_none_negative
22/71 Test #22: filter_regex_none_negative .................   Passed    0.00 sec
      Start 23: filter_regex_none_negative_list_only
23/71 Test #23: filter_regex_none_negative_list_only .......   Passed    0.00 sec
      Start 24: filter_regex_wildcard
24/71 Test #24: filter_regex_wildcard ......................   Passed    0.00 sec
      Start 25: filter_regex_wildcard_list_only
25/71 Test #25: filter_regex_wildcard_list_only ............   Passed    0.00 sec
      Start 26: filter_regex_wildcard_negative
26/71 Test #26: filter_regex_wildcard_negative .............   Passed    0.01 sec
      Start 27: filter_regex_wildcard_negative_list_only
27/71 Test #27: filter_regex_wildcard_negative_list_only ...   Passed    0.00 sec
      Start 28: filter_regex_begin
28/71 Test #28: filter_regex_begin .........................   Passed    0.00 sec
      Start 29: filter_regex_begin_list_only
29/71 Test #29: filter_regex_begin_list_only ...............   Passed    0.00 sec
      Start 30: filter_regex_begin_negative
30/71 Test #30: filter_regex_begin_negative ................   Passed    0.00 sec
      Start 31: filter_regex_begin_negative_list_only
31/71 Test #31: filter_regex_begin_negative_list_only ......   Passed    0.00 sec
      Start 32: filter_regex_begin2
32/71 Test #32: filter_regex_begin2 ........................   Passed    0.00 sec
      Start 33: filter_regex_begin2_list_only
33/71 Test #33: filter_regex_begin2_list_only ..............   Passed    0.00 sec
      Start 34: filter_regex_begin2_negative
34/71 Test #34: filter_regex_begin2_negative ...............   Passed    0.00 sec
      Start 35: filter_regex_begin2_negative_list_only
35/71 Test #35: filter_regex_begin2_negative_list_only .....   Passed    0.00 sec
      Start 36: filter_regex_end
36/71 Test #36: filter_regex_end ...........................   Passed    0.00 sec
      Start 37: filter_regex_end_list_only
37/71 Test #37: filter_regex_end_list_only .................   Passed    0.00 sec
      Start 38: filter_regex_end_negative
38/71 Test #38: filter_regex_end_negative ..................   Passed    0.00 sec
      Start 39: filter_regex_end_negative_list_only
39/71 Test #39: filter_regex_end_negative_list_only ........   Passed    0.00 sec
      Start 40: options_benchmarks
40/71 Test #40: options_benchmarks .........................   Passed    2.26 sec
      Start 41: basic_benchmark
41/71 Test #41: basic_benchmark ............................   Passed    0.76 sec
      Start 42: repetitions_benchmark
42/71 Test #42: repetitions_benchmark ......................   Passed    0.02 sec
      Start 43: diagnostics_test
43/71 Test #43: diagnostics_test ...........................   Passed    0.04 sec
      Start 44: skip_with_error_test
44/71 Test #44: skip_with_error_test .......................   Passed    0.16 sec
      Start 45: donotoptimize_test
45/71 Test #45: donotoptimize_test .........................   Passed    0.00 sec
      Start 46: fixture_test
46/71 Test #46: fixture_test ...............................   Passed    0.05 sec
      Start 47: register_benchmark_test
47/71 Test #47: register_benchmark_test ....................   Passed    0.00 sec
      Start 48: map_test
48/71 Test #48: map_test ...................................   Passed    0.25 sec
      Start 49: multiple_ranges_test
49/71 Test #49: multiple_ranges_test .......................   Passed    0.34 sec
      Start 50: args_product_test
50/71 Test #50: args_product_test ..........................   Passed    0.26 sec
      Start 51: link_main_test
51/71 Test #51: link_main_test .............................   Passed    0.02 sec
      Start 52: reporter_output_test
52/71 Test #52: reporter_output_test .......................   Passed    0.24 sec
      Start 53: templated_fixture_test
53/71 Test #53: templated_fixture_test .....................   Passed    0.03 sec
      Start 54: user_counters_test
54/71 Test #54: user_counters_test .........................   Passed    0.24 sec
      Start 55: perf_counters_test
55/71 Test #55: perf_counters_test .........................   Passed    0.00 sec
      Start 56: internal_threading_test
56/71 Test #56: internal_threading_test ....................   Passed    1.82 sec
      Start 57: report_aggregates_only_test
57/71 Test #57: report_aggregates_only_test ................   Passed    0.00 sec
      Start 58: display_aggregates_only_test
58/71 Test #58: display_aggregates_only_test ...............   Passed    0.00 sec
      Start 59: user_counters_tabular_test
59/71 Test #59: user_counters_tabular_test .................   Passed    0.26 sec
      Start 60: user_counters_thousands_test
60/71 Test #60: user_counters_thousands_test ...............   Passed    0.01 sec
      Start 61: memory_manager_test
61/71 Test #61: memory_manager_test ........................   Passed    0.03 sec
      Start 62: cxx03
62/71 Test #62: cxx03 ......................................   Passed    0.18 sec
      Start 63: complexity_benchmark
63/71 Test #63: complexity_benchmark .......................   Passed    1.29 sec
      Start 64: benchmark_gtest
64/71 Test #64: benchmark_gtest ............................   Passed    0.00 sec
      Start 65: benchmark_name_gtest
65/71 Test #65: benchmark_name_gtest .......................   Passed    0.00 sec
      Start 66: benchmark_random_interleaving_gtest
66/71 Test #66: benchmark_random_interleaving_gtest ........   Passed    0.01 sec
      Start 67: commandlineflags_gtest
67/71 Test #67: commandlineflags_gtest .....................   Passed    0.00 sec
      Start 68: statistics_gtest
68/71 Test #68: statistics_gtest ...........................   Passed    0.00 sec
      Start 69: string_util_gtest
69/71 Test #69: string_util_gtest ..........................   Passed    0.00 sec
      Start 70: perf_counters_gtest
70/71 Test #70: perf_counters_gtest ........................   Passed    0.00 sec
      Start 71: time_unit_gtest
71/71 Test #71: time_unit_gtest ............................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 71

Total Test time (real) =  11.73 sec

インストール

下記コマンドで，インストールします．


cmake --install build --prefix /path/to/install

/path/to/installはインストール先です．任意のパスを設定してください．

インストールが成功すれば，下記のように出力されます．


-- Install configuration: "Release"
-- Installing: /path/to/install/google_benchmark/lib/libbenchmark.a
-- Installing: /path/to/install/google_benchmark/lib/libbenchmark_main.a
-- Installing: /path/to/install/google_benchmark/include/benchmark
-- Installing: /path/to/install/google_benchmark/include/benchmark/benchmark.h
-- Up-to-date: /path/to/install/google_benchmark/include/benchmark
-- Installing: /path/to/install/google_benchmark/include/benchmark/export.h
-- Installing: /path/to/install/google_benchmark/lib/cmake/benchmark/benchmarkConfig.cmake
-- Installing: /path/to/install/google_benchmark/lib/cmake/benchmark/benchmarkConfigVersion.cmake
-- Installing: /path/to/install/google_benchmark/lib/pkgconfig/benchmark.pc
-- Installing: /path/to/install/google_benchmark/lib/cmake/benchmark/benchmarkTargets.cmake
-- Installing: /path/to/install/google_benchmark/lib/cmake/benchmark/benchmarkTargets-release.cmake
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/perf_counters.md
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/user_guide.md
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/dependencies.md
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/tools.md
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/random_interleaving.md
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/_config.yml
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/AssemblyTests.md
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/releasing.md
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/index.md
-- Installing: /path/to/install/google_benchmark/share/doc/benchmark/platform_specific_build_instructions.md

インストールが不要な方は

インストールが不要もしくはしたくない方は，インストールせずに使用することも可能です．

ビルドが完了した段階で，benchmark/includeとbenchmark/build/include，benchmark/build/srcにそれぞれヘッダファイルとライブラリが存在するので，インクルードパスとライブラリリンクをこのディレクトリを対象とすれば使用可能です．

こんな感じです↓


g++ main.cpp -std=c++11 -isystem /path/to/install/google_benchmark/include -isystem /path/to/install/google_benchmark/build/include/ -L　/path/to/install/google_benchmark/build/src -lbenchmark -lpthread -o mybenchmark

Google Benchmarkの動作確認

Google Benchmarkに記載されている簡単なベンチマークを実行してみます．

このベンチマークでは，

１．std::stringを単純に生成する
２．std::stringに文字列をコピーする

の処理についてベンチマークテストをして，比較しているものになります．


#include <benchmark/benchmark.h>

static void BM_StringCreation(benchmark::State& state) {
  for (auto _ : state)
    std::string empty_string;
}
// Register the function as a benchmark
BENCHMARK(BM_StringCreation);

// Define another benchmark
static void BM_StringCopy(benchmark::State& state) {
  std::string x = "hello";
  for (auto _ : state)
    std::string copy(x);
}
BENCHMARK(BM_StringCopy);

BENCHMARK_MAIN();

コンパイルします．


g++ main.cpp -std=c++11 -isystem /path/to/install/include -L /path/to/install/lib -lbenchmark -lpthread -o mybenchmark

実行し，結果を確認します．


./mybenchmark


Running ./mybenchmark
Run on (32 X 3493.48 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 64 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 8192 KiB (x4)
Load Average: 0.09, 0.09, 0.22
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_StringCreation       4.07 ns         4.07 ns    174563674
BM_StringCopy           10.8 ns         10.8 ns     59074443

2-8行目はCPUの情報を示しています．12, 13行目はベンチマークの実行結果を示しています．

実行時間は，「１．std::stringを単純に生成する」（BM_StringCreation）は4.07 nsであり，「２．std::stringに文字列をコピーする」（BM_StringCopy）は10.8 nsということがわかります．当然ながら文字列をコピーしている方が時間がかかっています．

上記は，Google Benchmarkのオプションをそのまま実行したものですが，通常，リリース用のコンパイルでは最適化オプションを使用します．

最適化オプション-O2を使用して計測してみます．

GCC

g++でコンパイルした結果が下記のものです．


g++ main.cpp -std=c++11 -O2 -isystem /path/to/install/include -L /path/to/install/lib -lbenchmark -lpthread -o mybenchmark


./mybenchmark


Running ./mybenchmark
Run on (32 X 3493.48 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 64 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 8192 KiB (x4)
Load Average: 0.10, 0.09, 0.09
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_StringCreation      0.000 ns        0.000 ns   1000000000
BM_StringCopy           4.29 ns         4.29 ns    159471390

最適化した結果，BM_StringCreationは0.000 nsとなり，実行時間がほぼゼロということが分かります．BM_StringCopyは，4.29 nsとなりました．当然ですが，どちらも最適化によって高速化したことが確認できます．

Clang

clang++でも実行してみます．


clang++ main.cpp -std=c++11 -O2 -isystem /path/to/install/include -L /path/to/install/lib -lbenchmark -lpthread -o mybenchmark


./mybenchmark


Running ./mybenchmark
Run on (32 X 3493.48 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 64 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 8192 KiB (x4)
Load Average: 0.15, 0.08, 0.08
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_StringCreation      0.032 ns        0.032 ns   1000000000
BM_StringCopy           3.90 ns         3.89 ns    181667338

clang++の場合，BM_StringCreationは0.032 nsであり，g++の0.000 nsと比較すると多少時間がかかっているようです．また，BM_StringCopyは3.89 nsとなり，g++の4.29 nsより高速であることがわかります．このように，コンパイラによる差も簡単に確認することができます．

実際に計測してみる

チュートリアルを動かすだけでは面白くない上にあまり理解しくいため，新しく例題を作成します．

例題は，文字列と数値の変換を速度計測です．使用する関数はstd::atoiとstd::stoiです．

下記は，文字列"123"をintに変換する速度のベンチマークです．文字列型をchar[]とstd::stringからintに変換するものです．


#include <benchmark/benchmark.h>

#include <string>

static void BM_CharToInt(benchmark::State &state) {
  char str[] = "123";
  for (auto _ : state)
    auto val = std::atoi(str);
}
BENCHMARK(BM_CharToInt);

static void BM_StringToInt(benchmark::State &state) {
  std::string str = "123";
  for (auto _ : state)
    auto val = std::stoi(str);
}
BENCHMARK(BM_StringToInt);

BENCHMARK_MAIN();

最適化なし


clang++ main.cpp -std=c++11 -isystem /path/to/install/include -L /path/to/install/lib -lbenchmark -lpthread -o mybenchmark


./mybenchmark


Running ./mybenchmark
Run on (32 X 3493.48 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 64 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 8192 KiB (x4)
Load Average: 0.08, 0.11, 0.10
---------------------------------------------------------
Benchmark               Time             CPU   Iterations
---------------------------------------------------------
BM_CharToInt         13.8 ns         13.8 ns     49526467
BM_StringToInt       33.0 ns         33.0 ns     21382077

最適化なしの場合，std::stringを数値に変換するよりも，charを使用したほうが高速であることがわかります．

最適化（-O2）


clang++ main.cpp -O2 -std=c++11 -isystem /path/to/install/include -L /path/to/install/lib -lbenchmark -lpthread -o mybenchmark


./mybenchmark


Running ./mybenchmark
Run on (32 X 3493.48 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 64 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 8192 KiB (x4)
Load Average: 0.20, 0.16, 0.11
---------------------------------------------------------
Benchmark               Time             CPU   Iterations
---------------------------------------------------------
BM_CharToInt         11.3 ns         11.3 ns     62193614
BM_StringToInt       11.1 ns         11.1 ns     648850377

最適化すると，charとstd::stringの数値に変換する速度は同じであるといえます．

Optimized C++ ―最適化、高速化のためのプログラミングテクニック

Amazon

楽天市場

Yahooショッピング

ポチップ

運用Tips

結果の表示形式を変更する

デフォルトでは下記のようなconsole形式がそのほかにjsonやcsv形式を選択できます．

実行時引数に--benchmark_out_format={json|console|csv}を指定します．

それぞれ下記のような形式になります．

json形式

{
  "context": {
    "date": "2022-05-06T16:36:02+00:00",
    "host_name": "d4eaabbfede7",
    "executable": "./mybenchmark",
    "num_cpus": 32,
    "mhz_per_cpu": 3493,
    "cpu_scaling_enabled": false,
    "caches": [
      {
        "type": "Data",
        "level": 1,
        "size": 32768,
        "num_sharing": 2
      },
      {
        "type": "Instruction",
        "level": 1,
        "size": 65536,
        "num_sharing": 2
      },
      {
        "type": "Unified",
        "level": 2,
        "size": 524288,
        "num_sharing": 2
      },
      {
        "type": "Unified",
        "level": 3,
        "size": 8388608,
        "num_sharing": 8
      }
    ],
    "load_avg": [0.2,0.09,0.11],
    "library_build_type": "release"
  },
  "benchmarks": [
    {
      "name": "BM_StringCreation",
      "family_index": 0,
      "per_family_instance_index": 0,
      "run_name": "BM_StringCreation",
      "run_type": "iteration",
      "repetitions": 1,
      "repetition_index": 0,
      "threads": 1,
      "iterations": 174704710,
      "real_time": 4.0049870435675450e+00,
      "cpu_time": 4.0050181131350140e+00,
      "time_unit": "ns"
    },
    {
      "name": "BM_StringCopy",
      "family_index": 1,
      "per_family_instance_index": 0,
      "run_name": "BM_StringCopy",
      "run_type": "iteration",
      "repetitions": 1,
      "repetition_index": 0,
      "threads": 1,
      "iterations": 64403112,
      "real_time": 1.0557832981747081e+01,
      "cpu_time": 1.0557795794091440e+01,
      "time_unit": "ns"
    }
  ]
}

console形式

2022-05-06T16:35:05+00:00
Running ./mybenchmark
Run on (32 X 3493.48 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 64 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 8192 KiB (x4)
Load Average: 0.04, 0.05, 0.11
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_StringCreation       4.01 ns         4.01 ns    173218570
BM_StringCopy           10.5 ns         10.5 ns     66407769

csv形式


2022-05-06T16:35:33+00:00
Running ./mybenchmark
Run on (32 X 3493.48 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 64 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 8192 KiB (x4)
Load Average: 0.03, 0.05, 0.10
name,iterations,real_time,cpu_time,time_unit,bytes_per_second,items_per_second,label,error_occurred,error_message
"BM_StringCreation",173374409,3.99505,3.99508,ns,,,,,
"BM_StringCopy",65329035,10.4927,10.4925,ns,,,,,

json形式が一番情報量が多いですね．

結果をファイルに出力

実行時引数--benchmark_out=<filename>で出力先ファイル名を指定します．

--benchmark_out_format={json|console|csv}で表示形式を指定します．

ベンチマークの繰り返し実行

デフォルト設定では，ベンチマークは一度のみ実行されます．

実際のベンチマークにはノイズ（結果の差）が生じることがあり，その影響を低減する必要があります．

ノイズ除去のため，平均，中央値，偏差など統計的な技術を取り入れます．

Google Benchmarkでは，これらの統計的パラメータを自動的に計算してくれるオプションとして--benchmark_repetitions=<num>があります．

まとめ

Google Benchmarkライブラリを使用することで，容易にルーチンや関数の処理時間を計測することができます．各ルーチンの処理が高速化することで，結果としてソフトウェア全体のパフォーマンスが向上するでしょう．

ただし，並列化を伴う計算科学系の領域では，計算規模やメモリ帯域，並列化効率などさまざまな要因がパフォーマンスに影響するため，このベンチマークでは不十分化もしれません．これらは，ハードウェアも考慮したチューニングの領域になるため，単純なベンチマークでは計測できませんが，ある小規模な処理においては有用なライブラリといえるでしょう．

よかったらシェアしてね！