画像ファイルからObject Detection（物体検出）をラズパイで試す

by souichirou · 公開済み 2019年11月30日 · 更新済み 2021年7月19日

Contents

Object Detection（物体検出）

ラズパイ上でEdgeTPU（Edge Tensor Processing Unit）を使って画像ファイルのObject Detection（物体検出）をやってみた時の備忘録。

尚、こちらのサンプルページを参考にしている。

やりたい事のイメージは以下の写真を見れば分かると思う。

複数の物体が写った写真（画像ファイル）を読み込んで写真上の物体を検出してバウンディングボックス（四角い枠）で囲ってその物体が何であるかを機械学習で予測する。

またデフォルトでは１０個のオブジェクトを予測しているが予測スコアの高いトップ３に変更すると同時にバウンディングボックスの左上に予測したオブジェクトの名称を表示する様にプログラムの改修も行ってみた。

事前準備

ラズパイにTensorFlow実行環境を構築

事前にラズパイ上で下記のページを参考にして１～３までを終了させておく。

※２.は必須ではない

Edge TPUランタイムのインストール
最大クロック周波数で動作させる為のランタイムをインストール（オプション）
TensorFlow Liteインタープリターのインストール

サンプルをインストール

今回のサンプルプログラム類をラズパイにインストールする。

sudo apt-get install edgetpu-examples

インストーラー実行結果

以下のディレクトリ構成でインストールされる。

├─usr
│  │      
│  ├─share
│  │  │      
│  │  ├──edgetpu
│  │  │  │
│  │  │  ├──examples
│  │  │  │  │  backprop_last_layer.py
│  │  │  │  │  classify_capture.py
│  │  │  │  │  classify_image.py
│  │  │  │  │  imprinting_learning.py
│  │  │  │  │  object_detection.py
│  │  │  │  │  semantic_segmetation.py
│  │  │  │  │  two_models_inference.py
│  │  │  │  ├──images
│  │  │  │  │     COPYRIGHT
│  │  │  │  │     bird.bmp
│  │  │  │  │     cat.bmp
│  │  │  │  │     grace_hopper.bmp
│  │  │  │  │     parrot.jpg
│  │  │  │  │     sunflower.bmp
│  │  │  │  │
│  │  │  │  ├──models
│  │  │  │  │     coco_labels.txt
│  │  │  │  │     deeplabv3_mnv2_pascal_quant_edgetpu.tflite
│  │  │  │  │     inat_bird_labels.txt
│  │  │  │  │     mobilenet_ssd_v1_coco_quant_postprocess_edgetpu.tflite
│  │  │  │  │     mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite
│  │  │  │  │     mobilenet_ssd_v2_face_quant_postprocess_edgetpu.tflite
│  │  │  │  │     mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite
│  │  │  │  │

backprop_last_layer.py	花のデータを使った最終層のバックプロパゲーションのサンプルだと思う（未検証）
classify_capture.py	ラズパイからのカメラ画像に写っているモノを分類するサンプルプログラムこちらは別記事で試す予定
classify_image.py	画像ファイルに写っているモノを分類するサンプルプログラム
imprinting_learning.py	分類モデルの最終層を再トレーニングするサンプルプログラムだと思う（未検証） ”Low-Shot Learning with Imprinted Weights.”という手法に基づいている
object_detection.py	今回試す画像ファイルからObject Detection（物体検出）のサンプルプログラム修正する前にバックアップをとっておいた
semantic_segmetation.py	セマンティックセグメンテーションのサンプルプログラム鳥の画像を読み込んでバウンディングボックス（四角い枠）では無くセマンティックセグメンテーション（輪郭を縁取る）で鳥の部分を切り出している
two_models_inference.py	２つのモデル（分類モデルと検出モデル）を同時に動かして比較するサンプルプログラムだと思う（未検証）
imagesディレクトリ	読み込むサンプル画像が格納されている今回使用する画像ファイルは grace_hopper.bmp グレース・ホッパーさん。アメリカ海軍の軍人でCOBOL言語を開発した人らしい。この画像に写っているObject（物体）をDetection（検出）する
modelsディレクトリ	ラベルとモデルが格納されているラベル coco_labels.txt：今回使用するラベル。0（person）から89（toothbrush）まで90種類の分類ができる inat_bird_labels.txt：鳥の分類用のラベル。965種類の鳥が載っていたモデル今回使用するモデルは mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite 上記の他に deeplabv3_mnv2_pascal_quant_edgetpu.tflite mobilenet_ssd_v1_coco_quant_postprocess_edgetpu.tflite mobilenet_ssd_v2_face_quant_postprocess_edgetpu.tflite mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite 等のモデルが格納されていた

fehのインストール

ラズパイでサンプルプログラムを実行する場合は実行結果画像の表示にfeh※を使用しているので事前にインストールしておく必要がある。

※fehはx-window用の画像ビューワー

sudo apt-get install feh

インストーラー実行結果

サンプルプログラムの実行

まずはディレクトリを移動する。

cd /usr/share/edgetpu/examples/

その後、下記のコマンドでサンプルプログラムを実行する。

python3 object_detection.py \
--model ./models/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite \
--label ./models/coco_labels.txt \
--input ./images/grace_hopper.bmp \
--output ${HOME}/object_detection_results.jpg

–outputは${HOME}でホームディレクトリを指定しているがカレントディレクトリでも良い。

検出結果をアウトプットする画像ファイルを指定している。

実行結果

実行結果は以下の通り。

物体検出した部分がバウンディングボックスで囲まれる。

デフォルトだと１０個の物体が検出される。

object_detection_results.jpg

またコンソールに確率の高い順に名称、スコア（確率）、ボックス値（x1,y1,x2,y2）が表示される。

※x1,y1は左上、x2,y2は右下の座標

1位は87.89%のスコアでperson（人）、2位は78.9%のスコアtie（ネクタイ）と検出されている。

3位はとたんにスコアが下がって12.1％でremoto（遠隔？）と検出されているので、多分EdgeTPUも自信が無いのだと思う。

プログラムの修正

元のプログラムのバックアップを取った上で検出件数を10個から3個に減らして画像上に検出した物体の名称を表示する様に変更している。

# jpeg画像を元に物体検出するプログラム
# cd /usr/share/edgetpu/examples/ 
# engine.detect_with_imageを使って画像から検出している
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
r"""A demo for object detection.

For Raspberry Pi, you need to install 'feh' as image viewer:
sudo apt-get install feh

Example (Running under edgetpu repo's root directory):

  - Face detection:
    python3 examples/object_detection.py \
    --model='test_data/mobilenet_ssd_v2_face_quant_postprocess_edgetpu.tflite' \
    --input='test_data/face.jpg' \
    --keep_aspect_ratio

  - Pet detection:
    python3 examples/object_detection.py \
    --model='test_data/ssd_mobilenet_v1_fine_tuned_edgetpu.tflite' \
    --label='test_data/pet_labels.txt' \
    --input='test_data/pets.jpg' \
    --keep_aspect_ratio

'--output' is an optional flag to specify file name of output image.
At this moment we only support SSD model with postprocessing operator. Other
models such as YOLO won't work.
"""

import argparse
import platform
import subprocess
from edgetpu.detection.engine import DetectionEngine
from edgetpu.utils import dataset_utils
from PIL import Image, ImageDraw, ImageFont

TOP_K = 3 # 信頼スコアの髙いトップn件を表示する

def main():
  parser = argparse.ArgumentParser()
  parser.add_argument(
      '--model',
      help='Path of the detection model, it must be a SSD model with postprocessing operator.',
      required=True)
  parser.add_argument('--label', help='Path of the labels file.')
  parser.add_argument(
      '--input', help='File path of the input image.', required=True)
  parser.add_argument('--output', help='File path of the output image.')
  parser.add_argument(
      '--keep_aspect_ratio',
      dest='keep_aspect_ratio',
      action='store_true',
      help=(
          'keep the image aspect ratio when down-sampling the image by adding '
          'black pixel padding (zeros) on bottom or right. '
          'By default the image is resized and reshaped without cropping. This '
          'option should be the same as what is applied on input images during '
          'model training. Otherwise the accuracy may be affected and the '
          'bounding box of detection result may be stretched.'))
  parser.set_defaults(keep_aspect_ratio=False)
  args = parser.parse_args()

  if not args.output:
    output_name = 'object_detection_result.jpg'
  else:
    output_name = args.output

  # Initialize engine.
  engine = DetectionEngine(args.model)
  labels = dataset_utils.read_label_file(args.label) if args.label else None

  # Open image.
  img = Image.open(args.input)
  draw = ImageDraw.Draw(img)

  # Run inference.
  ans = engine.detect_with_image(
      img,
      threshold=0.05,
      keep_aspect_ratio=args.keep_aspect_ratio,
      relative_coord=False,
      top_k=TOP_K)

  # Display result.
  if ans:
    for obj in ans:
      print('-----------------------------------------')
      print('obj = ', obj)
      if labels:
        print(labels[obj.label_id])
      print('score = ', obj.score)
      box = obj.bounding_box.flatten().tolist()
      print('box = ', box)
      # Draw a rectangle.
      draw.rectangle(box, outline='red')
      fnt = ImageFont.truetype('Pillow/Tests/fonts/FreeMono.ttf', 40)
      draw.text((box[0], box[1]), labels[obj.label_id], fill='yellow', font= fnt, spacing=10, align='left')
      img.save(output_name)
    if platform.machine() == 'x86_64':
      # For gLinux, simply show the image.
      img.show()
    elif platform.machine() == 'armv7l':
      # For Raspberry Pi, you need to install 'feh' to display image.
      subprocess.Popen(['feh', output_name])
    else:
      print('Please check ', output_name)
  else:
    print('No object detected!')


if __name__ == '__main__':
  main()

修正部分の簡単な説明

自分が追加したコメント欄（１～３行目）。

# jpeg画像を元に物体検出するプログラム
# cd /usr/share/edgetpu/examples/ 
# engine.detect_with_imageを使って画像から検出している

表示件数の定数を指定している（４９行目）。

TOP_K = 3 # 信頼スコアの髙いトップn件を表示する

TOP_Kの指定。

ここで指定した件数だけ検出される（９４行目）。

      top_k=TOP_K)

バウンディングボックスの左上に検出した物体の名称を黄色い文字色で描画している（１０８行目）。

      fnt = ImageFont.truetype('Pillow/Tests/fonts/FreeMono.ttf', 40)
      draw.text((box[0], box[1]), labels[obj.label_id], fill='yellow', font= fnt, spacing=10, align='left')

実行結果

上位3位までが検出されたが、先程と検出されたオブジェクトが違う。

1位は87.89%のスコアでperson（人）、2位は73.04%のスコアtie（ネクタイ）と検出されているが3位は26.95％のスコアで同じくtieが検出されている。

モデル、検出エンジンやパラメーターは変更していないので中々興味深い結果が得られた。

同じ画像を読み込んでもタイミングで予測スコアが変わったり検出されるオブジェクトが異なるということは、検出までの挙動が人と似ていると感じて親近感が湧いてしまった。

月	火	水	木	金	土	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

画像ファイルからObject Detection（物体検出）をラズパイで試す

Object Detection（物体検出）

事前準備

ラズパイにTensorFlow実行環境を構築

サンプルをインストール

ラベル

モデル

fehのインストール

サンプルプログラムの実行

実行結果

プログラムの修正

修正部分の簡単な説明

実行結果

関連

おすすめ

質問やコメントや励ましの言葉などを残すコメントをキャンセル

最近の投稿

カテゴリー

アーカイブ

最近のコメント

画像ファイルからObject Detection（物体検出）をラズパイで試す

Object Detection（物体検出）

事前準備

ラズパイにTensorFlow実行環境を構築

サンプルをインストール

ラベル

モデル

fehのインストール

サンプルプログラムの実行

実行結果

プログラムの修正

修正部分の簡単な説明

実行結果

関連

おすすめ

ラズパイでmp3を再生するコマンドと Python のプログラム

ラズパイとパイカメラで作成した自宅の監視カメラのカメラが壊れたので修理した件

ラズパイと温湿度、気圧センサー（BME280)で測定結果をコンソールに表示するプログラム

質問やコメントや励ましの言葉などを残す コメントをキャンセル

最近の投稿

カテゴリー

アーカイブ

最近のコメント

質問やコメントや励ましの言葉などを残すコメントをキャンセル