TensorFlow 研究实践笔记

文章由LinuxBoy分享于2019-03-29 04:03:47热评（604）

TensorFlow 研究实践笔记

一、Caffe、TensorFlow、MXnet三个开源库对比

选择首先学习TensorFlow

二、深度学习研究

TensorFlow在图像识别中的应用

深度卷积神经网络的模型在困难的视觉识别任务中取得了理想的效果 —— 达到人类水平，在某些领域甚至超过。

这里写图片描述

三、TensorFlow安装:

安装环境：Ubuntu15.10_64

1、下载源码
sudo apt-get install git

git clone - -recurse-submodules https://github.com/tensorflow/tensorflow

–recurse-submodules 参数必须要加, 用于获取 TesorFlow 依赖的 protobuf 库

这里写图片描述

Cloning into 'tensorflow'...
remote: Counting objects: 40348, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 40348 (delta 0), reused 0 (delta 0), pack-reused 40341
Receiving objects: 100% (40348/40348), 35.45 MiB | 404.00 KiB/s, done.
Resolving deltas: 100% (29338/29338), done.
Checking connectivity... done.
Submodule 'google/protobuf' (https://github.com/google/protobuf.git) registered for path 'google/protobuf'
Cloning into 'google/protobuf'...
remote: Counting objects: 32801, done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 32801 (delta 12), reused 0 (delta 0), pack-reused 32767
Receiving objects: 100% (32801/32801), 31.27 MiB | 1.27 MiB/s, done.
Resolving deltas: 100% (22019/22019), done.
Checking connectivity... done.
Submodule path 'google/protobuf': checked out 'fb714b3606bd663b823f6960a73d052f97283b74'

2、安装Bazel

OpenJDK做为GPL许可（GPL-licensed）的Java平台的开源化实现，Sun正式发布它已经六年有余。从发布那一时刻起，Java社区的大众们就又开始努力学习，以适应这个新的开源代码基础（code-base）。 [1]
OpenJDK在2013年发展迅速，被著名IT杂志SD Times评选为2013 SD Times 100，位于“极大影响力”分类第9位。

Google日前开源了他们内部使用的构建工具Bazel。
Bazel是一个类似于Make的工具，是Google为其内部软件开发的特点量身定制的工具，如今Google使用它来构建内部大多数的软件。它的功能有诸多亮点：
多语言支持：目前Bazel默认支持Java、Objective-C和C++，但可以被扩展到其他任何变成语言。

高级构建描述语言：项目是使用一种叫BUILD的语言来描述的，它是一种简洁的文本语言，它把一个项目视为一个集合，这个集合由一些互相关联的库、二进制文件和测试用例组成。相反，像Make这样的工具，需要去描述每个文件如何调用编译器。

多平台支持：同一套工具和相同的BUILD文件可以用来为不同的体系结构构建软件，甚至是不同的平台。在Google，Bazel被同时用在数据中心系统中的服务器应用和手机端的移动应用上。

可重复性：在BUILD文件中，每个库、测试用例和二进制文件都需要明确指定它们的依赖关系。当一个源码文件被修改时，Bazel凭这些依赖来判断哪些部分需要重新构建，以及哪些任务可以并行进行。这意味着所有构建都是增量的，并且相同构建总是产生一样的结果。

可伸缩性：Bazel可以处理大型项目；在Google，一个服务器软件有十万行代码是很常见的，在什么都不改的前提下重新构建这样一个项目，大概只需要200毫秒。

安装Bazel依赖库
sudo apt-get install openjdk-8-jdk openjdk-8-source

这里写图片描述

oot.pem
Adding debian:E-Tugra_Certification_Authority.pem
Adding debian:Staat_der_Nederlanden_EV_Root_CA.pem
Adding debian:GlobalSign_ECC_Root_CA_-_R4.pem
Adding debian:Certinomis_-_Autorité_Racine.pem
Adding debian:ssl-cert-snakeoil.pem
Adding debian:COMODO_Certification_Authority.pem
done.
Processing triggers for libc-bin (2.21-0ubuntu4) ...
Processing triggers for ca-certificates (20150426ubuntu1) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...

done.
done.
learning@learning-virtual-machine:~$

sudo apt-get install pkg-config zip g++ zlib1g-dev unzip

Processing triggers for mime-support (3.54ubuntu1.1) ...
Setting up libstdc++-4.8-dev:amd64 (4.8.4-2ubuntu1~14.04.1) ...
Setting up g++-4.8 (4.8.4-2ubuntu1~14.04.1) ...
Setting up g++ (4:4.8.2-1ubuntu6) ...
update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
Setting up unzip (6.0-9ubuntu1.5) ...
Setting up zlib1g-dev:amd64 (1:1.2.8.dfsg-1ubuntu1) ...
@ubuntu:~$

下载链接：https://github.com/bazelbuild/bazel/releases/download/0.2.2b/bazel-0.2.2b-installer-linux-x86_64.sh
@ubuntu:~$ chmod +x bazel-0.2.2b-installer-linux-x86_64.sh
@ubuntu:~$ ./bazel-0.2.2b-installer-linux-x86_64.sh –user

Bazel is now installed!

Make sure you have "/home/learning/bin" in your path. You can also activate bash
completion by adding the following line to your ~/.bashrc:
  source /home/learning/.bazel/bin/bazel-complete.bash

See http://bazel.io/docs/getting-started.html to start a new project!
learning@learning-virtual-machine:~$ source /home/learning/.bazel/bin/bazel-complete.bash
learning@learning-virtual-machine:~$

 export PATH="$PATH:$HOME/bin"

sudo apt-get install python-numpy swig python-dev

blapack.so.3 (liblapack.so.3) in auto mode
Setting up libpython-dev:amd64 (2.7.5-5ubuntu3) ...
Setting up python2.7-dev (2.7.6-8ubuntu0.2) ...
Setting up python-dev (2.7.5-5ubuntu3) ...
Setting up python-numpy (1:1.8.2-0ubuntu0.1) ...
Setting up swig2.0 (2.0.11-1ubuntu2) ...
Setting up swig (2.0.11-1ubuntu2) ...
Processing triggers for libc-bin (2.19-0ubuntu6.5) ...

3、

mkdir /tmp/tensorflow_pkg
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-py2-none-any.whl

learning@learning-virtual-machine:~$ pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-py2-none-any.whl
Requirement '/tmp/tensorflow_pkg/tensorflow-0.5.0-py2-none-any.whl' looks like a filename, but the file does not exist
Unpacking /tmp/tensorflow_pkg/tensorflow-0.5.0-py2-none-any.whl
Cleaning up...
Exception:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 304, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1198, in prepare_files
    do_download,
  File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1365, in unpack_url
    unpack_file_url(link, location, download_dir)
  File "/usr/lib/python2.7/dist-packages/pip/download.py", line 640, in unpack_file_url
    unpack_file(from_path, location, content_type, link)
  File "/usr/lib/python2.7/dist-packages/pip/util.py", line 640, in unpack_file
    unzip_file(filename, location, flatten=not filename.endswith(('.pybundle', '.whl')))
  File "/usr/lib/python2.7/dist-packages/pip/util.py", line 508, in unzip_file
    zipfp = open(filename, 'rb')
IOError: [Errno 2] No such file or directory: '/tmp/tensorflow_pkg/tensorflow-0.5.0-py2-none-any.whl'

Storing debug log for failure in /home/learning/.pip/pip.log
learning@learning-virtual-machine:~$

使用pip编译并安装
bazel build -c opt tensorflow/tools/pip_package:build_pip_package

learning@learning-virtual-machine:~/tensorflow$ bazel build -c opt tensorflow/tools/pip_package:build_pip_package
Sending SIGTERM to previous Bazel server (pid=17411)... done.
.......................................
INFO: Waiting for response from Bazel server (pid 18433)...
INFO: Downloading from https://bitbucket.org/eigen/eigen/get/50812b426b7c.tar.\
gz: 0B

出现问题：

ERROR: /home/learning/tensorflow/tensorflow/core/kernels/BUILD:640:1: C++ compilation of rule '//tensorflow/core/kernels:padding_fifo_queue' failed: gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 ... (remaining 70 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 4.
gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
[1,604 / 2,192] Still waiting for 199 jobs to complete:
      Running (standalone):
        Compiling tensorflow/core/kernels/queue_base.cc, 5653 s
        Compiling tensorflow/core/kernels/split_lib_cpu.cc, 15 s

这里写图片描述

解决：内存不够，将虚拟机内存改为4G，编译成功

INFO: From Compiling 
tensorflow/contrib/tensor_forest/core/ops/update_fertile_slots_op.cc: 
tensorflow/contrib/tensor_forest/core/ops/update_fertile_slots_op.cc: 
In member function ‘virtual void 
tensorflow::UpdateFertileSlots::Compute(tensorflow::OpKernelContext*)’: 
tensorflow/contrib/tensor_forest/core/ops/update_fertile_slots_op.cc:176:14: 
warning: comparison between signed and unsigned integer expressions 
[-Wsign-compare] 
for (; i < values->size(); ++i) { 
^ tensorflow/contrib/tensor_forest/core/ops/update_fertile_slots_op.cc: 
In member function ‘void 
tensorflow::UpdateFertileSlots::SetNewNonFertileLeaves(tensorflow::UpdateFertileSlots::HeapValuesType*, 
int, tensorflow::OpKernelContext*)’: 
tensorflow/contrib/tensor_forest/core/ops/update_fertile_slots_op.cc:340:29: 
warning: comparison between signed and unsigned integer expressions 
[-Wsign-compare] 
for (int32 i = start; i < values->size(); ++i) { 
^ Target //tensorflow/tools/pip_package:build_pip_package up-to-date: 
bazel-bin/tensorflow/tools/pip_package/build_pip_package INFO: Elapsed 
time: 9696.811s, Critical Path: 7936.35s

bazel build -c opt tensorflow/tools/pip_package:build_pip_package

这里写图片描述

learning@learning-virtual-machine:~/tensorflow$ mkdir /tmp/tensorflow_pkg
learning@learning-virtual-machine:~/tensorflow$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
2016年 05月 06日星期五 11:22:55 CST : === Using tmpdir: /tmp/tmp.n9viqhep4u
/tmp/tmp.n9viqhep4u ~/tensorflow
2016年 05月 06日星期五 11:23:01 CST : === Building wheel
2016年 05月 06日星期五 11:24:09 CST : === Output wheel file is in: /tmp/tensorflow_pkg
learning@learning-virtual-machine:~/tensorflow$

pip install /tmp/tensorflow_pkg/tensorflow-0.8.0-py2-none-any.whl

learning@learning-virtual-machine:/tmp/tensorflow_pkg$ pip install /tmp/tensorflow_pkg/tensorflow-0.8.0-py2-none-any.whl
Unpacking ./tensorflow-0.8.0-py2-none-any.whl
Downloading/unpacking six>=1.10.0 (from tensorflow==0.8.0)
  Cannot fetch index base URL https://pypi.python.org/simple/
  Downloading six-1.10.0-py2.py3-none-any.whl
Downloading/unpacking protobuf==3.0.0b2 (from tensorflow==0.8.0)
  Downloading protobuf-3.0.0b2-py2.py3-none-any.whl (326kB): 326kB downloaded
Downloading/unpacking wheel (from tensorflow==0.8.0)
  Downloading wheel-0.29.0-py2.py3-none-any.whl (66kB): 66kB downloaded
Downloading/unpacking numpy>=1.8.2 (from tensorflow==0.8.0)

这里写图片描述

m/mtrand/randomkit.o build/temp.linux-x86_64-2.7/numpy/random/mtrand/initarray.o build/temp.linux-x86_64-2.7/numpy/random/mtrand/distributions.o -Lbuild/temp.linux-x86_64-2.7 -o build/lib.linux-x86_64-2.7/numpy/random/mtrand.so
Creating build/scripts.linux-x86_64-2.7/f2py
adding ‘build/scripts.linux-x86_64-2.7/f2py’ to scripts
changing mode of build/scripts.linux-x86_64-2.7/f2py from 664 to 775

warning: no previously-included files matching '*.pyo' found anywhere in distribution
warning: no previously-included files matching '*.pyd' found anywhere in distribution
changing mode of /home/learning/.local/bin/f2py to 775

Successfully installed tensorflow six protobuf wheel numpy setuptools
Cleaning up…

这里写图片描述
创建 pip 包并安装，编译安装结束。

这里写图片描述

1、问题：

The 'build' command is only supported from within a workspace.

解决方法：

learning@learning-virtual-machine:**~/tensorflow**$ bazel build -c opt tensorflow/tools/pip_package:build_pip_package
.........................

2、问题：

INFO: Waiting for response from Bazel server (pid 15464)… ERROR: 
/home/learning/tensorflow/WORKSPACE:16:6: First argument of load() is 
a path, not a label. It should start with a single slash if it is an 
absolute path.. ERROR: /home/learning/tensorflow/WORKSPACE:20:6: First 
argument of load() is a path, not a label. It should start with a 
single slash if it is an absolute path.. ERROR: WORKSPACE file could 
not be parsed. ERROR: no such package ‘external’: Package ‘external’ 
contains errors. INFO: Elapsed time: 9.814s

这里写图片描述

解决方法：bazel版本低，换成0.2.2

源码分析：
example_trainer.cc

/* Copyright 2015 Google Inc. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/

#include <cstdio>
#include <functional>
#include <string>
#include <vector>

#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/graph.pb.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/graph/default_device.h"
#include "tensorflow/core/graph/graph_def_builder.h"
#include "tensorflow/core/lib/core/threadpool.h"
#include "tensorflow/core/lib/strings/stringprintf.h"
#include "tensorflow/core/platform/init_main.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/public/session.h"

using tensorflow::string;
using tensorflow::int32;

namespace tensorflow {
namespace example {

struct Options {
  int num_concurrent_sessions = 10;  // The number of concurrent sessions
  int num_concurrent_steps = 10;     // The number of concurrent steps
  int num_iterations = 100;          // Each step repeats this many times
  bool use_gpu = false;              // Whether to use gpu in the training
};

// A = [3 2; -1 0]; x = rand(2, 1);
// We want to compute the largest eigenvalue for A.
// repeat x = y / y.norm(); y = A * x; end
GraphDef CreateGraphDef() {
  // TODO(jeff,opensource): This should really be a more interesting
  // computation.  Maybe turn this into an mnist model instead?
  GraphDefBuilder b;
  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
  // Store rows [3, 2] and [-1, 0] in row major format.
  Node* a = Const({3.f, 2.f, -1.f, 0.f}, {2, 2}, b.opts());

  // x is from the feed.
  Node* x = Const({0.f}, {2, 1}, b.opts().WithName("x"));

  // y = A * x
  Node* y = MatMul(a, x, b.opts().WithName("y"));

  // y2 = y.^2
  Node* y2 = Square(y, b.opts());

  // y2_sum = sum(y2)
  Node* y2_sum = Sum(y2, Const(0, b.opts()), b.opts());

  // y_norm = sqrt(y2_sum)
  Node* y_norm = Sqrt(y2_sum, b.opts());

  // y_normalized = y ./ y_norm
  Div(y, y_norm, b.opts().WithName("y_normalized"));

  GraphDef def;
  TF_CHECK_OK(b.ToGraphDef(&def));
  return def;
}

string DebugString(const Tensor& x, const Tensor& y) {
  CHECK_EQ(x.NumElements(), 2);
  CHECK_EQ(y.NumElements(), 2);
  auto x_flat = x.flat<float>();
  auto y_flat = y.flat<float>();
  const float lambda = y_flat(0) / x_flat(0);
  return strings::Printf("lambda = %8.6f x = [%8.6f %8.6f] y = [%8.6f %8.6f]",
                         lambda, x_flat(0), x_flat(1), y_flat(0), y_flat(1));
}

void ConcurrentSteps(const Options* opts, int session_index) {
  // Creates a session.
  SessionOptions options;
  std::unique_ptr<Session> session(NewSession(options));
  GraphDef def = CreateGraphDef();
  if (options.target.empty()) {
    graph::SetDefaultDevice(opts->use_gpu ? "/gpu:0" : "/cpu:0", &def);
  }

  TF_CHECK_OK(session->Create(def));

  // Spawn M threads for M concurrent steps.
  const int M = opts->num_concurrent_steps;
  thread::ThreadPool step_threads(Env::Default(), "trainer", M);

  for (int step = 0; step < M; ++step) {
    step_threads.Schedule([&session, opts, session_index, step]() {
      // Randomly initialize the input.
      Tensor x(DT_FLOAT, TensorShape({2, 1}));
      x.flat<float>().setRandom();

      // Iterations.
      std::vector<Tensor> outputs;
      for (int iter = 0; iter < opts->num_iterations; ++iter) {
        outputs.clear();
        TF_CHECK_OK(
            session->Run({{"x", x}}, {"y:0", "y_normalized:0"}, {}, &outputs));
        CHECK_EQ(size_t{2}, outputs.size());

        const Tensor& y = outputs[0];
        const Tensor& y_norm = outputs[1];
        // Print out lambda, x, and y.
        std::printf("%06d/%06d %s\n", session_index, step,
                    DebugString(x, y).c_str());
        // Copies y_normalized to x.
        x = y_norm;
      }
    });
  }

  TF_CHECK_OK(session->Close());
}

void ConcurrentSessions(const Options& opts) {
  // Spawn N threads for N concurrent sessions.
  const int N = opts.num_concurrent_sessions;
  thread::ThreadPool session_threads(Env::Default(), "trainer", N);
  for (int i = 0; i < N; ++i) {
    session_threads.Schedule(std::bind(&ConcurrentSteps, &opts, i));
  }
}

}  // end namespace example
}  // end namespace tensorflow

namespace {

bool ParseInt32Flag(tensorflow::StringPiece arg, tensorflow::StringPiece flag,
                    int32* dst) {
  if (arg.Consume(flag) && arg.Consume("=")) {
    char extra;
    return (sscanf(arg.data(), "%d%c", dst, &extra) == 1);
  }

  return false;
}

bool ParseBoolFlag(tensorflow::StringPiece arg, tensorflow::StringPiece flag,
                   bool* dst) {
  if (arg.Consume(flag)) {
    if (arg.empty()) {
      *dst = true;
      return true;
    }

    if (arg == "=true") {
      *dst = true;
      return true;
    } else if (arg == "=false") {
      *dst = false;
      return true;
    }
  }

  return false;
}

}  // namespace

int main(int argc, char* argv[]) {
  tensorflow::example::Options opts;
  std::vector<char*> unknown_flags;
  for (int i = 1; i < argc; ++i) {
    if (string(argv[i]) == "--") {
      while (i < argc) {
        unknown_flags.push_back(argv[i]);
        ++i;
      }
      break;
    }

    if (ParseInt32Flag(argv[i], "--num_concurrent_sessions",
                       &opts.num_concurrent_sessions) ||
        ParseInt32Flag(argv[i], "--num_concurrent_steps",
                       &opts.num_concurrent_steps) ||
        ParseInt32Flag(argv[i], "--num_iterations", &opts.num_iterations) ||
        ParseBoolFlag(argv[i], "--use_gpu", &opts.use_gpu)) {
      continue;
    }

    fprintf(stderr, "Unknown flag: %s\n", argv[i]);
    return -1;
  }

  // Passthrough any unknown flags.
  int dst = 1;  // Skip argv[0]
  for (char* f : unknown_flags) {
    argv[dst++] = f;
  }
  argv[dst++] = nullptr;
  argc = unknown_flags.size() + 1;
  tensorflow::port::InitMain(argv[0], &argc, &argv);
  tensorflow::example::ConcurrentSessions(opts);
}

更多详情见请继续阅读下一页的精彩内容：

推荐文章：

TensorFlow 研究实践笔记