分布式训练的GPU设置与分配(含源码可以直接测试)_gpu集群训练模型_ERROR_LESS的博客-程序员秘密

技术标签: python  tensorflow  深度学习  分布式  docker  

0 前言

为什么需要分布式?

   数据量太大
   模型太复杂

为什么要进行分布式训练的GPU设置?

单机训练默认只使用一个GPU,并且使用策略是不管需要多少计算资源默认使用全部GPU并将内存全部占满,使得另外的进程就无法使用GPU了
避免上述情况:
1. 内存自增长:根据需要占用资源
2. 虚拟设备机制:实际上只有一个GPU,手动切分成多个虚拟上的逻辑GPU
多GPU使用
1. 虚拟GPU & 实际GPU
2. 手工设置 & 分布式机制

API列表

tf.debugging.set_log_device_placement :输出日志信息,包含任务的布置情况
tf.config.set_soft_device_placement :自动指定设备布置任务
tf.config.experimental.set_visible_devices :设置可见设备,例如机器上有4个GPU,但设置只对一个GPU可见,则该进程无法访问其他设备
tf.config.experimental.list_physical_devices :获取所有物理设备(整块)
tf.config.experimental.VirtualDeviceConfiguration :建立逻辑分区
tf.config.experimental.list_logical_devices :获取所有逻辑设备(分块)
tf.config.experimental.set_memory_growth :设置内存自增长,需在程序开始的时候就被设置

1 查看本机GPU环境

[email protected]:~$ nvidia-smi
Mon Jun 13 16:58:43 2022       
	+-----------------------------------------------------------------------------+
	| NVIDIA-SMI 495.44       Driver Version: 495.44       CUDA Version: 11.5     |
	|-------------------------------+----------------------+----------------------+
	| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
	| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
	|                               |                      |               MIG M. |
	|===============================+======================+======================|
	|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
	| 23%   29C    P8     8W / 250W |     11MiB / 11178MiB |      0%      Default |
	|                               |                      |                  N/A |
	+-------------------------------+----------------------+----------------------+
	|   1  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
	| 23%   30C    P8     7W / 250W |     11MiB / 11178MiB |      0%      Default |
	|                               |                      |                  N/A |
	+-------------------------------+----------------------+----------------------+
	                                                                               
	+-----------------------------------------------------------------------------+
	| Processes:                                                                  |
	|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
	|        ID   ID                                                   Usage      |
	|=============================================================================|
	|    0   N/A  N/A      2035      G   /usr/lib/xorg/Xorg                  4MiB |
	|    0   N/A  N/A      4804      G   /usr/lib/xorg/Xorg                  4MiB |
	|    1   N/A  N/A      2035      G   /usr/lib/xorg/Xorg                  4MiB |
	|    1   N/A  N/A      4804      G   /usr/lib/xorg/Xorg                  4MiB |
	+-----------------------------------------------------------------------------+
# 可见,有两个GPU

# 进入配置好的tensorflow-gpu环境
[email protected]:/home/hqc# source activate tf
# 进入python查看
(tf) [email protected]:/home/hqc# python
	Python 3.9.7 (default, Sep 16 2021, 13:09:58) 
	[GCC 7.5.0] :: Anaconda, Inc. on linux
	Type "help", "copyright", "credits" or "license" for more information.
	>>> import tensorflow as tf
	>>> print(tf.__version__)
		2.4.1
	>>> tf.test.is_gpu_available()
		...
		True # 代表GPU可用
	>>> gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
		...
		2022-06-13 17:14:35.957644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
		pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1080 Ti computeCapability: 6.1
		coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
		...
		2022-06-13 17:14:35.958938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties: 
		pciBusID: 0000:04:00.0 name: NVIDIA GeForce GTX 1080 Ti computeCapability: 6.1
		coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
		...
		2022-06-13 17:14:35.962910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1
		# 找到两个GPU
	>>> cpus = tf.config.experimental.list_physical_devices(device_type='CPU')
	# 查看详细信息
	>>> print(gpus, cpus)
	[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')] [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

	# 也可直接查看
	>>> tf.config.list_physical_devices('GPU')
	[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]

因此,本机有两块物理GPU

2 GPU设置实战

2.1 不做GPU设置的实验

先做一个默认gpu设置的实验,作为对照组。

基础代码:

### import some neccessary modules
import os
import sys
import time

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras

### load data
fashion_mnist = keras.datasets.fashion_mnist
(x_train_all, y_train_all), (x_test, y_test) = fashion_mnist.load_data()

x_valid, x_train = x_train_all[:5000], x_train_all[5000:]
y_valid, y_train = y_train_all[:5000], y_train_all[5000:]

### normalize data
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
# x_train [None, 28, 28] --> [None, 784]
x_train_scaler = scaler.fit_transform(x_train.reshape(-1, 784)).reshape(-1, 28, 28, 1) # 最后一维 1,>表示1个通道
x_valid_scaler = scaler.transform(x_valid.reshape(-1, 784)).reshape(-1, 28, 28, 1)
x_test_scaler = scaler.transform(x_test.reshape(-1, 784)).reshape(-1, 28, 28, 1)

### make dataset
def make_dataset(images, labels, epochs, batch_size, shuffle=True):
    dataset = tf.data.Dataset.from_tensor_slices((images, labels))
    if shuffle:
        dataset = dataset.shuffle(10000)

    # prefetch:表示从数据中预先取出来多少个,来给生成数据作准备。为什么说是用来加速的一个函数?
    dataset = dataset.repeat(epochs).batch(batch_size).prefetch(50)
    return dataset

batch_size = 128
epochs = 100
train_dataset = make_dataset(x_train_scaler, y_train, epochs, batch_size)

### build a model
model = keras.models.Sequential()
model.add(keras.layers.Conv2D(filters=32, kernel_size=3,
                              padding='same',
                              activation='selu',
                              input_shape=(28, 28, 1)))
model.add(keras.layers.SeparableConv2D(filters=32, kernel_size=3,
                                       padding='same',
                                       activation='selu'))
model.add(keras.layers.MaxPool2D(pool_size=2))

# 一般每进行一次pooling层,图像的大小就会缩小,中间的数据就会大大减少,为减少这种信息的损失,故将filters翻倍。
model.add(keras.layers.SeparableConv2D(filters=64, kernel_size=3,
                                       padding='same',
                                       activation='selu'))
model.add(keras.layers.SeparableConv2D(filters=64, kernel_size=3,
                                       padding='same',
                                       activation='selu'))
model.add(keras.layers.MaxPool2D(pool_size=2))

model.add(keras.layers.SeparableConv2D(filters=128, kernel_size=3,
                                       padding='same',
                                       activation='selu'))
model.add(keras.layers.SeparableConv2D(filters=128, kernel_size=3,
                                       padding='same',
                                       activation='selu'))
model.add(keras.layers.MaxPool2D(pool_size=2))

# 展平
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation='selu')) # 全链接层
model.add(keras.layers.Dense(10, activation="softmax")) # 全链接层

model.compile(loss=keras.losses.SparseCategoricalCrossentropy(),
              optimizer=keras.optimizers.SGD(),
              metrics=["accuracy"])

model.summary()

### training
history = model.fit(train_dataset,
                    steps_per_epoch = x_train_scaler.shape[0] // batch_size,
                    epochs=10)

容器内进行训练:

# 需要先安装三个缺少的模块
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# pip install pandas
	...
	Successfully installed pandas-1.1.5 python-dateutil-2.8.2 pytz-2022.1
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# pip install matplotlib
	...
	Successfully installed cycler-0.11.0 kiwisolver-1.3.1 matplotlib-3.3.4 pillow-8.4.0 pyparsing-3.0.9
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# pip install sklearn
	...
	Successfully installed joblib-1.1.0 scikit-learn-0.24.2 scipy-1.5.4 sklearn-0.0 threadpoolctl-3.1.0
# 运行起来
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# python default.py 
	...
	# 确定使用的GPU及占用的资源
	2022-06-13 11:58:13.110072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10261 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0, compute capability: 6.1)
	# 输出神经网络模型
	Model: "sequential"
	_________________________________________________________________
	Layer (type)                 Output Shape              Param #   
	=================================================================
	conv2d (Conv2D)              (None, 28, 28, 32)        320       
	_________________________________________________________________
	separable_conv2d (SeparableC (None, 28, 28, 32)        1344      
	_________________________________________________________________
	max_pooling2d (MaxPooling2D) (None, 14, 14, 32)        0         
	_________________________________________________________________
	separable_conv2d_1 (Separabl (None, 14, 14, 64)        2400      
	_________________________________________________________________
	separable_conv2d_2 (Separabl (None, 14, 14, 64)        4736      
	_________________________________________________________________
	max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64)          0         
	_________________________________________________________________
	separable_conv2d_3 (Separabl (None, 7, 7, 128)         8896      
	_________________________________________________________________
	separable_conv2d_4 (Separabl (None, 7, 7, 128)         17664     
	_________________________________________________________________
	max_pooling2d_2 (MaxPooling2 (None, 3, 3, 128)         0         
	_________________________________________________________________
	flatten (Flatten)            (None, 1152)              0         
	_________________________________________________________________
	dense (Dense)                (None, 128)               147584    
	_________________________________________________________________
	dense_1 (Dense)              (None, 10)                1290      
	=================================================================
	Total params: 184,234
	Trainable params: 184,234
	Non-trainable params: 0
	_________________________________________________________________
	# 开始训练
	Epoch 1/10
	...
	429/429 [==============================] - 4s 6ms/step - loss: 2.3024 - accuracy: 0.1021
	Epoch 2/10
	429/429 [==============================] - 3s 6ms/step - loss: 2.3014 - accuracy: 0.1101
	Epoch 3/10
	429/429 [==============================] - 3s 6ms/step - loss: 2.2998 - accuracy: 0.1240
	Epoch 4/10
	429/429 [==============================] - 3s 6ms/step - loss: 2.2933 - accuracy: 0.1750
	Epoch 5/10
	429/429 [==============================] - 3s 6ms/step - loss: 1.9980 - accuracy: 0.3968
	Epoch 6/10
	429/429 [==============================] - 3s 6ms/step - loss: 0.8706 - accuracy: 0.6798
	Epoch 7/10
	429/429 [==============================] - 3s 6ms/step - loss: 0.7657 - accuracy: 0.7071
	Epoch 8/10
	429/429 [==============================] - 3s 6ms/step - loss: 0.7207 - accuracy: 0.7247
	Epoch 9/10
	429/429 [==============================] - 3s 6ms/step - loss: 0.6953 - accuracy: 0.7382
	Epoch 10/10
	429/429 [==============================] - 3s 6ms/step - loss: 0.6702 - accuracy: 0.7470
# 训练结束,准确率有点低,但无伤大雅。

默认情况下,此demo每步运行花费6ms。

查看GPU占用情况:watch -n 0.2 nvidia-smi
在这里插入图片描述
发现仅仅这一个进程就几乎占满GPU,对资源浪费十分严重。因此,进行GPU的合理设置十分有必要。

2.2 设置GPU自增长

注意:一定要在程序一开始的时候就设置,否则会报错

修改部分的代码:在import之后,load data之前

### set gpu self_growth
tf.debugging.set_log_device_placement(True)### show which device each variable on
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)### set gpu self_growth

logical_gpus = tf.config.experimental.list_logical_devices('GPU')

print("the num of gpu", len(gpus))
print("the num of logical gpu", len(logical_gpus))

运行结果:

# 拷贝文件用于修改
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# cp default.py self_growth.py
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# ls
default.py  self_growth.py
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# python self_growth.py
	...
	the num of gpu 2
	the num of logical gpu 2
	# 可见有两个物理gpu两个逻辑gpu,物理gpu也算是逻辑gpu
	...
	420/429 [============================>.] - ETA: 0s - loss: 0.7134 - accuracy: 0.72912022-06-13 12:27:38.592831: I tensorflow/core/common_runtime/eager/execute.cc:760] Executing op __inference_train_function_927 in device /job:localhost/replica:0/task:0/device:GPU:0
	2022-06-13 12:27:38.599417: I tensorflow/core/common_runtime/eager/execute.cc:760] Executing op __inference_train_function_927 in device /job:localhost/replica:0/task:0/device:GPU:0
	2022-06-13 12:27:38.605374: I tensorflow/core/common_runtime/eager/execute.cc:760] Executing op __inference_train_function_927 in device /job:localhost/replica:0/task:0/device:GPU:0
	2022-06-13 12:27:38.611573: I tensorflow/core/common_runtime/eager/execute.cc:760] Executing op __inference_train_function_927 in device /job:localhost/replica:0/task:0/device:GPU:0
	2022-06-13 12:27:38.617694: I tensorflow/core/common_runtime/eager/execute.cc:760] Executing op __inference_train_function_927 in device /job:localhost/replica:0/task:0/device:GPU:0
	2022-06-13 12:27:38.625279: I tensorflow/core/common_runtime/eager/execute.cc:760] Executing op __inference_train_function_927 in device /job:localhost/replica:0/task:0/device:GPU:0
	2022-06-13 12:27:38.631732: I tensorflow/core/common_runtime/eager/execute.cc:760] Executing op __inference_train_function_927 in device /job:localhost/replica:0/task:0/device:GPU:0
	2022-06-13 12:27:38.637800: I tensorflow/core/common_runtime/eager/execute.cc:760] Executing op __inference_train_function_927 in device /job:localhost/replica:0/task:0/device:GPU:0
	428/429 [============================>.] - ETA: 0s - loss: 0.7133 - accuracy: 0.72922022-06-13 12:27:38.643905: I tensorflow/core/common_runtime/eager/execute.cc:760] Executing op __inference_train_function_927 in device /job:localhost/replica:0/task:0/device:GPU:0
	429/429 [==============================] - 3s 6ms/step - loss: 0.7133 - accuracy: 0.7292
# 输出的日志中可以看到,每个变量位于哪个设备上都打印出来,由于默认使用第一个gpu,故全在第一个gpu上

监控gpu占用情况:
在这里插入图片描述

可见,只占用了不到800MB资源,大大降低了资源的浪费。

2.3 手动指定可见GPU

这里,我设置为第二个GPU可见。
添加一句即可:tf.config.experimental.set_visible_devices(gpus[1], 'GPU')
具体位置如下:

### set gpu self_growth
#tf.debugging.set_log_device_placement(True)### show which device each variable on
gpus = tf.config.experimental.list_physical_devices('GPU')
### set gpu visible
tf.config.experimental.set_visible_devices(gpus[1], 'GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)### set gpu self_growth

logical_gpus = tf.config.experimental.list_logical_devices('GPU')

print("the num of gpu", len(gpus))
print("the num of logical gpu", len(logical_gpus))

运行起来:

# 拷贝一份
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# cp self_growth.py visible.py
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# ls
default.py  self_growth.py  visible.py
# 修改
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# vim visible.py 
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# python visible.py 
	...
	the num of gpu 2
	the num of logical gpu 1
	# 可见此处只显示一个逻辑gpu了
	...
	Epoch 10/10
	429/429 [==============================] - 3s 7ms/step - loss: 0.7120 - accuracy: 0.7311

2.4 逻辑GPU切分

进行逻辑切分的话,就不能再进行自增长,因为自增长需要在进行任何操作之前设置,所以把自增长部分去掉。
并加上逻辑切分语句。

这里为将第二块GPU切分为两块上限为5G的逻辑gpu

具体位置如下:

### set gpu self_growth
#tf.debugging.set_log_device_placement(True)### show which device each variable on
gpus = tf.config.experimental.list_physical_devices('GPU')
### set gpu visible
tf.config.experimental.set_visible_devices(gpus[1], 'GPU')
### divided into logical gpu
tf.config.experimental.set_virtual_device_configuration(
        gpus[1],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5120),
         tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5120)])

运行起来:

# 拷贝
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# cp visible.py virtual_device.py
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# ls
	default.py  self_growth.py  virtual_device.py  visible.py
# 修改
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# vim virtual_device.py 
# 运行起来
(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# python virtual_device.py 
	...
	the num of gpu 2
	the num of logical gpu 2
	# 此处两个逻辑gpu均为第二块gpu切分而来
	...
	Epoch 10/10
	429/429 [==============================] - 3s 7ms/step - loss: 0.7171 - accuracy: 0.7252

监控gpu:
在这里插入图片描述
可见,切分为两块以后,也对资源占用改善有作用,但是相对于设置自增长效果没那么好。

2.5 手动设置进行多gpu运算

此部分设置了一个简单的矩阵乘法指定CPU运行,模型的不同层运行在不同gpu上。
完整代码如下:

### import some neccessary modules
import os
import sys
import time

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras

### set gpu self_growth
#tf.debugging.set_log_device_placement(True)### show which device each variable on
gpus = tf.config.experimental.list_physical_devices('GPU')

for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)### set gpu self_growth

logical_gpus = tf.config.experimental.list_logical_devices('GPU')

print("the num of gpu", len(gpus))
print("the num of logical gpu", len(logical_gpus))

### specify computation on specific device
c = []
for gpu in logical_gpus:
    print(gpu.name)
    with tf.device(gpu.name):
        a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
        b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
        c.append(tf.matmul(a, b))
 ### add on cpu
with tf.device('/cpu:0'):
    matmul_sum = tf.add_n(c)

print(matmul_sum)

### load data
fashion_mnist = keras.datasets.fashion_mnist
(x_train_all, y_train_all), (x_test, y_test) = fashion_mnist.load_data()

x_valid, x_train = x_train_all[:5000], x_train_all[5000:]
y_valid, y_train = y_train_all[:5000], y_train_all[5000:]

### normalize data
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
# x_train [None, 28, 28] --> [None, 784]
x_train_scaler = scaler.fit_transform(x_train.reshape(-1, 784)).reshape(-1, 28, 28, 1) 
x_valid_scaler = scaler.transform(x_valid.reshape(-1, 784)).reshape(-1, 28, 28, 1)
x_test_scaler = scaler.transform(x_test.reshape(-1, 784)).reshape(-1, 28, 28, 1)

### make dataset
def make_dataset(images, labels, epochs, batch_size, shuffle=True):
    dataset = tf.data.Dataset.from_tensor_slices((images, labels))
    if shuffle:
        dataset = dataset.shuffle(10000)

    dataset = dataset.repeat(epochs).batch(batch_size).prefetch(50)
    return dataset

batch_size = 128
epochs = 100
train_dataset = make_dataset(x_train_scaler, y_train, epochs, batch_size)

### set diffierent layer on diffierent device
model = keras.models.Sequential()
with tf.device(logical_gpus[0].name):
    model.add(keras.layers.Conv2D(filters=32, kernel_size=3,
                                padding='same',
                                activation='selu',
                                input_shape=(28, 28, 1)))
    model.add(keras.layers.SeparableConv2D(filters=32, kernel_size=3,
                                        padding='same',
                                        activation='selu'))
    model.add(keras.layers.MaxPool2D(pool_size=2))

    model.add(keras.layers.SeparableConv2D(filters=64, kernel_size=3,
                                        padding='same',
                                        activation='selu'))
    model.add(keras.layers.SeparableConv2D(filters=64, kernel_size=3,
                                        padding='same',
                                        activation='selu'))
    model.add(keras.layers.MaxPool2D(pool_size=2))


with tf.device(logical_gpus[1].name):
    model.add(keras.layers.SeparableConv2D(filters=128, kernel_size=3,
                                        padding='same',
                                        activation='selu'))
    model.add(keras.layers.SeparableConv2D(filters=128, kernel_size=3,
                                        padding='same',
                                        activation='selu'))
    model.add(keras.layers.MaxPool2D(pool_size=2))

    # ��~U平
    model.add(keras.layers.Flatten())
    
    model.add(keras.layers.Dense(128, activation='selu'))
    model.add(keras.layers.Dense(10, activation="softmax"))

model.compile(loss=keras.losses.SparseCategoricalCrossentropy(),
              optimizer=keras.optimizers.SGD(),
              metrics=["accuracy"])

model.summary()

### training
history = model.fit(train_dataset,
                    steps_per_epoch = x_train_scaler.shape[0] // batch_size,
                    epochs=10)

运行起来:

(tf2_py3) [email protected]:/share/distributed tensorflow/config_gpu# python manual_multi_gpu.py 
	...
	the num of gpu 2
	the num of logical gpu 2
	...
	tf.Tensor(
	[[ 44.  56.]
	 [ 98. 128.]], shape=(2, 2), dtype=float32)
	 # 此为cpu运算得出的结果
	...
	Epoch 10/10
	429/429 [==============================] - 3s 6ms/step - loss: 0.7345 - accuracy: 0.7169

监控GPU:
在这里插入图片描述
可见两个gpu均被占用。因此达到指定多gpu进行运算的效果。

3 实验需求

前面部分只是进行一些测试,我需要做一个单机多卡、多机多卡的分布式训练实验。

而目前我的主机设备只有两块物理GPU,而我准备了5个封装在docker中的tensorflow-gpu的开发环境,不一定要全部用上,但只有两块是肯定不够的。

目前需要解决的问题是:

  1. 单机多卡实验可以通过将两块物理GPU划分为4/5块逻辑GPU,但是如何分别指定各个逻辑GPU进行运算呢。
  2. 多机多卡实验,每个开发环境如何只调用其中一块逻辑GPU呢?
  3. 划分为逻辑GPU之后是否可以都设置为内存自增长?

3.3 回答问题3

经过查询资料以及自己的一些实践验证,发现设置每个GPU内存自增长 和 划分逻辑GPU不能同时进行
遂暂时作罢。。。

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/qq_47058489/article/details/125264275

智能推荐

SpringCloudGateway、zuul 服务网关调研概览_调研报告 网关_游戈程序员的博客-程序员秘密

网关技术对比cloud-netflix-zuul(zuul 1.x)Spring Cloud GatewayZuul 2.xcloud-netflix-zuul(zuul 1.x)Spring Cloud GatewayZuul 2.x维护&社区停止更新,维护状态,Spring 社区活, 官方文档偏少持续,社区活跃,中英文文档都有资料有限,基本都在 Netflix 官方,英文文档生态Cloud 生态,新版本不再支持(只剩 Eureka)Cloud

冲压模具是冲压生产必不可少的工艺装备_Trouvaille_a的博客-程序员秘密

模具(俗称冷冲模)。冲压–是在室温下,利用安装在压力机上的模具对材料施加压力,使其产生分离或塑性变形,从而获得所需零件的一种压力加工方法。冲压模具是冲压生产必不可少的工艺装备,是技术密集型产品。冲压件的质量、生产效率以及生产成本等,与模具设计和制造有直接关系。二、分类冲压模具的形式很多,冲模也依工作性质,模具构造,模具材料三方面来分类。一般可按以下几个主要特征分类:1.根据工艺性质分类a.冲裁模 沿封闭或敞开的轮廓线使材料产生分离的模具。如落料模、冲孔模、切断模、切口模、切边模、剖切模等。b.

CMake编译安装FLTK-1.3.3的详细步骤_fltk编译_arackethis的博客-程序员秘密

本安装方式适合的系统环境为:Ubuntu 12.04, 并且已经安装了CMake。下文是通过CCMake进行安装前的变量配置的。

what is an elasticsearch index?_fy63572507的博客-程序员秘密

What exactly is an index in Elasticsearch? Despite being a very basic question, the answer is surprisingly nuanced.basic definitionAn index is defined as:An index is like a ‘database’ in a

微信小程序开发-Bilibili仿写_小小米粒吖的博客-程序员秘密

初学者总结:1.目录结构介绍2.部分语法 首先清空默认文件,开始编写: 新建目录页(添加paths地址直接创建文件):在app.json文件下新建pages路径1.新建文件目录这样:pages/index/index简单的say一下每个目录文件主要是干啥的index.js:index本页面所有的数据定义及方法都在本页面中显示index.json:...

利用POI将Excel文档转换为Html_jsudavid的博客-程序员秘密

excel文档转换为html其实和word转html相类似,优点也很明显。请参考 word转html        public static void convertExcel2Html(String excelFilePath,String htmlFilePath) throws IOException,ParserConfigurationException,Transforme

随便推点

HADOOP 高可用搭建_hadoop高可用集群搭建_yy64ll的博客-程序员秘密

首先先说一下大概的步骤,就用四台为例,简单适合新手操作。流程是:创建虚拟机,配置好;搭建linux系统;安装jdk(因为后面好多都依赖jkd);免密登录ssh;安装zookeeper;最后就是搭建高可用了。不熟练的小伙伴可以先试试单机模式。这是我的四台机子加一个单机模式 。1.创建虚拟机就不多说了,一直下一步下一步,重复四次创建4个。当然有疑问的可以私聊我,我帮你解决。创建好后就是修改配置首先输入vi /etc/sysconfig/network-scripts/ifcfg-ens33,

SpringBoot报错Failed to start component [StandardEngine[Tomcat].StandardHost[localhost].TomcatEmbedded_Vove7的博客-程序员秘密

Failed to start component [StandardEngine[Tomcat].StandardHost[localhost].TomcatEmbeddedContext[]]Application startup failedorg.springframework.context.ApplicationContextException: Unable to start embedded

《C语言编程魔法书:基于C11标准》——第一篇 预备知识篇 第1章 C魔法概览 1.1 例说编程语言..._weixin_34168880的博客-程序员秘密

本节书摘来自华章计算机《C语言编程魔法书:基于C11标准》一书中的第1章,第1.1节,作者 陈轶,更多章节内容可以访问云栖社区“华章计算机”公众号查看。第一篇 预备知识篇第1章C魔法概览本章内容主要对C编程语言(以下简称C语言)进行大体介绍,包括它的历史以及C语言标准的演化进程。然后介绍一下C语言编程思想,当前主流C语言编译器以及GNU语法扩展...

oryx 推荐系统的使用 _xiaomin_____的博客-程序员秘密

https://github.com/cloudera/oryx   在这里的readme.md文件中能看到有哪些jar包。 很惊讶的是这些jar包竟然不能在search.maven.org中找到。 于是决定到git里下载oryx源代码。 通过一下代码安装  其实安装oryx也要看相应的hadoop版本的,如果是hadoop2.3.0以前的版本的话,可以参照https:...

图论:Kruskal算法、对偶图、最大流最小割_偶图最大流_Cookie_Cheng的博客-程序员秘密

contentKruskal算法对偶图最大流最小割数学建模系列之图论方法——Kruskal算法、对偶图、最大流最小割Kruskal算法从最小边开始选取,不允许成圈,即可。对偶图三角刨分+染色通过将区域三角刨分,再转换为 染色问题, 来实现 。 只需要三分之一就可以了。最大流最小割...

初 见(●‘◡‘●) CSDN_m0_61151333的博客-程序员秘密

不知道要说啥,#include<stdio.h>int main(){ printf("Hello,world!'); return 0; }所以我写了个初级代码。。。 哈哈哈,很想成为一名IT大牛,我相信我可以,虽然我现在还不是。????????????????‍????(哈哈哈笑容逐渐明朗,头发日渐稀少...) 很喜欢那句话,因为热爱,所以成长;因此哇,还有很长的路要走。 我鼓励自己勇敢地向前出发吧,尽管形单影只,百无聊赖。 期待绮丽蔚蓝的天

推荐文章

热门文章

相关标签