Skip to content

Commit 05c15bf

Browse files
authored
dropout:ompsimd+cuda的实现 (#44)
* dropout:ompsimd+cuda的实现 * dropout:ompsimd+cuda的实现
1 parent bea5115 commit 05c15bf

File tree

14 files changed

+440
-138
lines changed

14 files changed

+440
-138
lines changed

CODE_OF_CONDUCT.md

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# DeepX 行为准则
2+
3+
## 我们的承诺
4+
5+
作为贡献者和维护者,我们承诺为每个人提供一个开放和欢迎的环境。
6+
7+
## 我们的标准
8+
9+
有助于创造积极环境的行为包括但不限于:
10+
11+
- 使用友好和包容的语言
12+
- 尊重不同的观点和经验
13+
- 耐心地接受建设性的批评
14+
- 关注对社区最有利的事情
15+
- 友善对待其他社区成员
16+
17+
不可接受的行为包括但不限于:
18+
19+
- 使用性化的语言或图像以及不受欢迎的性关注或挑逗
20+
- 捣乱/煽动/侮辱性/贬损的评论,人身攻击或政治攻击
21+
- 公开或私下的骚扰
22+
- 未经明确许可,发布他人的私人信息,如物理或电子地址
23+
- 其他可以合理地被认为不符合专业行为的行为
24+
25+
## 我们的责任
26+
27+
项目维护者有责任澄清可接受行为的标准,并应对任何不可接受的行为采取适当和公平的纠正措施。
28+
29+
项目维护者有权利和责任删除、编辑或拒绝与本行为准则不符的评论、提交、代码、wiki编辑、问题和其他贡献,并可暂时或永久禁止任何他们认为不适合、威胁、冒犯或有害的贡献者。
30+
31+
## 适用范围
32+
33+
当个人代表项目或其社区时,本行为准则适用于项目空间和公共空间。
34+
35+
## 执行
36+
37+
如有滥用、骚扰或其他不可接受的行为,请通过以下方式联系项目团队。所有投诉都将被审查和调查,并将导致认为必要和适当的回应。
38+
39+
## 联系信息
40+
41+
请通过 [您的联系信息] 联系我们。
42+
43+
## 归属
44+
45+
本行为准则改编自[贡献者公约](https://www.contributor-covenant.org),版本1.4。

CONTRIBUTING.md

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# deepx 贡献指南
2+
3+
deepx框架的发展,主要包括五大类方向
4+
5+
+ front: 新增模型、module、python类函数等
6+
+ 中间层:包括计算图优化器,插件系统(自动KVcache系统),自动分布式化,栈tensor自动释放,自动Inplace化等操作
7+
+ 新增或修改excuter
8+
+ 增加或修改算子,进一步可以分为leaftensorfunc(不可分割的基础算子),fusedtensorfunc(融合算子)
9+
+ 文档丰富:
10+
+ 运维自动化方向
11+
12+
大家可以选择一个方向
13+
14+
## 步骤
15+
16+
第一次提交
17+
1. Fork本仓库(github.com/array2d/deepx)的main分支,到你的github/yourname/deepx
18+
2. 本地clone github/yourname/deepx
19+
3. 提交并推送您的更改到你的github:`git commit -m 'Add some feature'`
20+
4. 创建一个Pull Request。
21+
22+
第N次提交
23+
24+
1. 保障你的本地和github/yourname/deepx中均已提pull request并得到merge
25+
2. 在github/yourname/deepx中sync fork【危险操作,会删除你新增的代码】,拉取(github.com/array2d/deepx) main分支的最新代码
26+
3. 本地clone github/yourname/deepx
27+
4. 提交并推送您的更改到你的github:`git commit -m 'Add some feature'`
28+
5. 创建一个Pull Request。

README.md

-23
Original file line numberDiff line numberDiff line change
@@ -66,29 +66,6 @@ DeepX可以集成现有的张量计算框架作为执行器,充分利用现有
6666

6767
这种架构使得DeepX可以整合各类先进的计算框架作为执行引擎,同时提供统一的分布式调度和执行能力,为用户提供更灵活的选择和更高的性能。
6868

69-
## 二.贡献指南
70-
71-
也可以参考官方文档的指南
72-
73-
https://deepx.array2d.com
74-
75-
欢迎通过以下方式参与项目共建:
76-
77-
1. **代码贡献**
78-
- 提交PR前请先创建Issue说明修改内容
79-
- front项目当前以py为核心
80-
- excuter:目前规划开发的3类执行器,参考这里如何给excuter添加一个新算子[excuter](doc/excuter/excuter.md)
81-
- cpu:
82-
- cuda:
83-
- jax:
84-
85-
2. **文档改进**
86-
- 提交文档更新到`doc/`目录
87-
88-
3. **问题反馈**
89-
- 当前处于高速迭代中,可通过issue反馈问题
90-
91-
9269

9370
### 官方文档
9471

doc/excuter/op-mem-cuda/list.md

+1
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848

4949
| Operation | Author | Math Formula | IR Instruction |
5050
|-----------|--------|--------------|----------------|
51+
| dropout | miaobyte | T1.dropout(p,seed)->T3 | dropout(tensor<any> A, var<float32> p, var<int32> seed)->(tensor<any> C) |
5152
| switch | miaobyte | C=switch(tensors,cases) | switch(listtensor<any> tensors, tensor<int8> cases)->(tensor<any> result) |
5253
| greaterscalar | miaobyte | mask=compare(T1, scalar) | greaterscalar(tensor<any> A, var<any> scalar)->(tensor<bool> mask) |
5354
| equalscalar | miaobyte | mask=compare(T1, scalar) | equalscalar(tensor<any> A, var<any> scalar, var<float64> epsilon)->(tensor<bool> mask) |

doc/excuter/op-mem-ompsimd/list.md

+1
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949

5050
| Operation | Author | Math Formula | IR Instruction |
5151
|-----------|--------|--------------|----------------|
52+
| dropout | miaobyte | A.dropout(p,seed)->C | dropout(tensor<any> A, var<float32> p, var<int32> seed)->(tensor<any> C) |
5253
| switch | miaobyte | C=switch([tensors],case) | switch(listtensor<any> tensors, tensor<int8> cases)->(tensor<any> C) |
5354
| greaterscalar | miaobyte | mask=greater(T1,scalar) | greaterscalar(tensor<any> A, var<any> scalar)->(tensor<bool> mask) |
5455
| equalscalar | miaobyte | mask=equal(T1,scalar) | equalscalar(tensor<any> A, var<any> scalar)->(tensor<bool> mask) |

excuter/cpp-common/src/deepx/tensorfunc/elementwise.hpp

+13
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,19 @@ namespace deepx::tensorfunc
412412
invertDispatcher<Author, T>::invert(input, output);
413413
}
414414

415+
//dropout(A,p)=>C
416+
template <typename Author, typename T>
417+
struct dropoutDispatcher
418+
{
419+
static void dropout(const Tensor<T> &input, const float p,const unsigned int seed, Tensor<T> &output) = delete;
420+
};
421+
422+
template <typename Author, typename T>
423+
void dropout(const Tensor<T> &input, const float p,const unsigned int seed, Tensor<T> &output)
424+
{
425+
dropoutDispatcher<Author, T>::dropout(input, p, seed, output);
426+
}
427+
415428
} // namespace deepx::tensorfunc
416429

417430
#endif // DEEPX_TENSORFUNC_ELEMENTWISE_HPP

excuter/op-mem-cuda/src/client/tfs.cpp

+11
Original file line numberDiff line numberDiff line change
@@ -462,6 +462,17 @@ namespace deepx::tf
462462
{
463463
Param("result", DataCategory::Tensor, Precision::Any),
464464
})));
465+
// dropout
466+
tffactory.add_tf(std::make_shared<Dropout<miaobyte>>(vector<Param>(
467+
{
468+
Param("A", DataCategory::Tensor, Precision::Any),
469+
Param("p", DataCategory::Var, Precision::Float32),
470+
Param("seed", DataCategory::Var, Precision::Int32),
471+
}),
472+
vector<Param>(
473+
{
474+
Param("C", DataCategory::Tensor, Precision::Any),
475+
})));
465476
}
466477
// matmul
467478
void register_matmul(TfFactory &tffactory)

excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.cu

+45
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33

44
#include <cuda_bf16.h>
55
#include <cuda_fp16.h>
6+
#include <curand_kernel.h>
7+
8+
69
#include "deepx/tensorfunc/cuda.hpp"
710
#include "deepx/tensorfunc/authors.hpp"
811
#include "deepx/tensorfunc/cuda_math.cuh"
@@ -404,6 +407,48 @@ namespace deepx::tensorfunc
404407
template void launch_invert<int16_t>(const int16_t *a, int16_t *c, const int size);
405408
template void launch_invert<int8_t>(const int8_t *a, int8_t *c, const int size);
406409

410+
//dropout
411+
template <typename T>
412+
__global__ void dropout_kernel(const T *A, const float p,const unsigned int seed, T *C, const int size)
413+
{
414+
int stride = blockDim.x * gridDim.x;
415+
curandState state;
416+
curand_init(seed, threadIdx.x, 0, &state); // 仅初始化一次
417+
418+
for (int idx = blockIdx.x * blockDim.x + threadIdx.x; idx < size; idx += stride)
419+
{
420+
float rand = curand_uniform(&state);
421+
if (rand < p)
422+
{
423+
C[idx] = 0;
424+
}
425+
else
426+
{
427+
C[idx] = A[idx];
428+
}
429+
}
430+
}
431+
432+
template <typename T>
433+
void launch_dropout(const T *a, const float p,const unsigned int seed, T *c, const int size)
434+
{
435+
auto [numBlocks, blockSize] = BestDims(size);
436+
dropout_kernel<<<numBlocks, blockSize>>>(a, p, seed, c, size);
437+
cudaError_t err = cudaGetLastError();
438+
if (err != cudaSuccess)
439+
{
440+
throw std::runtime_error("Failed to launch dropout kernel: " +
441+
std::string(cudaGetErrorString(err)));
442+
}
443+
}
444+
template void launch_dropout<double>(const double *a, const float p,const unsigned int seed, double *c, const int size);
445+
template void launch_dropout<float>(const float *a, const float p,const unsigned int seed, float *c, const int size);
446+
template void launch_dropout<half>(const half *a, const float p,const unsigned int seed, half *c, const int size);
447+
template void launch_dropout<nv_bfloat16>(const nv_bfloat16 *a, const float p,const unsigned int seed, nv_bfloat16 *c, const int size);
448+
template void launch_dropout<int64_t>(const int64_t *a, const float p,const unsigned int seed, int64_t *c, const int size);
449+
template void launch_dropout<int32_t>(const int32_t *a, const float p,const unsigned int seed, int32_t *c, const int size);
450+
template void launch_dropout<int16_t>(const int16_t *a, const float p,const unsigned int seed, int16_t *c, const int size);
451+
template void launch_dropout<int8_t>(const int8_t *a, const float p,const unsigned int seed, int8_t *c, const int size);
407452
}
408453

409454
#endif // DEEPX_TENSORFUNC_ELEMENTWISE_MIAOBYTE_BASIC_CU

excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.cuh

+6
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,12 @@ namespace deepx::tensorfunc
8989
template <typename T>
9090
void launch_invert(const T* a, T* c,const int size);
9191

92+
//dropout
93+
template <typename T>
94+
__global__ void dropout_kernel(const T* A, const float p,const unsigned int seed, T* C,const int size);
95+
96+
template <typename T>
97+
void launch_dropout(const T* a, const float p,const unsigned int seed, T* c,const int size);
9298
}
9399

94100
#endif // DEEPX_TENSORFUNC_ELEMENTWISE_MIAO_BYTE_BASIC_CUH

excuter/op-mem-cuda/src/deepx/tensorfunc/elementwise_miaobyte_basic.hpp

+9
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,15 @@ namespace deepx::tensorfunc
140140
launch_invert( A.data, C.data, A.shape.size);
141141
}
142142
};
143+
144+
template <typename T>
145+
struct dropoutDispatcher<miaobyte, T>
146+
{
147+
static void dropout(const Tensor<T> &A, const float p,const unsigned int seed, Tensor<T> &C)
148+
{
149+
launch_dropout(A.data, p, seed, C.data, A.shape.size);
150+
}
151+
};
143152
}
144153

145154
#endif // DEEPX_TENSORFUNC_ELEMENTWISE_MIAO_BYTE_BASIC_HPP

excuter/op-mem-cuda/src/deepx/tf/elementwise_basic.hpp

+68
Original file line numberDiff line numberDiff line change
@@ -1033,6 +1033,74 @@ namespace deepx::tf
10331033
return 0;
10341034
}
10351035
};
1036+
1037+
// dropout
1038+
template <typename Author>
1039+
class Dropout : public TF
1040+
{
1041+
public:
1042+
Dropout(const vector<Param> &args, const vector<Param> &returns)
1043+
{
1044+
this->name = "dropout";
1045+
this->metadata.author = Author::name();
1046+
this->tftype = "elementwise";
1047+
this->args = args;
1048+
this->returns = returns;
1049+
}
1050+
string math_formula() const override
1051+
{
1052+
return "T1.dropout(p,seed)->T3";
1053+
}
1054+
shared_ptr<TF> clone() const override
1055+
{
1056+
return make_shared<Dropout<Author>>(*this);
1057+
}
1058+
int run(shared_ptr<MemBase> mem, string &error) override
1059+
{
1060+
if (!checktensors({this->args[0].textvalue, this->returns[0].textvalue}, mem, error))
1061+
{
1062+
return 1;
1063+
}
1064+
Precision a_type = mem->gettensor(this->args[0].textvalue).get()->shape.dtype;
1065+
Precision c_type = mem->gettensor(this->returns[0].textvalue).get()->shape.dtype;
1066+
if (a_type != c_type)
1067+
{
1068+
error = "Type mismatch: " + precision_str(a_type) + " != " + precision_str(c_type);
1069+
return 1;
1070+
}
1071+
switch (a_type)
1072+
{
1073+
case Precision::Float64:
1074+
tensorfunc::dropout<Author>(*mem->gettensor<double>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<double>(this->returns[0].textvalue));
1075+
break;
1076+
case Precision::Float32:
1077+
tensorfunc::dropout<Author>(*mem->gettensor<float>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<float>(this->returns[0].textvalue));
1078+
break;
1079+
case Precision::Float16:
1080+
tensorfunc::dropout<Author>(*mem->gettensor<half>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<half>(this->returns[0].textvalue));
1081+
break;
1082+
case Precision::BFloat16:
1083+
tensorfunc::dropout<Author>(*mem->gettensor<nv_bfloat16>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<nv_bfloat16>(this->returns[0].textvalue));
1084+
break;
1085+
case Precision::Int64:
1086+
tensorfunc::dropout<Author>(*mem->gettensor<int64_t>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<int64_t>(this->returns[0].textvalue));
1087+
break;
1088+
case Precision::Int32:
1089+
tensorfunc::dropout<Author>(*mem->gettensor<int32_t>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<int32_t>(this->returns[0].textvalue));
1090+
break;
1091+
case Precision::Int16:
1092+
tensorfunc::dropout<Author>(*mem->gettensor<int16_t>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<int16_t>(this->returns[0].textvalue));
1093+
break;
1094+
case Precision::Int8:
1095+
tensorfunc::dropout<Author>(*mem->gettensor<int8_t>(this->args[0].textvalue), this->getvar<float>(1, mem), this->getvar<unsigned int>(2, mem), *mem->gettensor<int8_t>(this->returns[0].textvalue));
1096+
break;
1097+
default:
1098+
error = "Unsupported dtype: " + precision_str(a_type);
1099+
return 1;
1100+
}
1101+
return 0;
1102+
}
1103+
};
10361104
};
10371105

10381106
#endif // DEEPX_TF_ELEMENTWISE_BASIC_HPP

excuter/op-mem-ompsimd/src/client/tfs.cpp

+11
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,17 @@ namespace deepx::tf
465465
{
466466
Param("C", DataCategory::Tensor, Precision::Any),
467467
})));
468+
// dropout author=miaobyte
469+
tffactory.add_tf(std::make_shared<Dropout<miaobyte>>(vector<Param>(
470+
{
471+
Param("A", DataCategory::Tensor, Precision::Any),
472+
Param("p", DataCategory::Var, Precision::Float32),
473+
Param("seed", DataCategory::Var, Precision::Int32),
474+
}),
475+
vector<Param>(
476+
{
477+
Param("C", DataCategory::Tensor, Precision::Any),
478+
}) ));
468479
}
469480
// matmul
470481
void register_matmul(TfFactory &tffactory)

0 commit comments

Comments
 (0)