-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathinstruction_fine_tuning_profiler_output.txt
190 lines (190 loc) · 46.7 KB
/
instruction_fine_tuning_profiler_output.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::empty 1.66% 485.399ms 1.69% 491.527ms 9.732us 518.846ms 1.62% 518.846ms 10.273us 10.15 Mb 10.15 Mb 322.27 Gb 322.27 Gb 50506
aten::mul 2.22% 648.933ms 2.87% 836.420ms 17.702us 1.853s 5.78% 1.853s 39.215us 6.47 Kb 6.47 Kb 317.48 Gb 317.48 Gb 47250
aten::empty_strided 0.72% 211.125ms 0.75% 220.148ms 6.453us 221.721ms 0.69% 221.721ms 6.499us 3.59 Gb 3.59 Gb 293.08 Gb 293.08 Gb 34116
aten::mm 2.52% 736.088ms 3.55% 1.035s 31.809us 4.218s 13.16% 4.218s 129.620us 0 b 0 b 246.24 Gb 246.24 Gb 32544
aten::_log_softmax 0.01% 2.825ms 0.08% 22.404ms 155.583us 907.859ms 2.83% 907.859ms 6.305ms 0 b 0 b 121.72 Gb 121.72 Gb 144
aten::pow 1.34% 390.995ms 1.71% 497.627ms 52.360us 404.700ms 1.26% 577.662ms 60.781us 0 b 0 b 77.87 Gb 77.87 Gb 9504
aten::add 0.73% 213.593ms 0.98% 285.418ms 17.444us 465.307ms 1.45% 465.307ms 28.438us 0 b 0 b 60.50 Gb 60.50 Gb 16362
aten::nll_loss_backward 0.02% 4.662ms 0.04% 10.337ms 143.569us 4.230ms 0.01% 102.742ms 1.427ms 0 b 0 b 52.76 Gb 52.76 Gb 72
aten::_log_softmax_backward_data 0.01% 2.249ms 0.01% 3.183ms 44.208us 386.095ms 1.20% 386.095ms 5.362ms 0 b 0 b 52.76 Gb 52.76 Gb 72
aten::silu 0.11% 31.661ms 0.16% 47.563ms 20.644us 145.470ms 0.45% 145.470ms 63.138us 0 b 0 b 35.14 Gb 35.14 Gb 2304
aten::div 0.27% 77.636ms 0.34% 100.235ms 38.671us 81.978ms 0.26% 81.978ms 31.627us 3.56 Kb 3.56 Kb 23.62 Gb 23.62 Gb 2592
aten::cat 0.28% 81.837ms 0.39% 112.308ms 22.270us 243.632ms 0.76% 243.632ms 48.311us 6.69 Mb 6.69 Mb 18.09 Gb 18.09 Gb 5043
aten::silu_backward 0.06% 16.527ms 0.07% 21.721ms 18.855us 94.765ms 0.30% 94.765ms 82.261us 0 b 0 b 15.20 Gb 15.20 Gb 1152
aten::neg 0.31% 91.550ms 0.45% 130.836ms 18.929us 171.270ms 0.53% 171.270ms 24.779us 0 b 0 b 13.42 Gb 13.42 Gb 6912
aten::gather 0.86% 250.273ms 1.00% 291.717ms 42.204us 291.261ms 0.91% 347.372ms 50.256us 0 b 0 b 12.52 Gb 12.52 Gb 6912
aten::index 0.17% 49.896ms 0.24% 70.161ms 60.904us 52.352ms 0.16% 70.780ms 61.441us 0 b 0 b 3.11 Gb 3.11 Gb 1152
aten::resize_ 0.05% 14.524ms 0.05% 14.524ms 5.073us 27.282ms 0.09% 27.282ms 9.529us 0 b 0 b 923.10 Mb 923.10 Mb 2863
aten::mean 0.31% 89.330ms 0.40% 115.483ms 23.850us 162.711ms 0.51% 162.937ms 33.651us 0 b 0 b 85.33 Mb 85.33 Mb 4842
aten::rsqrt 0.38% 109.845ms 0.50% 144.672ms 30.444us 75.325ms 0.23% 75.325ms 15.851us 0 b 0 b 85.29 Mb 85.29 Mb 4752
aten::bmm 0.05% 14.624ms 0.12% 35.610ms 247.292us 36.333ms 0.11% 36.333ms 252.312us 0 b 0 b 15.40 Mb 15.40 Mb 144
aten::cos 0.01% 2.660ms 0.02% 4.925ms 34.201us 5.606ms 0.02% 5.606ms 38.931us 0 b 0 b 14.01 Mb 14.01 Mb 144
aten::sin 0.01% 2.249ms 0.02% 4.484ms 31.139us 5.171ms 0.02% 5.171ms 35.910us 0 b 0 b 14.01 Mb 14.01 Mb 144
aten::cumsum 0.39% 112.368ms 0.47% 137.500ms 59.679us 117.214ms 0.37% 125.436ms 54.443us 0 b 0 b 1.12 Mb 1.12 Mb 2304
aten::eq 0.04% 10.846ms 0.08% 22.555ms 62.479us 25.028ms 0.08% 25.028ms 69.330us 653.96 Kb 653.96 Kb 722.00 Kb 722.00 Kb 361
aten::nll_loss_forward 0.02% 5.396ms 0.04% 11.829ms 82.146us 14.282ms 0.04% 14.459ms 100.410us 0 b 0 b 144.00 Kb 144.00 Kb 144
aten::sum 0.90% 263.343ms 1.51% 440.856ms 47.100us 272.273ms 0.85% 405.496ms 43.322us 0 b 0 b 41.46 Mb 94.50 Kb 9360
aten::any 0.02% 5.651ms 0.05% 14.733ms 102.312us 14.759ms 0.05% 15.468ms 107.417us 0 b 0 b 72.00 Kb 72.00 Kb 144
aten::ne 0.01% 3.317ms 0.02% 5.303ms 73.653us 6.082ms 0.02% 6.082ms 84.472us 0 b 0 b 36.00 Kb 36.00 Kb 72
cudaMemsetAsync 0.09% 26.679ms 0.09% 26.679ms 3.702us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 13.00 Kb 13.00 Kb 7206
aten::linalg_vector_norm 0.00% 701.000us 0.07% 19.688ms 1.094ms 19.695ms 0.06% 19.771ms 1.098ms 0 b 0 b 9.00 Kb 9.00 Kb 18
aten::reciprocal 0.00% 317.000us 0.03% 8.766ms 487.000us 8.847ms 0.03% 8.847ms 491.500us 0 b 0 b 9.00 Kb 9.00 Kb 18
aten::clamp 0.00% 555.000us 0.05% 14.246ms 791.444us 14.245ms 0.04% 14.323ms 795.722us 0 b 0 b 9.00 Kb 9.00 Kb 18
aten::to 0.93% 272.565ms 13.40% 3.909s 94.282us 216.460ms 0.68% 3.517s 84.832us 3.58 Gb 0 b 275.57 Gb 0 b 41462
aten::_has_compatible_shallow_copy_type 0.00% 0.000us 0.00% 0.000us 0.000us 909.000us 0.00% 909.000us 3.092us 0 b 0 b 0 b 0 b 294
aten::lift_fresh 0.00% 46.000us 0.00% 46.000us 0.079us 4.320ms 0.01% 4.320ms 7.423us 0 b 0 b 0 b 0 b 582
aten::detach_ 0.02% 5.861ms 0.02% 6.202ms 14.192us 5.753ms 0.02% 8.402ms 19.227us 0 b 0 b 0 b 0 b 437
detach_ 0.00% 341.000us 0.00% 341.000us 0.780us 2.649ms 0.01% 2.649ms 6.062us 0 b 0 b 0 b 0 b 437
aten::_to_copy 2.13% 621.848ms 12.47% 3.637s 144.200us 328.578ms 1.02% 3.301s 130.888us 3.58 Gb 1.13 Kb 275.57 Gb 0 b 25219
aten::copy_ 1.54% 449.099ms 19.41% 5.661s 103.549us 6.808s 21.24% 6.808s 124.536us 0 b 0 b 0 b 0 b 54671
cudaMemcpyAsync 15.18% 4.429s 15.18% 4.429s 245.789us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 18020
cudaStreamSynchronize 2.71% 791.652ms 2.71% 791.652ms 59.666us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 13268
aten::rsub 0.00% 82.000us 0.00% 115.000us 57.500us 83.000us 0.00% 128.000us 64.000us 16 b 0 b 0 b 0 b 2
aten::sub 0.00% 33.000us 0.00% 33.000us 16.500us 45.000us 0.00% 45.000us 22.500us 16 b 16 b 0 b 0 b 2
aten::add_ 0.71% 208.259ms 1.28% 373.442ms 17.761us 543.524ms 1.70% 645.314ms 30.691us 5.13 Kb -5.13 Kb 0 b 0 b 21026
aten::random_ 0.00% 316.000us 0.00% 316.000us 16.632us 435.000us 0.00% 435.000us 22.895us 0 b 0 b 0 b 0 b 19
aten::item 0.29% 83.508ms 0.46% 134.796ms 16.747us 72.282ms 0.23% 148.303ms 18.425us 0 b 0 b 0 b 0 b 8049
aten::_local_scalar_dense 0.05% 15.436ms 0.18% 51.288ms 6.372us 76.021ms 0.24% 76.021ms 9.445us 0 b 0 b 0 b 0 b 8049
enumerate(DataLoader)#_SingleProcessDataLoaderIter._... 5.75% 1.678s 5.98% 1.744s 10.701ms 1.664s 5.19% 1.745s 10.708ms 0 b -15.90 Mb 0 b 0 b 163
aten::clone 0.51% 147.568ms 1.18% 345.474ms 55.785us 103.463ms 0.32% 818.823ms 132.218us 5.11 Mb 56.19 Kb 134.90 Gb 0 b 6193
aten::index_put_ 0.11% 32.383ms 0.59% 172.226ms 70.325us 25.334ms 0.08% 202.614ms 82.733us 0 b 0 b 0 b 0 b 2449
aten::_index_put_impl_ 0.27% 79.947ms 0.48% 139.843ms 57.102us 123.823ms 0.39% 177.280ms 72.389us 0 b 0 b 0 b 0 b 2449
aten::masked_fill_ 0.01% 3.196ms 0.01% 3.196ms 22.041us 4.056ms 0.01% 4.056ms 27.972us 0 b 0 b 0 b 0 b 145
aten::pin_memory 0.04% 12.933ms 0.12% 36.271ms 83.382us 10.491ms 0.03% 38.601ms 88.738us 0 b 0 b 0 b 0 b 435
aten::is_pinned 0.01% 3.118ms 0.02% 4.383ms 10.076us 6.839ms 0.02% 6.839ms 15.722us 0 b 0 b 0 b 0 b 435
cudaPointerGetAttributes 0.00% 1.265ms 0.00% 1.265ms 2.908us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 435
aten::_pin_memory 0.05% 14.587ms 0.06% 18.955ms 43.575us 13.172ms 0.04% 21.271ms 48.899us 0 b 0 b 0 b 0 b 435
cudaHostAlloc 0.00% 1.138ms 0.00% 1.138ms 189.667us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 6
aten::set_ 0.88% 256.625ms 0.88% 256.625ms 19.366us 118.984ms 0.37% 118.984ms 8.979us 0 b 0 b 0 b 0 b 13251
aten::slice 2.48% 723.577ms 2.54% 739.834ms 18.853us 380.502ms 1.19% 527.870ms 13.451us 0 b 0 b 0 b 0 b 39243
aten::as_strided 0.24% 68.643ms 0.24% 68.643ms 0.388us 654.135ms 2.04% 654.135ms 3.698us 0 b 0 b 0 b 0 b 176871
aten::embedding 0.03% 7.521ms 1.53% 446.916ms 3.104ms 4.899ms 0.02% 449.327ms 3.120ms 0 b 0 b 877.84 Mb 0 b 144
aten::reshape 3.21% 935.469ms 4.81% 1.403s 17.477us 704.506ms 2.20% 1.427s 17.780us 0 b 0 b 13.17 Gb 0 b 80280
aten::view 0.57% 166.288ms 0.57% 166.288ms 1.869us 419.066ms 1.31% 419.066ms 4.709us 0 b 0 b 0 b 0 b 88992
aten::index_select 0.02% 7.027ms 1.50% 437.108ms 3.035ms 437.545ms 1.36% 441.374ms 3.065ms 0 b 0 b 877.84 Mb 0 b 144
cudaLaunchKernel 6.18% 1.802s 6.18% 1.802s 6.892us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 261489
aten::arange 0.03% 9.115ms 0.08% 23.586ms 59.261us 10.999ms 0.03% 23.607ms 59.314us 0 b 0 b 5.07 Mb 0 b 398
aten::unsqueeze 0.75% 217.702ms 0.76% 222.462ms 12.237us 202.878ms 0.63% 274.808ms 15.116us 0 b 0 b 0 b 0 b 18180
aten::expand 0.60% 174.327ms 0.62% 180.087ms 13.520us 159.775ms 0.50% 219.830ms 16.504us 0 b 0 b 0 b 0 b 13320
aten::matmul 1.83% 534.078ms 4.72% 1.376s 83.827us 214.489ms 0.67% 2.908s 177.137us 0 b 0 b 180.60 Gb 0 b 16416
cudaFree 0.06% 16.380ms 0.06% 16.380ms 5.460ms 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 3
cudaDeviceGetAttribute 0.00% 712.000us 0.00% 712.000us 0.082us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 8674
cudaGetSymbolAddress 0.00% 96.000us 0.00% 96.000us 96.000us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 1
cudaMalloc 0.10% 30.383ms 0.10% 30.383ms 137.480us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 221
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFla... 0.25% 72.298ms 0.25% 72.298ms 1.871us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 38640
aten::_unsafe_view 0.13% 37.084ms 0.13% 37.084ms 1.672us 49.695ms 0.15% 49.695ms 2.241us 0 b 0 b 0 b 0 b 22176
aten::transpose 3.07% 896.009ms 3.19% 930.840ms 12.928us 579.266ms 1.81% 818.032ms 11.362us 0 b 0 b 0 b 0 b 72000
cudaStreamIsCapturing 0.00% 166.000us 0.00% 166.000us 0.617us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 269
aten::result_type 0.00% 732.000us 0.00% 732.000us 0.029us 52.041ms 0.16% 52.041ms 2.059us 0 b 0 b 0 b 0 b 25272
aten::linear 1.75% 510.304ms 7.63% 2.225s 136.755us 177.676ms 0.55% 3.317s 203.856us 0 b 0 b 180.59 Gb 0 b 16272
aten::t 2.02% 588.759ms 4.18% 1.219s 23.847us 359.015ms 1.12% 861.688ms 16.856us 0 b 0 b 0 b 0 b 51120
cudaFuncSetAttribute 0.26% 76.492ms 0.26% 76.492ms 2.117us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 36124
aten::flatten 0.18% 52.156ms 0.21% 62.011ms 13.457us 46.768ms 0.15% 75.563ms 16.398us 0 b 0 b 0 b 0 b 4608
aten::nonzero 0.60% 174.178ms 1.55% 451.984ms 196.174us 268.666ms 0.84% 367.573ms 159.537us 0 b 0 b 42.70 Mb 0 b 2304
cudaGetDeviceCount 0.00% 0.000us 0.00% 0.000us 0.000us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 1
cudaFuncGetAttributes 0.05% 15.373ms 0.05% 15.373ms 15.373ms 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 1
cudaPeekAtLastError 0.00% 6.000us 0.00% 6.000us 0.000us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 26597
aten::max 0.27% 79.518ms 0.37% 109.248ms 47.417us 86.674ms 0.27% 119.339ms 51.796us 0 b 0 b 1.12 Mb 0 b 2304
aten::pad 0.09% 26.246ms 0.87% 253.937ms 110.216us 25.438ms 0.08% 262.489ms 113.928us 0 b 0 b 1.12 Mb 0 b 2304
aten::constant_pad_nd 1.60% 466.323ms 3.46% 1.009s 97.294us 302.797ms 0.94% 1.048s 101.066us 0 b 0 b 17.34 Gb 0 b 10368
aten::fill_ 0.40% 117.610ms 0.74% 215.881ms 10.085us 624.363ms 1.95% 624.363ms 29.168us 0 b 0 b 0 b 0 b 21406
aten::narrow 0.92% 267.889ms 1.61% 468.769ms 31.301us 165.683ms 0.52% 409.290ms 27.330us 0 b 0 b 0 b 0 b 14976
IndexFirstAxis 2.28% 663.665ms 4.42% 1.288s 186.331us 520.924ms 1.62% 1.317s 190.590us 0 b 0 b 12.52 Gb 0 b 6912
aten::_pad_enum 0.31% 90.771ms 2.99% 871.829ms 108.114us 80.434ms 0.25% 891.237ms 110.520us 0 b 0 b 17.34 Gb 0 b 8064
aten::empty_like 0.49% 143.698ms 0.77% 225.536ms 18.363us 124.056ms 0.39% 241.991ms 19.703us 0 b 0 b 148.75 Gb 0 b 12282
IndexPutFirstAxis 0.37% 108.270ms 1.31% 383.154ms 166.299us 70.304ms 0.22% 383.823ms 166.590us 0 b 0 b 13.27 Gb 0 b 2304
aten::zeros 1.22% 355.768ms 2.40% 698.760ms 65.908us 163.127ms 0.51% 741.034ms 69.896us 0 b 0 b 149.25 Gb 0 b 10602
aten::zero_ 0.61% 178.760ms 1.00% 290.513ms 26.492us 115.546ms 0.36% 588.507ms 53.667us 0 b 0 b 0 b 0 b 10966
aten::contiguous 0.01% 2.676ms 0.07% 19.737ms 68.531us 1.116ms 0.00% 468.548ms 1.627ms 0 b 0 b 121.73 Gb 0 b 288
aten::log_softmax 0.03% 8.701ms 0.11% 31.105ms 216.007us 948.000us 0.00% 908.997ms 6.312ms 0 b 0 b 121.72 Gb 0 b 144
aten::nll_loss_nd 0.01% 1.682ms 0.06% 16.847ms 116.993us 1.131ms 0.00% 16.175ms 112.326us 0 b 0 b 108.00 Kb 0 b 144
aten::ones_like 0.00% 1.437ms 0.01% 3.691ms 51.264us 494.000us 0.00% 1.188ms 16.500us 0 b 0 b 36.00 Kb 0 b 72
autograd::engine::evaluate_function: DivBackward0 0.02% 5.742ms 0.06% 16.720ms 232.222us 396.000us 0.00% 1.143ms 15.875us -576 b -576 b 36.00 Kb 0 b 72
DivBackward0 0.01% 4.130ms 0.04% 10.978ms 152.472us 333.000us 0.00% 747.000us 10.375us 0 b 0 b 36.00 Kb 0 b 72
NllLossBackward0 0.01% 2.511ms 0.04% 12.848ms 178.444us 352.000us 0.00% 103.094ms 1.432ms 0 b 0 b 52.76 Gb 0 b 72
LogSoftmaxBackward0 0.01% 2.358ms 0.02% 5.541ms 76.958us 284.000us 0.00% 386.379ms 5.366ms 0 b 0 b 52.76 Gb 0 b 72
ViewBackward0 0.70% 203.592ms 1.99% 580.223ms 33.300us 148.619ms 0.46% 508.898ms 29.207us 0 b 0 b 8.01 Gb 0 b 17424
autograd::engine::evaluate_function: CloneBackward0 0.01% 1.960ms 0.01% 2.011ms 27.931us 283.000us 0.00% 375.000us 5.208us 0 b 0 b 0 b 0 b 72
CloneBackward0 0.00% 51.000us 0.00% 51.000us 0.708us 92.000us 0.00% 92.000us 1.278us 0 b 0 b 0 b 0 b 72
SliceBackward0 0.29% 85.760ms 4.07% 1.188s 250.023us 52.227ms 0.16% 1.161s 244.340us 0 b 0 b 121.51 Gb 0 b 4752
aten::slice_backward 1.32% 383.845ms 3.78% 1.102s 231.976us 96.864ms 0.30% 1.109s 233.349us 0 b 0 b 121.51 Gb 0 b 4752
ToCopyBackward0 0.19% 56.094ms 1.17% 341.089ms 70.707us 29.413ms 0.09% 454.542ms 94.225us 0 b 0 b 62.13 Gb 0 b 4824
autograd::engine::evaluate_function: UnsafeViewBackw... 0.39% 115.070ms 1.07% 310.879ms 38.210us 70.026ms 0.22% 218.950ms 26.911us 0 b 0 b 0 b 0 b 8136
UnsafeViewBackward0 0.31% 90.571ms 0.67% 195.809ms 24.067us 62.522ms 0.20% 148.924ms 18.304us 0 b 0 b 0 b 0 b 8136
MmBackward0 1.66% 483.092ms 5.27% 1.536s 188.772us 188.336ms 0.59% 2.327s 285.975us 0 b 0 b 65.65 Gb 0 b 8136
autograd::engine::evaluate_function: TBackward0 0.39% 114.703ms 1.36% 396.536ms 48.738us 65.188ms 0.20% 268.897ms 33.050us 0 b 0 b 0 b 0 b 8136
TBackward0 0.30% 87.155ms 0.97% 281.833ms 34.640us 58.321ms 0.18% 203.709ms 25.038us 0 b 0 b 0 b 0 b 8136
MulBackward0 1.18% 344.865ms 2.09% 608.417ms 57.878us 103.641ms 0.32% 840.206ms 79.928us 0 b 0 b 117.62 Gb 0 b 10512
aten::detach 0.44% 128.551ms 0.51% 148.859ms 18.256us 117.022ms 0.36% 184.291ms 22.601us 0 b 0 b 0 b 0 b 8154
detach 0.07% 20.308ms 0.07% 20.308ms 2.491us 67.269ms 0.21% 67.269ms 8.250us 0 b 0 b 0 b 0 b 8154
autograd::engine::evaluate_function: AddBackward0 0.43% 123.973ms 0.43% 125.329ms 17.945us 70.726ms 0.22% 93.223ms 13.348us 0 b 0 b 0 b 0 b 6984
AddBackward0 0.00% 1.356ms 0.00% 1.356ms 0.194us 22.497ms 0.07% 22.497ms 3.221us 0 b 0 b 0 b 0 b 6984
MeanBackward1 0.21% 62.689ms 0.61% 179.244ms 75.439us 59.918ms 0.19% 167.225ms 70.381us 0 b -3.56 Kb 23.62 Gb 0 b 2376
SiluBackward0 0.05% 15.852ms 0.13% 37.573ms 32.615us 6.828ms 0.02% 101.593ms 88.188us 0 b 0 b 15.20 Gb 0 b 1152
IndexPutFirstAxisBackward 0.13% 36.584ms 0.37% 106.745ms 92.661us 22.589ms 0.07% 93.369ms 81.049us 0 b 0 b 3.11 Gb 0 b 1152
aten::scatter_ 0.41% 118.517ms 0.46% 135.402ms 39.179us 116.267ms 0.36% 144.801ms 41.898us 0 b 0 b 0 b 0 b 3456
autograd::engine::evaluate_function: TransposeBackwa... 0.34% 100.595ms 1.01% 293.282ms 42.431us 82.314ms 0.26% 259.290ms 37.513us 0 b 0 b 0 b 0 b 6912
TransposeBackward0 0.30% 87.926ms 0.66% 192.687ms 27.877us 65.783ms 0.21% 176.976ms 25.604us 0 b 0 b 0 b 0 b 6912
autograd::engine::evaluate_function: CatBackward0 0.50% 145.633ms 1.69% 492.024ms 213.552us 29.222ms 0.09% 194.766ms 84.534us 0 b 0 b 0 b 0 b 2304
CatBackward0 0.42% 123.958ms 1.19% 346.391ms 150.343us 39.117ms 0.12% 165.544ms 71.851us 0 b 0 b 0 b 0 b 2304
autograd::engine::evaluate_function: NegBackward0 0.35% 103.392ms 0.59% 172.790ms 74.996us 26.157ms 0.08% 102.905ms 44.664us 0 b 0 b 4.04 Gb 0 b 2304
NegBackward0 0.10% 28.331ms 0.24% 69.398ms 30.121us 21.194ms 0.07% 76.748ms 33.311us 0 b 0 b 4.04 Gb 0 b 2304
EmbeddingBackward0 0.00% 1.008ms 0.19% 56.678ms 787.194us 420.000us 0.00% 56.731ms 787.931us 0 b 0 b 4.35 Gb 0 b 72
aten::embedding_backward 0.00% 985.000us 0.19% 55.670ms 773.194us 458.000us 0.00% 56.311ms 782.097us 0 b 0 b 4.35 Gb 0 b 72
aten::isnan 0.01% 2.431ms 0.03% 7.734ms 107.417us 2.343ms 0.01% 8.425ms 117.014us 0 b 0 b 36.00 Kb 0 b 72
aten::is_nonzero 0.02% 4.438ms 0.06% 16.192ms 112.444us 4.298ms 0.01% 17.367ms 120.604us 0 b 0 b 0 b 0 b 144
aten::abs 0.04% 12.994ms 0.10% 28.959ms 201.104us 15.918ms 0.05% 31.917ms 221.646us 0 b 0 b 72.00 Kb 0 b 144
aten::select 0.10% 30.362ms 0.10% 30.581ms 11.326us 27.965ms 0.09% 38.177ms 14.140us 0 b 0 b 0 b 0 b 2700
aten::stack 0.08% 24.474ms 0.18% 51.517ms 2.862ms 14.295ms 0.04% 51.702ms 2.872ms 0 b 0 b 9.00 Kb 0 b 18
aten::_foreach_mul_ 0.20% 58.016ms 0.23% 68.430ms 1.267ms 79.845ms 0.25% 93.483ms 1.731ms 0 b 0 b 0 b 0 b 54
aten::zeros_like 0.02% 5.776ms 0.07% 19.178ms 65.678us 4.574ms 0.01% 20.422ms 69.938us 0 b 0 b 415.56 Mb 0 b 292
aten::_foreach_add_ 0.19% 55.766ms 0.72% 209.848ms 5.829ms 72.482ms 0.23% 210.497ms 5.847ms 0 b -5.13 Kb 0 b 0 b 36
aten::_foreach_lerp_ 0.00% 1.104ms 0.01% 2.346ms 130.333us 21.642ms 0.07% 21.642ms 1.202ms 0 b 0 b 0 b 0 b 18
aten::_foreach_addcmul_ 0.09% 25.974ms 0.10% 29.834ms 1.657ms 31.787ms 0.10% 35.281ms 1.960ms 0 b 0 b 0 b 0 b 18
aten::_foreach_sqrt 0.09% 27.613ms 0.25% 71.749ms 3.986ms 58.009ms 0.18% 85.408ms 4.745ms 0 b 0 b 3.66 Gb 0 b 18
aten::_foreach_div_ 0.08% 23.658ms 0.11% 31.191ms 1.733ms 29.365ms 0.09% 33.190ms 1.844ms 0 b 0 b 0 b 0 b 18
aten::_foreach_addcdiv_ 0.09% 25.719ms 0.09% 26.183ms 1.455ms 35.658ms 0.11% 40.303ms 2.239ms 0 b 0 b 0 b 0 b 18
aten::sub_ 0.00% 366.000us 0.00% 563.000us 31.278us 683.000us 0.00% 683.000us 37.944us 0 b 0 b 0 b 0 b 18
aten::repeat 0.02% 4.643ms 0.03% 9.274ms 128.806us 1.257ms 0.00% 3.154ms 43.806us 0 b 0 b 36.00 Kb 0 b 72
aten::alias 0.00% 52.000us 0.00% 52.000us 0.722us 102.000us 0.00% 102.000us 1.417us 0 b 0 b 0 b 0 b 72
aten::unfold 0.00% 817.000us 0.00% 820.000us 11.389us 274.000us 0.00% 373.000us 5.181us 0 b 0 b 0 b 0 b 72
aten::expand_as 0.00% 709.000us 0.01% 1.459ms 20.264us 291.000us 0.00% 666.000us 9.250us 0 b 0 b 0 b 0 b 72
aten::atleast_1d 0.00% 0.000us 0.00% 0.000us 0.000us 130.000us 0.00% 130.000us 1.204us 0 b 0 b 0 b 0 b 108
aten::resolve_conj 0.00% 2.000us 0.00% 2.000us 0.111us 81.000us 0.00% 81.000us 4.500us 0 b 0 b 0 b 0 b 18
aten::resolve_neg 0.00% 0.000us 0.00% 0.000us 0.000us 75.000us 0.00% 75.000us 4.167us 0 b 0 b 0 b 0 b 18
cudaDeviceSynchronize 0.00% 56.000us 0.00% 56.000us 56.000us 0.000us 0.00% 0.000us 0.000us 0 b 0 b 0 b 0 b 1
aten::isinf 0.02% 5.949ms 0.10% 28.848ms 400.667us 3.777ms 0.01% 29.592ms 411.000us 0 b 0 b 36.50 Kb -35.50 Kb 72
aten::nll_loss 0.01% 3.336ms 0.05% 15.165ms 105.312us 585.000us 0.00% 15.044ms 104.472us 0 b 0 b 108.00 Kb -36.00 Kb 144
autograd::engine::evaluate_function: NllLossBackward... 0.01% 3.823ms 0.06% 16.671ms 231.542us 312.000us 0.00% 103.406ms 1.436ms 0 b 0 b 52.75 Gb -2.28 Mb 72
aten::_foreach_norm 0.10% 28.942ms 0.21% 61.034ms 3.391ms 23.044ms 0.07% 61.107ms 3.395ms 0 b 0 b 9.00 Kb -4.97 Mb 18
RsqrtBackward0 0.32% 93.556ms 1.01% 294.391ms 123.902us 61.198ms 0.19% 163.064ms 68.630us 0 b -3.05 Kb 44.67 Mb -65.88 Mb 2376
autograd::engine::evaluate_function: RsqrtBackward0 0.14% 41.239ms 1.15% 335.630ms 141.258us 22.690ms 0.07% 185.754ms 78.179us 0 b 0 b -36.85 Mb -81.52 Mb 2376
aten::embedding_dense_backward 0.06% 16.368ms 0.19% 54.685ms 759.514us 39.926ms 0.12% 55.853ms 775.736us 0 b 0 b 4.35 Gb -699.01 Mb 72
autograd::engine::evaluate_function: torch::autograd... 0.50% 145.359ms 1.44% 419.437ms 39.901us 96.994ms 0.30% 366.324ms 34.848us 0 b 0 b -11.31 Gb -3.31 Gb 10512
Optimizer.step#AdamW.step 0.38% 112.070ms 2.13% 620.209ms 34.456ms 58.660ms 0.18% 641.401ms 35.633ms 584 b 60 b 415.56 Mb -3.66 Gb 18
IndexFirstAxisBackward 1.45% 424.307ms 4.00% 1.167s 337.531us 305.988ms 0.95% 1.062s 307.336us 0 b 0 b 10.11 Gb -5.16 Gb 3456
autograd::engine::evaluate_function: IndexFirstAxisB... 0.24% 69.146ms 4.24% 1.236s 357.539us 59.360ms 0.19% 1.122s 324.512us 0 b 0 b 4.36 Gb -5.75 Gb 3456
autograd::engine::evaluate_function: IndexPutFirstAx... 0.07% 19.629ms 0.43% 126.374ms 109.700us 14.257ms 0.04% 107.626ms 93.425us 0 b 0 b -2.77 Gb -5.88 Gb 1152
torch::autograd::AccumulateGrad 0.49% 143.360ms 0.94% 274.078ms 26.073us 98.628ms 0.31% 269.330ms 25.621us 0 b 0 b -8.00 Gb -8.00 Gb 10512
FlashAttnVarlenFunc 0.93% 271.989ms 3.81% 1.113s 482.948us 293.156ms 0.91% 1.161s 504.031us 0 b 0 b 13.65 Gb -8.32 Gb 2304
autograd::engine::evaluate_function: EmbeddingBackwa... 0.01% 2.462ms 0.21% 60.210ms 836.250us 623.000us 0.00% 82.191ms 1.142ms 0 b 0 b -356.50 Mb -9.06 Gb 72
autograd::engine::evaluate_function: FlashAttnVarlen... 0.12% 35.821ms 2.74% 798.668ms 693.288us 29.646ms 0.09% 726.970ms 631.050us 0 b 0 b -6.78 Gb -12.51 Gb 1152
FlashAttnVarlenFuncBackward 1.15% 335.956ms 2.62% 762.847ms 662.194us 333.446ms 1.04% 697.324ms 605.316us 0 b 0 b 5.73 Gb -23.17 Gb 1152
autograd::engine::evaluate_function: MeanBackward1 0.16% 45.519ms 0.77% 224.763ms 94.597us 27.750ms 0.09% 194.975ms 82.060us 0 b 0 b -32.75 Mb -23.66 Gb 2376
autograd::engine::evaluate_function: SiluBackward0 0.07% 21.614ms 0.20% 59.187ms 51.378us 6.127ms 0.02% 107.720ms 93.507us 0 b 0 b -15.17 Gb -30.38 Gb 1152
autograd::engine::evaluate_function: ViewBackward0 1.04% 302.567ms 3.21% 936.930ms 53.772us 175.760ms 0.55% 803.855ms 46.135us 0 b 0 b -17.49 Gb -37.11 Gb 17424
PowBackward0 0.38% 111.004ms 1.21% 353.855ms 148.929us 67.970ms 0.21% 509.822ms 214.572us 0 b -3.42 Kb 28.86 Gb -41.84 Gb 2376
aten::cross_entropy_loss 0.01% 3.407ms 0.18% 51.359ms 356.660us 1.512ms 0.00% 926.684ms 6.435ms 0 b 0 b 69.68 Gb -52.04 Gb 144
autograd::engine::evaluate_function: PowBackward0 0.28% 81.017ms 1.58% 461.883ms 194.395us 23.884ms 0.07% 673.733ms 283.558us 0 b 0 b -47.23 Gb -76.09 Gb 2376
autograd::engine::evaluate_function: MmBackward0 0.54% 156.511ms 5.80% 1.692s 208.008us 52.411ms 0.16% 2.379s 292.417us 0 b 0 b -34.09 Gb -99.74 Gb 8136
autograd::engine::evaluate_function: ToCopyBackward0... 0.39% 113.244ms 1.64% 477.933ms 99.074us 36.161ms 0.11% 561.412ms 116.379us 0 b 0 b -38.06 Gb -100.19 Gb 4824
autograd::engine::evaluate_function: LogSoftmaxBackw... 0.01% 2.583ms 0.03% 8.124ms 112.833us 282.000us 0.00% 386.661ms 5.370ms 0 b 0 b -52.76 Gb -105.51 Gb 72
autograd::engine::evaluate_function: SliceBackward0 0.50% 146.626ms 4.66% 1.359s 285.940us 66.966ms 0.21% 1.281s 269.546us 0 b 0 b -3.97 Gb -125.48 Gb 4752
autograd::engine::evaluate_function: MulBackward0 1.20% 349.259ms 3.82% 1.113s 105.873us 135.042ms 0.42% 1.190s 113.231us 0 b 0 b -27.10 Gb -144.75 Gb 10512
[memory] 0.00% 0.000us 0.00% 0.000us 0.000us 0.000us 0.00% 0.000us 0.000us -3.59 Gb -3.59 Gb -744.01 Gb -744.01 Gb 120645
------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Self CPU time total: 29.169s
Self CUDA time total: 32.062s