Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Open sidebar
lizhiping
text-generation-inference
Commits
4b10c8c3
Commit
4b10c8c3
authored
11 months ago
by
drbh
Browse files
Options
Download
Email Patches
Plain Diff
fix: improve scales change and revert conditional
parent
ab4d480d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
2 deletions
+5
-2
server/text_generation_server/layers/marlin/fp8.py
server/text_generation_server/layers/marlin/fp8.py
+5
-2
No files found.
server/text_generation_server/layers/marlin/fp8.py
View file @
4b10c8c3
...
...
@@ -38,9 +38,12 @@ class GPTQMarlinFP8Linear(nn.Module):
log_once
(
logger
.
info
,
"GPU does not support FP8, using Marlin FP8 kernel"
)
# if scales is a scalar (0D tensor), convert it to a 1D tensor
if
scales
.
dim
()
==
0
:
scales
=
scales
.
unsqueeze
(
0
)
scales
=
scales
.
unsqueeze
(
0
)
# repack weights for Marlin if a single scale is provided
if
scales
.
size
(
0
)
==
1
:
if
scales
.
shape
[
1
]
==
1
:
out_features
,
in_features
=
qweight
.
shape
scales
=
scales
.
repeat
(
1
,
out_features
)
qweight
,
scales
=
repack_fp8_for_marlin
(
qweight
,
scales
)
...
...
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment