1. 30 Jan, 2024 1 commit
  2. 26 Jan, 2024 13 commits
    • Dean Wyatte's avatar
      use static rotary embedding · f2a00be1
      Dean Wyatte authored
      f2a00be1
    • Nicolas Patry's avatar
      Fixing top_n_tokens. (#1497) · 069895b9
      Nicolas Patry authored
      # What does this PR do?
      
      Superseeds #1459
      
      The fix works as follows.
      We updated next_token_chooser to return all logprbs, then
      batch_top_n_tokens, now also gets accepted_ids + speculated_length (so
      it knows how to interpret the flat logprobs).
      
      We then update the code to return lists ot `Tokens` that it expects.
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      069895b9
    • OlivierDehaene's avatar
      v1.4.0 (#1494) · c2d4a3b5
      OlivierDehaene authored
      c2d4a3b5
    • drbh's avatar
      feat: add tokenizer-config-path to launcher args (#1495) · d9758851
      drbh authored
      This PR adds the `tokenizer-config-path` to the launcher and passes it
      to the router
      
      Fixes:
      https://github.com/huggingface/text-generation-inference/pull/1427
      d9758851
    • fxmarty's avatar
      GPTQ support on ROCm (#1489) · 650fea18
      fxmarty authored
      
      Tested with
      ```
      CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq
      EXLLAMA_VERSION=1 CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq
      CUDA_VISIBLE_DEVICES="0,1" text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq
      ```
      
      all with good and identical results on MI210.
      
      ---------
      Co-authored-by: default avatarFelix Marty <felix@hf.co>
      Co-authored-by: default avatarOlivierDehaene <olivier@huggingface.co>
      Co-authored-by: default avatarOlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
      650fea18
    • Nicolas Patry's avatar
      ebecc061
    • Andrés Restrepo's avatar
      fix: launcher doc typos (#1462) · 50a20a83
      Andrés Restrepo authored
      # What does this PR do?
      
      fixes launcher doc typos
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [x] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      @OlivierDehaene OR @Narsil
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      50a20a83
    • Nicolas Patry's avatar
      Trying to fix that flaky test. (#1491) · 4c7315dd
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      4c7315dd
    • Nicolas Patry's avatar
      Add sealion mpt support (#1477) · ac499727
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarChoon Meng Tan <choonmeng@aisingapore.org>
      Co-authored-by: default avatarDavid Ong Tat-Wee <13075447+ongtw@users.noreply.github.com>
      ac499727
    • Nicolas Patry's avatar
      Reinstate exl2 with tp (#1490) · b9573218
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      b9573218
    • Nicolas Patry's avatar
      fix: launcher doc typos (#1473) · 16958fe3
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation
      
      ).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      
      ---------
      Co-authored-by: default avatarAndres Restrepo <andres@thelinuxkid.com>
      16958fe3
    • drbh's avatar
      fix: show warning with tokenizer config parsing error (#1488) · 13dd8e23
      drbh authored
      This tiny PR just prints the parsing error when a tokenizer config fails
      to load.
      
      This is helpful when a chat_template wont load due to formatting issues
      https://github.com/huggingface/text-generation-inference/pull/1427#issuecomment-1909226388
      13dd8e23
    • Nicolas Patry's avatar
      Update the docs · 17b7b75e
      Nicolas Patry authored
      17b7b75e
  3. 25 Jan, 2024 3 commits
    • OlivierDehaene's avatar
      fix: read stderr in download (#1486) · 9c320e26
      OlivierDehaene authored
      #1186
      9c320e26
    • drbh's avatar
      feat: adds phi model (#1442) · 7e2a7433
      drbh authored
      This PR adds basic modeling for phi-2 
      
      run
      ```bash
      text-generation-server \
          serve \
          microsoft/phi-2 \
          --revision 834565c23f9b28b96ccbeabe614dd906b6db551a
      ```
      
      
      test
      ```bash
      curl -s localhost:3000/generate \
         -X POST \
         -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
         -H 'Content-Type: application/json' | jq .
      # {
      #   "generated_text": "\nDeep learning is a subset of machine learning that uses artificial neural networks to learn from data. These"
      # }
      ```
      
      
      
      notes 
      - recently (~1 day ago) the Phi weights and model were updated to
      accommodate adding [GQA/MQA attention to the
      model.](https://github.com/huggingface/transformers/pull/28163) This
      impl expects the original model format so a fixed revision is required
      at the moment.
      - this PR only includes a basic implementation of the model and can
      later be extended for support Flash and Sharded versions as well as make
      use of better optimization
      7e2a7433
    • Nicolas Patry's avatar
      Add a new `/tokenize` route to get the tokenized input (#1471) · 86c8335f
      Nicolas Patry authored
      # What does this PR do?
      
      
      Ideally this is done client side, but this is a recurring request,
      therefore we implemented it.
      
      - Runs only if rust tokenizer is present (not encumbering the main
      inference pipeline is important).
      - Returns simple results, ID, text (gotten with offsets from the
      original string) and offsets (so users can do things like highlighting
      text).
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      86c8335f
  4. 24 Jan, 2024 2 commits
    • drbh's avatar
      Add messages api compatibility docs (#1478) · 7872b8c5
      drbh authored
      This PR adds a new page to the docs that describes the Messages API and
      how to use it.
      
      Additionally this page will contain cloud provider specific information
      for enabling and using this feature. This PR includes a SageMaker
      example/information.
      7872b8c5
    • Nicolas Patry's avatar
      Fixing non divisible embeddings. (#1476) · 7e542d4d
      Nicolas Patry authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      7e542d4d
  5. 23 Jan, 2024 1 commit
    • Jacob Keisling's avatar
      Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp... · 82f87ada
      Jacob Keisling authored
      Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API (#1470)
      
      This PR makes some minor tweaks to the new OpenAI-compatible chat
      endpoint #1427 in `GenerateParameters`:
      - Disables `decoder_input_details` when streaming is enabled. This was
      causing all streaming chat requests to fail before, since
      [`decoder_input_details`==true is not enabled when streaming
      tokens](https://github.com/huggingface/text-generation-inference/blob/98e5faff9daec6170cc2b0f963f2d73cf846b341/router/src/validation.rs#L406).
      - Passes through `temperature` and `top_p` hyperparameters from the API
      request to `GenerateParameters`
      
      ## Testing
      
      ```bash
      curl localhost:8080/v1/chat/completions \
          -X POST \
          -d '{
        "model": "",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "What is deep learning?"
          }
        ],
        "stream": true, 
        "max_tokens": 20
      }' \                                   
          -H 'Content-Type: application/json'
      ```
      
      Should work correctly. Currently, most recent release from `main`
      returns error:
      ```
      data:{"error":"Input validation error: `decoder_input_details` == true is not supported when streaming tokens","error_type":"validation"}
      ```
      
      It's my first time contributing to this project, so I could be missing
      something. Would especially appreciate @drbh's eyes on this one
      82f87ada
  6. 22 Jan, 2024 2 commits
    • drbh's avatar
      feat: conditionally toggle chat on invocations route (#1454) · 98e5faff
      drbh authored
      This PR adds support for reading the `OAI_ENABLED` env var which will
      changes the function called when the `/invocations` is called.
      
      If `OAI_ENABLED=true` the `chat_completions` method is used otherwise it
      defaults to `compat_generate`.
      
      example running the router
      ```bash
      OAI_ENABLED=true \
        cargo run -- \
        --tokenizer-name mistralai/Mistral-7B-Instruct-v0.2
      ```
      
      example request
      ```bash
      curl localhost:3000/invocations \
          -X POST \
          -d '{ "model": "tgi", "messages": [ { "role": "user", "content": "What is the IP address of the Google DNS servers?" } ], "stream": false, "max_tokens": 20, "logprobs": true, "seed": 0 }' \
          -H 'Content-Type: application/json' | jq 
      ```
      
      **please let me know if any naming changes are needed or if any other
      routes need similar functionality.
      98e5faff
    • drbh's avatar
      chore: bump rust version and annotate/fix all clippy warnings (#1455) · becd0997
      drbh authored
      This PR just bumps the latest rust version and makes clippy happy
      
      ```bash
      cargo clippy --all -- -D warnings
      #    Finished dev [unoptimized + debuginfo] target(s) in 0.10s
      ```
      becd0997
  7. 18 Jan, 2024 1 commit
    • drbh's avatar
      feat: support raise_exception, bos and eos tokens (#1450) · 3ccb3bb0
      drbh authored
      This PR adds support to handle the custom jinja function
      `raise_exception` and passes the `bos` and `eos` tokens into the
      template
      
      Additionally this PR adds 3 tests to validate and show examples of what
      can and cannot be parsed currently.
      
      ```bash
      cargo test --package text-generation-router --lib -- infer::tests --nocapture
      #     Finished test [unoptimized + debuginfo] target(s) in 7.82s
      #      Running unittests src/lib.rs (target/debug/deps/text_generation_router-18a0bbf99c2ca1b4)
      
      # running 3 tests
      # test infer::tests::test_chat_template_valid_with_raise ... ok
      # test infer::tests::test_chat_template ... ok
      # test infer::tests::test_chat_template_invalid_with_raise ... ok
      
      # test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 15 filtered out; finished in 0.00s
      ```
      3ccb3bb0
  8. 16 Jan, 2024 1 commit
    • drbh's avatar
      feat: supports openai chat completions API (#1427) · 0eabc835
      drbh authored
      This PR adds support to make TGI a drop in replacement for OpenAI
      clients by exposing the same HTTP interface.
      
      Notes
      - TGI inits a single model at startup so the `model` field is unused in
      HTTP requests.
      - `max_tokens` and `stream` should work as expected but other params may
      be (unimplemented or not supported)
      
      General approach
      - fetch the `tokenizer_config` at startup from the hub
      - pass `tokenizer_config` into `Infer` so we have it at request time
      - use the `chat_template` on the config to format chat request
      - parse jinja template and render chat string
      - pass inputs into existing generate function
      - wrap generation output in expected structure before returning
      
      # How to test
      
      ### Streaming curl
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{
        "model": "tgi",
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "What is deep learning?"
          }
        ],
        "stream": true,
        "max_tokens": 20
      }' \
          -H 'Content-Type: application/json'
      ```
      
      
      It is also possible to use the `openai` python library and change the
      base url
      
      ###  🌊 STREAMING REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=True
      )
      
      # iterate and print stream
      for message in chat_completion:
          print(message)
      
      # ChatCompletionChunk(id='', choices=[Choice(delta=ChoiceDelta(content=' that', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=2, logprobs=None)], created=1704486761, model='', object='text_completion', system_fingerprint='')
      ```
      
      ### 🚗 SYNCHRONOUS REQUEST
      ```python
      from openai import OpenAI
      
      # init the client but point it to TGI
      client = OpenAI(
          base_url="http://localhost:3000/v1",
          api_key="not needed for a local LLM"
      )
      
      chat_completion = client.chat.completions.create(
          model="tgi",
          messages=[
              {"role": "system", "content": "You are a helpful assistant." },
              {"role": "user", "content": "What is deep learning?"}
          ],
          stream=False
      )
      
      print(chat_completion)
      # ChatCompletion(id='', choices=[Choice(finish_reason=None, index=0, logprobs=None, message=ChatCompletionMessage(content='\nDeep learning is a new field of research that has been gaining traction in the last ...', role='assistant', function_call=None, tool_calls=None))], created=1704486762, model='', object='text_completion', system_fingerprint='', usage=CompletionUsage(completion_tokens=100, prompt_tokens=76, total_tokens=176))
      ```
      
      
      ## How to run dev
      
      ```bash
      cd text-generation-inference/server
      MASTER_ADDR=127.0.0.1 MASTER_PORT=5555 text-generation-server serve --trust-remote-code gpt2
      ```
      
      ***note many of the existing `chat_templates` use non standard `jinja`
      (ie. adding a `raise` to the template) which will throw an error when
      parsing; hence using `upstage/SOLAR-10.7B-Instruct-v1.0` since it has a
      valid template
      ```bash
      cd text-generation-inference/router
      cargo run -- --tokenizer-name upstage/SOLAR-10.7B-Instruct-v1.0
      ```
      
      trigger
      ```bash
      curl localhost:3000/v1/chat/completions \
          -X POST \
          -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is the IP address of the Google DNS servers?" } ], "stream": true, "max_tokens": 20, "logprobs": true }' \
          -H 'Content-Type: application/json'
      ```
      
      ^ supports `stream: true` and `stream: false` requests
      0eabc835
  9. 11 Jan, 2024 1 commit
    • Nicolas Patry's avatar
      Return prompt vs generated tokens. (#1436) · ac08b4ef
      Nicolas Patry authored
      # What does this PR do?
      
      Fixes #637 
       
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      ac08b4ef
  10. 10 Jan, 2024 2 commits
    • PYNing's avatar
      Fix local load for Medusa (#1420) · da27fbdf
      PYNing authored
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Close #1418 
      Close #1415
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      da27fbdf
    • OlivierDehaene's avatar
      fix: follow base model for tokenizer in router (#1424) · fbeb1c44
      OlivierDehaene authored
      Close #1422
      fbeb1c44
  11. 09 Jan, 2024 3 commits
  12. 22 Dec, 2023 1 commit
  13. 21 Dec, 2023 3 commits
    • Nicolas Patry's avatar
      Fix local load for peft (#1373) · 529d7c25
      Nicolas Patry authored
      local directory overloaded still needs the directory to locate the
      weights files correctly.
      
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      529d7c25
    • OlivierDehaene's avatar
      564199ba
    • regisss's avatar
      987c959f
  14. 20 Dec, 2023 1 commit
    • Nicolas Patry's avatar
      Peft safetensors. (#1364) · eb8923a9
      Nicolas Patry authored
      Works by removing adapter_model.safetensors from being detected as the
      core model file (which skips the real peft detection).
      
      # What does this PR do?
      
      <!--
      Congratulations! You've made it this far! You're not quite done yet
      though.
      
      Once merged, your PR is going to appear in the release notes with the
      title you set, so make sure it's a great title that fully reflects the
      extent of your awesome contribution.
      
      Then, please replace this with a description of the change and which
      issue is fixed (if applicable). Please also include relevant motivation
      and context. List any dependencies (if any) that are required for this
      change.
      
      Once you're done, someone will review your PR shortly (see the section
      "Who can review?" below to tag some potential reviewers). They may
      suggest changes to make the code even better. If no one reviewed your PR
      after a week has passed, don't hesitate to post a new comment
      @-mentioning the same persons---sometimes notifications get lost.
      -->
      
      <!-- Remove if not applicable -->
      
      Fixes # (issue)
      
      
      ## Before submitting
      - [ ] This PR fixes a typo or improves the docs (you can dismiss the
      other checks if that's the case).
      - [ ] Did you read the [contributor
      guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
            Pull Request section?
      - [ ] Was this discussed/approved via a Github issue or the
      [forum](https://discuss.huggingface.co/)? Please add a link
            to it if that's the case.
      - [ ] Did you make sure to update the documentation with your changes?
      Here are the
      [documentation
      guidelines](https://github.com/huggingface/transformers/tree/main/docs),
      and
      [here are tips on formatting
      docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
      - [ ] Did you write any new necessary tests?
      
      
      ## Who can review?
      
      Anyone in the community is free to review the PR once the tests have
      passed. Feel free to tag
      members/contributors who may be interested in your PR.
      
      <!-- Your PR will be replied to more quickly if you can figure out the
      right person to tag with @
      
      
      @OlivierDehaene OR @Narsil
      
       -->
      eb8923a9
  15. 18 Dec, 2023 2 commits
  16. 15 Dec, 2023 3 commits