Building a Llama 3.1 8b Streamlit Chat App with Local LLMs: A Step-by-Step Guide using Ollama

Large Language Models (LLMs) have revolutionized the AI landscape, offering impressive language understanding and generation capabilities.

Meta just released Llama 3.1 their most capable LLM model : https://ai.meta.com/blog/meta-llama-3-1/

This article will guide you through building a Streamlit chat application that uses a local LLM, specifically the Llama 3.1 8b model from Meta, integrated via the Ollama library.

Prerequisites

Before we dive into the code, make sure you have the following installed:

  • Python
  • Streamlit
  • Ollama

Setting Up Ollama and Downloading Llama 3.1 8b

First, you’ll need to install Ollama and download the Llama 3.1 8b model. Open your command line interface and execute the following commands:

# Install Ollama
pip install ollama

# Download Llama 3.1 8b model
ollama run llama3.1:8b

Creating the Modelfile

To create a custom model that integrates seamlessly with your Streamlit app, follow these steps:

  1. In your project directory, create a file named Modelfile without any extension.
  2. Open Modelfile in a text editor and add the following content:
model: llama3.1:8b

This file instructs Ollama to use the Llama 3.1 8b model.

The code

Importing Libraries and Setting Up Logging

import streamlit as st
from llama_index.core.llms import ChatMessage
import logging
import time
from llama_index.llms.ollama import Ollama

logging.basicConfig(level=logging.INFO)
  • streamlit as st: This imports Streamlit, a library for creating interactive web applications.
  • ChatMessage and Ollama: These are imported from the llama_index library to handle chat messages and interact with the Llama model.
  • logging: This is used to log information, warnings, and errors, which helps in debugging and tracking the application’s behavior.
  • time: This library is used to measure the time taken to generate responses.

Initializing Chat History

if 'messages' not in st.session_state:
st.session_state.messages = []
  • st.session_state: This is a Streamlit feature that allows you to store variables across different runs of the app. Here, it’s used to store the chat history.
  • The if statement checks if ‘messages’ is already in session_state. If not, it initializes it as an empty list.

Function to Stream Chat Response

def stream_chat(model, messages):
try:
llm = Ollama(model=model, request_timeout=120.0)
resp = llm.stream_chat(messages)
response = ""
response_placeholder = st.empty()
for r in resp:
response += r.delta
response_placeholder.write(response)
logging.info(f"Model: {model}, Messages: {messages}, Response: {response}")
return response
except Exception as e:
logging.error(f"Error during streaming: {str(e)}")
raise e
  • stream_chat: This function handles the interaction with the Llama model.
  • Ollama(model=model, request_timeout=120.0): Initializes the Llama model with a specified timeout.
  • llm.stream_chat(messages): Streams chat responses from the model.
  • response_placeholder = st.empty(): Creates a placeholder in the Streamlit app to dynamically update the response.
  • The for loop appends each part of the response to the final response string and updates the placeholder.
  • logging.info logs the model, messages, and response.
  • The except block catches and logs any errors that occur during the streaming process.

Main Function

def main():
st.title("Chat with LLMs Models")
logging.info("App started")

model = st.sidebar.selectbox("Choose a model", ["mymodel", "llama3.1 8b", "phi3", "mistral"])
logging.info(f"Model selected: {model}")

if prompt := st.chat_input("Your question"):
st.session_state.messages.append({"role": "user", "content": prompt})
logging.info(f"User input: {prompt}")

for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.write(message["content"])

if st.session_state.messages[-1]["role"] != "assistant":
with st.chat_message("assistant"):
start_time = time.time()
logging.info("Generating response")

with st.spinner("Writing..."):
try:
messages = [ChatMessage(role=msg["role"], content=msg["content"]) for msg in st.session_state.messages]
response_message = stream_chat(model, messages)
duration = time.time() - start_time
response_message_with_duration = f"{response_message}\n\nDuration: {duration:.2f} seconds"
st.session_state.messages.append({"role": "assistant", "content": response_message_with_duration})
st.write(f"Duration: {duration:.2f} seconds")
logging.info(f"Response: {response_message}, Duration: {duration:.2f} s")

except Exception as e:
st.session_state.messages.append({"role": "assistant", "content": str(e)})
st.error("An error occurred while generating the response.")
logging.error(f"Error: {str(e)}")

if __name__ == "__main__":
main()
  • main: This is the main function that sets up and runs the Streamlit app.
  • st.title("Chat with LLMs Models"): Sets the title of the app.
  • model = st.sidebar.selectbox("Choose a model", ["mymodel", "llama3.1 8b", "phi3", "mistral"]): Creates a dropdown menu in the sidebar for model selection.
  • if prompt := st.chat_input("Your question"): Takes user input and appends it to the chat history.
  • The for loop displays each message in the chat history.
  • The if statement checks if the last message is not from the assistant. If true, it generates a response from the model.
  • with st.spinner("Writing..."): Shows a spinner while the response is being generated.
  • messages = [ChatMessage(role=msg["role"], content=msg["content"]) for msg in st.session_state.messages]: Prepares the messages for the Llama model.
  • response_message = stream_chat(model, messages): Calls the stream_chat function to get the model’s response.
  • duration = time.time() - start_time: Calculates the time taken to generate the response.
  • response_message_with_duration = f"{response_message}\n\nDuration: {duration:.2f} seconds": Appends the duration to the response message.
  • st.session_state.messages.append({"role": "assistant", "content": response_message_with_duration}): Adds the assistant’s response to the chat history.
  • st.write(f"Duration: {duration:.2f} seconds"): Displays the duration of the response generation.
  • The except block handles errors during the response generation and displays an error message.

https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fgiphy.com%2Fembed%2F3oKIPuE14D3PSO5Cgg%2Ftwitter%2Fiframe&display_name=Giphy&url=https%3A%2F%2Fmedia.giphy.com%2Fmedia%2F3oKIPuE14D3PSO5Cgg%2Fgiphy.gif%3Fcid%3D790b76114uw12f5hhl3qi6m6sgw1c517um808tuad46w5j89%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg&image=https%3A%2F%2Fmedia1.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExaGZjMzNuNThkbXN5c212Nm9icW5ncGRjNXliZ3NpaTdranMzODBrYiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw%2F3oKIPuE14D3PSO5Cgg%2Fgiphy.gif&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=giphy

Running the Streamlit App

To run your Streamlit app, execute the following command in your project directory:

streamlit run app.py

Make sure your Ollama instance is running in the background to get any activity or results.

“Llama 3.1 8b generating a detailed response to the question ‘What are Large Language Models?’ in the Streamlit app.”
“Continuation of the conversation, showing the final part of the Llama 3.1 8b’s response about Large Language Models.”

Training Models with Ollama

The same steps can be utilized to train models on different datasets using Ollama. Here’s how you can manage and train models with Ollama.

“Interactive chat interface with Llama 3.1 8b in the Streamlit app, showcasing real-time response generation.”

Ollama Commands

To use Ollama for model management and training, you’ll need to be familiar with the following commands:

Example: Creating and Using a Model

  1. Create a Modelfile: Create a Modelfile in your project directory with instructions for your custom model.
  2. Content of Modelfile:
# Example content for creating a custom model
name: custom_model
base_model: llama3.1
data_path: /path/to/your/dataset
epochs: 10

3. Create the Model: Use the create command to create a model from the Modelfile.

ollama create -f Modelfile

4.Run the Model: Once the model is created, you can run it using:

ollama run custom_model

Integrate with Streamlit or whatever: You can integrate this custom model with your Streamlit application similarly to how you integrated the pre-trained models.

By following these steps, you can create a Streamlit application that interacts with local LLMs using the Ollama library.

C:\your\path\location>ollama
Usage:
ollama [flags]
ollama [command]
Available Commands:
serve Start ollama
create Create a model from a Modelfile
show Show information for a model
run Run a model
pull Pull a model from a registry
push Push a model to a registry
list List models
ps List running models
cp Copy a model
rm Remove a model
help Help about any command
Flags:
-h, - help help for ollama
-v, - version Show version information
Use "ollama [command] - help" for more information about a command.

Additionally, you can use the same steps and Ollama commands to train and manage models on different datasets. This flexibility allows you to leverage custom-trained models in your Streamlit applications, providing a more tailored and interactive user experience.

Implementation with Flask

This methodology can also be utilized to implement chat applications using Flask. Here is an outline for integrating Ollama with a Flask app:

Flask Application Setup

  1. Install Flask:
pip install Flask

2. Create a Flask App:

from flask import Flask, request, jsonify
from llama_index.core.llms import ChatMessage
from llama_index.llms.ollama import Ollama
import logging

app = Flask(__name__)
logging.basicConfig(level=logging.INFO)

@app.route('/chat', methods=['POST'])
def chat():
data = request.json
messages = data.get('messages', [])
model = data.get('model', 'llama3.1 8b')

try:
llm = Ollama(model=model, request_timeout=120.0)
resp = llm.stream_chat(messages)
response = ""
for r in resp:
response += r.delta
logging.info(f"Model: {model}, Messages: {messages}, Response: {response}")
return jsonify({'response': response})
except Exception as e:
logging.error(f"Error during streaming: {str(e)}")
return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
app.run(debug=True)

Running the Flask Application

Save the code in a file (e.g., app.py) and run the following command:

python app.py

This will start the Flask application, and you can make POST requests to the /chat endpoint with JSON data containing the messages and model to get responses from the Llama model.

Integrating Flask with Ollama

By following similar steps as shown for Streamlit, you can integrate Ollama with a Flask application. The stream_chat function can be reused, and the Flask routes can handle the interaction with the model, making it easy to create scalable chat applications.

Conclusion

By following this guide, you’ve successfully set up a Streamlit chat application using a local LLM. This setup allows you to interact with powerful language models directly from your local machine, providing a visually appealing and interactive experience. Whether you’re asking general questions or delving into specific inquiries, your app is now equipped to handle it all.

https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fgiphy.com%2Fembed%2FXXMWSAMXq1ebfxcKdV%2Ftwitter%2Fiframe&display_name=Giphy&url=https%3A%2F%2Fmedia.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExOXozaW8xb2N2M2M2MTQ2Z3BjcjhvZDJhcHdscno0cmd1enVxbndxdiZlcD12MV9naWZzX3NlYXJjaCZjdD1n%2FXXMWSAMXq1ebfxcKdV%2Fgiphy.gif&image=https%3A%2F%2Fmedia1.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExODBvM2wxdjhyNHRud2FsOTJ3M2tyaDUwYnlmaGExNzV5OGFnaGtvOCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw%2FXXMWSAMXq1ebfxcKdV%2Fgiphy.gif&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=giphy“Thank you for exploring the power of Large Language Models with us. Goodbye!”

Git Repo here : Click me!!

Engage with your app and explore the capabilities of LLMs, and make sure to share your experiences and any improvements you make. Happy coding!

Hope you found this article Informative

Leave a Reply

Your email address will not be published. Required fields are marked *