Large Language Models (LLMs) have revolutionized the AI landscape, offering impressive language understanding and generation capabilities.

This article will guide you through building a Streamlit chat application that uses a local LLM, specifically the Llama 3.1 8b model from Meta, integrated via the Ollama library.
Prerequisites
Before we dive into the code, make sure you have the following installed:
- Python
- Streamlit
- Ollama
Setting Up Ollama and Downloading Llama 3.1 8b
First, you’ll need to install Ollama and download the Llama 3.1 8b model. Open your command line interface and execute the following commands:
# Install Ollama
pip install ollama
# Download Llama 3.1 8b model
ollama run llama3.1:8b
Creating the Modelfile
To create a custom model that integrates seamlessly with your Streamlit app, follow these steps:
- In your project directory, create a file named
Modelfile
without any extension. - Open
Modelfile
in a text editor and add the following content:
model: llama3.1:8b
This file instructs Ollama to use the Llama 3.1 8b model.
The code
Importing Libraries and Setting Up Logging
import streamlit as st
from llama_index.core.llms import ChatMessage
import logging
import time
from llama_index.llms.ollama import Ollama
logging.basicConfig(level=logging.INFO)
streamlit as st
: This imports Streamlit, a library for creating interactive web applications.ChatMessage
andOllama
: These are imported from thellama_index
library to handle chat messages and interact with the Llama model.logging
: This is used to log information, warnings, and errors, which helps in debugging and tracking the application’s behavior.time
: This library is used to measure the time taken to generate responses.
Initializing Chat History
if 'messages' not in st.session_state:
st.session_state.messages = []
st.session_state
: This is a Streamlit feature that allows you to store variables across different runs of the app. Here, it’s used to store the chat history.- The
if
statement checks if ‘messages’ is already insession_state
. If not, it initializes it as an empty list.
Function to Stream Chat Response
def stream_chat(model, messages):
try:
llm = Ollama(model=model, request_timeout=120.0)
resp = llm.stream_chat(messages)
response = ""
response_placeholder = st.empty()
for r in resp:
response += r.delta
response_placeholder.write(response)
logging.info(f"Model: {model}, Messages: {messages}, Response: {response}")
return response
except Exception as e:
logging.error(f"Error during streaming: {str(e)}")
raise e
stream_chat
: This function handles the interaction with the Llama model.Ollama(model=model, request_timeout=120.0)
: Initializes the Llama model with a specified timeout.llm.stream_chat(messages)
: Streams chat responses from the model.response_placeholder = st.empty()
: Creates a placeholder in the Streamlit app to dynamically update the response.- The
for
loop appends each part of the response to the final response string and updates the placeholder. logging.info
logs the model, messages, and response.- The
except
block catches and logs any errors that occur during the streaming process.
Main Function
def main():
st.title("Chat with LLMs Models")
logging.info("App started")
model = st.sidebar.selectbox("Choose a model", ["mymodel", "llama3.1 8b", "phi3", "mistral"])
logging.info(f"Model selected: {model}")
if prompt := st.chat_input("Your question"):
st.session_state.messages.append({"role": "user", "content": prompt})
logging.info(f"User input: {prompt}")
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.write(message["content"])
if st.session_state.messages[-1]["role"] != "assistant":
with st.chat_message("assistant"):
start_time = time.time()
logging.info("Generating response")
with st.spinner("Writing..."):
try:
messages = [ChatMessage(role=msg["role"], content=msg["content"]) for msg in st.session_state.messages]
response_message = stream_chat(model, messages)
duration = time.time() - start_time
response_message_with_duration = f"{response_message}\n\nDuration: {duration:.2f} seconds"
st.session_state.messages.append({"role": "assistant", "content": response_message_with_duration})
st.write(f"Duration: {duration:.2f} seconds")
logging.info(f"Response: {response_message}, Duration: {duration:.2f} s")
except Exception as e:
st.session_state.messages.append({"role": "assistant", "content": str(e)})
st.error("An error occurred while generating the response.")
logging.error(f"Error: {str(e)}")
if __name__ == "__main__":
main()
main
: This is the main function that sets up and runs the Streamlit app.st.title("Chat with LLMs Models")
: Sets the title of the app.model = st.sidebar.selectbox("Choose a model", ["mymodel", "llama3.1 8b", "phi3", "mistral"])
: Creates a dropdown menu in the sidebar for model selection.if prompt := st.chat_input("Your question")
: Takes user input and appends it to the chat history.- The
for
loop displays each message in the chat history. - The
if
statement checks if the last message is not from the assistant. If true, it generates a response from the model. with st.spinner("Writing...")
: Shows a spinner while the response is being generated.messages = [ChatMessage(role=msg["role"], content=msg["content"]) for msg in st.session_state.messages]
: Prepares the messages for the Llama model.response_message = stream_chat(model, messages)
: Calls thestream_chat
function to get the model’s response.duration = time.time() - start_time
: Calculates the time taken to generate the response.response_message_with_duration = f"{response_message}\n\nDuration: {duration:.2f} seconds"
: Appends the duration to the response message.st.session_state.messages.append({"role": "assistant", "content": response_message_with_duration})
: Adds the assistant’s response to the chat history.st.write(f"Duration: {duration:.2f} seconds")
: Displays the duration of the response generation.- The
except
block handles errors during the response generation and displays an error message.
https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fgiphy.com%2Fembed%2F3oKIPuE14D3PSO5Cgg%2Ftwitter%2Fiframe&display_name=Giphy&url=https%3A%2F%2Fmedia.giphy.com%2Fmedia%2F3oKIPuE14D3PSO5Cgg%2Fgiphy.gif%3Fcid%3D790b76114uw12f5hhl3qi6m6sgw1c517um808tuad46w5j89%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg&image=https%3A%2F%2Fmedia1.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExaGZjMzNuNThkbXN5c212Nm9icW5ncGRjNXliZ3NpaTdranMzODBrYiZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw%2F3oKIPuE14D3PSO5Cgg%2Fgiphy.gif&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=giphy
Running the Streamlit App
To run your Streamlit app, execute the following command in your project directory:
streamlit run app.py
Make sure your Ollama instance is running in the background to get any activity or results.


Training Models with Ollama
The same steps can be utilized to train models on different datasets using Ollama. Here’s how you can manage and train models with Ollama.

Ollama Commands
To use Ollama for model management and training, you’ll need to be familiar with the following commands:

Example: Creating and Using a Model
- Create a Modelfile: Create a
Modelfile
in your project directory with instructions for your custom model. - Content of Modelfile:
# Example content for creating a custom model
name: custom_model
base_model: llama3.1
data_path: /path/to/your/dataset
epochs: 10
3. Create the Model: Use the create
command to create a model from the Modelfile
.
ollama create -f Modelfile
4.Run the Model: Once the model is created, you can run it using:
ollama run custom_model
Integrate with Streamlit or whatever: You can integrate this custom model with your Streamlit application similarly to how you integrated the pre-trained models.
By following these steps, you can create a Streamlit application that interacts with local LLMs using the Ollama library.
C:\your\path\location>ollama
Usage:
ollama [flags]
ollama [command]
Available Commands:
serve Start ollama
create Create a model from a Modelfile
show Show information for a model
run Run a model
pull Pull a model from a registry
push Push a model to a registry
list List models
ps List running models
cp Copy a model
rm Remove a model
help Help about any command
Flags:
-h, - help help for ollama
-v, - version Show version information
Use "ollama [command] - help" for more information about a command.
Additionally, you can use the same steps and Ollama commands to train and manage models on different datasets. This flexibility allows you to leverage custom-trained models in your Streamlit applications, providing a more tailored and interactive user experience.
Implementation with Flask
This methodology can also be utilized to implement chat applications using Flask. Here is an outline for integrating Ollama with a Flask app:
Flask Application Setup
- Install Flask:
pip install Flask
2. Create a Flask App:
from flask import Flask, request, jsonify
from llama_index.core.llms import ChatMessage
from llama_index.llms.ollama import Ollama
import logging
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
messages = data.get('messages', [])
model = data.get('model', 'llama3.1 8b')
try:
llm = Ollama(model=model, request_timeout=120.0)
resp = llm.stream_chat(messages)
response = ""
for r in resp:
response += r.delta
logging.info(f"Model: {model}, Messages: {messages}, Response: {response}")
return jsonify({'response': response})
except Exception as e:
logging.error(f"Error during streaming: {str(e)}")
return jsonify({'error': str(e)}), 500
if __name__ == '__main__':
app.run(debug=True)
Running the Flask Application
Save the code in a file (e.g., app.py
) and run the following command:
python app.py
This will start the Flask application, and you can make POST requests to the /chat
endpoint with JSON data containing the messages and model to get responses from the Llama model.
Integrating Flask with Ollama
By following similar steps as shown for Streamlit, you can integrate Ollama with a Flask application. The stream_chat
function can be reused, and the Flask routes can handle the interaction with the model, making it easy to create scalable chat applications.
Conclusion
By following this guide, you’ve successfully set up a Streamlit chat application using a local LLM. This setup allows you to interact with powerful language models directly from your local machine, providing a visually appealing and interactive experience. Whether you’re asking general questions or delving into specific inquiries, your app is now equipped to handle it all.
https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fgiphy.com%2Fembed%2FXXMWSAMXq1ebfxcKdV%2Ftwitter%2Fiframe&display_name=Giphy&url=https%3A%2F%2Fmedia.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExOXozaW8xb2N2M2M2MTQ2Z3BjcjhvZDJhcHdscno0cmd1enVxbndxdiZlcD12MV9naWZzX3NlYXJjaCZjdD1n%2FXXMWSAMXq1ebfxcKdV%2Fgiphy.gif&image=https%3A%2F%2Fmedia1.giphy.com%2Fmedia%2Fv1.Y2lkPTc5MGI3NjExODBvM2wxdjhyNHRud2FsOTJ3M2tyaDUwYnlmaGExNzV5OGFnaGtvOCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw%2FXXMWSAMXq1ebfxcKdV%2Fgiphy.gif&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=giphy“Thank you for exploring the power of Large Language Models with us. Goodbye!”
Git Repo here : Click me!!
Engage with your app and explore the capabilities of LLMs, and make sure to share your experiences and any improvements you make. Happy coding!
Hope you found this article Informative