AI - LLMs - chatbots - Inferencing

Prologue


Funny story, in January 2025 my descent into madness began on X (fomerly Twitter) and I predicted myself getting banned, and not long later I achieved the end result of being permanently banned from X. It's a proud moment for me as I've never been perma banned from anything.

Anyways, moving on, If you've been following along with me on this website or my BlueSky then you've seen some stuff about my deep dive into AI software and Large Language Models (LLMs). Here I want to explore that a little further and explain what I've been up to.

For some context, I've been using ChatGPT for a few years now ever since it was still in its infancy and kind of cumbersome to do anything with. In the past year or so there's been a huge leap forward in what LLMs like ChatGPT can do including the latest new features such as using Agent Retrieval-Augmented Generation (Agent RAGs) and Chain of Thought like what you see with DeepSeek R1.

As a software developer for over 20 years I've always been keen to learn about and utilize the newest technologies to their maximum potential, and when I was a teenager I remember dreaming of how to build an artificial intelligence akin to the LLMs we have today. In fact the ideas I had were very similar to what we see in modern LLMs (and I knew nothing about the history of AI or neural networks, I just wanted to make one) in that I envisioned chatting with a command line and having the program (AI) understand what I was asking, no matter what, and answer accordingly or with a "I don't know" type of response. I actually started developing such a "chatbot" with hard coded variables, "knowledge" and regular expressions that could somewhat effectively figure out what you were asking and how to answer. The neat thing about that is I probably still have the code and program somewhere to this day...

The long story short of it is I've been fascinated by AI and the idea of AI since I was a young teenager and it's quite amazing to be alive to see it come to fruition.

The Server


I had been exploring more and more the idea of running Large Language Models locally in my own server room on my own hardware, stuff that could stay offline and have no way of phoning home and with the ability to run fully uncensored models, when I started to learn about software such as LM Studio, Text Generation WebUI, GPT4ALL and the like when I realized that these didn't give me the freedom and ability I truly sought after in running models locally.

It's now February 2025 and I had ordered a new dual PCIe GPU server (This one) so I could spin up an LLM in my server rack and something I could tinker with to try out and see what I could really do with. This led to finding some cheap Nvidia Tesla K80 GPUs (24GB+Dual GPU per card) to put in it and start my first true journey into LLMs and AI.

One thing led to another, the server is in the rack and now it's loaded with 256GB Ram, a 20-core (40 thread) E5-2698 V4, an Intel Datacenter NVMe PCIe SSD and 2x Nvidia Tesla K80 GPUs. This is where the problems begin. Most of the software I was testing with such as LM Studio and GPT4ALL didn't support the K80 with its ancient Compute Compatibility and CUDA versions, and so I did the most me-thing I could think of and compiled llama-server from source including the necessary core support for the old CC and CUDA versions (plus I added in parallelism support). This enabled me to use these old GPUs with modern LLM software and I was ecstatic! I was getting 30-40 response tokens per second while inferencing on DeepSeek R1 DISTILL QWEN 1.5B Q8 LM! Other models like 14B, 30/32B and 70B all worked just fine albeit a little slower, but with 48GB of VRAM I could load many mainstream models and run them at a base level.

It has become March and now that I had a half-decent AI server that was kinda slow with larger models paired with fully custom llama-server foundation it was time to upgrade the GPUs. I had and in fact still have many better GPUs in my inventory such as GTX 1080ti, GTX 1060, RTX 2070....and so on but I wanted to give this system a serious upgrade! I ordered and have now installed the first of 2 Nvidia Tesla V100 SXM2 GPUs!

The much newer V100 SXM2 GPU is a much more modern GPU with a form factor meant initially for the Nvidia DGX rackmount system and in some cases for self driving cards. This meant I needed a new server, but at the same time eBay exists, so instead I found someone who makes SXM2 to PCIe adapter boards that have a fan and shroud in the same dimensions that you'd find a standard PCIe Tesla GPU in.

Now the system cooks at 150 response tokens per second on the same model I had been benchmarking on before (~5x speed improvement) - AND I only have 1 V100 installed right now! The system sees about 50tps on DeepSeek R1 DISTILL QWEN 14B Q4 and I'm still very happy with that!

LLM Controller


Now that I have a capable server with a modern GPU I'm utilizing the available software to experiment with coding, writing, ideas, planning, and more, however as usual I am not happy with the given software and so I've started developing my own AI Inferencing software based on top of the llama-server software that I previously compiled and exposes a web UI for chat interface and runs on bare hardware.

To keep a dry long boring story short, I've already spent significant time developing this new software, which I'm currently referring to as "LLM Controller", and am already closing up the last touches on version 3.4. This software isn't yet fully mature, but here's a breakdown of my current features:

Backend & Communication:

  • A Flask application combined with SocketIO manages real-time chat interactions.
  • The app uses a SQLite database to persist chat sessions, messages, and associated analytics.
  • It launches an external llama-server via a subprocess and streams logs for real-time monitoring.
  • There’s a built-in auto-title generation mechanism that leverages the LLM itself.

Endpoints & Features:

  • REST endpoints handle chat sessions (creation, retrieval, deletion, renaming) and model operations (start, stop, status).
  • Additional endpoints provide analytics, logs, and even export functionalities for chats.
  • Admin settings are managed via a configuration file (settings.config) and can be updated through the UI.

Frontend:

  • A modern web UI (HTML/CSS/JS) with a sidebar, chat area, and multiple drawers for logs, analytics, and admin settings.
  • The JavaScript handles session management, dynamic UI updates, and integrates with SocketIO for chat updates.

Chat History Management:

  • Adding export and import capabilities for conversations (supporting JSON, CSV, Markdown) along with a quick “delete all” feature from the UI.

Enhanced REST API:

  • Exposing core functionalities (model start/stop, chat interactions, model status) via a REST API with an integrated API testing interface in the UI.

Expanded Admin Settings:

  • A more detailed configuration panel, enabling customizations such as model defaults, logging verbosity, and system preferences.

Improved Analytics Dashboard:

  • Integrating more comprehensive metrics (usage frequency, tokens per second, response time per model) and incorporating real-time GPU/CPU utilization charts for better resource management.

Future Roadmap (v4.0 and Beyond):

  • Database Upgrade: Transitioning from SQLite to MySQL (probably MySQL?) for better scalability and performance.
  • User Account System: Implementing sign-up, login, and role-based access (Admin vs. Client/User).
  • Notifications & Alerts: Integrating email/webhook alerts triggered by events like model crashes or performance thresholds.
  • Enhanced Security: Limiting model directory scans to admin users and ensuring multi-platform compatibility.
  • Advanced Features: Automated model benchmarking, GPU cluster management, and automated SSL certificate management.
  • Community Integration: Eventually opening up a marketplace for shared models, plugins, and configuration scripts.

HuggingFace: 

  • Support for downloading models from huggingface with easy to see compatibility for your system.

...I'm sure there's more, but at over 2500 lines of code so far this is roughly where I'm at 😅

Conclusion...


While this software is definitely in its infancy (even though I'm already on Version 3.4) still, I'm getting a lot deeper control of the hardware from this software than in any other AI/LLM/Inferencing software I've used before and it makes using the LLMs a lot more enjoyable where I have much deeper insights to what's going on and when I find bugs I can just fix them myself instead of waiting on some other developers timeline.

My end goal of this software is two-fold: release the code for everyone to enjoy, and build upon/use the software to run on top of my own hardware and incorporate it into my hosted services as a value-added feature (or not?). Time will tell, but one thing's for sure my passion for software development hasn't been this strong since the mid 2010s when I was making bank developing software for BlackBerry.

As always stay tuned to this website and my socials for updates about this project. It's been side-burnered for a few weeks because of work but I'm done my long stretch of working and will continue the journey of development probably tomorrow.

EDIT: I'll post some screenshots soon. GUI isn't my strong suit so I'm gonna improve that before posting screenshots

December 2024 Code Update

I still don't know what I want to name this web blog/CMS software I'm currently developing, or what to refer to it as quite yet. @ me on X, BlueSky or heck reply to a video on my (neglected) YouTube channel if you have any ideas! 😅

Features and Functions:

  • Create Blog Posts
  • Create "tweet" style posts
  • Create Test or any other style of posts
  • Edit/Delete Posts
  • Categorize and Tag posts
  • Create photo albums (Which makes its own style of blog post with a mini-gallery to show the photos)
  • Full rich text editor on the backend for posts
  • Upload single or multiple photos (including drag-n-drop)
  • File manager to manage already uploaded photos
  • Photo picker
  • Gallery Light-box
  • Image caching
  • Image thumbnail-ifying on the fly
  • GZIP encoding
  • User profile
  • Site customization including Title, logo, footer text, blog name, etc
  • Site analytics with: sophisticated backend, traffic analysis/logging, user-flow-path, most frequently visited pages, User Agent Distribution charts, DB-IP Geo-location estimations based on IP address, total hits, average visit length, referring sites...(and more features being added)
  • BlueSky post/delete integration (1 way integration right now - post to and delete that post from BlueSk - the visa-versa will be in a future update)
  • Image watermarking on-the-fly
  • Right click blocking on photos 
  • Secure user/session handling
  • Site login/Forgot Password/Reset Password functionality
  • Responsive design (I almost forgot to put that, because, isn't that the norm now!?)

I've built all of this website from the ground up, relying on as few external libraries, etc, as possible. Most of the outside code is JavaScript for GUI elements and functionality.


Some technical jargon:

The BlueSky code was largely written from scratch as its a new protocol for me and I had to figure it all out. Most of the other people online creating BlueSky integration use dependency managers like composer to build the code, and I do not use such dependency managers so I had to scrape everywhere I could to find code samples that would allow my integration to work without relying on a bloated library.

The text editor is a fully free/locally hosted version of the TinyMCE rich-text editor with many added features to boost the overall functionality for creating posts

The light-box code is a highly-adapted version of jQuery Magnific popup.

The gallery code on the Photography page is some code that I wrote from scratch after searching the internet endlessly for a suitable endless-scroll gallery that is also responsive and also handles cached images. 


The wrap-up:

I've said it before and I'll say it again: I've long-used nickdodd.com as a place to refine my craft of website development and have been building my personal websites like this for over 20 years. I take pride in writing my code from scratch using just a text-editor and creating nearly every bit of UI and other graphics on the site. 

The code I craft here usually lives on in my repertoire for years, if not decades, to come, so I always take as much time as I need to make it perfect, and for the first time ever I'm going to turn my website code into open-source software that anyone can use! 

I'm not sure quite yet when I'll start the official open-source project, but for now the code is an absolute mess and there are many more features and functions I'm working on and adding still. The code as it is now, is around 4500 lines of code, and is contained within a very limited number of files as I build out the functionality (also my preferred way to code). Once the code is closer to feature complete for a 1.0 build I will start to modularize it into MVC-style code base, which will help make it easier for other people to create plugins/addons for and overall work on and that's when the Git will be born.

/* Happy Coding */

[TEST POST]
Test

test

Code Stress Test

This is a test post.

Photos and Videos added by various methods:


This post is being edited to include tags, and change the category. Edit Number 2, change tags. Edit 3 change title. Edit 4 attempt to remove tags. Edit 5, add tags back. Edit 6 change a non-defined category. Edit 7, new category handling code. Code checks out.

UPDATE:

Today's developments include: new post rendering code, new post and edit post backend updated to reduce bugs and work effectively including image handling, image uploads, title, category, tags and content changing. 

Up on the to-do list is add a galleries feature to the photography page, adding functionality to the notifications icon on the homepage so you can be alerted to new posts by turning on notifications, and more.

Functionality test after 1000+ lines of code added

This is a test. This post will be edited.


I would really like the photos to be cached when they are in a post, and when clicked on bring up the Image popup, but thats a project for another day.

There is much left to add to the images/albums part of the website. Stay Tuned.

New website: nickdodd.com Version 2023

I've left my various websites in neglect for the past few years, mostly due to not really needing them to be up to date, but with a large project I've got going for my web business right now I figured it was finally time to get back to some serious web development.

That's where nickdodd.com comes into play, since it has always been my testing grounds for web development, nearly every version of this website I've ever made has been entirely hard coded from the ground up with all of its features being products of my own development.

This new version of the site has been designed for over a year and mostly been a place holder, but I'm finally developing it out to be a full website again!

This new site will have photo galleries as a place to showcase some of my photography collection that I've been taking for over 15 years.

Additionally this website will a hub of information about my businesses and where you can learn more about them!

Stay tuned right here for more developments and have a great day!

UPDATE: August 2024

I have spent around 14 hours updating the entire website adding nearly 100% of features I want on the backend, including management of photos, posts, post categories, post tags, messaging, and analytics. This update added around 1000 lines of code to the codebase. 

There remains some work to be done on the front end user experience to get it to where I want it to be, which will include some updating of the photography page and adding functionality such as photo albums and more.