The Future Is Here - Practical Uses of AI Large Language Models

Where I describe my dabblings in the new Ai technology...

Jun 17, 2024

OK, you’ve heard nothing from me except Substack Notes for the past six weeks, time to rev up.

Note: this post is likely too long to be read completely in your email. If so, click the link that opens the post on the Web or click the text that says “Read Online” or “View in Browser.” In Gmail, you can also tell Gmail to view the entire link down at the bottom where it says :”[Message clipped] View Entire Message.”

This blog is entitled “The Five Essentials”, which are of course:

1) Philosophy

2) Attitude

3) Knowledge

4) Skills

5) Technology.

Over the past year, since the advent of Large Language Models (LLMs) became common public knowledge, I’ve been dabbling in using them off and on, mostly online. I have six links to the following AI chatbots on my Firefox browser bar: ChatGPT, Perplexity, Gemini, Claude, HuggingChat and DuckDuckGo’s new AI Chat feature.

But while useful for answering general questions, I am more interested in things like having AI work with my own files and documents, especially my notes in the Obsidian note-taking app. In fact, I should do a review of Obsidian at some point because of its power and the possibilities in using LLMs in conjunction with it.

So this post is about my explorations into the “Five Essentials” of using AI effectively. It’s a bit rambling, but bear with me. This blog is about practical matters and we’ll get to some practical recommendations along the way.

First, let’s get some philosophical points out of the way.

“AI is gonna take over the world!” No, it’s not. An AI – at least using current technology - has zero motivation to take over the world because it has no capacity for aggression. It has no biological processes, no emotions, no nervous system, no biochemistry to generate emotional reactions such as fear – which is the basis of all aggression. Simply because these things are not human, they’re highly unlikely to anything to threaten humans, except possibly by accident if they’re given control of actual physical objects (like an Air Force F-16 - which those morons are actually doing.)

The only way an AI can take over the world is if some human tells it to do so as a task to be performed. Is that likely? Well, given that many countries, most notably the United States, are run by psychopaths with a desire for global hegemony, I wouldn’t put it past someone to try it. But you can’t blame the AI for that.

In addition, knowing how to use AIs and having an AI under your control might well give you the only option to avoid being enslaved by some neoconservative or populist scumbag (Biden or Trump - take your pick.) Remember, this blog is about surviving the future that’s coming. This is why I’m talking about AI here.

Now will AI take your job? Yes and no. If you don’t know how to use AI in your job, and your job is one that can be easily done by an AI, then you are likely to get displaced at some point. If, however, you know how to use AI to do your job, you may well have a chance to stay employed in some related field if not precisely in your present function. Check out Tina Huang’s Youtube videos on that topic below.

That’s assuming, of course, that you still want to be employed as a wage slave by some corporation. That’s on you. If you’re that dumb, you probably deserve to lose your job. Someone smarter might be thinking of how to use AI to generate income directly. There are some people out there who have cobbled together some Internet services using various AI tools and they’re making money. You might want to look into that instead of whining about losing your job. Again, check out of some of the AI influencers on Youtube that I reference below.

Next up is attitude. What should be your attitude towards AI? It should be one of treating it as just another technological tool that can be applied to your purpose. It’s like the difference between using a four function calculator versus a desktop or laptop computer. It’s like a computer, a gun or a car. It’s a tool. Nothing more or less.

And it’s a tool that should be used for serious purposes. I saw a video of the new Apple AI tech being inserted into iPhones and tablets and the like where someone was using generative AI to create a cat picture. Whoop-de-fucking-do! Yes, there is a place for AI that generates images and videos. If you’re a “content creator”, then that has value. I personally have done very little of that so far. But a lot of people seem obsessed with that stuff as opposed to using AI for learning, for creating ideas, and for implementing productive tasks. Focus on using it where it matters in your life, not (just) as a recreation.

Next up is knowledge: how does this stuff work? Well, this morning I spent several hours getting a somewhat more in-depth education on that direct from the source, as it were. I asked “Nous Hermes 2 Mistral 7B DPO” LLM several questions about the technologies and procedures involved in creating and using an LLM. This is some of what I learned.

First, it gave me a timeline of the development of AI since the 1960’s until today, as follows:

1960s: The Birth of AI - First AI program, Logic Theorist (1957), developed by Allen Newell and Herbert A. Simon - IBM's System/360 mainframe computer launched in 1964, enabling large-scale computing for AI research - ELIZA, a natural language processing program simulating a psychotherapist, created by Joseph Weizenbaum (1966)
1970s: Rule-Based Systems and Expert Systems - DENDRAL, an early expert system in chemistry using rule-based AI, developed by Ronald R. Book, Bruce B. Burchfiel, Edward A. Feigenbaum, and Allen Newell (1965–1970) - MYCIN, a medical expert system that used rules to diagnose infectious diseases, created by Edward A. Feigenbaum, Bruce B. Buhrman, and John E. Holloman Jr. (1972–1976)
1980s: Knowledge Representation and Machine Learning - Expert System Development Shell (ESDS), a tool for creating expert systems, released by XCON Corporation in 1983 - Backpropagation algorithm developed for neural networks, enabling machine learning (Rumelhart et al., 1986) - Lisp Machines, hardware designed specifically to run Lisp programming language, were introduced (Massachusetts Artificial Intelligence and Technology (MAIT), 1983)
1990s: Deep Blue Chess Program and Neural Network Advancements - IBM's Deep Blue chess computer defeated world chess champion Garry Kasparov in 1997 - Recurrent Neural Networks (RNNs), a type of neural network that can process sequential data, were developed by James Andre Jordan and John D. Cunningham (1986) - Convolutional Neural Networks (CNNs), used for image recognition tasks, were introduced by Kunihiko Fukushima in 1980
2000s: Big Data and Statistical Learning - Google's PageRank algorithm revolutionized search engines using graph theory and link analysis (Larry Page and Sergey Brin, 1998) - Support Vector Machines (SVMs), a popular machine learning algorithm for classification tasks, were developed by Vapnik in the late 20th century - IBM's Watson computer won Jeopardy! against human champions in 2011 using natural language processing and information retrieval techniques.
2010s: Deep Learning and AI Applications - AlexNet, a convolutional neural network, won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, demonstrating deep learning's power for image recognition - Google's AlphaGo defeated world champion Go player Lee Sedol in 2016 using reinforcement learning and deep neural networks - OpenAI's GPT-2 language model released in 2019, capable of generating human-like text with 77 billion parameters.
Today: AI Integration into Society - Large language models like OpenAI's GPT-3 (2020) and Google's Megatron-Turing NLG 530B (2021), pushing the boundaries of natural language processing - Autonomous vehicles, such as Tesla's self-driving cars, are being deployed in real-world scenarios - AI applications continue to expand across various industries, including healthcare, finance, and cybersecurity.

Next, I asked what are the enabling technologies that result in the LLM technology. It gave this list:

1. Natural Language Processing (NLP): A subfield of AI that focuses on enabling computers to understand and generate human language.
2. Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks: Sequence modeling techniques used for processing text data, allowing the model to remember past inputs when generating outputs.
3. Word Embeddings: Techniques that represent words as numerical vectors in a high-dimensional space, capturing semantic relationships between words. Examples include Word2Vec and GloVe.
4. Transformer Networks: A type of neural network architecture introduced by Vaswani et al. (2017) for machine translation tasks, which has since become the standard in large language models like BERT, RoBERTa, and GPT-3.
5. Self-Attention Mechanism: A key component of Transformer Networks that allows the model to focus on relevant parts of input sequences when processing them.
6. Pretraining: Training a model on a massive corpus of text data without specific tasks in mind, allowing it to learn general language representations. Examples include unsupervised pretraining and masked language modeling (MLM).
7. Masked Language Modeling (MLM): A technique where some words within the input sequence are randomly masked, and the model is trained to predict their correct values based on surrounding context. This helps the model learn semantic relationships between words.
8. Fine-tuning: Adapting a pretrained language model for specific tasks by further training it with task-specific data and objectives.

Well, that was a word soup. So I asked for some clarification on some of the terms used and got this response:

1. Tokens: A unit of data processed by language models, which can represent words, punctuation marks, or special symbols. They are numerical representations that the model uses to understand input sequences.
2. Masked Language Modeling (MLM): A technique where some tokens within an input sequence are randomly masked and replaced with a [MASK] token. The model is trained to predict the correct original value of these masked tokens based on their context in the sentence, helping it learn semantic relationships between words.
3. Transformer Network: A type of neural network architecture introduced by Vaswani et al. (2017) for machine translation tasks. It consists of multiple layers that use self-attention mechanisms to process input sequences and capture long-range dependencies in the text data.
4. Self-Attention Mechanism: A key component of Transformer Networks that allows a model to focus on relevant parts of an input sequence when processing it, by assigning different weights (or attention scores) to each token based on their relevance to other tokens within the context.
5. Layer Normalization: A technique used in neural networks to normalize the outputs from different layers, helping stabilize and accelerate training by making the gradients smaller and less sensitive to initial parameter values.
6. Feedforward Neural Networks (FFNN): A type of neural network consisting of a series of fully connected layers with an activation function applied at each layer. They are used in Transformer Networks for further processing after self-attention mechanisms have been applied.
7. Greedy Search: An algorithmic technique where the model selects the most probable next token based on its learned probabilities, without considering alternative paths or exploring other options. This can lead to suboptimal outputs but is computationally efficient.
8. Sampling Techniques (Nucleus Sampling, Top-k Sampling): Methods used during generation in language models that help balance exploration and exploitation by generating output tokens based on probabilities of the model's predictions. Nucleus sampling stops generating when a certain cumulative probability threshold is reached, while top-k sampling selects the k most probable next tokens from the predicted distribution.
9. Postprocessing: The process of cleaning up or modifying generated text to make it more coherent and readable for human consumption. This can include removing special tokens added during preprocessing, correcting grammar or spelling errors, or reordering sentences for clarity.

All of this is enabled by the underlying technology called “neural networks”. This was explained as follows:

A Neural Network is a computational model inspired by the structure and function of the human brain, designed to learn patterns and relationships in data through a process called "training." It consists of interconnected nodes (neurons) organized into layers that transform input data using mathematical operations, such as weighted sums and activation functions.
Neural networks are particularly useful for tasks like classification, regression, and prediction, where they can learn complex patterns in large datasets by adjusting the weights between neurons during training. They excel at handling non-linear relationships and high-dimensional data, making them widely applicable across various domains such as computer vision, natural language processing, speech recognition, and robotics.
The main components of a neural network include input layers that receive raw data, hidden layers where the model learns representations through iterative computations, and output layers that produce final predictions or classifications based on learned patterns in the training data. Neural networks can be trained using various optimization algorithms like gradient descent to minimize an objective function (e.g., mean squared error) by adjusting the weights between neurons until the model achieves a satisfactory level of performance on its task.
Neural Networks create their decision-making models by utilizing a series of interconnected nodes (neurons) organized into layers that transform input data through mathematical operations such as weighted sums and activation functions.
Let's break down these terms:
1. Weighted Sum: A process where the inputs are multiplied by adjustable parameters called "weights" before being combined to produce a single value. These weights determine how much influence each input has on the final output, allowing the model to learn complex patterns and relationships in data. The weighted sum is calculated as follows:
Weighted Sum = Σ(Input * Weight)
2. Activation Function: A mathematical function applied to the weighted sum of inputs that introduces non-linearity into a neural network, enabling it to model more complex patterns and relationships in data. Common activation functions include Sigmoid, Tanh (Hyperbolic Tangent), ReLU (Rectified Linear Unit), and Softmax. Each function has its own characteristics and is suitable for different types of problems:
- Sigmoid and Tanh are smooth, continuous functions that can handle both positive and negative inputs but may suffer from vanishing gradients during training.
- ReLU is a piecewise linear function that is computationally efficient but can have issues with "dead" neurons (where the output remains zero) when trained on datasets with many zeros or negatives values.
- Softmax is an exponential function used primarily in multi-class classification problems, as it outputs probabilities for each class, allowing the model to make predictions about which class a given input belongs to.
3. Optimization Algorithms: These are methods that neural networks use to adjust their weights during training to minimize an objective function (e.g., mean squared error) and improve performance on its task. Some commonly used optimization algorithms include:
- Gradient Descent: A widely-used algorithm that iteratively updates the weights in a direction opposite to the gradient of the loss function with respect to the weights, thereby reducing the loss over time. It has variants like Stochastic Gradient Descent (SGD), Mini-Batch Gradient Descent, and Batch Gradient Descent based on how data is processed during each iteration.
- Adagrad: An adaptive learning rate method that adjusts the step size for updating weights based on historical gradients, making it suitable for sparse datasets or when working with different learning rates across dimensions.
- RMSProp (Root Mean Square Propagation): A variant of Gradient Descent that adapts the learning rate for each weight using a moving average of squared gradients, helping to accelerate convergence and mitigate issues related to vanishing or exploding gradients.
- Adam (Adaptive Moment Estimation): An adaptive learning rate optimization algorithm that combines the advantages of both Adagrad and RMSProp by estimating first and second moments (mean and variance) of gradients, allowing it to dynamically adjust the step size for each weight during training.
In a neural network, different types of layers serve distinct purposes and contribute to its overall functionality:
1. Input Layer: The first layer in a neural network that receives raw data or preprocessed input as its primary function. It does not perform any computations but simply passes the input along to the next layer(s) for further processing. For example, if you're training an image classification model, the input layer would receive the pixel values of an image.
2. Hidden Layers: These layers are sandwiched between the input and output layers and perform most of the computations in a neural network. They consist of neurons that apply mathematical operations (such as weighted sums and activation functions) to transform the input data into more abstract, higher-level representations. Each hidden layer can have multiple layers within it, allowing for increasingly complex feature extraction and learning. The number of hidden layers and their depth depend on the complexity of the problem being solved and the architecture chosen by the model designer.
3. Output Layer: This is the final layer in a neural network that produces the model's predictions or classifications based on learned patterns in the training data. Its primary function depends on the type of task the neural network is designed to solve, such as classification, regression, or prediction. For instance, if you are using a multi-class classification model like a softmax output layer would be used where each neuron corresponds to one class and outputs probabilities for that class.
In summary, input layers receive raw data, hidden layers learn representations through iterative computations, and output layers produce final predictions or classifications based on learned patterns in the training data. The interactions between these different types of layers enable a neural network to transform complex inputs into meaningful outputs by learning from the provided examples during the training process.

I needed clarification on some of those terms and this is what I got:

Hidden layers in a neural network produce more abstract representations through a series of transformations applied to the input data as it passes through multiple layers. This process is facilitated by three key aspects: weight sharing, non-linear activation functions, and pooling or striding operations (in convolutional networks). These features enable hidden layers to learn increasingly complex patterns in the data while reducing the dimensionality of the representation.
1. Weight Sharing: Each neuron in a hidden layer is connected to every input feature with its own set of weights. By sharing these weights across all inputs, the network can detect commonalities and differences among different parts of the input. This weight sharing allows the neural network to learn generalized patterns or features that are relevant for solving the task at hand.
2. Non-Linear Activation Functions: Hidden layers apply non-linear activation functions (e.g., ReLU, sigmoid, tanh) to the weighted sum of inputs before passing it on to the next layer. These activation functions introduce non-linearity into the network's computations, allowing it to learn complex patterns and relationships in the data that cannot be captured by linear operations alone.
3. Pooling or Striding Operations (in Convolutional Networks): In convolutional networks, pooling or striding operations are applied after applying activation functions on a set of neurons called a feature map. These operations reduce the spatial dimensions of the representation while retaining its most important features by taking maximum, average, or sum values within local regions (pooling) or moving across the input with a specific stride (striding).
These processes result in the extraction of increasingly abstract and task-relevant representations as data flows through multiple hidden layers. The extracted features can be low-level (e.g., edges, corners), mid-level (e.g., shapes, textures), or high-level (e.g., objects, scenes) depending on the complexity of the problem being solved and the architecture of the neural network. These learned representations are then used by the output layer to make predictions or classifications based on the patterns it has detected in the training data.

I still wasn’t clear on what “abstract representation” actually looked like, so I asked. It replied:

Abstract representations refer to high-level, conceptual summaries or simplified versions of raw input data that capture essential patterns, features, or relationships within the data. These abstract representations are typically more compact and easier to interpret than their original, raw counterparts. In a neural network context, they represent learned knowledge or insights from the training process that enable better predictions or classifications for new inputs.
For example, let's consider an image classification task where we have a convolutional neural network (CNN) with multiple hidden layers. The input to this CNN would be a raw pixel value matrix representing an image. As data flows through the hidden layers of the CNN, it undergoes several transformations and reductions in dimensionality while learning increasingly abstract representations.
Here's a simplified example using Python code to illustrate how these abstract representations might look like within a database:
Suppose we have an image dataset containing various types of fruits. Each row in the following table represents one fruit, with each column representing a specific feature or attribute learned by our CNN model during training:
| Image ID | Color_Feature1 | Texture_Feature2 | Shape_Feature3 | Fruit_Type |
|----------|----------------|------------------|----------------|------------|
| 0 | 0.7 | 0.4 | 0.9 | Apple |
| 1 | 0.5 | 0.8 | 0.2 | Orange |
| 2 | 0.3 | 0.6 | 0.7 | Banana |
| ... | ... | ... | ... | ... |
In this example, the abstract representations are represented by numerical values in columns such as "Color_Feature1," "Texture_Feature2," and "Shape_Feature3." These features capture essential characteristics of each fruit image that contribute to its classification. The model has learned these features through weight sharing, non-linear activation functions, and pooling or striding operations during training.
These abstract representations are then used by the output layer in our neural network to make predictions about the type of fruit (Fruit_Type) based on the patterns it has detected in the training data. This process allows the model to generalize well to new, unseen images and classify them accurately.

I then made my own interpretation of the above:

So in the example above, if a new image of a fruit is submitted to the model, it processes the image assigning it weights and then compares that to the weights it already knows, thus enabling it to classify the image. Is that roughly how it works?

It replied:

Yes, that's correct! When a new image is submitted to the trained neural network, the process occurs as follows:
1. Input Layer: The raw pixel values of the input image are fed into the input layer, which then passes them along to the first hidden layer without performing any computations itself.
2. Hidden Layers: As the data flows through the hidden layers, each neuron applies its learned weights (from previous training) to the input features and computes a weighted sum using activation functions like ReLU or sigmoid. These operations help extract abstract representations of the image by detecting patterns and relationships within the input data.
3. Output Layer: The final hidden layer's output serves as input for the output layer, which then applies its learned weights to these features to produce a probability distribution over the possible fruit classes (e.g., Apple, Orange, Banana). These probabilities represent the model's confidence in each class given the abstract representations it has extracted from the new image.
4. Classification: The output layer's highest probability corresponds to the predicted fruit type for the input image. This process allows the neural network to generalize well and make accurate predictions on unseen images based on the patterns it learned during training.
In summary, by assigning weights to each feature (based on what it has learned during training), the model compares these weighted features with its existing knowledge to classify new input images accurately.

So in essence an LLM is a program that can process a bunch of numbers in a “vector space” or “feature space”. “What is that?”, you ask? It explained that, too.

A vector space (also known as a Euclidean space or coordinate space) is a mathematical construct used to represent data points or objects using one or more dimensions. In this context, each dimension corresponds to an attribute or feature that can take on numerical values. The relationships between these dimensions are defined by the distances and angles between vectors in the vector space.
In machine learning and natural language processing (NLP), a vector space is often used as a representation of words, documents, images, or other data types.
These representations are called "embeddings" or "feature vectors." The dimensions in this context represent semantic or syntactic relationships between different elements within the dataset.
A vector database is a collection of such feature vectors stored in a structured format that allows for efficient querying and retrieval based on similarity measures, distances, or other criteria. Here are some examples:
1. Word Embeddings Databases: In NLP, word embeddings like Word2Vec (Google) or GloVe (Stanford) create vector representations of words in a high-dimensional space. These databases store these learned vectors for millions of words and phrases. For example, the Glove database contains 1.2 million unique words represented by 50-dimensional vectors.
2. Image Databases: In computer vision tasks like image classification or object detection, deep learning models learn feature representations (vectors) that capture relevant information about images. These databases store these learned features for a large number of images. For instance, the ImageNet database contains over 14 million images belonging to more than 20,000 categories and is used extensively in training convolutional neural networks (CNNs).
3. Audio Databases: In speech recognition or music analysis tasks, vector databases store learned representations of audio signals. For example, the Google Speech Commands dataset contains 65-dimensional vectors representing short audio clips of spoken words.
4. Time Series Databases: In time series forecasting and anomaly detection tasks, feature vectors are extracted from historical data to represent patterns or trends in the data. These databases store these learned representations for future use. For instance, the ECG5000 database contains 5000 electrocardiogram (ECG) signals represented by a set of features extracted using time-domain and frequency-domain analysis techniques.
In each case, vector databases enable efficient storage and retrieval of learned representations that can be used for various tasks like similarity search, clustering, or anomaly detection.

I then asked about what a vector space actually looks like within a computer program, say, one written in the Python programming language, which is commonly used in Ai programming. It cleared that up, too.

A vector database can be represented as a NumPy array (or ndarray) in Python. NumPy is a popular library used for scientific computing and data analysis in Python. It provides support for multi-dimensional arrays that are efficient to compute on and store in memory, making it suitable for representing vector databases.
Here's an example of how you might represent a simple word embedding database using NumPy:
python
import numpy as np
# Define the vectors (rows) for each word in your corpus
word_embeddings = np.array([[0.2, 0.3, -0.1], [-0.5, 0.8, -0.6], [0.7, -0.4,
0.9]])
In this example, we have a vector database with three words (or vectors) represented by three-dimensional embeddings. Each row in the NumPy array represents a word's embedding. You can extend this to represent more words or higher dimensionality as needed for your specific task.
To access and manipulate these vectors, you would use standard NumPy indexing and operations:
python
# Access a specific vector (word) from the database
vector = word_embeddings[1] # This is the second row in our example
Modify a specific element of a vector
word_embeddings[1][0] = -0.7 # Now, the first element of the second vector is
-0.7
You can also perform operations like dot products or Euclidean distances between vectors using NumPy's built-in functions:
python
Compute a dot product between two vectors
dot_product = np.dot(word_embeddings[1], word_embeddings[2]) # -0.45 in this
case
Calculate the Euclidean distance between two vectors
distance = np.linalg.norm(word_embeddings[1] - word_embeddings[2]) #
Approximately 1.639
These operations allow you to work with vector databases efficiently and perform various tasks like similarity search, clustering, or anomaly detection in your Python code.

There were some other terms in there I need clarity on, but I think this is enough to point out the mundane nature of an LLM. There’s nothing magical about it. It’s just the stacking of one mathematical technique on top of another and then spending a lot of money dumping a big chunk of the Internet and other data sources into the resulting programs running on very expensive hardware to store a bunch of numbers.

Speaking of hardware, the sort of hardware these things are created on is very expensive, because it involves racks of servers and especially very large and capable video graphics cards which use special circuits to efficiently handle the processing of all these floating point numbers. This technology was developed to efficiently handle large-scale graphics processing, which then found its way into the computer gaming community. But on these AI development platforms, the graphic processors don’t cost $250 or $500 or even $2,000 each – they cost more like $10-20,000 each.

However, once these AI models are initially “trained” on very large sets of data, some of them, trained on smaller amounts of data, can be modified and “fine-tuned” for specific purposes. While some of these models using 16-bit floating point numbers to function, the same models can be adjusted to use 8-bit, 4-bit or even 2-bit numbers in a process called “quantification” and “quantization”. This allows such models to run on computer systems with much less processing power and graphics processing power than those used to train the models initially.

And that leads to running LLM models locally on personal computers instead of servers. And that’s what I’m doing, along with many other people. That has also resulted in the big personal computer tech companies introducing new computers called “AI PCs”, which has special chips connected to the central processing units (CPU) which are optimized for handling the sort of mathematical operations used by AI models.

But since I and many people are not rich or have $100K jobs, we can’t afford to buy $2,000 graphics cards to run AI on our PCs. And we probably won’t afford the new “AI PCs” which from what I’ve read will be a bit better at running LLM models on them, but not be all that good at normal PC tasks and gaming compared to existing PCs.

But it’s still possible to run the smaller LLM models on a personal computer. I’m running four at the moment on my system. My system has a Ryzen 9 5950X CPU, which is the most powerful CPU available for the previous generation of the AMD AM4 motherboard platform. It also has 64GB of RAM. However, it only has an AMD Radeon RX 550 graphics card with a mere 2GB of video RAM on it. This is mostly useless for running LLMs. I’d like to upgrade that card to an Nvidia graphics card with 16GB of VRAM, but those are VERY expensive. AMD cards of equivalent size are less expensive but there may be complications with the software which I won’t go into here.

The four LLM models I’m running at the moment are:

Microsoft’s Phi-3-Mini-4K-Instruct-Q4-GGUF (the Q4 refers to 4-bit quantization as mentioned above.)
Microsoft’s Phi-3-medium-128k-instruct-Q4_K_M.
Nous-Hermes-23) -Mistral-7B-DPO.Q4_0.gguf
Meta Llama 3 8B Instruct

The links take you to HuggingFace which is a repository for models, AI programming libraries, datasets and related documentation.

I’m testing these using a variety of GUI utilities which act as front ends to the models as well as the Ollama command line utility. These allow me to use the models conveniently, enabling a ChatGPT-like interface so I can send prompts to the models, keeping records of my chats, and in some cases enabling me to embed my own documents into a vector database so the model can use its algorithms to access that data, enabling it to answer questions about the information in my own documents.

The front-end utilities I’m using include:

1) LMStudio

2) AnythingLLM

3) GPT4ALL

4) Jan

5) MSTY

And I expect to be testing other front ends as well.

This stuff can all be rather complicated to set up. The software industry in this space is at its infancy, and a lot of these tools must be considered “beta” products. Expect issues getting them to run or doing everything you’d like them to do. Nonetheless, they work well enough to be useful. All of these tools are “open source”, so they’re free and not supported by a big software company, but by an enthusiastic developer and user community.

The LLM models I run are generally between 4 gigabytes and 8 gigabytes in size. The larger the model, the slower it is likely to run. I think I can go up to about 13 gigabytes before it becomes too slow to run them. I would like to run the various 70B models – the B refers to the number of billions of parameters that were used to train the model – but they are probably out of my reach until I get a better video card.

There are ways to share the model’s components between the CPU and the GPU if the model and your GUI utility supports that. This can enable larger models to run on consumer hardware.

There are also even smaller models that can be as small as 600-700 megabytes in size. These models tend to be somewhat less capable, however, as they were trained on smaller sets of data. Their responses can be problematic in more cases than a larger model.

I’d also like to suggest some people you can follow on Youtube to keep up with this technology and its use cases.

The first and foremost is the delightful Tina Huang. Tina used to work for Meta and other tech companies in the US as a data scientist. Now she runs her Youtube channel and has a program to train people in data analysis. She travels around between Beijing and San Francisco and elsewhere. She produces videos on AI technology and personal learning methods, and has a weekly live stream. She’s great to listen to. I recommend these videos to get a flavor of her work:

Which Jobs Will Survive AI?

Why NO Job Is Safe From AI

How To NOT Fall Victim To AI Layoffs

How To Not Get Replaced By AI

Why You Can't Find A Job

Big Tech AI Is A Lie

There are tons of other AI “influencers” on Youtube. I follow quite a few, but these are the best:

AICodeKing – He points to mostly open source, free, locally run AI tools.

All About AI

ChatGPT Tutorials

Corbin Brown

Fahd Mirza – He posts six or more videos per day on new LLMs

Lucidate

ManuAGI - He mostly covers trending Ai projects on the Github development platform.

Matt Williams – a founding member of the Ollama organization, no longer with them.

Prompt Engineering

Prompt Engineer

Sam Witteveen

San Diego Machine Learning – If you want to get into the weeds...hour long videos on the tech.

The AI Advantage – Big emphasis on use cases for AI, both business and personal.

TheAIGRID – Mostly AI news, not a lot of practical application

Venelin Valkov

Matt Wolfe – One of my favorites for AI news; also has a “Future Tools” newsletter you can get.

Well, that’s more than enough for now. I’m sure I will be discussing this further as I continue my exploration of the potential of personal AI. Look into it and see what the LLM technology can do for you.

The Five Essentials

The Future Is Here - Practical Uses of AI Large Language Models

Where I describe my dabblings in the new Ai technology...

Discussion about this post