<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Llm on Mini Fish</title>
    <link>https://blog.minifish.org/tags/llm/</link>
    <description>Recent content in Llm on Mini Fish</description>
    <image>
      <title>Mini Fish</title>
      <url>https://blog.minifish.org/android-chrome-512x512.png</url>
      <link>https://blog.minifish.org/android-chrome-512x512.png</link>
    </image>
    <generator>Hugo -- 0.154.5</generator>
    <language>en-US</language>
    <copyright>Mini Fish 2014-present. Licensed under CC-BY-NC</copyright>
    <lastBuildDate>Wed, 27 Nov 2024 18:26:14 +0800</lastBuildDate>
    <atom:link href="https://blog.minifish.org/tags/llm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Exploring Local LLMs with Ollama: My Journey and Practices</title>
      <link>https://blog.minifish.org/posts/exploring-local-llms-with-ollama-my-journey-and-practices/</link>
      <pubDate>Wed, 27 Nov 2024 18:26:14 +0800</pubDate>
      <guid>https://blog.minifish.org/posts/exploring-local-llms-with-ollama-my-journey-and-practices/</guid>
      <description>&lt;p&gt;Local Large Language Models (LLMs) have been gaining traction as developers and enthusiasts seek more control over their AI tools without relying solely on cloud-based solutions. In this blog post, I&amp;rsquo;ll share my experiences with &lt;strong&gt;Ollama&lt;/strong&gt;, a remarkable tool for running local LLMs, along with other tools like &lt;strong&gt;llamaindex&lt;/strong&gt; and &lt;strong&gt;Candle&lt;/strong&gt;. I&amp;rsquo;ll also discuss various user interfaces (UI) that enhance the local LLM experience.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;table-of-contents&#34;&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#introduction-to-ollama&#34;&gt;Introduction to Ollama&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#a-popular-choice&#34;&gt;A Popular Choice&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#ease-of-use&#34;&gt;Ease of Use&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#built-with-golang&#34;&gt;Built with Golang&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#my-practices-with-ollama&#34;&gt;My Practices with Ollama&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#preferred-models&#34;&gt;Preferred Models&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#llama-31&#34;&gt;Llama 3.1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#mistral&#34;&gt;Mistral&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#phi-3&#34;&gt;Phi-3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#qwen-2&#34;&gt;Qwen-2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#hardware-constraints&#34;&gt;Hardware Constraints&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#exploring-uis-for-ollama&#34;&gt;Exploring UIs for Ollama&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#openwebui&#34;&gt;OpenWebUI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#page-assist&#34;&gt;Page Assist&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#enchanted&#34;&gt;Enchanted&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#anythingllm&#34;&gt;AnythingLLM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#dify&#34;&gt;Dify&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#diving-into-llamaindex&#34;&gt;Diving into llamaindex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#experimenting-with-candle&#34;&gt;Experimenting with Candle&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#conclusion&#34;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;introduction-to-ollama&#34;&gt;Introduction to Ollama&lt;/h2&gt;
&lt;h3 id=&#34;a-popular-choice&#34;&gt;A Popular Choice&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/jmorganca/ollama&#34;&gt;Ollama&lt;/a&gt; has rapidly become a favorite among developers interested in local LLMs. Within a year, it has garnered significant attention on GitHub, reflecting its growing user base and community support.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>Local Large Language Models (LLMs) have been gaining traction as developers and enthusiasts seek more control over their AI tools without relying solely on cloud-based solutions. In this blog post, I&rsquo;ll share my experiences with <strong>Ollama</strong>, a remarkable tool for running local LLMs, along with other tools like <strong>llamaindex</strong> and <strong>Candle</strong>. I&rsquo;ll also discuss various user interfaces (UI) that enhance the local LLM experience.</p>
<hr>
<h2 id="table-of-contents">Table of Contents</h2>
<ul>
<li><a href="#introduction-to-ollama">Introduction to Ollama</a>
<ul>
<li><a href="#a-popular-choice">A Popular Choice</a></li>
<li><a href="#ease-of-use">Ease of Use</a></li>
<li><a href="#built-with-golang">Built with Golang</a></li>
</ul>
</li>
<li><a href="#my-practices-with-ollama">My Practices with Ollama</a>
<ul>
<li><a href="#preferred-models">Preferred Models</a>
<ul>
<li><a href="#llama-31">Llama 3.1</a></li>
<li><a href="#mistral">Mistral</a></li>
<li><a href="#phi-3">Phi-3</a></li>
<li><a href="#qwen-2">Qwen-2</a></li>
</ul>
</li>
<li><a href="#hardware-constraints">Hardware Constraints</a></li>
</ul>
</li>
<li><a href="#exploring-uis-for-ollama">Exploring UIs for Ollama</a>
<ul>
<li><a href="#openwebui">OpenWebUI</a></li>
<li><a href="#page-assist">Page Assist</a></li>
<li><a href="#enchanted">Enchanted</a></li>
<li><a href="#anythingllm">AnythingLLM</a></li>
<li><a href="#dify">Dify</a></li>
</ul>
</li>
<li><a href="#diving-into-llamaindex">Diving into llamaindex</a></li>
<li><a href="#experimenting-with-candle">Experimenting with Candle</a></li>
<li><a href="#conclusion">Conclusion</a></li>
</ul>
<hr>
<h2 id="introduction-to-ollama">Introduction to Ollama</h2>
<h3 id="a-popular-choice">A Popular Choice</h3>
<p><a href="https://github.com/jmorganca/ollama">Ollama</a> has rapidly become a favorite among developers interested in local LLMs. Within a year, it has garnered significant attention on GitHub, reflecting its growing user base and community support.</p>
<h3 id="ease-of-use">Ease of Use</h3>
<p>One of Ollama&rsquo;s standout features is its simplicity. It&rsquo;s as easy to use as Docker, making it accessible even to those who may not be deeply familiar with machine learning frameworks. The straightforward command-line interface allows users to download and run models with minimal setup.</p>
<h3 id="built-with-golang">Built with Golang</h3>
<p>Ollama is written in <strong>Golang</strong>, ensuring performance and efficiency. Golang&rsquo;s concurrency features contribute to Ollama&rsquo;s ability to handle tasks effectively, which is crucial when working with resource-intensive LLMs.</p>
<h2 id="my-practices-with-ollama">My Practices with Ollama</h2>
<h3 id="preferred-models">Preferred Models</h3>
<h4 id="llama-31">Llama 3.1</h4>
<p>I&rsquo;ve found that <strong>Llama 3.1</strong> works exceptionally well with Ollama. It&rsquo;s my go-to choice due to its performance and compatibility.</p>
<h4 id="mistral">Mistral</h4>
<p>While <strong>Mistral</strong> also performs well, it hasn&rsquo;t gained as much popularity as Llama. Nevertheless, it&rsquo;s a solid option worth exploring.</p>
<h4 id="phi-3">Phi-3</h4>
<p>Developed by Microsoft, <strong>Phi-3</strong> is both fast and efficient. The 2B parameter model strikes a balance between size and performance, making it one of the best small-sized LLMs available.</p>
<h4 id="qwen-2">Qwen-2</h4>
<p>Despite its impressive benchmarks, <strong>Qwen-2</strong> didn&rsquo;t meet my expectations in practice. It might work well in certain contexts, but it didn&rsquo;t suit my specific needs.</p>
<h3 id="hardware-constraints">Hardware Constraints</h3>
<p>Running large models on hardware with limited resources can be challenging. On my 16GB MacBook, models around <strong>7B to 8B parameters</strong> are the upper limit. Attempting to run larger models results in performance issues.</p>
<h2 id="exploring-uis-for-ollama">Exploring UIs for Ollama</h2>
<p>Enhancing the user experience with UIs can make interacting with local LLMs more intuitive. Here&rsquo;s a look at some UIs I&rsquo;ve tried:</p>
<h3 id="openwebui">OpenWebUI</h3>
<p><a href="https://github.com/OpenWebUI/OpenWebUI">OpenWebUI</a> offers a smooth and user-friendly interface similar to Ollama&rsquo;s default UI. It requires Docker to run efficiently, which might be a barrier for some users.</p>
<ul>
<li><strong>Features</strong>:
<ul>
<li>Basic Retrieval-Augmented Generation (RAG) capabilities.</li>
<li>Connection to OpenAI APIs.</li>
</ul>
</li>
</ul>
<h3 id="page-assist">Page Assist</h3>
<p><a href="https://chrome.google.com/webstore/detail/page-assist/">Page Assist</a> is a Chrome extension that I&rsquo;ve chosen for its simplicity and convenience.</p>
<ul>
<li><strong>Advantages</strong>:
<ul>
<li>No requirement for Docker.</li>
<li>Accesses the current browser page as input, enabling context-aware interactions.</li>
</ul>
</li>
</ul>
<h3 id="enchanted">Enchanted</h3>
<p><a href="https://apps.apple.com/app/enchanted-ai-assistant/id">Enchanted</a> is unique as it provides an iOS UI for local LLMs with support for Ollama.</p>
<ul>
<li><strong>Usage</strong>:
<ul>
<li>By using <strong>Tailscale</strong>, I can connect it to Ollama running on my MacBook.</li>
<li>Serves as an alternative to Apple’s native intelligence features.</li>
</ul>
</li>
</ul>
<h3 id="anythingllm">AnythingLLM</h3>
<p><a href="https://github.com/Mintplex-Labs/anything-llm">AnythingLLM</a> offers enhanced RAG capabilities. However, in my experience, it hasn&rsquo;t performed consistently well enough for regular use.</p>
<h3 id="dify">Dify</h3>
<p><a href="https://github.com/langgenius/dify">Dify</a> is a powerful and feature-rich option.</p>
<ul>
<li><strong>Pros</strong>:
<ul>
<li>Easy to set up with an extensive feature set.</li>
</ul>
</li>
<li><strong>Cons</strong>:
<ul>
<li>Resource-intensive, requiring Docker and running multiple containers like Redis and PostgreSQL.</li>
</ul>
</li>
</ul>
<h2 id="diving-into-llamaindex">Diving into llamaindex</h2>
<p><a href="https://github.com/jerryjliu/llama_index">llamaindex</a> is geared towards developers who are comfortable writing code. While it offers robust functionalities, it does have a learning curve.</p>
<ul>
<li><strong>Observations</strong>:
<ul>
<li>Documentation is somewhat limited, often necessitating diving into the source code.</li>
<li>The <code>llamaindex-cli</code> tool aims to simplify getting started but isn&rsquo;t entirely stable.
<ul>
<li>Works seamlessly with OpenAI.</li>
<li>Requires code modifications to function with Ollama.</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2 id="experimenting-with-candle">Experimenting with Candle</h2>
<p><strong>Candle</strong> is an intriguing project written in <strong>Rust</strong>.</p>
<ul>
<li>
<p><strong>Features</strong>:</p>
<ul>
<li>Uses <a href="https://huggingface.co/">Hugging Face</a> to download models.</li>
<li>Simple to run but exhibits slower performance compared to Ollama.</li>
</ul>
</li>
<li>
<p><strong>Additional Tools</strong>:</p>
<ul>
<li><strong>Cake</strong>: A distributed solution based on Candle, <strong>Cake</strong> opens up possibilities for scaling and extending use cases.</li>
</ul>
</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>Exploring local LLMs has been an exciting journey filled with learning and experimentation. Tools like Ollama, llamaindex, and Candle offer various pathways to harnessing the power of LLMs on personal hardware. While there are challenges, especially with hardware limitations and setup complexities, the control and privacy afforded by local models make the effort worthwhile.</p>
<hr>
<p><em>Feel free to share your experiences or ask questions in the comments below!</em></p>
]]></content:encoded>
    </item>
  </channel>
</rss>
