Problems with UTF-8

Started by JohnClaw, April 28, 2024, 09:37:07 PM

Previous topic - Next topic

MrBcx

I'm attaching two screenshots.

Success.png is using my original chatbot and your Dllama.dll from several days ago.
It took just over 2-1/2 minutes to complete the processing successfully.


Fail.png is using today's chatbot modified to use your new system
It ran for over 5 minutes and eventually crashed.

No more time to experiment right now - I've gotta run.

tinyBigGAMES

Are you running with GPU enabled? It's gonna be much, much slower if running in CPU mode. It should use your GPU if you have good Vulkan drivers. I run using the same query in streaming mode so I can see if there were issues and it seemed to work fine for me. I do have GPU enabled so it much faster, but seems to be working.


https://www.youtube.com/watch?v=oqKuay3NfMs
Jarrod Davis
tinyBigGAMES LLC

Projects - Dllama - Local LLM Inference

MrBcx

Quote from: tinyBigGAMES on May 06, 2024, 01:07:31 PM
Are you running with GPU enabled? It's gonna be much, much slower if running in CPU mode. It should use your GPU if you have good Vulkan drivers. I run using the same query in streaming mode so I can see if there were issues and it seemed to work fine for me. I do have GPU enabled so it much faster, but seems to be working.


https://www.youtube.com/watch?v=oqKuay3NfMs

No GPU installed.  My two earlier screenshots used CPU processing only.

One crashed, one did not.



MrBcx

#63
Jarrod,

I downloaded your latest package and recompiled using Pelles C and MSVC.

Setting the GPU flag to zero in config.json and running my test prompt:

List the countries with provinces

your latest Dllama.dll successfully returned a result. 

What's strange is that the result did not have as much detail in it as the previous version - specifically,
the countries were provided but the names of the various provinces were not.  The processing time
( GPU = 0 ) averaged 75 seconds.

The Dllama.dll from a couple days ago would average 2 minutes but gave a much more in-depth response.

I have attached a screenshot using the latest Dllama.dll

Success.png found in my earlier post shows the more in-depth response from your earlier Dllama.dll

ref:  https://bcxbasiccoders.com/smf/index.php?topic=1034.msg5388#msg5388


And to be clear, Version1 and Version2 of my chatbots are using the same LLM.



tinyBigGAMES

Setting gpu_layers to zero, will cause it to use the CPU only, setting it to < 0 will cause it to use the max layers on your GPU (for example, it will be 33 for my RTX 3060).

The new code base is setup to generate concise and predictable responses, paving the way for future function calling and agent support. For function calling to work, the model needs to be deterministic and better follow instructions. With this, agents can be implemented to work. Function calling + agents is planned in a future release.

It will use Vulkan backend for acceleration. It will be slower overall than CUDA, but requires no other DLLs in the distro (CUDA requires almost 800MB consisting of 3 DLLs that will not be on the end user's machine unless they have CUDA dev kit installed for example, thus this mean 99% of the end user will not. Vulkan gives me the widest range of GPU support with no additional dependencies other than vulkan-1.dll, which if your GPU supports Vulkan, will always be on the end user's machine.

If you try to init for the GPU and nothing is found, it should default to CPU. If you do have a GPU and all its resources are gone (stuff running in the background using the GPU, like the modern browsers) will degrade performance and may interfere with operations and possible cause a crash. The more feedback I get in these areas I can better assess how to handle these situations.
Jarrod Davis
tinyBigGAMES LLC

Projects - Dllama - Local LLM Inference

MrBcx

#65
Here's another interesting discovery.  Using Dllama.dll from several days ago and setting GPU to non-zero
results in yet another version of the, "list the countries with provinces" prompt.  My Intel 630 integrated
graphics is Vulcan driver compatible.  Indeed, my chatbot used my integrated Intel 630 GPU to produce the
following in 52 seconds using the outdated Dllama.dll and rarely using more than 5% CPU. 

Again, I'm not fixated on the speed, rather the varying results when using CPU vs GPU. 

The following was produced using my GPU...

The countries with provinces are:

1. Canada

2. Argentina

3. Brazil

4. China

5. India

6. Indonesia

7. Italy

8. Japan

9. Pakistan

10. Russia

11. Spain

12. United States


Please note that the number of provinces can vary within each country, and not all countries listed will have provinces.
For example, the United States



The following was produced using my CPU


While most countries are not divided into provinces, there are several nations that have administrative divisions known as provinces. Here is a list
of countries that have provinces:

1. Canada - 10 provinces (Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island,
Quebec, and Saskatchewan)
2. China - 23 provinces (Anhui, Fujian, Gansu, Guangdong, Guizhou, Hainan, Hebei, Heilongjiang, Henan, Hubei, Hunan, Jiangsu, Jiangxi, Jilin,
Liaoning, Shaanxi, Shandong, Shanxi, Sichuan, Yunnan, Zhejiang)
3. India - 28 states and 8 Union territories (not all states are provinces, but they are administrative divisions)
4. Pakistan - 4 provinces (Balochistan, Khyber Pakhtunkhwa, Punjab, and Sindh)
5. Brazil - 26 states (not provinces, but administrative divisions)
6. Argentina - 23 provinces (not provinces, but administrative divisions)
7. United Kingdom - 4 countries (England, Northern Ireland, Scotland, and Wales) - while not provinces, they are administrative divisions
8. Spain - 17 autonomous communities (not provinces, but administrative divisions)
9. Italy - 20 regions (not provinces, but administrative divisions)
10. Indonesia - 34 provinces

Please note that the term "province" can have different meanings in different countries, and some countries may not have provinces but have other administrative divisions. The list above includes countries with provinces or similar administrative divisions.


tinyBigGAMES

You may be interested in my new inference library, LMEngine.  It's a robust and lightweight LLM inference library designed for optimal performance. Utilizing a Vulkan backend for acceleration, it supports a wider range of GPUs, ensuring broad compatibility. The library maintains a minimal overhead of approximately 2MB.

The API is highly flexible, enabling inference execution through the Inference_Run function, which completes the process and returns the results. Users can define callbacks for customization or implement their own inference loop for finer control.

LMEngine provides out-of-the-box bindings for Pascal and C/C++, featuring a straightforward procedural API that facilitates the creation of bindings for additional languages.

https://github.com/tinyBigGAMES/LMEngine
Jarrod Davis
tinyBigGAMES LLC

Projects - Dllama - Local LLM Inference


tinyBigGAMES

Hi, I'm trying make bcx wrapper for LMEngine, wondering if anyone can vet LMEngine.bcx to see if its ok and what changes are needed.


Thanks.


Jarrod

Jarrod Davis
tinyBigGAMES LLC

Projects - Dllama - Local LLM Inference