Close Menu
  • Home
  • AI
  • Big Data
  • Cloud Computing
  • iOS Development
  • IoT
  • IT/ Cybersecurity
  • Tech
    • Nanotechnology
    • Green Technology
    • Apple
    • Software Development
    • Software Engineering

Subscribe to Updates

Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

    What's Hot

    Working with @Generable and @Guide in Foundation Models

    July 18, 2025

    Navigating the labyrinth of forks

    July 18, 2025

    OpenAI unveils ‘ChatGPT agent’ that gives ChatGPT its own computer to autonomously use your email and web apps, download and create files for you

    July 18, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Big Tee Tech Hub
    • Home
    • AI
    • Big Data
    • Cloud Computing
    • iOS Development
    • IoT
    • IT/ Cybersecurity
    • Tech
      • Nanotechnology
      • Green Technology
      • Apple
      • Software Development
      • Software Engineering
    Big Tee Tech Hub
    Home»Artificial Intelligence»Introducing mall for R…and Python
    Artificial Intelligence

    Introducing mall for R…and Python

    big tee tech hubBy big tee tech hubMarch 18, 2025006 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email Telegram WhatsApp
    Follow Us
    Google News Flipboard
    Introducing mall for R…and Python
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    The beginning

    A few months ago, while working on the Databricks with R workshop, I came
    across some of their custom SQL functions. These particular functions are
    prefixed with “ai_”, and they run NLP with a simple SQL call:

    > SELECT ai_analyze_sentiment('I am happy');
      positive
    
    > SELECT ai_analyze_sentiment('I am sad');
      negative

    This was a revelation to me. It showcased a new way to use
    LLMs in our daily work as analysts. To-date, I had primarily employed LLMs
    for code completion and development tasks. However, this new approach
    focuses on using LLMs directly against our data instead.

    My first reaction was to try and access the custom functions via R. With
    dbplyr we can access SQL functions
    in R, and it was great to see them work:

    orders |>
      mutate(
        sentiment = ai_analyze_sentiment(o_comment)
      )
    #> # Source:   SQL [6 x 2]
    #>   o_comment                   sentiment
    #>                               
    #> 1 ", pending theodolites …    neutral  
    #> 2 "uriously special foxes …   neutral  
    #> 3 "sleep. courts after the …  neutral  
    #> 4 "ess foxes may sleep …      neutral  
    #> 5 "ts wake blithely unusual … mixed    
    #> 6 "hins sleep. fluffily …     neutral

    One downside of this integration is that even though accessible through R, we
    require a live connection to Databricks in order to utilize an LLM in this
    manner, thereby limiting the number of people who can benefit from it.

    According to their documentation, Databricks is leveraging the Llama 3.1 70B
    model. While this is a highly effective Large Language Model, its enormous size
    poses a significant challenge for most users’ machines, making it impractical
    to run on standard hardware.

    Reaching viability

    LLM development has been accelerating at a rapid pace. Initially, only online
    Large Language Models (LLMs) were viable for daily use. This sparked concerns among
    companies hesitant to share their data externally. Moreover, the cost of using
    LLMs online can be substantial, per-token charges can add up quickly.

    The ideal solution would be to integrate an LLM into our own systems, requiring
    three essential components:

    1. A model that can fit comfortably in memory
    2. A model that achieves sufficient accuracy for NLP tasks
    3. An intuitive interface between the model and the user’s laptop

    In the past year, having all three of these elements was nearly impossible.
    Models capable of fitting in-memory were either inaccurate or excessively slow.
    However, recent advancements, such as
    Llama from Meta
    and cross-platform interaction engines like Ollama, have
    made it feasible to deploy these models, offering a promising solution for
    companies looking to integrate LLMs into their workflows.

    The project

    This project started as an exploration, driven by my interest in leveraging a
    “general-purpose” LLM to produce results comparable to those from Databricks AI
    functions. The primary challenge was determining how much setup and preparation
    would be required for such a model to deliver reliable and consistent results.

    Without access to a design document or open-source code, I relied solely on the
    LLM’s output as a testing ground. This presented several obstacles, including
    the numerous options available for fine-tuning the model. Even within prompt
    engineering, the possibilities are vast. To ensure the model was not too
    specialized or focused on a specific subject or outcome, I needed to strike a
    delicate balance between accuracy and generality.

    Fortunately, after conducting extensive testing, I discovered that a simple
    “one-shot” prompt yielded the best results. By “best,” I mean that the answers
    were both accurate for a given row and consistent across multiple rows.
    Consistency was crucial, as it meant providing answers that were one of the
    specified options (positive, negative, or neutral), without any additional
    explanations.

    The following is an example of a prompt that worked reliably against
    Llama 3.2:

    >>> You are a helpful sentiment engine. Return only one of the 
    ... following answers: positive, negative, neutral. No capitalization. 
    ... No explanations. The answer is based on the following text: 
    ... I am happy
    positive

    As a side note, my attempts to submit multiple rows at once proved unsuccessful.
    In fact, I spent a significant amount of time exploring different approaches,
    such as submitting 10 or 2 rows simultaneously, formatting them in JSON or
    CSV formats. The results were often inconsistent, and it didn’t seem to accelerate
    the process enough to be worth the effort.

    Once I became comfortable with the approach, the next step was wrapping the
    functionality within an R package.

    The approach

    One of my goals was to make the mall package as “ergonomic” as possible. In
    other words, I wanted to ensure that using the package in R and Python
    integrates seamlessly with how data analysts use their preferred language on a
    daily basis.

    For R, this was relatively straightforward. I simply needed to verify that the
    functions worked well with pipes (%>% and |>) and could be easily
    incorporated into packages like those in the tidyverse:

    reviews |> 
      llm_sentiment(review) |> 
      filter(.sentiment == "positive") |> 
      select(review) 
    #>                                                               review
    #> 1 This has been the best TV I've ever used. Great screen, and sound.

    However, for Python, being a non-native language for me, meant that I had to adapt my
    thinking about data manipulation. Specifically, I learned that in Python,
    objects (like pandas DataFrames) “contain” transformation functions by design.

    This insight led me to investigate if the Pandas API allows for extensions,
    and fortunately, it did! After exploring the possibilities, I decided to start
    with Polar, which allowed me to extend its API by creating a new namespace.
    This simple addition enabled users to easily access the necessary functions:

    >>> import polars as pl
    >>> import mall
    >>> df = pl.DataFrame(dict(x = ["I am happy", "I am sad"]))
    >>> df.llm.sentiment("x")
    shape: (2, 2)
    ┌────────────┬───────────┐
    │ x          ┆ sentiment │
    │ ---        ┆ ---       │
    │ str        ┆ str       │
    ╞════════════╪═══════════╡
    │ I am happy ┆ positive  │
    │ I am sad   ┆ negative  │
    └────────────┴───────────┘

    By keeping all the new functions within the llm namespace, it becomes very easy
    for users to find and utilize the ones they need:

    llm namespace

    What’s next

    I think it will be easier to know what is to come for mall once the community
    uses it and provides feedback. I anticipate that adding more LLM back ends will
    be the main request. The other possible enhancement will be when new updated
    models are available, then the prompts may need to be updated for that given
    model. I experienced this going from LLama 3.1 to Llama 3.2. There was a need
    to tweak one of the prompts. The package is structured in a way the future
    tweaks like that will be additions to the package, and not replacements to the
    prompts, so as to retains backwards compatibility.

    This is the first time I write an article about the history and structure of a
    project. This particular effort was so unique because of the R + Python, and the
    LLM aspects of it, that I figured it is worth sharing.

    If you wish to learn more about mall, feel free to visit its official site:

    Enjoy this blog? Get notified of new posts by email:

    Posts also available at r-bloggers



    Source link

    Introducing mall Python R...and
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    tonirufai
    big tee tech hub
    • Website

    Related Posts

    Big milestone for the future of quantum computing.

    July 18, 2025

    This “smart coach” helps LLMs switch between text and code | MIT News

    July 17, 2025

    Introducing the Insider Incident Data Exchange Standard (IIDES)

    July 17, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Editors Picks

    Working with @Generable and @Guide in Foundation Models

    July 18, 2025

    Navigating the labyrinth of forks

    July 18, 2025

    OpenAI unveils ‘ChatGPT agent’ that gives ChatGPT its own computer to autonomously use your email and web apps, download and create files for you

    July 18, 2025

    Big milestone for the future of quantum computing.

    July 18, 2025
    Advertisement
    About Us
    About Us

    Welcome To big tee tech hub. Big tee tech hub is a Professional seo tools Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of seo tools, with a focus on dependability and tools. We’re working to turn our passion for seo tools into a booming online website. We hope you enjoy our seo tools as much as we enjoy offering them to you.

    Don't Miss!

    Working with @Generable and @Guide in Foundation Models

    July 18, 2025

    Navigating the labyrinth of forks

    July 18, 2025

    Subscribe to Updates

    Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

      • About Us
      • Contact Us
      • Disclaimer
      • Privacy Policy
      • Terms and Conditions
      © 2025 bigteetechhub.All Right Reserved

      Type above and press Enter to search. Press Esc to cancel.