[B! LLM] arrowKatoのブックマーク

arrowKato id:arrowKato

LLMに関するarrowKatoのブックマーク (96)

GPT-4o vs. GPT-4 vs. Gemini 1.5 ⭐ — Performance Analysis
arrowKato 2024/06/04
ベンチマークの比較

LLM

GPT-4o

Gemini1.5Pro

Claude3 Opus
リンク
Applied LLMs - What We’ve Learned From A Year of Building with LLMs
A practical guide to building successful LLM products, covering the tactical, operational, and strategic. Also published on O’Reilly Media in three parts: Tactical, Operational, Strategic. Also see podcast. It’s an exciting time to build with large language models (LLMs). Over the past year, LLMs have become “good enough” for real-world applications. And they’re getting better and cheaper every ye
arrowKato 2024/06/03
包括的な良記事。例えば、1,000 万トークンのコンテキストウィンドウがあっても、モデルに入力する情報を選択する必要あり。大規模なコンテキストを効果的に推論できるという説得力のあるデータはまだない。

LLM

RAG
リンク
大規模言語モデルの開発
2024年度人工知能学会全国大会（第38回）チュートリアル講演１本講演では、大規模言語モデルの開発に必要な基礎および最新動向を概観する。その後、東京工業大学情報理工学院の岡崎研究室と横田研究室、産業技術総合研究所の研究チームで開発された大規模言語モデルSwallowの開発経験を踏まえ、学習データの構築、モデルの学習や評価などを説明し、日本語に強い大規模言語モデルの現状や課題を議論したい。
arrowKato 2024/06/03
LLMの開発を俯瞰できる資料。DPOやらnejumi learder boardまでも含む

LLM
リンク
rinna/llama-3-youko-8b · Hugging Face
","eos_token":"<|end_of_text|>"}},"createdAt":"2024-05-01T07:53:45.000Z","discussionsDisabled":false,"downloads":1359,"downloadsAllTime":1359,"id":"rinna/llama-3-youko-8b","isLikedByUser":false,"isWatchedByUser":false,"inference":"ExplicitOptOut","lastModified":"2024-05-07T01:59:47.000Z","likes":28,"pipeline_tag":"text-generation","library_name":"transf ormers","librariesOther":[],"model-index":nul
arrowKato 2024/05/31
Llama3 の日本語で継続事前学習

LLM
リンク
Foundation Model Transparency Index
arrowKato 2024/05/27
面白くはあるけれど、Closedなモデルもだしている企業と、Openなモデルのみの企業とでは差が出るのは当たり前では？とは思う。

LLM
リンク
GPT-4oとStreamlitでOpenAI Assistants APIのCode Interpreterを検証した現状と課題
はじめに OpenAIのAssistants APIをそのまま使用することで、自前でLangChainのエージェントなどを使用して同様の処理を実装する手間を省け、非常に便利です。ただ、現状（2024/05/18）ではまだβ版ということもあり、APIのインタフェースの改変も多く見られます。 Assitants APIを用いたcode-interpreterのUIをstreamlitで実装においても、実装例が紹介されていますが、そのままでは動作しないこともあり、最新版での動作検証も兼ねてStreamlitでの実装例を紹介します。また、本記事ではStreaming対応済みの実装を取り入れており、よりリアルタイムな対話が可能となっています。扱っているモデルは2024/05/14に発表されたGPT-4oを用いています。目次はじめに実装例 app.py openai_handler.py
arrowKato 2024/05/20
Assistant APIの CodeInterpreterを使ったまともな実装例

CodeInterpreter

LLM

advenced data analysis
リンク
mlabonne/Meta-Llama-3-120B-Instruct · Hugging Face
","chat_template":"{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }
arrowKato 2024/05/14
浅い層は大体浅い層でコピーして、真ん中くらいはだいたい真ん中くらいの層をコピーして、深い層も同様にしてモデルを大きくしたら精度が上がっちゃった

LLM
リンク
LLM時代のX情報収集術｜べいえりあ
AI for Everyoneについては日本語版もあるのと、どちらのコースも日本語字幕付きで見られる（多分機械翻訳での英語字幕からの翻訳だが、翻訳の質は悪くない）ので、英語分からなくてある程度何とかなるんじゃないかと思います。あと、余力のある人、最新のNLP研究を理解したい人はこちらの本を読むことをオススメします。アルゴリズムの詳細は必ずしも理解しなくても良いですが、どんなタスクがあるのかは理解しておいた方が良いかと思います。 NLPの知識がLLMを応用する上で実際にどう役に立つかですが、例えばで言うとNLP的には対話の中には「タスク指向型対話（task-oriented dialogue）」と「雑談（chit-chat dialogue）」があります。それぞれ対話の中で重要視されるものから評価の仕方まで全然違うのですが、NLPをやらずにLLMをやっている人と話しているとこれらをごっちゃ
arrowKato 2024/05/10
Xの情報源まとめ

LLM

NLP
リンク
DeepSeek
arrowKato 2024/05/08
ホントだったらすごい。ChatBot Arena, nejumi leader boardにランクインしたら利用検討しましょう

LLM

DeepSeek
リンク
Retrieval-Augmented Generation for Large Language Models: A Survey
arrowKato 2024/05/01
RAGのサーベイ論文

RAG

LLM
リンク
LLMアプリケーションの記録・実験・評価のプラットフォーム Weave を試す｜npaka
LLMアプリケーションの記録・実験・評価のプラットフォーム「Weave」がリリースされたので、試してみました。この入門記事は、「Weights & Biases」のご支援により提供されています。Weights & Biases JapanのNoteでは他にも多くの有用な記事が掲載されていますので是非ご覧ください。 1. Weave「Weave」は、LLMアプリケーションの記録、実験、評価のためのツールです。「Weights & Biases」が提供する機能の1つになります。主な機能は、次のとおりです。・記録 : LLMとのあらゆるやり取りを記録。・実験 : 様々なパラメータを試して結果を確認。・評価 : 評価を実行してモデルが改善されたかどうかを測定。 2. Weave の準備今回は、「Google Colab」で「Weave」を使って「OpenAI」のモデルの記録・実験・評価を
arrowKato 2024/04/30
Weave。 weight & bias社が出している評価プラットフォーム

LLM

評価
リンク
OWASP Top 10 for LLM Applications
arrowKato 2024/04/30
LLMを使ったアプリケーションでの脆弱性top10

LLM

security
リンク
AgentBench: Evaluating LLMs as Agents
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has been an urgent need to evaluate LLMs as agents on challenging tasks in interactive environments. We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Age
arrowKato 2024/04/26
ベンチマークの論文

LLM

Agent
リンク
New additions to Amazon Bedrock make it easier and faster than ever to build generative AI applications securely
arrowKato 2024/04/26
なんかでた。

AWS

llm
リンク
Vals.ai: LegalBench
arrowKato 2024/04/25
]所得税、企業財務、および契約法に関連するタスクの大規模言語モデルのパフォーマンスをランク付けするベンチマーク

LLM

評価
リンク
GitHub - meta-llama/llama3: The official Meta Llama 3 GitHub site
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
arrowKato 2024/04/19
llama3 やっぱり　7億MAUが基準らしい

LLM
リンク
Meta Llama 3
Build the future of AI with Meta Llama 3. Now available with both 8B and 70B pretrained and instruction-tuned versions to support a wide range of applications.
arrowKato 2024/04/19
Llama3

LLM
リンク
GitHub - GoogleCloudPlatform/genai-databases-retrieval-app
arrowKato 2024/04/18
アプリとAPIサーバーでの責任分解までしているので中規模以上のRAGアプリを作るならこの構成が参考になるかも

LLM

RAG

Agent

ReAct
リンク
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
arrowKato 2024/04/18
コンテキストが長くなると精度が落ちるという論文。GPT-4, 3.5は比較的落ちづらい。

LLM
リンク
Jan | Rethink the Computer
Built with loveJan is entirely open-source. We build it transparently, guided by the belief that AI's future should be open and shared with everyone.
arrowKato 2024/04/15
ローカルで動くOSSモデル

LLM
リンク
1 2 3 4 5 次のページ