当前位置:   article > 正文

Elasticsearch:RBAC 和 RAG - 最好的朋友 (二)

Elasticsearch:RBAC 和 RAG - 最好的朋友 (二)

在之前的文章 “Elasticsearch:RBAC 和 RAG - 最好的朋友(一)”,我们详细描述了如何使用 RBAC 来控制 RAG 的访问。在今天的文章中,我们来通过一个 jupyter notebook 来描述如何实现这个。

安装

如果你还没有安装好自己的 Elasticsearch 及 Kibana,请参考如下的链接来进行安装:

在安装的时候,我们选择 Elastic Stack 8.x 来进行安装。特别值得指出的是:ES|QL 只在 Elastic Stack 8.11 及以后得版本中才有。你需要下载 Elastic Stack 8.11 及以后得版本来进行安装。

在首次启动 Elasticsearch 的时候,我们可以看到如下的输出:

我们需要记下 Elasticsearch 超级用户 elastic 的密码。

我们还可以在安装 Elasticsearch 目录中找到 Elasticsearch 的访问证书:

  1. $ pwd
  2. /Users/liuxg/elastic/elasticsearch-8.13.2/config/certs
  3. $ ls
  4. http.p12 http_ca.crt transport.p12

在上面,http_ca.crt 是我们需要用来访问 Elasticsearch 的证书。

我们首先克隆已经写好的代码

git clone https://github.com/liu-xiao-guo/elasticsearch-labs

我们然后进入到该项目的根目录下:

  1. $ pwd
  2. /Users/liuxg/python/elasticsearch-labs/supporting-blog-content/rbac-and-rag-best-friends
  3. $ cp ~/elastic/elasticsearch-8.13.2/config/certs/http_ca.crt .
  4. $ ls
  5. http_ca.crt rbac-and-rag-best-friends.ipynb

在上面,我们把 Elasticsearch 的证书拷贝到当前的目录下。上面的 rbac-and-rag-best-friends.ipynb 就是我们下面要展示的 notebook

展示

在运行 jupyter notebook 之前,我们先在命令行中打入如下的命令来设置变量:

  1. export ES_USER="elastic"
  2. export ES_PASSWORD="VDMlz5QnM_0g-349fFq7"
  3. export ES_ENDPOINT="localhost"

我们需要根据自己的配置做相应的改动。然后,我们在当前的 terminal 中打入如下的命令:

jupyter notebook

安装并导入需要的 Python 库

!pip install elasticsearch python-dotenv
  1. from elasticsearch import Elasticsearch
  2. from IPython.display import HTML, display
  3. from pprint import pprint
  4. from dotenv import load_dotenv
  5. import os, json

在运行完上面的命令后,我们可以查看安装好的 elasticsearch 包的版本:

  1. $ pip list | grep elasticsearch
  2. elasticsearch 8.13.0

客户端连接到 Elasticsearch

创建 elasticsearch 连接

  1. load_dotenv()
  2. ES_USER = os.getenv("ES_USER")
  3. ES_PASSWORD = os.getenv("ES_PASSWORD")
  4. ES_ENDPOINT = os.getenv("ES_ENDPOINT")
  5. url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
  6. print(url)
  7. es = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
  8. print(es.info())

更多有关如何使用 Python 连接到 Elasticsearch 的知识,请参阅文章 “Elasticsearch:关于在 Python 中使用 Elasticsearch 你需要知道的一切 - 8.x”。

删除演示索引(如果以前存在)

  1. # Delete indices
  2. def delete_indices():
  3. try:
  4. es.indices.delete(index="rbac_rag_demo-data_public")
  5. print("Deleted index: rbac_rag_demo-data_public")
  6. except Exception as e:
  7. print(f"Error deleting index rbac_rag_demo-data_public: {str(e)}")
  8. try:
  9. es.indices.delete(index="rbac_rag_demo-data_sensitive")
  10. print("Deleted index: rbac_rag_demo-data_sensitive")
  11. except Exception as e:
  12. print(f"Error deleting index rbac_rag_demo-data_sensitive: {str(e)}")
  13. delete_indices()

创建及装载数据到索引中

  1. # Create indices
  2. def create_indices():
  3. # Create data_public index
  4. es.indices.create(
  5. index="rbac_rag_demo-data_public",
  6. ignore=400,
  7. body={
  8. "settings": {"number_of_shards": 1},
  9. "mappings": {"properties": {"info": {"type": "text"}}},
  10. },
  11. )
  12. # Create data_sensitive index
  13. es.indices.create(
  14. index="rbac_rag_demo-data_sensitive",
  15. ignore=400,
  16. body={
  17. "settings": {"number_of_shards": 1},
  18. "mappings": {
  19. "properties": {
  20. "document": {"type": "text"},
  21. "confidentiality_level": {"type": "keyword"},
  22. }
  23. },
  24. },
  25. )
  26. # Populate sample data
  27. def populate_data():
  28. # Public HR information
  29. public_docs = [
  30. {"title": "Annual leave policies updated.", "confidentiality_level": "low"},
  31. {"title": "Remote work guidelines available.", "confidentiality_level": "low"},
  32. {
  33. "title": "Health benefits registration period starts next month.",
  34. "confidentiality_level": "low",
  35. },
  36. ]
  37. for doc in public_docs:
  38. es.index(index="rbac_rag_demo-data_public", document=doc)
  39. # Sensitive HR information
  40. sensitive_docs = [
  41. {
  42. "title": "Executive compensation details Q2 2024.",
  43. "confidentiality_level": "high",
  44. },
  45. {
  46. "title": "Bonus payout structure for all levels.",
  47. "confidentiality_level": "high",
  48. },
  49. {
  50. "title": "Employee stock options plan details.",
  51. "confidentiality_level": "high",
  52. },
  53. ]
  54. for doc in sensitive_docs:
  55. es.index(index="rbac_rag_demo-data_sensitive", document=doc)
  56. create_indices()
  57. populate_data()

我们可以在 Kibana 中使用如下的命令来查看索引:

创建两个具有不同访问级别的用户

  1. # Create roles
  2. def create_roles():
  3. # Role for the engineer
  4. es.security.put_role(
  5. name="engineer_role",
  6. body={
  7. "indices": [
  8. {"names": ["rbac_rag_demo-data_public"], "privileges": ["read"]}
  9. ]
  10. },
  11. )
  12. # Role for the manager
  13. es.security.put_role(
  14. name="manager_role",
  15. body={
  16. "indices": [
  17. {
  18. "names": [
  19. "rbac_rag_demo-data_public",
  20. "rbac_rag_demo-data_sensitive",
  21. ],
  22. "privileges": ["read"],
  23. }
  24. ]
  25. },
  26. )
  27. # Create users with respective roles
  28. def create_users():
  29. # User 'engineer'
  30. es.security.put_user(
  31. username="engineer",
  32. body={
  33. "password": "password123",
  34. "roles": ["engineer_role"],
  35. "full_name": "Engineer User",
  36. },
  37. )
  38. # User 'manager'
  39. es.security.put_user(
  40. username="manager",
  41. body={
  42. "password": "password123",
  43. "roles": ["manager_role"],
  44. "full_name": "Manager User",
  45. },
  46. )
  47. create_roles()
  48. create_users()

运行完上面的代码后,我们可以在 Kibana 中进行查看:

我们其实也可以使用 Kibana 的 UI 来创建这些用户及 role。你可以想象阅读文章 “Elasticsearch:用户安全设置”。

测试安全角色如何影响查询数据的能力

创建 helper 函数

用于查询每个用户的辅助函数和一些输出格式

  1. """
  2. def get_es_connection(cid, username, password):
  3. return Elasticsearch(cloud_id=cid, basic_auth=(username, password))
  4. """
  5. def get_es_connection(username, password):
  6. url = f"https://{username}:{password}@{ES_ENDPOINT}:9200"
  7. print(url)
  8. return Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
  9. def query_index(es, index_name, username):
  10. try:
  11. response = es.search(index=index_name, body={"query": {"match_all": {}}})
  12. # Prepare the message
  13. results_message = f'Results from querying as <span style="color: orange;">{username}:</span><br>'
  14. for hit in response["hits"]["hits"]:
  15. confidentiality_level = hit["_source"].get("confidentiality_level", "N/A")
  16. index_name = hit.get("_index", "N/A")
  17. title = hit["_source"].get("title", "No title")
  18. # Set color based on confidentiality level
  19. if confidentiality_level == "low":
  20. conf_color = "lightgreen"
  21. elif confidentiality_level == "high":
  22. conf_color = "red"
  23. else:
  24. conf_color = "black"
  25. # Set color based on index name
  26. if index_name == "rbac_rag_demo-data_public":
  27. index_color = "lightgreen"
  28. elif index_name == "rbac_rag_demo-data_sensitive":
  29. index_color = "red"
  30. else:
  31. index_color = "black" # Default color
  32. results_message += (
  33. f'Index: <span style="color: {index_color};">{index_name}</span>\t '
  34. f'confidentiality level: <span style="color: {conf_color};">{confidentiality_level}</span> '
  35. f'title: <span style="color: lightblue;">{title}</span><br>'
  36. )
  37. display(HTML(results_message))
  38. except Exception as e:
  39. print(f"Error accessing {index_name}: {str(e)}")

模拟 “工程师” 及 “经理” 的查询

  1. index_pattern = "rbac_rag_demo-data*"
  2. print(
  3. f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
  4. )
  5. for user in ["engineer", "manager"]:
  6. print(f"Logged in as {user}:")
  7. es_conn = get_es_connection(user, "password123")
  8. results = query_index(es_conn, index_pattern, user)
  9. print("\n\n")
  1. index_pattern = "rbac_rag_demo-data*"
  2. print(
  3. f"Each user will log in with their credentials and query the same index pattern: {index_pattern}\n\n"
  4. )
  5. for user in ["engineer", "manager"]:
  6. print(f"Logged in as {user}:")
  7. es_conn = get_es_connection(user, "password123")
  8. results = query_index(es_conn, index_pattern, user)
  9. print("\n\n")

从上面的输出中,我们可以看出来经理可以同时访问两个索引的数据,但是工程师只能访问属于工程师的数据。

最终的源码在地址 elasticsearch-labs/supporting-blog-content/rbac-and-rag-best-friends/rbac-and-rag-best-friends.ipynb at main · liu-xiao-guo/elasticsearch-labs · GitHub

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/613524
推荐阅读
相关标签
  

闽ICP备14008679号