基於Python操作ElasticSearch

python：2.7

es依賴包：pyelasticsearch

elasticsearch：5.5.1 / 6.0.1

作業系統：windows 10 / centos 7

本文主要就es基本的crud操作做以歸納整理，es官方對python的依賴支援有很多，eg：pyelasticsearch、esclient、elasticutils、pyes、rawes、surfiki refine等。博主在工作中只涉及到了pyelasticsearch，所以本文主要就該依賴做說明，其他的依賴包可詳見官網。

pyelasticsearch依賴包的安裝命令：pip install elasticsearch

pyelasticsearch依賴所提供的介面不是很多，下面主要從單一操作和批量操作兩大類做以討論和分析。

插入

create：必須指定待查詢的idnex、type、id和查詢體body；缺一不可，否則報錯

index：相比於create，index的用法就相對靈活很多；id並非是乙個必選項，如果指定，則該文件的id就是指定值，若不指定，則系統會自動生成乙個全域性唯一的id賦給該文件。

eg：

body = 
es = elasticsearch(['localhost:9200'])
es.index(index='indexname', doc_type='typename', body, id=none)

刪除

delete：刪除指定index、type、id的文件

es.delete(index='indexname', doc_type='typename', id='idvalue')

查詢

get：獲取指定index、type、id所對應的文件

es.get(index='indexname', doc_type='typename', id='idvalue')

更新

update：跟新指定index、type、id所對應的文件

es.update(index='indexname', doc_type='typename', id='idvalue', body=)

條件查詢

search：查詢滿足條件的所有文件，沒有id屬性，且index，type和body均可為none。

body的語法格式必須符合dsl（domain specific language ）格式

query = }}# 查詢所有文件
query = }}# 查詢名字叫做jack的所有文件
query = }}}# 查詢年齡大於11的所有文件
alldoc = es.search(index='indexname', doc_type='typename', body=query)
print alldoc['hits']['hits'][0]# 返回第乙個文件的內容

條件刪除

delete_by_query：刪除滿足條件的所有資料，查詢條件必須符合dls格式

query = }}# 刪除性別為女性的所有文件
query = }}}# 刪除年齡小於11的所有文件
es.delete_by_query(index='indexname', body=query, doc_type='typename')

條件更新

update_by_query：更新滿足條件的所有資料，寫法同上刪除和查詢

批量插入、刪除、更新

bulk：在這重點和大家聊聊bulk方法，前面的所有方法都很簡單，唯獨這個bulk在筆者開始接觸的時候，花費了不少時間；這個方法可以同時執行多個操作，單隻請求一次，從而在批量操作的時候，可以很大程度上減少程式系統開銷。此外，bulk不僅可以一次性批量執行插入、或者刪除操作，還可以在一次請求中，既可以插入、又可以刪除和更新操作。

但是需要注意的是，任何一種操作都有固定的文件格式，只有完全符合該格式要求，才可執行成功。廢話不多說，直接上**：

doc = [},,
},,},,
},,
] doc = [}}
}}
}] es.bulk(index='indexname'， doc_type='typename', body=doc)

通過上面兩個例子可以看出，在用bulk在批量操作的時候，對於不同的操作型別，一定要與之對應乙個操作頭資訊（eg：}， }， …），否則會報transporterror（400, u』illegal_argument_exception』）的錯誤。

說到這裡，在實際過程中，很多時候就會在此處要專門去批湊這樣的乙個字典陣列。假設有如下場景：

如果要批量插入一批資料，如上述第乙個例子，則在現有資料集的基礎上，很容易想到乙個解決方法：通過list的奇偶合併的方法快速實現所需要的字典陣列。在這推薦一種python的技巧：[::2]和[1::2]來實現奇偶合併。

文獻二：

elasticsearch 他對外提供了rest的http的介面，貌似很強大的樣子。但是咱們的一些資料平台市場會對於elasticsearch的資料進行分析，尤其是實時分析。當然不能用 http的方式。

下面是http的方式的乙個demo:

下面是查詢，/ceshi是索引，rui是type，搜尋的內容是title為jones的資料

新增資料

curl-x post-d''

但是聽說，1.x之後不能直接curl，這不是重點忽略下面介紹乙個python使用elasticsearch的例子

from datetime import datetime
from elasticsearch import elasticsearch
#連線elasticsearch,預設是9200
es = elasticsearch()
#建立索引，索引的名字是my-index,如果已經存在了，就返回個400，
#這個索引可以現在建立，也可以在後面插入資料的時候再臨時建立
es.indices.create(index='my-index',ignore)
##插入資料,(這裡省略插入其他兩條資料，後面用)
es.index(index="my-index",doc_type="test-type",id=01,body=)
#,u'_version':1,u'_index':u'my-index',u'_id':u'1}
#也可以，在插入資料的時候再建立索引test-index
es.index(index="test-index",doc_type="test-type",id=42,body=)
#查詢資料，兩種get and search
#get獲取
res = es.get(index="my-index", doc_type="test-type", id=01)
print(res)
#, u'_index': u'my-index', u'_version': 1, u'found': true, u'_id': u'1'}
print(res['_source'])
##search獲取
res = es.search(index="test-index", body=}})
print(res)
#, u'_index': u'my-index'},
#        , u'_index': u'my-index'},
#        , u'_index': u'my-index'}
#    ],
#    u'total': 5,
#    u'max_score': 1.0
#    },
#u'_shards': ,
#u'took': 1,
#u'timed_out': false
#}for hit in res['hits']['hits']:
print(hit["_source"])
res = es.search(index="test-index", body=}}) #獲取any=data的所有值
print(res)

至於body裡面引數的設定，具體請看：

基於Python操作ElasticSearch

基於Python操作ElasticSearch

檔案操作基於python

python操作hbase 基於thrift服務

基於Python操作ElasticSearch

基於Python操作ElasticSearch

檔案操作 基於python

python操作hbase 基於thrift服務

相關推薦

檔案操作基於python