問題記錄者002

2021-10-24 02:01:36 字數 2540 閱讀 5307

在進行spark**除錯的時候,報錯:

log length: 2195

traceback (most recent call last):

typeerror: can not infer schema for type: during handling of the above exception, another exception occurred:

traceback (most recent call last):

file "330925675.py", line 209, in run()

file "330925675.py", line 199, in run

.createdataframe(input_data, ["doc_id", "doc_title", "doc_attribute","box_map","has_summary_img","catalog_map","content_img_num","index"])

typeerror: not supported type:

spark**如下

def analysis_row2(row):

box_map = {}

has_summary_img = 0

box_map = {}

attr_list, link_list = get_infobox_data(row.json_info_box)

for attr in attr_list:

box_map[attr.key] = attr.value

# 資訊模組字串

box_map_str = json.dumps(box_map, ensure_ascii=false)

has_summary_img = 0

summary_img = get_summary_img(row.json_summary)

if len(summary_img) > 0:

has_summary_img = 1

if summary_img == "err":

has_summary_img = -1

catalog_map = parser_catalog(row.json_catalog)

catalog_map_str = json.dumps(catalog_map, ensure_ascii=false)

content_img_arr = parser_content_img(row.json_content)

content_img_num = len(content_img_arr)

return row.doc_id, row.doc_title, row.doc_attribute,box_map_str,has_summary_img,catalog_map_str,content_img_num,row.index

//注意這裡的row.index

def run():

spark = sparksession \

.builder \

.enablehivesupport() \

.config("hive.exec.dynamic.partition", "true") \

.config("hive.exec.dynamic.partition.mode", "nonstrict") \

.getorcreate()

conf = sparkconf()

date = str(conf.get("spark.biz.date"))

task_type = "doc_detail_stat"

df = spark.sql("""select

id as doc_id,

index ,

stat.doc_title,

doc_attribute,

json_catalog,

json_info_box,

json_summary,

json_content

from

bk.xiaoxu

join (

select

doc_id,

json_catalog,

doc_title,

doc_attribute,

json_info_box,

json_summary,

json_content

from

bk.mds_midas_latest_doc_stats

where

date = 20200828

) as stat on stat.doc_id = id """

)input_data = df.rdd.map(lambda row: analysis_row2(row))

在除錯spark的時候,**執行失敗,報錯。

排查

發現在analysis_row2 函式返回時不能用row.index,應該是改index關鍵字不可以,改名之後this_index,順利執行。

LeetCode 刷題記錄 002 兩數相加

給定兩個非空鍊錶來表示兩個非負整數。位數按照逆序方式儲存,它們的每個節點只儲存單個數字。將兩數相加返回乙個新的鍊錶。你可以假設除了數字 0 之外,這兩個數字都不會以零開頭。示例 輸入 2 4 3 5 6 4 輸出 7 0 8 原因 342 465 807思路 因為返回乙個新的鍊錶,所以先要建立乙個新...

ios crash問題記錄

1.誤將nsmutablearray型別的變數初始化為nsarray,結果對nsmutablearray型別變數進行操作時,crash h檔案如下 inte ce movemecontroller secondlevelviewcontroller property nonatomic,retain...

??? nginx lua問題記錄

問題1 當用http localhost test 訪問時,結果為何迥異?eg1 location test 結果為空,說明執行的是httpechomodule的echo指令,沒有執行httpluamodule的content by lua指令 eg2 location test輸出123 說明執行...