这部分到底是在干嘛?
来源:8-9 batch_flow_bucket(1)

StephenLee147
2019-01-09
all_data = list(zip(*data))
lengths = sorted(list(set([len(x[bucket_ind]) for x in all_data])))
if n_bucket > len(lengths):
n_bucket = len(lengths)
splits = np.array(lengths)[
(np.linspace(0, 1, 5, endpoint=False) * len(lengths)).astype(int)
].tolist()
splits += [np.inf] #np.inf无限大的正整数
if debug:
print(splits)
ind_data = {}
for x in all_data:
l = len(x[bucket_ind])
for ind, s in enumerate(splits[:-1]):
if l >= s and l <= splits[ind + 1]:
if ind not in ind_data:
ind_data[ind] = []
ind_data[ind].append(x)
break
inds = sorted(list(ind_data.keys()))
ind_p = [len(ind_data[x]) / len(all_data) for x in inds]
if debug:
print(np.sum(ind_p), ind_p)
写回答
1回答
-
Mr_Ricky
2019-04-29
这个实际上就是在做一个句子的切分
00
相似问题