i have dataframe of transactions. each row represents transaction of 2 item (think of transaction of 2 event tickets or something). want duplicate each row based on quantity sold.
here's example code:
# dictionary of transactions d = {'1': ['20', 'nyc', '2'], '2': ['30', 'nyc', '2'], '3': ['5', 'nyc', '2'], \ '4': ['300', 'la', '2'], '5': ['30', 'la', '2'], '6': ['100', 'la', '2']} columns=['price', 'city', 'quantity'] # create dataframe , rename columns df = pd.dataframe.from_dict(data=d, orient='index') df.columns = columns
this produces dataframe looks this
price city quantity 20 nyc 2 30 nyc 2 5 nyc 2 300 la 2 30 la 2 100 la 2
so in case above, each row transform 2 duplicate rows. if 'quantity' column 3, row transform 3 duplicate rows.
first, recreated data using integers instead of text. varied quantity 1 can more understand problem.
d = {1: [20, 'nyc', 1], 2: [30, 'nyc', 2], 3: [5, 'sf', 3], 4: [300, 'la', 1], 5: [30, 'la', 2], 6: [100, 'sf', 3]} columns=['price', 'city', 'quantity'] # create dataframe , rename columns df = pd.dataframe.from_dict(data=d, orient='index').sort_index() df.columns = columns >>> df price city quantity 1 20 nyc 1 2 30 nyc 2 3 5 sf 3 4 300 la 1 5 30 la 2 6 100 sf 3
i created new dataframe using nested list comprehension structure.
df_new = pd.dataframe([df.ix[idx] idx in df.index _ in range(df.ix[idx]['quantity'])]).reset_index(drop=true) >>> df_new price city quantity 0 20 nyc 1 1 30 nyc 2 2 30 nyc 2 3 5 sf 3 4 5 sf 3 5 5 sf 3 6 300 la 1 7 30 la 2 8 30 la 2 9 100 sf 3 10 100 sf 3 11 100 sf 3
Comments
Post a Comment