Forum » Feedback and Ideas

Automated & manual Artist disambiguation: How to drastically improve this site,…

 
    • maz35 sa...
    • Användare
    • 30 jul 2009, 01:23
    JLizard79 said:
    I see the excact same problem on spotify, but that I can take since it's so new but last.fm has been around since 2004 if I'm not misstaken.

    I'd say its an easier problem to deal with when you are new as you wont have years of past data to worry about. Should be easier for spotify anyway since they are just a streaming service.

    • [Raderad användare] sa...
    • Användare
    • 30 jul 2009, 06:37
    All they need to do with the years of past data is to give end-users the ability to move tracks from the disambiguation page to the new unique aritst page. Newly-disambiguated bands can enlist their fans to do the work.

    As I said before, all these problems are challenges to be overcome, not excuses as to why it can't be done.

    • Russ sa...
    • Alumni
    • 30 jul 2009, 11:19
    We've never said it's impossible - we've had a plan on how to do this for a few months now. But we're still a small team and this is a pretty massive change, and as yet we haven't had time to work on it.

    It will happen, but I can't give you any indication as to when.

    • [Raderad användare] sa...
    • Användare
    • 31 jul 2009, 23:22
    Glad to hear you're addressing this. Last.fm is matters a lot for exposure for independent artists who are locked out of mainstream radio, and these are the bands most affected by the lack of disambiguation.

    I apologise a bit for the angry tone of my earlier post - I was angry at the fan of the Italian Panic Room for completely ignoring the style guidelines on the Wiki, and pushing the established Panic Room (who I know personally) "below the fold" in favour of the new band.

  • Russ said:
    We've never said it's impossible - we've had a plan on how to do this for a few months now. But we're still a small team and this is a pretty massive change, and as yet we haven't had time to work on it.

    It will happen, but I can't give you any indication as to when.


    yaaay I got staff imput. Russ, what do you think about my ideas? I think the concepts I presented are the path of least resistance to get this done, your thoughts?

    • fmera sa...
    • Användare
    • 3 aug 2009, 09:18

    Re:How to drastically improve this site: Follow the discogs.com model

    iTranscendence said:
    Mainly, artists having to share pages, I mean are you serious? If this site is supposed to support the artist how is that possible when you are mashing all the ones with the same names together.

    ...There seems to be animosity on band pages where they have to share with other bands between their fans, and with good cause.

    ...Another thing I have noticed, is that sometimes the album covers uploaded by the labels are of a pathetically low quality.

    ...the one thing that last.fm is missing from the equation, fully wielding the power of the listener base, the fans of the artists themselves to help improve your own website.

    all solid points!

    U.G.L.Y. - changing the face of music, one artist at a time.
    there are some things pngs can't fix. for everything else, there's pngoptimizer.
  • Thanks fmera, I just want to see this site live up to the potential it has, it's already positioned itself to become the dominant force in this niche area of the internet, they just need to harness the power of the user more, like they do in the wiki model.

    There is an old adage, "Many hands make light work."

  • I just had a thought about how you could solve the problem of tracks that are mislabeled. If the scrobbler found songs with key words, instead of an exact match it could link the stats, and redirect all others to that page for that version.

  • Bump again this is pertinent ot the other thread currently being discussed on the issue.

    • [Raderad användare] sa...
    • Användare
    • 13 aug 2009, 08:31
    The artist suffixes proposed here would look terrible imho. Too complicated and messy. For that reason alone it's a horrible idea.

  • higginst said:
    The artist suffixes proposed here would look terrible imho. Too complicated and messy. For that reason alone it's a horrible idea.



    Are you saying that because you are under the impression that every person would have to label their libraries as such? because that's not how it would work, It would only be internal within the database and if you looked at the web address page for that artist specifically,

    The scrobbler would determine whether it was "Artist" or "Artist (1)" you are listening to, based on the song and album keywords coming through your srobbler, and would direct you and the scrobble data to the corresponding page accordingly.

    • Bloopy sa...
    • Forum Moderator
    • 16 aug 2009, 03:28
    Discogs URLs are a little bit ugly. It would be better if it looked like this:

    last.fm/music/Pitch+Black/1
    last.fm/music/Pitch+Black/2

    • [Raderad användare] sa...
    • Användare
    • 16 aug 2009, 03:53
    Bloopy said:
    Discogs URLs are a little bit ugly. It would be better if it looked like this:

    last.fm/music/Pitch+Black/1
    last.fm/music/Pitch+Black/2

    That would decrease usability imho.

    • [Raderad användare] sa...
    • Användare
    • 16 aug 2009, 05:11
    I agree with Bloopy. How would such a system decrease usability?

  • higginst said:
    I agree with Bloopy. How would such a system decrease usability?


    I'm going to have to third that, you need to describe how it would decrease user ability vs this model.

    last.fm/music/Aes+Dana
    last.fm/music/Aes+Dana+(1)

  • gwalla said:
    Disambiguation pages work great for Wikipedia...but Wikipedia is not dependent on ID3 tags for its information, and all of Wikipedia's information is meant to be used directly by humans interacting with the site. Last.FM, on the other hand, collects most of its information automatically (through passive scrobbling) and uses it to fuel computer algorithms, without any direct user input.

    Many, probably most, tracks in the Last.FM database were not submitted directly by a label or artist. All Last.FM knows about them is that somebody played a song with one string of characters in the artist tag, another string of characters in the title tag, and maybe another string of characters in the album tag.


    Here's how it should be done.

    Artist.YearOfFirstAlbum

    Or maybe...

    Artist.YearOfFirstAlbum.MonthOfFirstAlbum

    Discog is just an infoless arbitrary number. The year gives you something to work with. If I see the year I can work with that and figure out which band I want.

    "Are you not entertained?" Then take a listen to my thousands of lovingly loved Loved tracks and love the loving, Lovely:
    http://www.last.fm/listen/user/mINDsELFiNDULGE/loved
    The Bestest: Gonna Get Got, Veronica Lipgloss, King Hell, DearestAzazel, Tsunami Bomb, Mindless Self Indulgence, Moloko, The White Stripes, Muse, Von Iva, The Mentalists!, Escape The Fate
  • iTranscendence said...User input is what has made wiki what it is today.

    OMG!, Yes it has and half the posts are wrong as well, while also creating 100's of copyright issues.

    As a journalist, we are explicitly instructed, to avoid Wikipedia at all costs. Those writers, who have used Wikipedia in the past, as a soul source and reference for information, have been also terminated from employment.

    • dankine sa...
    • Användare
    • 19 aug 2009, 12:56
    sigh, a journalist who hasn't quite mastered apostrophes or commas it seems.

    on subject, if there were a way to classify by fingerprints that would seem to be a good way to go, but isn't very simple at all. the numbers are arbitrary and i don't think they would really 'fit' with lfm.

    "Those who can make you believe absurdities can make you commit atrocities"
    "I don't want to believe, I want to know"

    Auto Corrections Group
  • CrybKeeper said:
    iTranscendence said...User input is what has made wiki what it is today.

    OMG!, Yes it has and half the posts are wrong as well, while also creating 100's of copyright issues.

    As a journalist, we are explicitly instructed, to avoid Wikipedia at all costs. Those writers, who have used Wikipedia in the past, as a soul source and reference for information, have been also terminated from employment.



    The funny thing about this whole comment is, you can go to the citations on wiki for the subject you are researching and then just cite the source material yourself.

    BTW, where were you like half a decade ago.

    http://news.bbc.co.uk/2/hi/technology/4530930.stm

    Wikipedia survives research test
    The free online resource Wikipedia is about as accurate on science as the Encyclopedia Britannica, a study shows.

    They also found it was more accurate on fast changing subjects like technology.

  • Wikipedia has ended a couple careers, due to Hoaxers. One in particular, just recently.

    As for journalism- I am an amateur, self absorbed, semi-cynic, who procrastinates learning better grammatics, in favor of live participation.
    I am constantly berated for over using punctuations. lol.
    I have a copy editor that fixes my crap, I am published here and there, freelance. I also have a 3rd place award in short story writing, so who cares? Apparently, not myself. And, 3rd place sucks.

    EXAMPLE(below) was not published, so I still have the rights:

    Gannett Owns Our Newspaper, But For How Long?
    3/2/2009 10:33 AM EST - chillicothegazette.com

    Excerpts from a recent Letter To The Shareholders of Gannett, by Company Chairmen and President, CRAIG A. DUBOW, show signs of profit losses in the billions of dollars, while profits from the transition to digital media offerings, is sadly lacking in comparison.

    "Overall, operating revenues were $6.8 billion. Net income from continuing operations
    would have been $747 million were it not for impairment and other charges in the second
    and fourth quarters totaling $8.4 billion pre-tax ($7.4 billion after-tax). These
    charges caused us to report a net loss from continuing operations of $6.65 billion or
    $29.11 per share."

    "With its acquisition, Ripple6 joined CareerBuilder, ShopLocal, PointRoll, Planet
    Discover and Schedule Star (which operates HighSchoolSports.net) in our new Digital
    segment. Pro forma revenues for this segment were $689 million in 2008. When added
    with revenues from Web sites associated with our Publishing and Broadcast operations,
    our pro forma online revenues surpassed $1 billion."


    The majority of the Letter, touted broader access to more localized markets for national advertising customers, which shows Gannetts profit focus is still applied to their advertisers primarily.

    MORE....

    "Driving a national advertising sales strategy is a project called SalesOne, which allows
    us to approach national advertisers in a unified, organized way from a print, digital and
    broadcast perspective and to reduce redundancy of effort both on our part and the
    advertisers’."

    "In 2008, we aggressively continued our transformation into a company that is innovative,
    nimble and intently focused on the customer. We welcomed the explosive growth
    in people’s need for instant, accurate and credible information as a true opportunity
    and continued to adjust quickly and creatively to the changing nature of our industry.
    After all, providing news and information is what we do best."

    "In a move that strengthened our core operations, Bob Dickey was appointed president
    of the U.S. Community Publishing division in late February, 2008.
    He dove in quickly by overhauling the operation with a reorganization of regions, a
    consolidation of operations and printing facilities, and a ramping up of the regional
    toning centers for photo production."


    The above paragraph, would explain the loss of 50 jobs in Lancaster, Ohio, one year later.

  • That's their own fault for not being able to smell a troll.

    • gwalla sa...
    • Användare
    • 24 aug 2009, 06:46
    mINDsELFiNDULGE said:
    gwalla said:
    Disambiguation pages work great for Wikipedia...but Wikipedia is not dependent on ID3 tags for its information, and all of Wikipedia's information is meant to be used directly by humans interacting with the site. Last.FM, on the other hand, collects most of its information automatically (through passive scrobbling) and uses it to fuel computer algorithms, without any direct user input.

    Many, probably most, tracks in the Last.FM database were not submitted directly by a label or artist. All Last.FM knows about them is that somebody played a song with one string of characters in the artist tag, another string of characters in the title tag, and maybe another string of characters in the album tag.


    Here's how it should be done.

    Artist.YearOfFirstAlbum

    Or maybe...

    Artist.YearOfFirstAlbum.MonthOfFirstAlbum

    Discog is just an infoless arbitrary number. The year gives you something to work with. If I see the year I can work with that and figure out which band I want.
    You missed the point entirely. How do they get "year of first album" from a title, artist name, and optional album name? It's simply not part of the data the scrobbler sends (nor is it data the scrobbler could send, since it's not stored in the tags for mp3s or in CDDB data for CDs).

    • Sinriel sa...
    • Användare
    • 25 aug 2009, 15:23
    Bloopy said:
    Discogs URLs are a little bit ugly. It would be better if it looked like this:

    last.fm/music/Pitch+Black/1
    last.fm/music/Pitch+Black/2


    Can't believe someone is objecting because URLs are "ugly". Get a grip. It's all about functionality and data being as correct as possible.

    Terminus Est.
    • Bloopy sa...
    • Forum Moderator
    • 25 aug 2009, 21:24
    The reason I posted that is because someone objected to the Discogs suffixes. I was basically agreeing and just showing that they can be more simple than that.

  • Following the Discogs, MusicBrainz, cddb, or any other model is not the solution to LFM's data problems. It might be a piece of the puzzle, but it's definitely not the solution.

    LFM has the quintessential data cleansing problem... there is none bigger, IMO. Not only do they not have a true system of record, they also have multiple vendors (labels & artists) supplying potentially overlapping data (multiple labels per artist, multiple releases of the same work across diff labels, multiple releases of the same work across multiple countries, multiple artists with the same name, etc.), several possible data lookup integration points (Discogs, MB, etc.), every man, woman, and child on the face of the earth a potential data provider, and the requirement to implement licensing restrictions per track regardless of data source.

    It's a complex and interesting problem that requires a great deal of analysis in order to yield an appropriate, working design. IMO, the very first (and key) thing that I'd look at doing (assuming data is captured in a DB) is abstracting the accessing identification system for artists, albums, and tracks. You do so by implementing identifier types (LFM, Label Defined, User Defined, Musicbrainz, Discogs, etc.), modeling/implementing the notion of external identifiers, populating/integrating baseline external id data, then mapping those external ids/types to internal id's identifying the actual data. Do so and you can browse data by any number of methods (Discogs, LFM, User Defined, etc.). More important, you can filter what's viewed on a user by user basis.

    For instance, a user might want to browse and see only data that maps up to Discogs plus his/her scrobbles that don't match to anything (i.e., self-scrobbled, erroneous ID3 tags). If you've abstracted the access identification system, you can do so. A user can choose to view only baseline "clean" data filtering out all un-cleansed user scrobbles, a Label/Artist can choose to view only data mapping up to their own identification system, LFM can compare their baseline dataset to Discogs, MB, or any other external source... there's a great deal of flexibility provided, an automatic implicit map created between id types, and a system that provides the means to maintain a data system of record within a sea of garbage data and also allows "cleanse over time" and explicit user control over their own scrobbles.

    Yet, that's still only a small part of the overall solution. You need processes in place for maintaining the baseline dataset (data associated with LFM external IDs), specifically, propagating non-cleansed data to cleansed status (probably via appointed user community data managers, mods, etc.), processes allowing user's to correct their own data scrobbles without impacting charts (not that difficult), and processes for abstracting and enumerating ambiguous artists (again, not that difficult once you've abstracted the identification system and provided user's the ability to correct their own scrobbles). Devil's always in the details ;-)

    Russ already mentioned that they have a plan in place to the attack the problem. I'd be willing to bet it's probably the single most thought about architectural issue they have... and, it's really not an issue they could attack until "after" they built up a baseline dataset given the nature of their primary data source (scrobbles). It will be interesting to watch the change once it takes place.

Anonyma användare kan inte skriva inlägg. Vänligen logga in eller skapa ett konto för att göra inlägg i forumen.